Convert CSV records to TSV: Difference between revisions

m
→‎{{header|Wren}}: Changed to Wren S/H
m (→‎{{header|Wren}}: Changed to Wren S/H)
 
(14 intermediate revisions by 4 users not shown)
Line 14:
 
; A CSV record
[[Category:PEG]]
Our starting point will be a character set that includes ASCII; the language
of regular expressions (which will be denoted by strings of the form
r'REGEX'); and the following [[:Category:PEG|PEG]] (parsing expression grammar) grammar for a
single CSV record:
<pre>
Line 113 ⟶ 114:
aRETURNb, Be sure to change RETURN to the '\r' control character (#xd)
a\b
</pre>
 
[[category:CSV]]
[[category:TSV]]
 
=={{header|ALGOL 68}}==
All input \ characters are doubled in the output.<br>
As with some of the other samples, the input data is stored in an array, not read from a file.
<syntaxhighlight lang="algol68">
BEGIN # convert lines of CSV to TSV #
CHAR nul = REPR 0;
CHAR tab = REPR 9;
CHAR lf = REPR 10;
CHAR cr = REPR 13;
# returns s with some control characters converted to <name> #
PROC show ctrl = ( STRING s )STRING:
BEGIN
STRING result := "";
FOR i FROM LWB s TO UPB s DO
result +:= IF s[ i ] = nul THEN "<nul>"
ELIF s[ i ] = tab THEN "<tab>"
ELIF s[ i ] = cr THEN "<cr>"
ELIF s[ i ] = lf THEN "<lf>"
ELSE s[ i ]
FI
OD;
result
END # show ctrl # ;
# returns csv converted to TSV #
PROC csv2tsv = ( STRING csv )STRING:
BEGIN
BOOL at end := FALSE;
CHAR ch := nul;
# sets ch to the next character in csv, if there is one #
PROC next = VOID: ch := IF c pos < c max
THEN csv[ c pos +:= 1 ]
ELSE at end := TRUE
; nul
FI;
# skips over spaces and returns the count of skipped spaces #
PROC spaces = INT:
BEGIN
INT s count := 0;
WHILE NOT at end AND ch = " " DO s count +:= 1; next OD;
s count
END # spaces # ;
# adds ch to the TSV - converting some characters to escaped form #
PROC add = VOID: tsv +:= IF ch = "\" THEN "\\"
ELIF ch = nul THEN "\0"
ELIF ch = cr THEN "\r"
ELIF ch = lf THEN "\n"
ELIF ch = tab THEN "\t"
ELSE ch
FI;
# parse the csv and generate the tsv #
STRING tsv := "";
INT c pos := LWB csv - 1;
INT c max = UPB csv;
WHILE NOT at end DO
# spaces are not significant around quoted fields but are part of unquoted fields #
next;
INT space count := spaces;
IF ch = """" THEN
# quoted field part #
WHILE next;
WHILE NOT at end AND ch /= """" DO add; next OD;
IF NOT at end THEN
next;
IF ch = """" THEN
# embedded quote #
add
FI
FI;
NOT at end AND ch = """"
DO SKIP OD;
space count := spaces;
IF at end OR ch = "," THEN
# nothing significant after the quoted field part #
space count := 0
FI
FI;
# unquoted field part #
tsv +:= space count * " ";
WHILE NOT at end AND ch /= "," DO add; next OD;
IF ch = "," THEN
# have another field following this one #
tsv +:= tab
FI
OD;
tsv
END # csv2tsv # ;
# task test cases #
[]STRING tests =
( "a,""b"""
, """a"",""b""""c"""
, ""
, ",a"
, "a,"""
, " a , ""b"""
, """12"",34"
, "a" + tab + "b, That is a TAB character"
, "a\tb"
, "a\n\rb"
, "a" + nul + "b, That is a NUL character"
, "a" + cr + "b, Be sure to change RETURN to the '\r' control character (#xd)"
, "a\b"
);
FOR i FROM LWB tests TO UPB tests DO
print( ( " {{", show ctrl( tests[ i ] ), "}}", newline
, " -> {{", show ctrl( csv2tsv( tests[ i ] ) ), "}}", newline
)
)
OD
END
</syntaxhighlight>
{{out}}
<pre>
{{a,"b"}}
-> {{a<tab>b}}
{{"a","b""c"}}
-> {{a<tab>b"c}}
{{}}
-> {{}}
{{,a}}
-> {{<tab>a}}
{{a,"}}
-> {{a<tab>}}
{{ a , "b"}}
-> {{ a <tab>b}}
{{"12",34}}
-> {{12<tab>34}}
{{a<tab>b, That is a TAB character}}
-> {{a\tb<tab> That is a TAB character}}
{{a\tb}}
-> {{a\\tb}}
{{a\n\rb}}
-> {{a\\n\\rb}}
{{a<nul>b, That is a NUL character}}
-> {{a\0b<tab> That is a NUL character}}
{{a<cr>b, Be sure to change RETURN to the '\r' control character (#xd)}}
-> {{a\rb<tab> Be sure to change RETURN to the '\\r' control character (#xd)}}
{{a\b}}
-> {{a\\b}}
</pre>
 
Line 157 ⟶ 301:
Here, also, we interpret "nonsense" as starting immediately after a closing quote which is not followed by a delimiter, and ending immediately before the most immediately following newline.
 
For csv parsing we first break out fields using <b><tt>[[j:Vocabulary/semico#dyadic|;:]]</tt></b>. Here, each field is preceded by a delimiter. (We discard an optional trailing newline from the csv text and prepend a newline at the beginning so that every field has a preceding delimiter. Also, of course, if we were given a file reference, we work with the text of the file rather than its name.)
 
Then, these fields are formed into rows (fields which beginbegan withimmediately following newlines start a new row), and each field is stripped of delimiters and non-textual quotes are removed. The result is a two-dimensional matrix: the result of csv2mat.
 
To translate to tsv form, we would first escape special characters in each field, then insert delimiters between each field and terminate each record with a newline. Thus, mat2tsv takes a two-dimensional matrix and its result is a tsv string. (For utility, mat2tsv also supports numeric matrices, since that was trivial.)
 
Task example:
Line 183 ⟶ 327:
 
'''The following program can also be used with gojq, the Go implementation of jq,
but until recently NUL (#x0) iswas left unaltered.'''
 
In this entry, the PEG grammar for "record" as defined in the task
Line 189 ⟶ 333:
closely to jq operators, notably PEG's '/' to jq's '//'.
 
In translating the PEG grammar to a jq program, the main idea is to define a
pipeline for each grammar rule. A JSON object with keys "remainder" and
"result" is passed through this pipeline, consuming the text in .remainder and
Line 227 ⟶ 371:
 
def record: field | star(consume(",") | field);
 
def parse: {remainder: .} | record | .result;
 
def csv2tsv:
{remainder: .} | record | .result | @tsv ;
parse
| @tsv ;
 
# Transform an entire file: assuming jq is invoked with the -n option
inputs | csv2tsv
</syntaxhighlight>
{{output}}
As required:
* Backquotes are uniformly duplicated.
* Backslashes are uniformly duplicated.
* gojq does not, and currently cannot, handle NUL (#x0) properly.
* Until recently gojq did not handle NUL (#x0) properly.
 
=={{header|Julia}}==
{{trans|Phix}}
<syntaxhighlight lang="julia">function csv_tsv(str)
p = split(str, ",")
for (i, f) in enumerate(p)
if count(==('"'), f) > 1
p[i] = replace(strip(f, [' ', '"']), "\"\"" => "\"")
elseif f == "\""
p[i] = ""
end
end
t = join(p,"<TAB>")
s = replace(str, "\\" => "\\\\", "\t" => "\\t", "\0" => "\\0", "\n" => "\\n", "\r" => "\\r")
t = replace(t, "\\" => "\\\\", "\t" => "\\t", "\0" => "\\0", "\n" => "\\n", "\r" => "\\r")
return s, t
end
 
const testfile = "test.tmp"
fh = open(testfile, "w")
 
write(fh, """
a,"b"
"a","b""c"
 
,a
a,"
a , "b"
"12",34
a\tb, TAB
a\\tb
a\\n\\rb
a\0b, NUL
a\rb, RETURN
a\\b""")
 
close(fh)
 
for test_string in split(read(testfile, String), "\n")
csv, tsv = csv_tsv(test_string)
println(lpad(csv, 12), " => ", tsv)
end
</syntaxhighlight>{{out}}
<pre>
a,"b" => a<TAB>b
"a","b""c" => a<TAB>b"c
=>
,a => <TAB>a
a," => a<TAB>
a , "b" => a <TAB>b
"12",34 => 12<TAB>34
a\tb, TAB => a\tb<TAB> TAB
a\\tb => a\\tb
a\\n\\rb => a\\n\\rb
a\0b, NUL => a\0b<TAB> NUL
a\rb, RETURN => a\rb<TAB> RETURN
a\\b => a\\b
</pre>
 
=={{header|Phix}}==
Line 296 ⟶ 496:
{{libheader|Wren-str}}
Backslashes are only duplicated for escaped \t, \n and \r.
<syntaxhighlight lang="ecmascriptwren">import "./ioutil" for FileUtil
import "./str" for Str
 
9,488

edits