Convert CSV records to TSV: Difference between revisions
m
→{{header|Wren}}: Changed to Wren S/H
m (→{{header|J}}) |
m (→{{header|Wren}}: Changed to Wren S/H) |
||
(14 intermediate revisions by 4 users not shown) | |||
Line 14:
; A CSV record
[[Category:PEG]]
Our starting point will be a character set that includes ASCII; the language
of regular expressions (which will be denoted by strings of the form
r'REGEX'); and the following [[:Category:PEG|PEG]] (parsing expression grammar) grammar for a
single CSV record:
<pre>
Line 113 ⟶ 114:
aRETURNb, Be sure to change RETURN to the '\r' control character (#xd)
a\b
</pre>
[[category:CSV]]
[[category:TSV]]
=={{header|ALGOL 68}}==
All input \ characters are doubled in the output.<br>
As with some of the other samples, the input data is stored in an array, not read from a file.
<syntaxhighlight lang="algol68">
BEGIN # convert lines of CSV to TSV #
CHAR nul = REPR 0;
CHAR tab = REPR 9;
CHAR lf = REPR 10;
CHAR cr = REPR 13;
# returns s with some control characters converted to <name> #
PROC show ctrl = ( STRING s )STRING:
BEGIN
STRING result := "";
FOR i FROM LWB s TO UPB s DO
result +:= IF s[ i ] = nul THEN "<nul>"
ELIF s[ i ] = tab THEN "<tab>"
ELIF s[ i ] = cr THEN "<cr>"
ELIF s[ i ] = lf THEN "<lf>"
ELSE s[ i ]
FI
OD;
result
END # show ctrl # ;
# returns csv converted to TSV #
PROC csv2tsv = ( STRING csv )STRING:
BEGIN
BOOL at end := FALSE;
CHAR ch := nul;
# sets ch to the next character in csv, if there is one #
PROC next = VOID: ch := IF c pos < c max
THEN csv[ c pos +:= 1 ]
ELSE at end := TRUE
; nul
FI;
# skips over spaces and returns the count of skipped spaces #
PROC spaces = INT:
BEGIN
INT s count := 0;
WHILE NOT at end AND ch = " " DO s count +:= 1; next OD;
s count
END # spaces # ;
# adds ch to the TSV - converting some characters to escaped form #
PROC add = VOID: tsv +:= IF ch = "\" THEN "\\"
ELIF ch = nul THEN "\0"
ELIF ch = cr THEN "\r"
ELIF ch = lf THEN "\n"
ELIF ch = tab THEN "\t"
ELSE ch
FI;
# parse the csv and generate the tsv #
STRING tsv := "";
INT c pos := LWB csv - 1;
INT c max = UPB csv;
WHILE NOT at end DO
# spaces are not significant around quoted fields but are part of unquoted fields #
next;
INT space count := spaces;
IF ch = """" THEN
# quoted field part #
WHILE next;
WHILE NOT at end AND ch /= """" DO add; next OD;
IF NOT at end THEN
next;
IF ch = """" THEN
# embedded quote #
add
FI
FI;
NOT at end AND ch = """"
DO SKIP OD;
space count := spaces;
IF at end OR ch = "," THEN
# nothing significant after the quoted field part #
space count := 0
FI
FI;
# unquoted field part #
tsv +:= space count * " ";
WHILE NOT at end AND ch /= "," DO add; next OD;
IF ch = "," THEN
# have another field following this one #
tsv +:= tab
FI
OD;
tsv
END # csv2tsv # ;
# task test cases #
[]STRING tests =
( "a,""b"""
, """a"",""b""""c"""
, ""
, ",a"
, "a,"""
, " a , ""b"""
, """12"",34"
, "a" + tab + "b, That is a TAB character"
, "a\tb"
, "a\n\rb"
, "a" + nul + "b, That is a NUL character"
, "a" + cr + "b, Be sure to change RETURN to the '\r' control character (#xd)"
, "a\b"
);
FOR i FROM LWB tests TO UPB tests DO
print( ( " {{", show ctrl( tests[ i ] ), "}}", newline
, " -> {{", show ctrl( csv2tsv( tests[ i ] ) ), "}}", newline
)
)
OD
END
</syntaxhighlight>
{{out}}
<pre>
{{a,"b"}}
-> {{a<tab>b}}
{{"a","b""c"}}
-> {{a<tab>b"c}}
{{}}
-> {{}}
{{,a}}
-> {{<tab>a}}
{{a,"}}
-> {{a<tab>}}
{{ a , "b"}}
-> {{ a <tab>b}}
{{"12",34}}
-> {{12<tab>34}}
{{a<tab>b, That is a TAB character}}
-> {{a\tb<tab> That is a TAB character}}
{{a\tb}}
-> {{a\\tb}}
{{a\n\rb}}
-> {{a\\n\\rb}}
{{a<nul>b, That is a NUL character}}
-> {{a\0b<tab> That is a NUL character}}
{{a<cr>b, Be sure to change RETURN to the '\r' control character (#xd)}}
-> {{a\rb<tab> Be sure to change RETURN to the '\\r' control character (#xd)}}
{{a\b}}
-> {{a\\b}}
</pre>
Line 157 ⟶ 301:
Here, also, we interpret "nonsense" as starting immediately after a closing quote which is not followed by a delimiter, and ending immediately before the most immediately following newline.
For csv parsing we first break out fields using <b><tt>[[j:Vocabulary/semico#dyadic|;:]]</tt></b>. Here, each field is preceded by a delimiter. (We discard an optional trailing newline from the csv text and prepend a newline at the beginning so that every field has a preceding delimiter. Also, of course, if we were given a file reference, we work with the text of the file rather than its name.)
Then, these fields are formed into rows (fields which
To translate to tsv form, we would first escape special characters in each field, then insert delimiters between each field and terminate each record with a newline. Thus, mat2tsv takes a two-dimensional matrix and its result is a tsv string. (For utility, mat2tsv also supports numeric matrices, since that was trivial.)
Task example:
Line 183 ⟶ 327:
'''The following program can also be used with gojq, the Go implementation of jq,
but until recently NUL (#x0)
In this entry, the PEG grammar for "record" as defined in the task
Line 189 ⟶ 333:
closely to jq operators, notably PEG's '/' to jq's '//'.
In translating the PEG grammar to a jq program, the main idea is to define a
pipeline for each grammar rule. A JSON object with keys "remainder" and
"result" is passed through this pipeline, consuming the text in .remainder and
Line 227 ⟶ 371:
def record: field | star(consume(",") | field);
def csv2tsv:
{remainder: .} | record | .result | @tsv ;
# Transform an entire file
inputs | csv2tsv
</syntaxhighlight>
{{output}}
As required:
* Backslashes are uniformly duplicated.
* Until recently gojq did not handle NUL (#x0) properly.
=={{header|Julia}}==
{{trans|Phix}}
<syntaxhighlight lang="julia">function csv_tsv(str)
p = split(str, ",")
for (i, f) in enumerate(p)
if count(==('"'), f) > 1
p[i] = replace(strip(f, [' ', '"']), "\"\"" => "\"")
elseif f == "\""
p[i] = ""
end
end
t = join(p,"<TAB>")
s = replace(str, "\\" => "\\\\", "\t" => "\\t", "\0" => "\\0", "\n" => "\\n", "\r" => "\\r")
t = replace(t, "\\" => "\\\\", "\t" => "\\t", "\0" => "\\0", "\n" => "\\n", "\r" => "\\r")
return s, t
end
const testfile = "test.tmp"
fh = open(testfile, "w")
write(fh, """
a,"b"
"a","b""c"
,a
a,"
a , "b"
"12",34
a\tb, TAB
a\\tb
a\\n\\rb
a\0b, NUL
a\rb, RETURN
a\\b""")
close(fh)
for test_string in split(read(testfile, String), "\n")
csv, tsv = csv_tsv(test_string)
println(lpad(csv, 12), " => ", tsv)
end
</syntaxhighlight>{{out}}
<pre>
a,"b" => a<TAB>b
"a","b""c" => a<TAB>b"c
=>
,a => <TAB>a
a," => a<TAB>
a , "b" => a <TAB>b
"12",34 => 12<TAB>34
a\tb, TAB => a\tb<TAB> TAB
a\\tb => a\\tb
a\\n\\rb => a\\n\\rb
a\0b, NUL => a\0b<TAB> NUL
a\rb, RETURN => a\rb<TAB> RETURN
a\\b => a\\b
</pre>
=={{header|Phix}}==
Line 296 ⟶ 496:
{{libheader|Wren-str}}
Backslashes are only duplicated for escaped \t, \n and \r.
<syntaxhighlight lang="
import "./str" for Str
|