Text processing/1: Difference between revisions

jq
(jq)
Line 2,005:
 
Maximum run of 589 consecutive false readings ends at 1993-03-05</pre>
 
=={{header|jq}}==
{{works with|jq|with foreach}}
 
This article highlights jq's recently added "foreach" and "inputs" filters,
as they allow the input file to be processed efficiently on a line-by-line basis,
with minimal memory requirements.
 
The "foreach" syntax is:
<lang jq>foreach STREAM as $row ( INITIAL; EXPRESSION; VALUE ).</lang>
The basic idea is that for each $row in STREAM, the value specified by VALUE is emitted.
 
If we wished only to produce per-line synopses of the "readings.txt"
file, the following pattern could be used:
<lang jq>foreach (inputs | split("\t")) as $line (INITIAL; EXPRESSION; VALUE)</lang>
In order to distinguish the single-line synopsis from the whole-file synopsis, we will use the following pattern instead:
<lang jq>foreach ((inputs | split("\t")), null) as $line (INITIAL; EXPRESSION; VALUE)</lang>
The "null" is added so that the stream of per-line values can be distinguished from the last value in the stream.
 
In this section, the whole-file synopsis is focused on the runs of lines having at least one flag<=0. The maximal length of such runs is computed, and the starting line(s) and date(s) of all such runs are recorded.
 
One point of interest in the following program is the use of JSON objects to store values. This allows mnemonic names to be used instead of local variables.
<lang jq># Input: { "max": max_run_length,
# "starts": array_of_start_line_values, # of all the maximal runs
# "start_dates": array_of_start_dates # of all the maximal runs
# }
def report:
(.starts | length) as $l
| if $l == 1 then
"There is one maximal run of lines with flag<=0.",
"The maximal run has length \(.max) and starts at line \(.starts[0]) and has start date \(.start_dates[0])."
elif $l == 0 then
"There is no lines with flag<=0."
else
"There are \($l) maximal runs of lines with flag<=0.",
"These runs have length \(.max) and start at the following line numbers:",
"\(.starts)",
"The corresponding dates are:",
"\(.start_dates)"
end;
 
# "process" processes "tab-separated string values" on stdin
def process:
 
# Given a line in the form of an array [date, datum1, flag2, ...],
# "synopsis" returns [ number of data items on the line with flag>0, sum, number of data items on the line with flag<=0 ]
def synopsis: # of a line
. as $row
| reduce range(0; (length - 1) / 2) as $i
( [0,0,0];
($row[1+ (2*$i)] | tonumber) as $datum
| ($row[2+(2*$i)] | tonumber) as $flag
| if ($flag>0) then .[0] += 1 | .[1] += $datum else .[2] += 1 end );
 
# state: {"line": line_number # (first line is line 0)
# "synopis": _, # value returned by "synopsis"
# "start": line_number_of_start_of_current_run,
# "start_date": date_of_start_of_current_run,
# "length": length_of_current_run # so far
# "max": max_run_length # so far
# "starts": array_of_start_values # of all the maximal runs
# "start_dates": array_of_start_dates # of all the maximal runs
# }
foreach ((inputs | split("\t")), null) as $line # null signals END
# Slots are effectively initialized by default to null
( { "line": -1, "length": 0, "max": 0, "starts": [], "start_dates": [] };
if $line == null then .line = null
else
.line += 1
# | debug
# synopsis returns [number with flag>0, sum, number with flag<=0 ]
| .synopsis = ($line | synopsis)
| if .synopsis[2] > 0 then
if .start then . else .start = .line | .start_date = $line[0] end
| .length += 1
| if .max < .length then
(.max = .length)
| .starts = [ .start ]
| .start_dates = [ .start_date ]
elif .max == .length then
.starts += [ .start ]
| .start_dates += [ .start_date ]
else .
end
else .start = null | .length = 0
end
end;
.)
| if .line == null then {max, starts, start_dates} | report
else .synopsis
end;
 
process</lang>
 
{{out}}
<lang sh>$ jq -c -n -R -r -f Text_processing_1.jq readings.txt
[22,590,2]
[24,410,0]
...
[23,47.3,1]
There is one maximal run of lines with flag<=0.
The maximal run has length 93 and starts at line 5378 and has start date 2004-09-30.</lang>
 
=={{header|Lua}}==
2,442

edits