Text processing/1: Difference between revisions

Content added Content deleted
(→‎Process whole file at once: Add example output)
(→‎Process file in chunks: add commentary)
Line 648: Line 648:
<lang j>
<lang j>
'Dates DailySumry Flags'=: mungeDataBlocks jpath '~temp/readings.txt'
'Dates DailySumry Flags'=: mungeDataBlocks jpath '~temp/readings.txt'
Line: Accept: Line_tot: Line_avg:
NB. output as for example above
2004-12-28 23 77.800 3.383
$ each Dates;DailySumry;Flags
2004-12-29 23 56.300 2.448
NB. output as for example above
2004-12-30 23 65.300 2.839
2004-12-31 23 47.300 2.057

Total: 1358393.400
Readings: 129403
Average: 10.497

Maximum run(s) of 589 consecutive false readings ends at line(s) starting with date(s): 1993-03-05

$ each Dates;DailySumry;Flags NB. show array shapes (rows cols) of each of the nouns defined
+-------+------+-------+
|5471 10|5471 3|5471 24|
+-------+------+-------+
</lang>
</lang>


==== Process lines at a time ====
==== Process lines at a time ====
The <tt>fapplyines</tt> adverb reads 1,000,000 chunks of the file, retaining trailing part lines in a buffer, it then processes the complete lines, line-by-line before reading the next chunk from the file.
Because this results in more operations on smaller arrays it is likely to be slower in J than the preceding approaches.

Because this approach results in more operations on smaller arrays it is likely to be slower in J than the preceding methods.


Example using fapplylines to process a line at a time.
Example using the <tt>fapplylines</tt> adverb to process a line at a time.
<lang j>
<lang j>
processLine=: monad define
processLine=: monad define
Line 687: Line 702:
Example usage
Example usage
<lang j>
<lang j>
mungeDataLines jpath '~temp/readings.txt'
mungeDataLines jpath '~temp/readings.txt'
NB. add output
NB. output as for previous example
$ each Dates;DailySumry
$ each Dates;DailySumry;MaxRuns
NB. add output
NB. output as for previous example
</lang>
</lang>