Jump to content

Text processing/1: Difference between revisions

→‎{{header|J}}: Add some structure and discussion
(→‎{{header|J}}: break up code and usage)
(→‎{{header|J}}: Add some structure and discussion)
Line 511:
 
=={{header|J}}==
J is an array language and as such is most efficient when performing a few operations on big arrays of data at once rather than many operations on small arrays of data. However sometimes, when working with very big files it is desirable to work on blocks of lines at a time. Below are examples of both.
 
=== Common utilities ===
Compares all-at-once and block processing methods
 
First load <tt>files</tt> library and define some utility verbs for use in all examples.
 
<lang j>
require 'files' NB. for fread/freadblock/fapplylines
Line 552 ⟶ 553:
</lang>
 
Process whole file at once
 
=== Process whole file at once ===
Using addon package to parse file
 
==== Using addon package to parse file ====
J has a growing Addon library that contains packages contributed by users but not installed as part of the system. A Package Manager is available to browse, download & install addons.
 
The <tt>tables/dsv</tt> addon provides verbs for working with '''d'''elimiter-'''s'''eparated-'''v'''alue files.
 
Below is a copy of a J session:
<lang j>
load 'tables/dsv' NB. use tables/dsv addon
Line 570 ⟶ 576:
</lang>
 
==== Build custom parser instead of using addon package====
Sometimes a parser isn't available for the file format you are working with. Also for simpler file structures, better performance can sometimes be achieved by building a custom parser instead of using an addon package that is generalized to deal with many possible variants.
 
<lang j>
Line 586 ⟶ 593:
</lang>
 
ExampeExample usage:
<lang j>
'Dates DailySumry Flags'=: processBlock fread jpath '~temp/readings.txt'
MaxRuns=: findLongestRuns Flags
 
(_4{.Dates) formatDailySumry _4{. DailySumry
NB. add output as for example above
 
MaxRuns formatFileSumry DailySumry
NB. add output
NB. output as for example above
</lang>
 
=== Process file in blockschunks ===
For huge files it might be desireabledesirable to read/process the file in blockschunks.
 
==== Process blocks at a time ====
The <tt>freadblock</tt> verb reads 1,000,000 chunks of the file, discarding trailing part lines and returning the new starting position for the following chunk to read.
 
Example using <tt>freadblock</tt> to process blocks of lines
<lang j>
mungeDataBlocks=: monad define
Line 625 ⟶ 638:
<lang j>
'Dates DailySumry Flags'=: mungeDataBlocks jpath '~temp/readings.txt'
NB. add output as for example above
$ each Dates;DailySumry;Flags
NB. add output as for example above
</lang>
 
==== Process lines at a time ====
Because this results in more operations on smaller arrays it is likely to be slower in J than the preceding approaches.
 
Example using fapplylines to process a line at a time.
892

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.