Jump to content

Text processing/1: Difference between revisions

no edit summary
(→‎{{header|Forth}}: more factored)
No edit summary
Line 508:
 
main bye
</lang>
 
=={{header|J}}==
<lang j>
NB. Compares all-at-once and block processing methods
 
require 'files' NB. for fapplylines adverb
 
NB. Utility verbs
mean=: +/ % #
summarize=: # , +/ , mean NB. count , sum , mean
filter=: #~ 0&< NB. keep left arg where 0 < right arg
 
NB. Longest run(s) of successive invalid measurements
findLongestRuns=: 3 : 0
badflags=. , 0 >: y
 
NB. define local utility verbs
getrunlengths=. [: #(;.1) 0 , }. *. }:
getidxmax=. >:@I.@e. >./
getdateidx=. (24 <.@%~ +/)@{.~ getidxmax
 
maxrun=. >./ getrunlengths badflags
enddates=. (Dates {~ getdateidx@getrunlengths) badflags
maxrun;enddates
)
 
NB. Report creating verbs
formatDailySumry=: dyad define
labels=. , ];.2 'Line: Accept: Line_tot: Line_avg: '
labels , x ,. 7j0 10j3 10j3 ": y
)
 
formatFileSumry=: dyad define
labels=. ];.2 'Total: Readings: Average: '
sumryvals=. (, %/) 1 0{ +/y
out=. labels ,. 12j3 12j0 12j3 ":&> sumryvals
'maxrun dates'=. x
out=. out,LF,'Maximum run(s) of ',(": maxrun),' consecutive false readings ends at line(s) starting with date(s): ',dates
)
 
NB. Process whole file at once
 
NB. Using addon package to parse file
Note 'Example session'
load 'tables/dsv' NB. use tables/dsv addon
dat=: TAB readdsv jpath '~temp/readings.txt'
Dates=: >{."1 dat
vals=: _99 ". >(1 + +: i.24){"1 dat
flags=: _99 ". >(2 + +: i.24){"1 dat
DailySumry=: vals summarize@filter"1 flags
MaxRuns=: findLongestRuns flags
(_4{.Dates) formatDailySumry _4{. DailySumry
MaxRuns formatFileSumry DailySumry
)
 
NB. Build custom parser instead of using addon package
 
NB. Processing verbs
parseLine=: 10&({. ,&< (_99&".;._1)@:}.)
 
processBlock=: monad define
'dates dat'=. |: parseLine;._2 CR -.~ y
vals=. (+: i.24){"1 dat
flags=. (>: +: i.24){"1 dat
sumry=. vals summarize@filter"1 flags
NB. sumry=. ([: summarize@filter/@|: _2 ]\ ])"1 dat
dates;sumry;flags NB. return block of dates and summaries
)
 
Note 'Exampe usage'
'Dates DailySumry'=: processBlock fread jpath '~temp/readings.txt'
(_4{.Dates) formatDailySumry _4{. DailySumry
formatFileSumry DailySumry
)
 
NB. Process file in blocks
NB. For huge files it might be desireable to read/process the file in blocks
 
NB. Example using freadblock to process blocks of lines
mungeDataBlocks=: monad define
sz=. fsize y
Dates=. 0 10$' '
DailySumry=. 0 3$0
Flags=. 0 24$0
strt=. 0
whilst. strt < sz do.
'dat newstrt'=. freadblock y;strt
'dates sumry flags'=.processBlock dat
Dates=. Dates,dates
DailySumry=. DailySumry,sumry
Flags=. Flags,flags
strt=. newstrt
end.
MaxRuns=. findLongestRuns Flags
smoutput (_4{. Dates) formatDailySumry _4{. DailySumry
smoutput ''
smoutput MaxRuns formatFileSumry DailySumry
Dates;DailySumry;Flags
)
 
Note 'Example usage'
'Dates DailySumry Flags'=: mungeDataBlocks jpath '~temp/readings.txt'
$ each Dates;DailySumry;Flags
)
 
 
NB. Example using fapplylines to process a line at a time.
processLine=: monad define
'dates dat'=. parseLine y
'vals flags'=. |: _2 ]\ dat
sumry=. vals summarize@filter flags
Dates=: Dates,dates NB. append to global
DailySumry=: DailySumry,sumry
Flags=: Flags,flags
)
 
mungeDataLines=: monad define
NB. initialize globals
Dates=: 0 10$' '
DailySumry=: 0 3$0
Flags=: 0 24$0
 
NB. read file in blocks of 1,000,000 bytes
NB. process line by line
processLine fapplylines y
 
MaxRuns=. findLongestRuns Flags
NB. Format output
smoutput (_4{. Dates) formatDailySumry _4{. DailySumry
smoutput ''
smoutput MaxRuns formatFileSumry DailySumry
)
 
Note 'Example usage'
mungeDataLines jpath '~temp/readings.txt'
$ each Dates;DailySumry
)
 
</lang>
 
892

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.