Text processing/2: Difference between revisions

m
m (→‎{{header|Wren}}: Minor tidy)
 
(44 intermediate revisions by 24 users not shown)
Line 1:
{{task|Text processing}}
The following data shows a few lines from the file readings.txt (as used in the [[Data Munging]] task).
 
The following task concerns data comesthat came from a pollution monitoring station with twenty -four instruments monitoring twenty -four aspects of pollution in the air. Periodically a record is added to the file, each record constitutingbeing a line of 49 white-spacefields separated fields, whereby white-space, which can be one or more space or tab characters.
 
The fields (from the left) are:
DATESTAMP [ VALUEn FLAGn ] * 24
i.e. a datestamp followed by twenty -four repetitions of a floating -point instrument value and that instrumentsinstrument's associated integer flag. Flag values are >= 1 if the instrument is working and < 1 if there is some problem with that instrumentit, in which case that instrument's value should be ignored.
 
A sample from the full data file [http://rosettacode.org/resources/readings.zip readings.txt], which is also used in the [[Text processing/1]] task, follows:
 
<pre style="height:17ex;overflow:scroll">
Data is no longer available at that link. Zipped mirror available [https://github.com/thundergnat/rc/blob/master/resouces/readings.zip here]
<pre>
1991-03-30 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1
1991-03-31 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 20.000 1 20.000 1 20.000 1 35.000 1 50.000 1 60.000 1 40.000 1 30.000 1 30.000 1 30.000 1 25.000 1 20.000 1 20.000 1 20.000 1 20.000 1 20.000 1 35.000 1
Line 18 ⟶ 19:
</pre>
 
;Task:
The task:
# Confirm the general field format of the file.
# Identify any DATESTAMPs that are duplicated.
# WhatReport the number of records that have good readings for all instruments.
<br><br>
 
=={{header|11l}}==
{{trans|Python}}
 
<syntaxhighlight lang="11l">V debug = 0B
V datePat = re:‘\d{4}-\d{2}-\d{2}’
V valuPat = re:‘[-+]?\d+\.\d+’
V statPat = re:‘-?\d+’
V totalLines = 0
Set[String] dupdate
Set[String] badform
Set[String] badlen
V badreading = 0
Set[String] datestamps
 
L(line) File(‘readings.txt’).read().rtrim("\n").split("\n")
totalLines++
V fields = line.split("\t")
V date = fields[0]
V pairs = (1 .< fields.len).step(2).map(i -> (@fields[i], @fields[i + 1]))
 
V lineFormatOk = datePat.match(date)
& all(pairs.map(p -> :valuPat.match(p[0])))
& all(pairs.map(p -> :statPat.match(p[1])))
I !lineFormatOk
I debug
print(‘Bad formatting ’line)
badform.add(date)
 
I pairs.len != 24 | any(pairs.map(p -> Int(p[1]) < 1))
I debug
print(‘Missing values ’line)
I pairs.len != 24
badlen.add(date)
I any(pairs.map(p -> Int(p[1]) < 1))
badreading++
 
I date C datestamps
I debug
print(‘Duplicate datestamp ’line)
dupdate.add(date)
 
datestamps.add(date)
 
print("Duplicate dates:\n "sorted(Array(dupdate)).join("\n "))
print("Bad format:\n "sorted(Array(badform)).join("\n "))
print("Bad number of fields:\n "sorted(Array(badlen)).join("\n "))
print("Records with good readings: #. = #2.2%\n".format(
totalLines - badreading, (totalLines - badreading) / Float(totalLines) * 100))
print(‘Total records: ’totalLines)</syntaxhighlight>
 
{{out}}
<pre>
Duplicate dates:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
Bad format:
 
Bad number of fields:
 
Records with good readings: 5017 = 91.70%
 
Total records: 5471
</pre>
 
=={{header|Ada}}==
{{libheader|Simple components for Ada}}
<langsyntaxhighlight lang="ada">with Ada.Calendar; use Ada.Calendar;
with Ada.Text_IO; use Ada.Text_IO;
with Strings_Edit; use Strings_Edit;
Line 87 ⟶ 156:
Close (File);
Put_Line ("Valid records " & Image (Count) & " of " & Image (Line_No) & " total");
end Data_Munging_2;</langsyntaxhighlight>
Sample output
<pre>
Line 99 ⟶ 168:
 
=={{header|Aime}}==
<syntaxhighlight lang="aime">check_format(list l)
<lang aime>void
check_format(list l)
{
integer i;
text s;
 
if (l_length(~l) != 49) {
error("wrongbad numberfield of fieldscount");
}
 
s = lf_q_text(l)[0];
if (lengthmatch(s) != 10 || s[4] != '"????-' ||??-??", s[7] != '-')) {
error("bad date format");
}
atoi(l[0] = s.delete(7).delete(s, 7), 4)).atoi;
 
i = 1;
while (i < 49) {
l_r_real(l, i, atof(l_q_text(l, [i))]);
i += 1;
l_r_integer(l, [i, >> 1] = atoi(l_q_text(l, [i))]);
i += 1;
}
 
l.erase(25, -1);
}
 
integer
main(void)
{
integer goods, i, v;
file f;
list l;
recordindex rx;
 
goods = 0;
 
f_affix(f, .affix("readings.txt");
 
while (f_list(f, .list(l, 0) != -1) {
if (!trap(check_format, l)) {
if (r_key(r,x[v l_head= lf_x_integer(l)] += 1) != 1) {
v_form("duplicate ~ line\n", l_head(l)v);
} else {
integer i;
 
r_put(r, l_head(l), 0);
i = 2;
while (i < 49) {
if (l_q_integer(l, i) != 1) {
break;
}
i += 2;
}
if (49 < i) {
goods += 1;
}
}
 
i = 1;
l.ucall(min_i, 1, i);
goods += iclip(0, i, 1);
}
}
 
o_integero_(goods, " good lines\n");
o_text(" good unique lines\n");
 
return 0;
}</langsyntaxhighlight>
{{out}} (the "reading.txt" needs to be converted to UNIX end-of-line)
<pre>duplicate 1990-03-2519900325 line
duplicate 1991-03-3119910331 line
duplicate 1992-03-2919920329 line
duplicate 1993-03-2819930328 line
duplicate 1995-03-2619950326 line
50135017 good unique lines</pre>
 
 
=={{header|Amazing Hopper}}==
{{Trans|AWK}}
<syntaxhighlight lang="c">
#include <basico.h>
 
algoritmo
 
número de campos correcto = `awk 'NF != 49' basica/readings.txt`
 
fechas repetidas = `awk '++count[$1] >= 2{print $1, "(",count[$1],")"}' basica/readings.txt`
 
resultados buenos = `awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2*i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec*100,"%"}' basica/readings.txt`
 
"Check field number by line: ", #( !(number(número de campos correcto)) ? "Ok\n" : "Nok\n";),\
"\nCheck duplicated dates:\n", fechas repetidas,NL, \
"Number of records have good readings for all instruments:\n",resultados buenos,\
"(including "
fijar separador( NL )
contar tokens en 'fechas repetidas'
" duplicated records)\n", luego imprime todo
 
terminar
</syntaxhighlight>
{{out}}
<pre>
Check field number by line: Ok
 
Check duplicated dates:
1990-03-25 ( 2 )
1991-03-31 ( 2 )
1992-03-29 ( 2 )
1993-03-28 ( 2 )
1995-03-26 ( 2 )
 
Number of records have good readings for all instruments:
Total records 5471 OK records 5017 or 91,7017 %
(including 5 duplicated records)
 
</pre>
 
=={{header|AutoHotkey}}==
 
<langsyntaxhighlight lang="autohotkey">; Author: AlephX Aug 17 2011
data = %A_scriptdir%\readings.txt
 
Line 225 ⟶ 325:
msgbox, Duplicate Dates:`n%wrongDates%`nRead Lines: %lines%`nValid Lines: %valid%`nwrong lines: %totwrong%`nDuplicates: %TotWrongDates%`nWrong Formatted: %unvalidformat%`n
</syntaxhighlight>
</lang>
 
Sample Output:
Line 252 ⟶ 352:
 
If their are any scientific notation fields then their will be an e in the file:
<langsyntaxhighlight lang="awk">bash$ awk '/[eE]/' readings.txt
bash$</langsyntaxhighlight>
Quick check on the number of fields:
<langsyntaxhighlight lang="awk">bash$ awk 'NF != 49' readings.txt
bash$</langsyntaxhighlight>
Full check on the file format using a regular expression:
<langsyntaxhighlight lang="awk">bash$ awk '!(/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+)+$/ && NF==49)' readings.txt
bash$</langsyntaxhighlight>
Full check on the file format as above but using regular expressions allowing intervals (gnu awk):
<langsyntaxhighlight lang="awk">bash$ awk --re-interval '!(/^[0-9]{4}-[0-9]{2}-[0-9]{2}([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+){24}+$/ )' readings.txt
bash$</langsyntaxhighlight>
 
 
Line 268 ⟶ 368:
 
Accomplished by counting how many times the first field occurs and noting any second occurrences.
<langsyntaxhighlight lang="awk">bash$ awk '++count[$1]==2{print $1}' readings.txt
1990-03-25
1991-03-31
Line 274 ⟶ 374:
1993-03-28
1995-03-26
bash$</langsyntaxhighlight>
 
 
Line 280 ⟶ 380:
 
<div style="width:100%;overflow:scroll">
<langsyntaxhighlight lang="awk">bash$ awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2*i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec*100,"%"}' readings.txt
Total records 5471 OK records 5017 or 91.7017 %
bash$</langsyntaxhighlight>
</div>
 
=={{header|C}}==
<langsyntaxhighlight lang="c">#include <stdio.h>
#include <string.h>
#include <stdlib.h>
Line 369 ⟶ 469:
read_file("readings.txt");
return 0;
}</langsyntaxhighlight>
 
{{out}}
Line 380 ⟶ 480:
 
5017 out 5471 lines good
</pre>
 
=={{header|C++}}==
{{libheader|Boost}}
<lang cpp>#include <boost/regex.hpp>
#include <fstream>
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <cstdlib>
#include <algorithm>
using namespace std ;
 
boost::regex e ( "\\s+" ) ;
 
int main( int argc , char *argv[ ] ) {
ifstream infile( argv[ 1 ] ) ;
vector<string> duplicates ;
set<string> datestamps ; //for the datestamps
if ( ! infile.is_open( ) ) {
cerr << "Can't open file " << argv[ 1 ] << '\n' ;
return 1 ;
}
int all_ok = 0 ;//all_ok for lines in the given pattern e
int pattern_ok = 0 ; //overall field pattern of record is ok
while ( infile ) {
string eingabe ;
getline( infile , eingabe ) ;
boost::sregex_token_iterator i ( eingabe.begin( ), eingabe.end( ) , e , -1 ), j ;//we tokenize on empty fields
vector<string> fields( i, j ) ;
if ( fields.size( ) == 49 ) //we expect 49 fields in a record
pattern_ok++ ;
else
cout << "Format not ok!\n" ;
if ( datestamps.insert( fields[ 0 ] ).second ) { //not duplicated
int howoften = ( fields.size( ) - 1 ) / 2 ;//number of measurement
//devices and values
for ( int n = 1 ; atoi( fields[ 2 * n ].c_str( ) ) >= 1 ; n++ ) {
if ( n == howoften ) {
all_ok++ ;
break ;
}
}
}
else {
duplicates.push_back( fields[ 0 ] ) ;//first field holds datestamp
}
}
infile.close( ) ;
cout << "The following " << duplicates.size() << " datestamps were duplicated:\n" ;
copy( duplicates.begin( ) , duplicates.end( ) ,
ostream_iterator<string>( cout , "\n" ) ) ;
cout << all_ok << " records were complete and ok!\n" ;
return 0 ;
}</lang>
 
{{out}}
<pre>
Format not ok!
The following 6 datestamps were duplicated:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
2004-12-31
</pre>
 
=={{header|C sharp|C#}}==
<langsyntaxhighlight lang="csharp">using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
Line 521 ⟶ 554:
}
}
}</langsyntaxhighlight>
 
<pre>
Line 530 ⟶ 563:
1993-03-28 is duplicated at Lines : 1183,1184
1995-03-26 is duplicated at Lines : 1910,1911
</pre>
 
=={{header|C++}}==
{{libheader|Boost}}
<syntaxhighlight lang="cpp">#include <boost/regex.hpp>
#include <fstream>
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <cstdlib>
#include <algorithm>
using namespace std ;
 
boost::regex e ( "\\s+" ) ;
 
int main( int argc , char *argv[ ] ) {
ifstream infile( argv[ 1 ] ) ;
vector<string> duplicates ;
set<string> datestamps ; //for the datestamps
if ( ! infile.is_open( ) ) {
cerr << "Can't open file " << argv[ 1 ] << '\n' ;
return 1 ;
}
int all_ok = 0 ;//all_ok for lines in the given pattern e
int pattern_ok = 0 ; //overall field pattern of record is ok
while ( infile ) {
string eingabe ;
getline( infile , eingabe ) ;
boost::sregex_token_iterator i ( eingabe.begin( ), eingabe.end( ) , e , -1 ), j ;//we tokenize on empty fields
vector<string> fields( i, j ) ;
if ( fields.size( ) == 49 ) //we expect 49 fields in a record
pattern_ok++ ;
else
cout << "Format not ok!\n" ;
if ( datestamps.insert( fields[ 0 ] ).second ) { //not duplicated
int howoften = ( fields.size( ) - 1 ) / 2 ;//number of measurement
//devices and values
for ( int n = 1 ; atoi( fields[ 2 * n ].c_str( ) ) >= 1 ; n++ ) {
if ( n == howoften ) {
all_ok++ ;
break ;
}
}
}
else {
duplicates.push_back( fields[ 0 ] ) ;//first field holds datestamp
}
}
infile.close( ) ;
cout << "The following " << duplicates.size() << " datestamps were duplicated:\n" ;
copy( duplicates.begin( ) , duplicates.end( ) ,
ostream_iterator<string>( cout , "\n" ) ) ;
cout << all_ok << " records were complete and ok!\n" ;
return 0 ;
}</syntaxhighlight>
 
{{out}}
<pre>
Format not ok!
The following 6 datestamps were duplicated:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
2004-12-31
</pre>
 
 
=={{header|Clojure}}==
<syntaxhighlight lang="clojure">
(defn parse-line [s]
(let [[date & data-toks] (str/split s #"\s+")
data-fields (map read-string data-toks)
valid-date? (fn [s] (re-find #"\d{4}-\d{2}-\d{2}" s))
valid-line? (and (valid-date? date)
(= 48 (count data-toks))
(every? number? data-fields))
readings (for [[v flag] (partition 2 data-fields)]
{:val v :flag flag})]
(when (not valid-line?)
(println "Malformed Line: " s))
{:date date
:no-missing-readings? (and (= 48 (count data-toks))
(every? pos? (map :flag readings)))}))
 
(defn analyze-file [path]
(reduce (fn [m line]
(let [{:keys [all-dates dupl-dates n-full-recs invalid-lines]} m
this-date (:date line)
dupl? (contains? all-dates this-date)
full? (:no-missing-readings? line)]
(cond-> m
dupl? (update-in [:dupl-dates] conj this-date)
full? (update-in [:n-full-recs] inc)
true (update-in [:all-dates] conj this-date))))
{:dupl-dates #{} :all-dates #{} :n-full-recs 0}
(->> (slurp path)
clojure.string/split-lines
(map parse-line))))
 
(defn report-summary [path]
(let [m (analyze-file path)]
(println (format "%d unique dates" (count (:all-dates m))))
(println (format "%d duplicated dates [%s]"
(count (:dupl-dates m))
(clojure.string/join " " (sort (:dupl-dates m)))))
(println (format "%d lines with no missing data" (:n-full-recs m)))))
</syntaxhighlight>
 
{{out}}
<pre>
5466 unique dates
5 duplicated dates [1990-03-25 1991-03-31 1992-03-29 1993-03-28 1995-03-26]
5017 lines with no missing data
</pre>
 
=={{header|COBOL}}==
{{works with|OpenCOBOL}}
<langsyntaxhighlight lang="cobol"> IDENTIFICATION DIVISION.
PROGRAM-ID. text-processing-2.
 
Line 696 ⟶ 845:
INSPECT input-data (offset:) TALLYING data-len
FOR CHARACTERS BEFORE delim
.</langsyntaxhighlight>
 
{{out}}
Line 711 ⟶ 860:
 
=={{header|D}}==
<langsyntaxhighlight lang="d">void main() {
import std.stdio, std.array, std.string, std.regex, std.conv,
std.algorithm;
Line 747 ⟶ 896:
repeatedDates.byKey.filter!(k => repeatedDates[k] > 1));
writeln("Good reading records: ", goodReadings);
}</langsyntaxhighlight>
{{out}}
<pre>Duplicated timestamps: 1990-03-25, 1991-03-31, 1992-03-29, 1993-03-28, 1995-03-26
Good reading records: 5017</pre>
 
=={{header|Eiffel}}==
<syntaxhighlight lang="eiffel">
class
APPLICATION
 
create
make
 
feature
 
make
-- Finds double date stamps and wrong formats.
local
found: INTEGER
double: STRING
do
read_wordlist
fill_hash_table
across
hash as h
loop
if h.key.has_substring ("_double") then
io.put_string ("Double date stamp: %N")
double := h.key
double.remove_tail (7)
io.put_string (double)
io.new_line
end
if h.item.count /= 24 then
io.put_string (h.key.out + " has the wrong format. %N")
found := found + 1
end
end
io.put_string (found.out + " records have not 24 readings.%N")
good_records
end
 
good_records
-- Number of records that have flag values > 0 for all readings.
local
count, total: INTEGER
end_date: STRING
do
create end_date.make_empty
across
hash as h
loop
count := 0
across
h.item as d
loop
if d.item.flag > 0 then
count := count + 1
end
end
if count = 24 then
total := total + 1
end
end
io.put_string ("%NGood records: " + total.out + ". %N")
end
 
original_list: STRING = "readings.txt"
 
read_wordlist
--Preprocesses data in 'data'.
local
l_file: PLAIN_TEXT_FILE
do
create l_file.make_open_read_write (original_list)
l_file.read_stream (l_file.count)
data := l_file.last_string.split ('%N')
l_file.close
end
 
data: LIST [STRING]
 
fill_hash_table
--Fills 'hash' using the date as key.
local
by_dates: LIST [STRING]
date: STRING
data_tup: TUPLE [val: REAL; flag: INTEGER]
data_arr: ARRAY [TUPLE [val: REAL; flag: INTEGER]]
i: INTEGER
do
create hash.make (data.count)
across
data as d
loop
if not d.item.is_empty then
by_dates := d.item.split ('%T')
date := by_dates [1]
by_dates.prune (date)
create data_tup
create data_arr.make_empty
from
i := 1
until
i > by_dates.count - 1
loop
data_tup := [by_dates [i].to_real, by_dates [i + 1].to_integer]
data_arr.force (data_tup, data_arr.count + 1)
i := i + 2
end
hash.put (data_arr, date)
if not hash.inserted then
date.append ("_double")
hash.put (data_arr, date)
end
end
end
end
 
hash: HASH_TABLE [ARRAY [TUPLE [val: REAL; flag: INTEGER]], STRING]
 
end
</syntaxhighlight>
{{out}}
<pre>
Double date stamp:
1990-03-25
Double date stamp:
1991-03-31
Double date stamp:
1992-03-29
Double date stamp:
1993-03-28
Double date stamp:
1995-03-26
0 records have not 24 readings.
 
Good records: 5017.
</pre>
 
=={{header|Erlang}}==
Uses function from [[Text_processing/1]]. It does some correctness checks for us.
<syntaxhighlight lang="erlang">
<lang Erlang>
-module( text_processing2 ).
 
Line 786 ⟶ 1,070:
 
value_flag_records() -> 24.
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 795 ⟶ 1,079:
 
=={{header|F Sharp|F#}}==
<langsyntaxhighlight lang="fsharp">
let file = @"readings.txt"
 
Line 815 ⟶ 1,099:
ok <- ok + 1
printf "%d records were ok\n" ok
</syntaxhighlight>
</lang>
Prints:
<langsyntaxhighlight lang="fsharp">
Date 1990-03-25 is duplicated
Date 1991-03-31 is duplicated
Line 824 ⟶ 1,108:
Date 1995-03-26 is duplicated
5017 records were ok
</syntaxhighlight>
</lang>
 
=={{header|Factor}}==
{{works with|Factor|0.99 2020-03-02}}
<syntaxhighlight lang="factor">USING: io io.encodings.ascii io.files kernel math math.parser
prettyprint sequences sequences.extras sets splitting ;
 
: check-format ( seq -- )
[ " \t" split length 49 = ] all?
"Format okay." "Format not okay." ? print ;
 
"readings.txt" ascii file-lines [ check-format ] keep
[ "Duplicates:" print [ "\t" split1 drop ] map duplicates . ]
[ [ " \t" split rest <odds> [ string>number 0 <= ] none? ] count ]
bi pprint " records were good." print</syntaxhighlight>
{{out}}
<pre>
Format okay.
Duplicates:
{
"1990-03-25"
"1991-03-31"
"1992-03-29"
"1993-03-28"
"1995-03-26"
}
5017 records were good.
</pre>
 
=={{header|Fortran}}==
The trouble with the dates rather suggests that they should be checked for correctness in themselves, and that the sequence check should be that each new record advances the date by one day. Daynumber calculations were long ago presented by H. F. Fliegel and T.C. van Flandern, in Communications of the ACM, Vol. 11, No. 10 (October, 1968).
 
Rather than copy today's data to a PDATA holder so that on the next read the new data may be compared to the old, a two-row array is used, with IT flip-flopping 1,2,1,2,1,2,... Comparison of the data as numerical values rather than text strings means that different texts that evoke the same value will not be regarded as different. If the data format were invalid, there would be horrible messages. There aren't, so ... the values should be read and plotted...
 
<syntaxhighlight lang="fortran">
Crunches a set of hourly data. Starts with a date, then 24 pairs of value,indicator for that day, on one line.
INTEGER Y,M,D !Year, month, and day.
INTEGER GOOD(24,2) !The indicators.
REAL*8 V(24,2) !The grist.
CHARACTER*10 DATE(2) !Along with the starting date.
INTEGER IT,TI !A flipper and its antiflipper.
INTEGER NV !Number of entirely good records.
INTEGER I,NREC,HIC !Some counters.
LOGICAL INGOOD !State flipper for the runs of data.
INTEGER IN,MSG !I/O mnemonics.
CHARACTER*666 ACARD !Scratchpad, of sufficient length for all expectation.
IN = 10 !Unit number for the input file.
MSG = 6 !Output.
OPEN (IN,FILE="Readings1.txt", FORM="FORMATTED", !This should be a function.
1 STATUS ="OLD",ACTION="READ") !Returning success, or failure.
NV = 0 !No pure records seen.
NREC = 0 !No records read.
HIC = 0 !Provoking no complaints.
DATE = "snargle" !No date should look like this!
IT = 2 !Syncopation for the 1-2 flip flop.
Chew into the file.
10 READ (IN,11,END=100,ERR=666) L,ACARD(1:MIN(L,LEN(ACARD))) !With some protection.
NREC = NREC + 1 !So, a record has been read.
11 FORMAT (Q,A) !Obviously, Q ascertains the length of the record being read.
READ (ACARD,12,END=600,ERR=601) Y,M,D !The date part is trouble, as always.
12 FORMAT (I4,2(1X,I2)) !Because there are no delimiters between the parts.
TI = IT !Thus finger the previous value.
IT = 3 - IT !Flip between 1 and 2.
DATE(IT) = ACARD(1:10) !Save the date field.
READ (ACARD(11:L),*,END=600,ERR=601) (V(I,IT),GOOD(I,IT),I = 1,24) !But after the date, delimiters abound.
Comparisons. Should really convert the date to a daynumber, check it by reversion, and then check for + 1 day only.
20 IF (DATE(IT).EQ.DATE(TI)) THEN !Same date?
IF (ALL(V(:,IT) .EQ.V(:,TI)) .AND. !Yes. What about the data?
1 ALL(GOOD(:,IT).EQ.GOOD(:,TI))) THEN !This disregards details of the spacing of the data.
WRITE (MSG,21) NREC,DATE(IT),"same." !Also trailing zeroes, spurious + signs, blah blah.
21 FORMAT ("Record",I8," Duplicate date field (",A,"), data ",A) !Say it.
ELSE !But if they're not all equal,
WRITE (MSG,21) NREC,DATE(IT),"different!" !They're different!
END IF !So much for comparing the data.
END IF !So much for just comparing the date's text.
IF (ALL(GOOD(:,IT).GT.0)) NV = NV + 1 !A fully healthy record, either way?
GO TO 10 !More! More! I want more!!
 
Complaints. Should really distinguish between trouble in the date part and in the data part.
600 WRITE (MSG,*) '"END" declared - insufficient data?' !Not enough numbers, presumably.
GO TO 602 !Reveal the record.
601 WRITE (MSG,*) '"ERR" declared - improper number format?' !Ah, but which number?
602 WRITE (MSG,603) NREC,L,ACARD(1:L) !Anyway, reveal the uninterpreted record.
603 FORMAT("Record",I8,", length ",I0," reads ",A) !Just so.
HIC = HIC + 1 !This may grow into a habit.
IF (HIC.LE.12) GO TO 10 !But if not yet, try the next record.
STOP "Enough distaste." !Or, give up.
666 WRITE (MSG,101) NREC,"format error!" !For A-style data? Should never happen!
GO TO 900 !But if it does, give up!
 
Closedown.
100 WRITE (MSG,101) NREC,"then end-of-file" !Discovered on the next attempt.
101 FORMAT ("Record",I8,": ",A) !A record number plus a remark.
WRITE (MSG,102) NV !The overall results.
102 FORMAT (" with",I8," having all values good.") !This should do.
900 CLOSE(IN) !Done.
END !Spaghetti rules.
</syntaxhighlight>
 
Output:
Record 85 Duplicate date field (1990-03-25), data different!
Record 456 Duplicate date field (1991-03-31), data different!
Record 820 Duplicate date field (1992-03-29), data different!
Record 1184 Duplicate date field (1993-03-28), data different!
Record 1911 Duplicate date field (1995-03-26), data different!
Record 5471: then end-of-file
with 5017 having all values good.
 
=={{header|Go}}==
<langsyntaxhighlight lang="go">package main
 
import (
Line 900 ⟶ 1,291:
fmt.Println(uniqueGood,
"unique dates with good readings for all instruments.")
}</langsyntaxhighlight>
{{out}}
<pre>
Line 915 ⟶ 1,306:
 
=={{header|Haskell}}==
<langsyntaxhighlight lang="haskell">
import Data.List (nub, (\\))
 
Line 934 ⟶ 1,325:
putStr (unlines ("duplicated dates:": duplicatedDates (map date inputs)))
putStrLn ("number of good records: " ++ show (length $ goodRecords inputs))
</syntaxhighlight>
</lang>
 
this script outputs:
Line 950 ⟶ 1,341:
duplicated timestamps that are on well-formed records.
 
<langsyntaxhighlight lang="unicon">procedure main(A)
dups := set()
goodRecords := 0
Line 982 ⟶ 1,373:
}
end</langsyntaxhighlight>
 
Sample run:
Line 995 ⟶ 1,386:
 
=={{header|J}}==
<langsyntaxhighlight lang="j"> require 'tables/dsv dates'
dat=: TAB readdsv jpath '~temp/readings.txt'
Dates=: getdate"1 >{."1 dat
Line 1,014 ⟶ 1,405:
1992 3 29
1993 3 28
1995 3 26</langsyntaxhighlight>
 
=={{header|Java}}==
{{trans|C++}}
{{works with|Java|1.5+}}
<langsyntaxhighlight lang="java5">import java.util.*;
import java.util.regex.*;
import java.io.*;
Line 1,062 ⟶ 1,453:
}
}
}</langsyntaxhighlight>
The program produces the following output:
<pre>
Line 1,076 ⟶ 1,467:
=={{header|JavaScript}}==
{{works with|JScript}}
<langsyntaxhighlight lang="javascript">// wrap up the counter variables in a closure.
function analyze_func(filename) {
var dates_seen = {};
Line 1,125 ⟶ 1,516:
 
var analyze = analyze_func('readings.txt');
analyze();</langsyntaxhighlight>
 
=={{header|jq}}==
{{works with|jq|with regex support}}
 
For this problem, it is convenient to use jq in a pipeline: the first invocation of jq will convert the text file into a stream of JSON arrays (one array per line):
<syntaxhighlight lang="sh">$ jq -R '[splits("[ \t]+")]' Text_processing_2.txt</syntaxhighlight>
 
The second part of the pipeline performs the task requirements. The following program is used in the second invocation of jq.
 
'''Generic Utilities'''
<syntaxhighlight lang="jq"># Given any array, produce an array of [item, count] pairs for each run.
def runs:
reduce .[] as $item
( [];
if . == [] then [ [ $item, 1] ]
else .[length-1] as $last
| if $last[0] == $item then (.[0:length-1] + [ [$item, $last[1] + 1] ] )
else . + [[$item, 1]]
end
end ) ;
 
def is_float: test("^[-+]?[0-9]*[.][0-9]*([eE][-+]?[0-9]+)?$");
 
def is_integral: test("^[-+]?[0-9]+$");
 
def is_date: test("[12][0-9]{3}-[0-9][0-9]-[0-9][0-9]");</syntaxhighlight>
 
'''Validation''':
<syntaxhighlight lang="jq"># Report line and column numbers using conventional numbering (IO=1).
def validate_line(nr):
def validate_date:
if is_date then empty else "field 1 in line \(nr) has an invalid date: \(.)" end;
def validate_length(n):
if length == n then empty else "line \(nr) has \(length) fields" end;
def validate_pair(i):
( .[2*i + 1] as $n
| if ($n | is_float) then empty else "field \(2*i + 2) in line \(nr) is not a float: \($n)" end),
( .[2*i + 2] as $n
| if ($n | is_integral) then empty else "field \(2*i + 3) in line \(nr) is not an integer: \($n)" end);
(.[0] | validate_date),
(validate_length(49)),
(range(0; (length-1) / 2) as $i | validate_pair($i)) ;
 
def validate_lines:
. as $in
| range(0; length) as $i | ($in[$i] | validate_line($i + 1));</syntaxhighlight>
 
'''Check for duplicate timestamps'''
<syntaxhighlight lang="jq">def duplicate_timestamps:
[.[][0]] | sort | runs | map( select(.[1]>1) );</syntaxhighlight>
 
'''Number of valid readings for all instruments''':
<syntaxhighlight lang="jq"># The following ignores any issues with respect to duplicate dates,
# but does check the validity of the record, including the date format:
def number_of_valid_readings:
def check:
. as $in
| (.[0] | is_date)
and length == 49
and all(range(0; 24) | $in[2*. + 1] | is_float)
and all(range(0; 24) | $in[2*. + 2] | (is_integral and tonumber >= 1) );
 
map(select(check)) | length ;</syntaxhighlight>
 
'''Generate Report'''
<syntaxhighlight lang="jq">validate_lines,
"\nChecking for duplicate timestamps:",
duplicate_timestamps,
"\nThere are \(number_of_valid_readings) valid rows altogether."</syntaxhighlight>
{{out}}
'''Part 1: Simple demonstration'''
 
To illustrate that the program does report invalid lines, we first use the six lines at the top but mangle the last line.
<syntaxhighlight lang="sh">$ jq -R '[splits("[ \t]+")]' Text_processing_2.txt | jq -s -r -f Text_processing_2.jq
field 1 in line 6 has an invalid date: 991-04-03
line 6 has 47 fields
field 2 in line 6 is not a float: 10000
field 3 in line 6 is not an integer: 1.0
field 47 in line 6 is not an integer: x
 
Checking for duplicate timestamps:
[
[
"1991-03-31",
2
]
]
 
There are 5 valid rows altogether.</syntaxhighlight>
 
'''Part 2: readings.txt'''
<syntaxhighlight lang="sh">$ jq -R '[splits("[ \t]+")]' readings.txt | jq -s -r -f Text_processing_2.jq
Checking for duplicate timestamps:
[
[
"1990-03-25",
2
],
[
"1991-03-31",
2
],
[
"1992-03-29",
2
],
[
"1993-03-28",
2
],
[
"1995-03-26",
2
]
]
 
There are 5017 valid rows altogether.</syntaxhighlight>
 
=={{header|Julia}}==
Refer to the code at https://rosettacode.org/wiki/Text_processing/1#Julia. Add at the end of that code the following:
<syntaxhighlight lang="julia">
dupdate = df[nonunique(df[:,[:Date]]),:][:Date]
println("The following rows have duplicate DATESTAMP:")
println(df[df[:Date] .== dupdate,:])
println("All values good in these rows:")
println(df[df[:ValidValues] .== 24,:])
</syntaxhighlight>
{{output}}
<pre>
The following rows have duplicate DATESTAMP:
2×29 DataFrames.DataFrame
│ Row │ Date │ Mean │ ValidValues │ MaximumGap │ GapPosition │ 0:00 │ 1:00 │ 2:00 │ 3:00 │ 4:00 │
├─────┼─────────────────────┼─────────┼─────────────┼────────────┼─────────────┼──────┼──────┼──────┼──────┼──────┤
│ 1 │ 1991-03-31T00:00:00 │ 23.5417 │ 24 │ 0 │ 0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2 │ 1991-03-31T00:00:00 │ 40.0 │ 1 │ 23 │ 2 │ 40.0 │ NaN │ NaN │ NaN │ NaN │
 
│ Row │ 5:00 │ 6:00 │ 7:00 │ 8:00 │ 9:00 │ 10:00 │ 11:00 │ 12:00 │ 13:00 │ 14:00 │ 15:00 │ 16:00 │ 17:00 │ 18:00 │
├─────┼──────┼──────┼──────┼──────┼──────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ 10.0 │ 10.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0 │ 50.0 │ 60.0 │ 40.0 │ 30.0 │ 30.0 │ 30.0 │ 25.0 │ 20.0 │
│ 2 │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │
 
│ Row │ 19:00 │ 20:00 │ 21:00 │ 22:00 │ 23:00 │
├─────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ 20.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0 │
│ 2 │ NaN │ NaN │ NaN │ NaN │ NaN │
All values good in these rows:
4×29 DataFrames.DataFrame
│ Row │ Date │ Mean │ ValidValues │ MaximumGap │ GapPosition │ 0:00 │ 1:00 │ 2:00 │ 3:00 │ 4:00 │
├─────┼─────────────────────┼─────────┼─────────────┼────────────┼─────────────┼──────┼──────┼──────┼──────┼──────┤
│ 1 │ 1991-03-30T00:00:00 │ 10.0 │ 24 │ 0 │ 0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2 │ 1991-03-31T00:00:00 │ 23.5417 │ 24 │ 0 │ 0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 3 │ 1991-04-02T00:00:00 │ 19.7917 │ 24 │ 0 │ 0 │ 8.0 │ 9.0 │ 11.0 │ 12.0 │ 12.0 │
│ 4 │ 1991-04-03T00:00:00 │ 13.9583 │ 24 │ 0 │ 0 │ 10.0 │ 9.0 │ 10.0 │ 10.0 │ 9.0 │
 
│ Row │ 5:00 │ 6:00 │ 7:00 │ 8:00 │ 9:00 │ 10:00 │ 11:00 │ 12:00 │ 13:00 │ 14:00 │ 15:00 │ 16:00 │ 17:00 │ 18:00 │
├─────┼──────┼──────┼──────┼──────┼──────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2 │ 10.0 │ 10.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0 │ 50.0 │ 60.0 │ 40.0 │ 30.0 │ 30.0 │ 30.0 │ 25.0 │ 20.0 │
│ 3 │ 12.0 │ 27.0 │ 26.0 │ 27.0 │ 33.0 │ 32.0 │ 31.0 │ 29.0 │ 31.0 │ 25.0 │ 25.0 │ 24.0 │ 21.0 │ 17.0 │
│ 4 │ 10.0 │ 15.0 │ 24.0 │ 28.0 │ 24.0 │ 18.0 │ 14.0 │ 12.0 │ 13.0 │ 14.0 │ 15.0 │ 14.0 │ 15.0 │ 13.0 │
 
│ Row │ 19:00 │ 20:00 │ 21:00 │ 22:00 │ 23:00 │
├─────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2 │ 20.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0 │
│ 3 │ 14.0 │ 15.0 │ 12.0 │ 12.0 │ 10.0 │
│ 4 │ 13.0 │ 13.0 │ 12.0 │ 10.0 │ 10.0 │
</pre>
 
=={{header|Kotlin}}==
<syntaxhighlight lang="scala">// version 1.2.31
 
import java.io.File
 
fun main(args: Array<String>) {
val rx = Regex("""\s+""")
val file = File("readings.txt")
var count = 0
var invalid = 0
var allGood = 0
var map = mutableMapOf<String, Int>()
file.forEachLine { line ->
count++
val fields = line.split(rx)
val date = fields[0]
if (fields.size == 49) {
if (map.containsKey(date))
map[date] = map[date]!! + 1
else
map.put(date, 1)
var good = 0
for (i in 2 until fields.size step 2) {
if (fields[i].toInt() >= 1) {
good++
}
}
if (good == 24) allGood++
}
else invalid++
}
 
println("File = ${file.name}")
println("\nDuplicated dates:")
for ((k,v) in map) {
if (v > 1) println(" $k ($v times)")
}
println("\nTotal number of records : $count")
var percent = invalid.toDouble() / count * 100.0
println("Number of invalid records : $invalid (${"%5.2f".format(percent)}%)")
percent = allGood.toDouble() / count * 100.0
println("Number which are all good : $allGood (${"%5.2f".format(percent)}%)")
}</syntaxhighlight>
 
{{out}}
<pre>
File = readings.txt
 
Duplicated dates:
1990-03-25 (2 times)
1991-03-31 (2 times)
1992-03-29 (2 times)
1993-03-28 (2 times)
1995-03-26 (2 times)
 
Total number of records : 5471
Number of invalid records : 0 ( 0.00%)
Number which are all good : 5017 (91.70%)
</pre>
 
=={{header|Lua}}==
<langsyntaxhighlight lang="lua">filename = "readings.txt"
io.input( filename )
 
Line 1,170 ⟶ 1,791:
for i = 1, #bad_format do
print( " ", bad_format[i] )
end</langsyntaxhighlight>
Output:
<pre>Lines read: 5471
Line 1,183 ⟶ 1,804:
 
</pre>
=={{header|M2000 Interpreter}}==
File is in user dir. Use Win Dir$ to open the explorer window and copy there the readings.txt
 
<syntaxhighlight lang="m2000 interpreter">Module TestThis {
=={{header|Mathematica}}==
Document a$, exp$
<lang Mathematica>data = Import["Readings.txt","TSV"]; Print["duplicated dates: "];
\\ automatic find the enconding and the line break
Load.doc a$, "readings.txt"
m=0
n=doc.par(a$)
k=list
nl$={
}
l=0
exp$=format$("Records: {0}", n)+nl$
For i=1 to n
b$=paragraph$(a$, i)
If exist(k,Left$(b$, 10)) then
m++ : where=eval(k)
exp$=format$("Duplicate for {0} at {1}",where, i)+nl$
Else
Append k, Left$(b$, 10):=i
End if
Stack New {
Stack Mid$(Replace$(chr$(9)," ", b$), 11)
while not empty {
Read a, b
if b<=0 then l++ : exit
}
}
Next
exp$= format$("Duplicates {0}",m)+nl$
exp$= format$("Valid Records {0}",n-l)+nl$
clipboard exp$
report exp$
}
TestThis
</syntaxhighlight>
{{out}}
<pre>
Records: 5471
Duplicate for 84 at 85
Duplicate for 455 at 456
Duplicate for 819 at 820
Duplicate for 1183 at 1184
Duplicate for 1910 at 1911
Duplicates 5
Valid Records 5017
 
</pre>
 
=={{header|Mathematica}}/{{header|Wolfram Language}}==
<syntaxhighlight lang="mathematica">data = Import["Readings.txt","TSV"]; Print["duplicated dates: "];
Select[Tally@data[[;;,1]], #[[2]]>1&][[;;,1]]//Column
Print["number of good records: ", Count[(Times@@#[[3;;All;;2]])& /@ data, 1],
" (out of a total of ", Length[data], ")"]</langsyntaxhighlight>
{{out}}
 
<pre>duplicated dates:
1990-03-25
Line 1,196 ⟶ 1,866:
1993-03-28
1995-03-26
 
number of good records: 5017 (out of a total of 5471)</pre>
 
=={{header|MATLAB}} / {{header|Octave}}==
 
<langsyntaxhighlight MATLABlang="matlab">function [val,count] = readdat(configfile)
% READDAT reads readings.txt file
%
Line 1,229 ⟶ 1,898:
dix = find(diff(d)==0) % check for to consequtive timestamps with zero difference
 
printf('number of valid records: %i\n ', sum( all( val(:,5:2:end) >= 1, 2) ) );</langsyntaxhighlight>
 
<pre>>> [val,count]=readdat;
Line 1,242 ⟶ 1,911:
number of valid records: 5017
</pre>
 
=={{header|Nim}}==
<syntaxhighlight lang="nim">import strutils, tables
 
const NumFields = 49
const DateField = 0
const FlagGoodValue = 1
 
var badRecords: int # Number of records that have invalid formatted values.
var totalRecords: int # Total number of records in the file.
var badInstruments: int # Total number of records that have at least one instrument showing error.
var seenDates: Table[string, bool] # Table to keep track of what dates we have seen.
 
proc checkFloats(floats: seq[string]): bool =
## Ensure we can parse all records as floats (except the date stamp).
for index in 1..<NumFields:
try:
# We're assuming all instrument flags are floats not integers.
discard parseFloat(floats[index])
except ValueError:
return false
true
 
proc areAllFlagsOk(instruments: seq[string]): bool =
## Ensure that all sensor flags are ok.
 
# Flags start at index 2, and occur every 2 fields.
for index in countup(2, NumFields, 2):
# We're assuming all instrument flags are floats not integers
var flag = parseFloat(instruments[index])
if flag < FlagGoodValue: return false
true
 
 
# Note: we're not checking the format of the date stamp.
 
# Main.
 
var currentLine = 0
for line in "readings.txt".lines:
currentLine.inc
if line.len == 0: continue # Empty lines don't count as records.
 
var tokens = line.split({' ', '\t'})
totalRecords.inc
 
if tokens.len != NumFields:
badRecords.inc
continue
 
if not checkFloats(tokens):
badRecords.inc
continue
 
if not areAllFlagsOk(tokens):
badInstruments.inc
 
if seenDates.hasKeyOrPut(tokens[DateField], true):
echo tokens[DateField], " duplicated on line ", currentLine
 
let goodRecords = totalRecords - badRecords
let goodInstruments = goodRecords - badInstruments
 
echo "Total Records: ", totalRecords
echo "Records with wrong format: ", badRecords
echo "Records where all instruments were OK: ", goodInstruments</syntaxhighlight>
 
{{out}}
<pre>1990-03-25 duplicated on line 85
1991-03-31 duplicated on line 456
1992-03-29 duplicated on line 820
1993-03-28 duplicated on line 1184
1995-03-26 duplicated on line 1911
Total Records: 5471
Records with wrong format: 0
Records where all instruments were OK: 5017</pre>
 
=={{header|OCaml}}==
<langsyntaxhighlight lang="ocaml">#load "str.cma"
open Str
 
let strip_cr str =
let last = pred (String.length str) in
if str.[last] <> '\r' then (str) else (String.sub str 0 last)
 
let map_records =
Line 1,257 ⟶ 2,002:
aux (e::acc) tail
| [_] -> invalid_arg "invalid data"
| [] -> (List.rev acc)
in
aux [] ;;
Line 1,270 ⟶ 2,015:
aux acc tl
| [] ->
(List.rev acc)
in
aux [] ;;
 
let record_ok (_,record) =
let is_ok (_,v) = (v >= 1) in
let sum_ok =
List.fold_left (fun sum this ->
if is_ok this then succ sum else sum) 0 record
in
(sum_ok = 24)
 
let num_good_records =
Line 1,289 ⟶ 2,034:
let li = split (regexp "[ \t]+") line in
let records = map_records (List.tl li)
and date = (List.hd li) in
(date, records)
 
Line 1,295 ⟶ 2,040:
let ic = open_in "readings.txt" in
let rec read_loop acc =
let line_opt = try Some (strip_cr (input_line ic))
try
with End_of_file -> None
let line = strip_cr(input_line ic) in
in
read_loop ((parse_line line) :: acc)
withmatch End_of_fileline_opt ->with
None -> close_in ic; List.rev acc
| Some line -> read_loop (List.revparse_line line :: acc)
in
let inputs = read_loop [] in
Line 1,311 ⟶ 2,056:
 
Printf.printf "number of good records: %d\n" (num_good_records inputs);
;;</langsyntaxhighlight>
 
this script outputs:
Line 1,325 ⟶ 2,070:
 
=={{header|Perl}}==
<langsyntaxhighlight lang="perl">use List::MoreUtils 'natatime';
use constant FIELDS => 49;
 
Line 1,352 ⟶ 2,097:
map {" $_\n"}
grep {$dates{$_} > 1}
sort keys %dates;</langsyntaxhighlight>
 
Output:
Line 1,363 ⟶ 2,108:
1995-03-26</pre>
 
=={{header|Perl 6Phix}}==
<!--<syntaxhighlight lang="phix">(phixonline)-->
{{trans|Perl}}
<span style="color: #000080;font-style:italic;">-- demo\rosetta\TextProcessing2.exw</span>
{{works with|Rakudo|2010.11}}
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span> <span style="color: #000080;font-style:italic;">-- (include version/first of next three lines only)</span>
 
<span style="color: #008080;">include</span> <span style="color: #000000;">readings</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span> <span style="color: #000080;font-style:italic;">-- global constant lines, or:
<lang perl6>my $fields = 49;
--assert(write_lines("readings.txt",lines)!=-1) -- first run, then:
 
--constant lines = read_lines("readings.txt")</span>
my ($good-records, %dates) = 0;
for 1 .. * Z $*IN.lines -> $line, $s {
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">\</span><span style="color: #004080;">timedate</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span>
my @fs = split /\s+/, $s;
@fs == $fields or die "$line: Bad number of fields";
<span style="color: #004080;">integer</span> <span style="color: #000000;">all_good</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span>
given shift @fs {
m/\d**4 \- \d**2 \- \d**2/ or die "$line: Bad date format";
<span style="color: #004080;">string</span> <span style="color: #000000;">fmt</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">"%d-%d-%d\t"</span><span style="color: #0000FF;">&</span><span style="color: #7060A8;">join</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"%f"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">48</span><span style="color: #0000FF;">),</span><span style="color: #008000;">'\t'</span><span style="color: #0000FF;">)</span>
++%dates{$_};
<span style="color: #004080;">sequence</span> <span style="color: #000000;">extset</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sq_mul</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #000000;">24</span><span style="color: #0000FF;">),</span><span style="color: #000000;">2</span><span style="color: #0000FF;">),</span> <span style="color: #000080;font-style:italic;">-- {2,4,6,..48}</span>
}
<span style="color: #000000;">curr</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">last</span>
my $all-flags-okay = True;
for @fs -> $val, $flag {
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
$val ~~ /\d+ \. \d+/ or die "$line: Bad value format";
<span style="color: #004080;">string</span> <span style="color: #000000;">li</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">lines</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span>
$flag ~~ /^ \-? \d+/ or die "$line: Bad flag format";
<span style="color: #004080;">sequence</span> <span style="color: #000000;">r</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">scanf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">li</span><span style="color: #0000FF;">,</span><span style="color: #000000;">fmt</span><span style="color: #0000FF;">)</span>
$flag < 1 and $all-flags-okay = False;
<span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">r</span><span style="color: #0000FF;">)!=</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span>
}
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"bad line [%d]:%s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">i</span><span style="color: #0000FF;">,</span><span style="color: #000000;">li</span><span style="color: #0000FF;">})</span>
$all-flags-okay and ++$good-records;
<span style="color: #008080;">else</span>
}
<span style="color: #000000;">curr</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">r</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">][</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..</span><span style="color: #000000;">3</span><span style="color: #0000FF;">]</span>
 
<span style="color: #008080;">if</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">></span><span style="color: #000000;">1</span> <span style="color: #008080;">and</span> <span style="color: #000000;">curr</span><span style="color: #0000FF;">=</span><span style="color: #000000;">last</span> <span style="color: #008080;">then</span>
say 'Good records: ', $good-records;
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"duplicate line for %04d/%02d/%02d\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">last</span><span style="color: #0000FF;">)</span>
say 'Repeated timestamps:';
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
say ' ', $_ for grep { %dates{$_} > 1 }, sort keys %dates;</lang>
<span style="color: #000000;">last</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">curr</span>
 
<span style="color: #000000;">all_good</span> <span style="color: #0000FF;">+=</span> <span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">sq_le</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">extract</span><span style="color: #0000FF;">(</span><span style="color: #000000;">r</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">][</span><span style="color: #000000;">4</span><span style="color: #0000FF;">..$],</span><span style="color: #000000;">extset</span><span style="color: #0000FF;">),</span><span style="color: #000000;">0</span><span style="color: #0000FF;">))=</span><span style="color: #000000;">0</span>
Output:
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<pre>Good records: 5017
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
Repeated timestamps:
1990-03-25
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Valid records %d of %d total\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">all_good</span><span style="color: #0000FF;">,</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">)})</span>
1991-03-31
1992-03-29
<span style="color: #0000FF;">?</span><span style="color: #008000;">"done"</span>
1993-03-28
<span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">wait_key</span><span style="color: #0000FF;">()</span>
1995-03-26</pre>
<!--</syntaxhighlight>-->
The first version demonstrates that you can program Perl&nbsp;6 almost like Perl&nbsp;5. Here's a more idiomatic Perl&nbsp;6 version that runs several times faster:
{{out}}
<lang perl6>my $good-records;
<pre>
my $line;
duplicate line for 1990/03/25
my %dates;
duplicate line for 1991/03/31
 
duplicate line for 1992/03/29
for lines() {
duplicate line for 1993/03/28
$line++;
duplicate line for 1995/03/26
/ ^
Valid records 5017 of 5471 total
(\d ** 4 '-' \d\d '-' \d\d)
</pre>
[ \h+ \d+'.'\d+ \h+ ('-'?\d+) ] ** 24
$ /
or note "Bad format at line $line" and next;
%dates.push: $0 => $line;
$good-records++ if $1.all >= 1;
}
 
say "$good-records good records out of $line total";
 
say 'Repeated timestamps (with line numbers):';
.say for sort %dates.pairs.grep: *.value.elems > 1;</lang>
Output:
<pre>5017 good records out of 5471 total
Repeated timestamps (with line numbers):
1990-03-25 84 85
1991-03-31 455 456
1992-03-29 819 820
1993-03-28 1183 1184
1995-03-26 1910 1911</pre>
Note how this version does validation with a single Perl&nbsp;6 regex that is much more readable than the typical regex, and arguably expresses the data structure more straightforwardly.
Here we use normal quotes for literals, and <tt>\h</tt> for horizontal whitespace.
 
Variables like <tt>$good-record</tt> that are going to be autoincremented do not need to be initialized. (Perl&nbsp;6 allows hyphens in variable names, as you can see.)
 
The <tt>.push</tt> method on a hash is magical and loses no information; if a duplicate key is found in the pushed pair, an array of values is automatically created of the old value and the new value pushed. Hence we can easily track all the lines that a particular duplicate occurred at.
 
The <tt>.all</tt> method does "junctional" logic: it autothreads through comparators as any English speaker would expect. Junctions can also short-circuit as soon as they find a value that doesn't match, and the evaluation order is up to the computer, so it can be optimized or parallelized.
 
The final line simply greps out the pairs from the hash whose value is an array with more than 1 element. (Those values that are not arrays nevertheless have a <tt>.elems</tt> method that always reports <tt>1</tt>.) The <tt>.pairs</tt> is merely there for clarity; grepping a hash directly has the same effect.
Note that we sort the pairs after we've grepped them, not before; this works fine in Perl&nbsp;6, sorting on the key and value as primary and secondary keys. Finally, pairs and arrays provide a default print format that is sufficient without additional formatting in this case.
 
=={{header|PHP}}==
<langsyntaxhighlight lang="php">$handle = fopen("readings.txt", "rb");
$missformcount = 0;
$totalcount = 0;
Line 1,473 ⟶ 2,189:
foreach ($duplicates as $key => $val){
echo $val . ' at Line : ' . $key . '<br>';
}</langsyntaxhighlight>
<pre>Valid records 5017 of 5471 total
Duplicates :
Line 1,481 ⟶ 2,197:
1993-03-28 at Line : 1184
1995-03-26 at Line : 1911</pre>
 
=={{header|Picat}}==
<syntaxhighlight lang="picat">import util.
 
go =>
Readings = [split(Record) : Record in read_file_lines("readings.txt")],
DateStamps = new_map(),
GoodReadings = 0,
foreach({Rec,Id} in zip(Readings,1..Readings.length))
if Rec.length != 49 then printf("Entry %d has bad_length %d\n", Id, Rec.length) end,
Date = Rec[1],
if DateStamps.has_key(Date) then
printf("Entry %d (date %w) is a duplicate of entry %w\n", Id, Date, DateStamps.get(Date))
else
if sum([1: I in 3..2..49, check_field(Rec[I])]) == 0 then
GoodReadings := GoodReadings + 1
end
end,
DateStamps.put(Date, Id)
end,
nl,
printf("Total readings: %d\n",Readings.len),
printf("Good readings: %d\n",GoodReadings),
nl.
 
check_field(Field) =>
Field == "-2" ; Field == "-1" ; Field == "0".</syntaxhighlight>
 
{{out}}
<pre>Entry 85 (date 1990-03-25) is a duplicate of entry 84
Entry 456 (date 1991-03-31) is a duplicate of entry 455
Entry 820 (date 1992-03-29) is a duplicate of entry 819
Entry 1184 (date 1993-03-28) is a duplicate of entry 1183
Entry 1911 (date 1995-03-26) is a duplicate of entry 1910
 
Total readings: 5471
Good readings: 5013</pre>
 
 
=={{header|PicoLisp}}==
Put the following into an executable file "checkReadings":
<syntaxhighlight lang="picolisp">#!/usr/bin/picolisp /usr/lib/picolisp/lib.l
 
(load "@lib/misc.l")
 
(in (opt)
(until (eof)
(let Lst (split (line) "^I")
(unless
(and
(= 49 (length Lst)) # Check total length
($dat (car Lst) "-") # Check for valid date
(fully # Check data format
'((L F)
(if F # Alternating:
(format L 3) # Number
(>= 9 (format L) -9) ) ) # or flag
(cdr Lst)
'(T NIL .) ) )
(prinl "Bad line format: " (glue " " Lst))
(bye 1) ) ) ) )
 
(bye)</syntaxhighlight>
Then it can be called as
<pre>$ ./checkReadings readings.txt</pre>
 
=={{header|PL/I}}==
<langsyntaxhighlight lang="pli">
/* To process readings produced by automatic reading stations. */
 
Line 1,536 ⟶ 2,317:
put skip list ('There were ' || k-faulty || ' good readings' );
end check;
</syntaxhighlight>
</lang>
 
=={{header|PicoLisp}}==
Put the following into an executable file "checkReadings":
<lang PicoLisp>#!/usr/bin/picolisp /usr/lib/picolisp/lib.l
 
(load "@lib/misc.l")
 
(in (opt)
(until (eof)
(let Lst (split (line) "^I")
(unless
(and
(= 49 (length Lst)) # Check total length
($dat (car Lst) "-") # Check for valid date
(fully # Check data format
'((L F)
(if F # Alternating:
(format L 3) # Number
(>= 9 (format L) -9) ) ) # or flag
(cdr Lst)
'(T NIL .) ) )
(prinl "Bad line format: " (glue " " Lst))
(bye 1) ) ) ) )
 
(bye)</lang>
Then it can be called as
<pre>$ ./checkReadings readings.txt</pre>
 
=={{header|PowerShell}}==
<langsyntaxhighlight lang="powershell">$dateHash = @{}
$goodLineCount = 0
get-content c:\temp\readings.txt |
Line 1,589 ⟶ 2,343:
}
[string]$goodLineCount + " good lines"
</syntaxhighlight>
</lang>
 
Output:
Line 1,600 ⟶ 2,354:
 
An alternative using regular expression syntax:
<langsyntaxhighlight lang="powershell">
$dateHash = @{}
$goodLineCount = 0
Line 1,621 ⟶ 2,375:
}
[string]$goodLineCount + " good lines"
</syntaxhighlight>
</lang>
 
Output:
Line 1,632 ⟶ 2,386:
5017 good lines
</pre>
 
=={{header|PureBasic}}==
Using regular expressions.
<langsyntaxhighlight PureBasiclang="purebasic">Define filename.s = "readings.txt"
#instrumentCount = 24
 
Line 1,705 ⟶ 2,460:
CloseConsole()
EndIf
EndIf</langsyntaxhighlight>
Sample output:
<pre>Duplicate date: 1990-03-25 occurs on lines 85 and 84.
Line 1,716 ⟶ 2,471:
 
=={{header|Python}}==
<langsyntaxhighlight lang="python">import re
import zipfile
import StringIO
Line 1,756 ⟶ 2,511:
#readings = StringIO.StringIO(zfs.read('readings.txt'))
readings = open('readings.txt','r')
munge2(readings)</langsyntaxhighlight>
The results indicate 5013 good records, which differs from the Awk implementation. The final few lines of the output are as follows
<pre style="height:10ex;overflow:scroll">
Line 1,775 ⟶ 2,530:
* Generate mostly summary information that is easier to compare to other solutions.
 
<langsyntaxhighlight lang="python">import re
import zipfile
import StringIO
Line 1,819 ⟶ 2,574:
readings = open('readings.txt','r')
munge2(readings)</langsyntaxhighlight>
<pre>bash$ /cygdrive/c/Python26/python munge2.py
Duplicate dates:
Line 1,837 ⟶ 2,592:
 
=={{header|R}}==
<langsyntaxhighlight Rlang="r"># Read in data from file
dfr <- read.delim("d:/readings.txt", colClasses=c("character", rep(c("numeric", "integer"), 24)))
dates <- strptime(dfr[,1], "%Y-%m-%d")
Line 1,849 ⟶ 2,604:
# Number of rows with no bad values
flags <- as.matrix(dfr[,seq(3,49,2)])>0
sum(apply(flags, 1, all))</langsyntaxhighlight>
 
=={{header|Racket}}==
<langsyntaxhighlight lang="racket">#lang racket
(read-decimal-as-inexact #f)
;; files to read is a sequence, so it could be either a list or vector of files
Line 1,901 ⟶ 2,656:
 
(printf "~a records have good readings for all instruments~%"
(text-processing/2 (current-command-line-arguments)))</langsyntaxhighlight>
Example session:
<pre>$ racket 2.rkt readings/readings.txt
Line 1,910 ⟶ 2,665:
duplicate datestamp: 1995-03-26 at line: 1911 (first seen at: 1910)
5013 records have good readings for all instruments</pre>
 
=={{header|Raku}}==
(formerly Perl 6)
{{trans|Perl}}
{{works with|Rakudo|2018.03}}
 
This version does validation with a single Raku regex that is much more readable than the typical regex, and arguably expresses the data structure more straightforwardly.
Here we use normal quotes for literals, and <tt>\h</tt> for horizontal whitespace.
 
Variables like <tt>$good-record</tt> that are going to be autoincremented do not need to be initialized.
 
The <tt>.push</tt> method on a hash is magical and loses no information; if a duplicate key is found in the pushed pair, an array of values is automatically created of the old value and the new value pushed. Hence we can easily track all the lines that a particular duplicate occurred at.
 
The <tt>.all</tt> method does "junctional" logic: it autothreads through comparators as any English speaker would expect. Junctions can also short-circuit as soon as they find a value that doesn't match, and the evaluation order is up to the computer, so it can be optimized or parallelized.
 
The final line simply greps out the pairs from the hash whose value is an array with more than 1 element. (Those values that are not arrays nevertheless have a <tt>.elems</tt> method that always reports <tt>1</tt>.) The <tt>.pairs</tt> is merely there for clarity; grepping a hash directly has the same effect.
Note that we sort the pairs after we've grepped them, not before; this works fine in Raku, sorting on the key and value as primary and secondary keys. Finally, pairs and arrays provide a default print format that is sufficient without additional formatting in this case.
 
<syntaxhighlight lang="raku" line>my $good-records;
my $line;
my %dates;
 
for lines() {
$line++;
/ ^
(\d ** 4 '-' \d\d '-' \d\d)
[ \h+ \d+'.'\d+ \h+ ('-'?\d+) ] ** 24
$ /
or note "Bad format at line $line" and next;
%dates.push: $0 => $line;
$good-records++ if $1.all >= 1;
}
 
say "$good-records good records out of $line total";
 
say 'Repeated timestamps (with line numbers):';
.say for sort %dates.pairs.grep: *.value.elems > 1;</syntaxhighlight>
Output:
<pre>5017 good records out of 5471 total
Repeated timestamps (with line numbers):
1990-03-25 => [84 85]
1991-03-31 => [455 456]
1992-03-29 => [819 820]
1993-03-28 => [1183 1184]
1995-03-26 => [1910 1911]</pre>
 
=={{header|REXX}}==
This REXX program process the file mentioned in "text processing 1" and does further valiidatevalidate on the dates, flags, and data.
<br><br>
Some of the checks performed are:
::* &nbsp; checks for duplicated date records.
::* &nbsp; checks for a bad date (YYYY-MM-DD) format, among:
::* &nbsp; wrong length
::* &nbsp; year > current year
::* &nbsp; year < 1970 (to allow for posthumous data)
::* &nbsp; mm < 1 or mm > 12
::* &nbsp; dd < 1 or dd > days for the month
::* &nbsp; yyyy, dd, mm isn't numeric
::* &nbsp; missing data (or flags)
::* &nbsp; flag isn't an integer
::* &nbsp; flag contains a decimal point
::* &nbsp; data isn't numeric
In addition, all of the presented numbers (may) have commas inserted.
<br><br>
The program has (negated) code to write the report to a file in addition to the console.
<langsyntaxhighlight lang="rexx">/*REXX program to process instrument data from a data file. */
numeric digits 20 /*allow for bigger numbers. */
ifid='READINGS.TXT' /*name of the input file. */
ofid='READINGS.OUT' /*the outut" file. " " output " */
grandSum=0 /*grand sum of the whole file. */
grandflggrandFlg=0 /*grand numnumber of flagged data. */
grandOKs=0
longFlagLflag=0 /*longest period of flagged data. */
contFlagCflag=0 /*longest continouscontinuous flagged data. */
oldDate =0 /*placeholder of penutilmatepenultimate date. */
w =16 /*width of fields when displayed. */
dupDates=0 /*count of duplicated timestamps. */
badflagsbadFlags=0 /*count of bad flags (¬not integer). */
badDates=0 /*count of bad dates (bad format). */
badData =0 /*count of bad datasdata (¬not numeric). */
ignoredR=0 /*count of ignored records, (bad). records*/
maxInstruments=24 /*maximum number of instruments. */
yyyyCurr=right(date(),4) /*get the current year (today). */
monDD. =31 /*number of days in every month. */
/*February# days in Feb. is figured on the fly.*/
monDD.4 =30
monDD.6 =30
Line 1,955 ⟶ 2,755:
monDD.11=30
 
do records=1 while lines(ifid)\==0 /*read until finished. */
rec=linein(ifid) /*read the next record (line). */
parse var rec datestamp Idata /*pick off the the dateStamp &and data. */
if datestamp==oldDate then do /*found a duplicate timestamp. */
dupDates=dupDates+1 /*bump the dupDate counter.*/
call sy datestamp copies('~',30),
'is a duplicate of the',
"previous datestamp."
ignoredR=ignoredR+1 /*bump # of ignoredRecs.*/
iterate /*ignore this duplicate record. */
end
 
parse var datestamp yyyy '-' mm '-' dd /*obtain YYYY, MM, and the DD. */
monDD.2=28+leapyear(yyyy) /*how long is February in year YYYY ? */
/*check for various bad formats. */
if verify(yyyy||mm||dd,1234567890)\==0 |,
length(datestamp)\==10 |,
Line 1,977 ⟶ 2,777:
yyyy<1970 |,
yyyy>yyyyCurr |,
mm=0 | dd=0 |,
mm>12 | dd>monDD.mm then do
badDates=badDates+1
call sy datestamp copies('~'),
'has an illegal format.'
ignoredR=ignoredR+1 /*bump number ignoredRecs.*/
iterate /*ignore this bad date record. */
end
oldDate=datestamp /*save datestamp for the next read. */
sum=0
flg=0
OKs=0
 
do j=1 until Idata='' /*process the instrument data. */
parse var Idata data.j flag.j Idata
 
if pos('.',flag.j)\==0 |, /*does flag have a decimal point -or- */
\datatype(flag.j,'W') then do /* ··· is the flag not a whole number? */
badflags badFlags=badflagsbadFlags+1 /*bump badFlags counter.*/
call sy datestamp copies('~'),
'instrument' j "has a bad flag:",
flag.j
iterate /*ignore it &and it's data. */
end
 
if \datatype(data.j,'N') then do /*is the flag not a whole number?*/
badData=badData+1 /*bump counter.*/
call sy datestamp copies('~'),
'instrument' j "has bad data:",
data.j
iterate /*ignore it & it's flag.*/
end
 
if flag.j>0 then do /*if good data, ...~~~ */
OKs=OKs+1
sum=sum+data.j
if contFlagCflag>longFlagLflag then do
longdateLdate=datestamp
longFlagLflag=contFlagCflag
end
contFlag Cflag=0
end
else do /*flagged data ...~~~ */
flg=flg+1
contFlag Cflag=contFlagCflag+1
end
end /*j*/
 
if j>maxInstruments then do
badData=badData+1 /*bump the badData counter.*/
call sy datestamp copies('~'),
'too many instrument datum'
end
 
if OKs\==0 then avg=format(sum/OKs,,3)
else avg='[n/a]'
grandOKs=grandOKs+OKs
_=right(commacommas(avg),w)
grandSum=grandSum+sum
grandFlg=grandFlg+flg
Line 2,041 ⟶ 2,841:
end /*records*/
 
records=records-1 /*adjust for reading end-of-filethe end─of─file. */
if grandOKs\==0 then grandAvg=format(grandsum/grandOKs,,3)
else grandAvg='[n/a]'
call sy
call sy copies('=',60)
call sy ' records read:' right(commacommas(records ),w)
call sy ' records ignored:' right(commacommas(ignoredR),w)
call sy ' grand sum:' right(commacommas(grandSum),w+4)
call sy ' grand average:' right(commacommas(grandAvg),w+4)
call sy ' grand OK data:' right(commacommas(grandOKs),w)
call sy ' grand flagged:' right(commacommas(grandFlg),w)
call sy ' duplicate dates:' right(commacommas(dupDates),w)
call sy ' bad dates:' right(commacommas(badDates),w)
call sy ' bad data:' right(commacommas(badData ),w)
call sy ' bad flags:' right(commacommas(badflagsbadFlags),w)
if Lflag\==0 then call sy ' longest flagged:' right(commas(LFlag),w) " ending at " Ldate
if longFlag\==0 then
call sy ' longest flagged:' right(comma(longFlag),w) " ending at " longdate
call sy copies('=',60)
exit /*stick a fork in it, we're all done.*/
call sy
/*────────────────────────────────────────────────────────────────────────────*/
exit /*stick a fork in it, we're done.*/
commas: procedure; parse arg _; n=_'.9'; #=123456789; b=verify(n,#,"M")
/*──────────────────────────────────LEAPYEAR subroutine─────────────────*/
e=verify(n,#'0',,verify(n,#"0.",'M'))-4
leapyear: procedure; arg y /*year could be: Y, YY, YYY, YYYY*/
do j=e to b by -3; _=insert(',',_,j); end /*j*/; return _
if length(y)==2 then y=left(right(date(),4),2)y /*adjust for YY year.*/
/*────────────────────────────────────────────────────────────────────────────*/
if y//4\==0 then return 0 /* not ≈ by 4? Not a leapyear.*/
returnleapyear: y//100\==0procedure; |arg y//400==0 /*applyyear 100could andbe: 400 yearY, YY, rule. YYY, or YYYY*/
if length(y)==2 then y=left(right(date(),4),2)y /*adjust for YY year.*/
/*──────────────────────────────────SY subroutine───────────────────────*/
if y//4\==0 then return 0 /* not divisible by 4? Not a leapyear*/
sy: procedure; parse arg stuff; say stuff
return y//100\==0 | y//400==0 /*apply the 100 and the 400 year rule.*/
if 1==0 then call lineout ofid,stuff
/*────────────────────────────────────────────────────────────────────────────*/
return
sy: say arg(1); call lineout ofid,arg(1); return</syntaxhighlight>
/*──────────────────────────────────COMMA subroutine────────────────────*/
'''output''' &nbsp; when using the default input file:
comma: procedure; parse arg _,c,p,t;arg ,cu;c=word(c ",",1)
if cu=='BLANK' then c=' ';o=word(p 3,1);p=abs(o);t=word(t 999999999,1)
if \datatype(p,'W')|\datatype(t,'W')|p==0|arg()>4 then return _;n=_'.9'
#=123456789;k=0;if o<0 then do;b=verify(_,' ');if b==0 then return _
e=length(_)-verify(reverse(_),' ')+1;end;else do;b=verify(n,#,"M")
e=verify(n,#'0',,verify(n,#"0.",'M'))-p-1;end
do j=e to b by -p while k<t;_=insert(c,_,j);k=k+1;end;return _</lang>
'''output'''
<pre style="height:35ex">
Line 2,107 ⟶ 2,899:
 
=={{header|Ruby}}==
<langsyntaxhighlight lang="ruby">require 'set'
 
def munge2(readings, debug=false)
Line 2,159 ⟶ 2,951:
open('readings.txt','r') do |readings|
munge2(readings)
end</langsyntaxhighlight>
 
=={{header|Scala}}==
{{works with|Scala|2.8}}
<langsyntaxhighlight lang="scala">object DataMunging2 {
import scala.io.Source
import scala.collection.immutable.{TreeMap => Map}
Line 2,201 ⟶ 2,993:
dateMap.valuesIterable.sum))
}
}</langsyntaxhighlight>
 
Sample output:
Line 2,217 ⟶ 3,009:
Invalid data records: 454
Total records: 5471
</pre>
 
=={{header|Sidef}}==
{{trans|Raku}}
<syntaxhighlight lang="ruby">var good_records = 0;
var dates = Hash();
 
ARGF.each { |line|
var m = /^(\d\d\d\d-\d\d-\d\d)((?:\h+\d+\.\d+\h+-?\d+){24})\s*$/.match(line);
m || (warn "Bad format at line #{$.}"; next);
dates{m[0]} := 0 ++;
var i = 0;
m[1].words.all{|n| i++.is_even || (n.to_num >= 1) } && ++good_records;
}
 
say "#{good_records} good records out of #{$.} total";
say 'Repeated timestamps:';
say dates.to_a.grep{ .value > 1 }.map { .key }.sort.join("\n");</syntaxhighlight>
{{out}}
<pre>
$ sidef script.sf < readings.txt
5017 good records out of 5471 total
Repeated timestamps:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
</pre>
 
=={{header|Snobol4}}==
 
Developed using the Snobol4 dialect Spitbol for Linux, version 4.0
 
<syntaxhighlight lang="snobol4">* Read text/2
 
v = array(24)
f = array(24)
tos = char(9) " " ;* break characters are both tab and space
pat1 = break(tos) . dstamp
pat2 = span(tos) break(tos) . *v[i] span(tos) (break(tos) | (len(1) rem)) . *f[i]
rowcount = 0
hold_dstamp = ""
num_bad_rows = 0
num_invalid_rows = 0
 
in0
row = input :f(endinput)
rowcount = rowcount + 1
row ? pat1 = :f(invalid_row)
 
* duplicated datestamp?
* if dstamp = hold_dstamp then duplicated
hold_dstamp = differ(hold_dstamp,dstamp) dstamp :s(nodup)
output = dstamp ": datestamp at row " rowcount " duplicates datestamp at " rowcount - 1
nodup
 
i = 1
in1
row ? pat2 = :f(invalid_row)
i = lt(i,24) i + 1 :s(in1)
 
* Is this a goodrow?
* if any flag is < 1 then row has bad data
c = 0
goodrow
c = lt(c,24) c + 1 :f(goodrow2)
num_bad_rows = lt(f[c],1) num_bad_rows + 1 :s(goodrow2)f(goodrow)
goodrow2
 
:(in0)
invalid_row
num_invalid_rows = num_invalid_rows + 1
:(in0)
endinput
output =
output = "Total number of rows : " rowcount
output = "Total number of rows with invalid format: " num_invalid_rows
output = "Total number of rows with bad data : " num_bad_rows
output = "Total number of good rows : " rowcount - num_invalid_rows - num_bad_rows
 
end
 
</syntaxhighlight>
{{out}}
<pre>1990-03-25: datestamp at row 85 duplicates datestamp at 84
1991-03-31: datestamp at row 456 duplicates datestamp at 455
1992-03-29: datestamp at row 820 duplicates datestamp at 819
1993-03-28: datestamp at row 1184 duplicates datestamp at 1183
1995-03-26: datestamp at row 1911 duplicates datestamp at 1910
 
Total number of rows : 5471
Total number of rows with invalid format: 0
Total number of rows with bad data : 454
Total number of good rows : 5017
</pre>
 
=={{header|Tcl}}==
 
<langsyntaxhighlight lang="tcl">set data [lrange [split [read [open "readings.txt" "r"]] "\n"] 0 end-1]
set total [llength $data]
set correct $total
Line 2,245 ⟶ 3,134:
 
puts "$correct records with good readings = [expr $correct * 100.0 / $total]%"
puts "Total records: $total"</langsyntaxhighlight>
<pre>$ tclsh munge2.tcl
Duplicate datestamp: 1990-03-25
Line 2,260 ⟶ 3,149:
To demonstate a different method to iterate over the file, and different ways to verify data types:
 
<langsyntaxhighlight lang="tcl">set total [set good 0]
array set seen {}
set fh [open readings.txt]
Line 2,298 ⟶ 3,187:
 
puts "total: $total"
puts [format "good: %d = %5.2f%%" $good [expr {100.0 * $good / $total}]]</langsyntaxhighlight>
Results:
<pre>duplicate date on line 85: 1990-03-25
Line 2,311 ⟶ 3,200:
compiled and run in a single step, with the input file accessed as a list of strings
pre-declared in readings_dot_txt
<langsyntaxhighlight Ursalalang="ursala">#import std
#import nat
 
Line 2,324 ⟶ 3,213:
#show+
 
main = valid_format?(^C/good_readings duplicate_dates,-[invalid format]-!) readings</langsyntaxhighlight>
output:
<pre>5017 good readings
Line 2,333 ⟶ 3,222:
1991-03-31
1990-03-25</pre>
 
=={{header|VBScript}}==
<syntaxhighlight lang="vb">Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(objFSO.GetParentFolderName(WScript.ScriptFullName) &_
"\readings.txt",1)
Set objDateStamp = CreateObject("Scripting.Dictionary")
 
Total_Records = 0
Valid_Records = 0
Duplicate_TimeStamps = ""
 
Do Until objFile.AtEndOfStream
line = objFile.ReadLine
If line <> "" Then
token = Split(line,vbTab)
If objDateStamp.Exists(token(0)) = False Then
objDateStamp.Add token(0),""
Total_Records = Total_Records + 1
If IsValid(token) Then
Valid_Records = Valid_Records + 1
End If
Else
Duplicate_TimeStamps = Duplicate_TimeStamps & token(0) & vbCrLf
Total_Records = Total_Records + 1
End If
End If
Loop
 
Function IsValid(arr)
IsValid = True
Bad_Readings = 0
n = 1
Do While n <= UBound(arr)
If n + 1 <= UBound(arr) Then
If CInt(arr(n+1)) < 1 Then
Bad_Readings = Bad_Readings + 1
End If
End If
n = n + 2
Loop
If Bad_Readings > 0 Then
IsValid = False
End If
End Function
 
WScript.StdOut.Write "Total Number of Records = " & Total_Records
WScript.StdOut.WriteLine
WScript.StdOut.Write "Total Valid Records = " & Valid_Records
WScript.StdOut.WriteLine
WScript.StdOut.Write "Duplicate Timestamps:"
WScript.StdOut.WriteLine
WScript.StdOut.Write Duplicate_TimeStamps
WScript.StdOut.WriteLine
 
objFile.Close
Set objFSO = Nothing</syntaxhighlight>
 
{{Out}}
<pre>
Total Number of Records = 5471
Total Valid Records = 5013
Duplicate Timestamps:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
</pre>
 
=={{header|Vedit macro language}}==
Line 2,341 ⟶ 3,298:
* Reads flag value and checks if it is positive
* Requires 24 value/flag pairs on each line
<langsyntaxhighlight lang="vedit">#50 = Buf_Num // Current edit buffer (source data)
File_Open("|(PATH_ONLY)\output.txt")
#51 = Buf_Num // Edit buffer for output file
Line 2,388 ⟶ 3,345:
IT("Date format errors: ") Num_Ins(#14)
IT("Invalid data records:") Num_Ins(#15)
IT("Total records: ") Num_Ins(#12)</langsyntaxhighlight>
Sample output:
<langsyntaxhighlight lang="vedit">1990-03-25: duplicate record at 85
1991-03-31: duplicate record at 456
1992-03-29: duplicate record at 820
Line 2,400 ⟶ 3,357:
Date format errors: 0
Invalid data records: 454
Total records: 5471</langsyntaxhighlight>
 
=={{header|Wren}}==
{{trans|Kotlin}}
{{libheader|Wren-pattern}}
{{libheader|Wren-fmt}}
{{libheader|Wren-sort}}
<syntaxhighlight lang="wren">import "io" for File
import "./pattern" for Pattern
import "./fmt" for Fmt
import "./sort" for Sort
 
var p = Pattern.new("+1/s")
var fileName = "readings.txt"
var lines = File.read(fileName).trimEnd().split("\r\n")
var count = 0
var invalid = 0
var allGood = 0
var map = {}
for (line in lines) {
count = count + 1
var fields = p.splitAll(line)
var date = fields[0]
if (fields.count == 49) {
map[date] = map.containsKey(date) ? map[date] + 1 : 1
var good = 0
var i = 2
while (i < fields.count) {
if (Num.fromString(fields[i]) >= 1) good = good + 1
i = i + 2
}
if (good == 24) allGood = allGood + 1
} else {
invalid = invalid + 1
}
}
 
Fmt.print("File = $s", fileName)
System.print("\nDuplicated dates:")
var keys = map.keys.toList
Sort.quick(keys)
for (k in keys) {
var v = map[k]
if (v > 1) Fmt.print(" $s ($d times)", k, v)
}
Fmt.print("\nTotal number of records : $d", count)
var percent = invalid/count * 100
Fmt.print("Number of invalid records : $d ($5.2f)\%", invalid, percent)
percent = allGood/count * 100
Fmt.print("Number which are all good : $d ($5.2f)\%", allGood, percent)</syntaxhighlight>
 
{{out}}
<pre>
File = readings.txt
 
Duplicated dates:
1990-03-25 (2 times)
1991-03-31 (2 times)
1992-03-29 (2 times)
1993-03-28 (2 times)
1995-03-26 (2 times)
 
Total number of records : 5471
Number of invalid records : 0 ( 0.00)%
Number which are all good : 5017 (91.70)%
</pre>
 
=={{header|zkl}}==
<langsyntaxhighlight lang="zkl"> // the RegExp engine has a low limit on groups so
// I can't use it to select all fields, only verify them
re:=RegExp(0'|^(\d+-\d+-\d+)| + 0'|\s+\d+\.\d+\s+-*\d+| * 24 + ".+$");
w:=Utils[1.Helpers.zipW].zip(File("readings.txt"),[1..]); //-->lazy (line,line #,line)
reg datep,N, good=0, dd=0;
foreach line,n,line in (w){
N=n; // since n is local to this scope
if (not re.search(line)){ println("Line %d: malformed".fmt(n)); continue; }
Line 2,421 ⟶ 3,443:
good+=1;
}
println("%d records read, %d duplicate dates, %d valid".fmt(N,dd,good));</langsyntaxhighlight>
{{out}}
<pre>
9,476

edits