Text processing/1: Difference between revisions

Line 1,284:

main bye</lang>

=={{header|Fortran}}==

Aside from formatted I/O, Fotran also offers free-format or "list-directed" I/O that accepts numerical data without many constraints - though complex numbers must be presented as (x,y) style. There are complications when character data are to be read in but this example does not involve that. Unfortunately, although the date part could be considered as three integers (with the hyphens separating the tokens), fortran's free-format scheme requires an actual delimiter, not an implied delimiter. If slashes were to be used, its behaviour is even less helpful, as the / is recognised as terminating the scan of the line! This may well allow comments to be added to data lines, but it makes reading such dates difficult. The free-format rule is that to read N data, input will be read from as many records as necessary until N data have been obtained. Should the last-read record have further data, they will not be seen by the next READ because it will start on a new record.

So, to handle this, the plan becomes to read the record into a CHARACTER variable, read the date part with a FORMAT ststement working on the first ten characters, and then read the rest of the input via free-format. If the data field is corrupt (say, a letter in place of a digit) the ERR label will be selected. Similarly, when reading from a CHARACTER variable there is no read-next-record available, so if there is a shortage of data (a number is missing) the END label will be selected. A complaint is printed, a HIC counted, and if HIC is not too large, go back and try for the next record. This means that record count no longer relates to the number of values read, so a count of bad values is maintained as well. In the event, no such troublesome records were encountered in this example. An advantage of reading the record into a scratchpad is that any error messages can quote the text of the record. Similarly, tabs could be converted to spaces, etc. but this isn't needed as the free-format reading allows either, and commas. More generally, a much more comprehensive analysis of the record's text could be made should the need arise, most obviously that the date is a valid one. Another useful check is for the appearance of additional text beyond the twenty-four pairs specified - for example, a CRLF has been "lost". Somehow.

In decades of working with (half-)hourly data on the generation and consumption of electricity, it has been evident that data suppliers endlessly demonstrate an inability to stick with a particular format, even one of their own design, and that the daylight savings changeover days (where there are ''not'' twenty-four hours in a day) surprass their competence annually. Persons from a mainframe background do a much more reliable job than those who have only tinkered with spreadsheets.

Crunches a set of hourly data. Starts with a date, then 24 pairs of value,indicator for that day, on one line.

INTEGER Y,M,D !Year, month, and day.

INTEGER GOOD(24) !The indicators.

REAL*8 V(24),VTOT,T !The grist.

INTEGER NV,N,NB !Number of good values overall, and in a day.

INTEGER I,NREC,HIC !Some counters.

INTEGER BI,BN,BBI,BBN !Stuff to locate the longest run of bad data,

CHARACTER*10 BDATE,BBDATE !Along with the starting date.

LOGICAL INGOOD !State flipper for the runs of data.

INTEGER IN,MSG !I/O mnemonics.

CHARACTER*666 ACARD !Scratchpad, of sufficient length for all expectation.

IN = 10 !Unit number for the input file.

MSG = 6 !Output.

OPEN (IN,FILE="Readings1.txt", FORM="FORMATTED", !This should be a function.

1 STATUS ="OLD",ACTION="READ") !Returning success, or failure.

NB = 0 !No bad values read.

NV = 0 !Nor good values read.

VTOT = 0 !Their average is to come.

NREC = 0 !No records read.

HIC = 0 !Provoking no complaints.

INGOOD = .TRUE. !I start in hope.

BBN = 0 !And the longest previous bad run is short.

Chew into the file.

10 READ (IN,11,END=100,ERR=666) L,ACARD(1:MIN(L,LEN(ACARD))) !With some protection.

NREC = NREC + 1 !So, a record has been read.

11 FORMAT (Q,A) !Obviously, Q ascertains the length of the record being read.

READ (ACARD,12,END=600,ERR=601) Y,M,D !The date part is trouble, as always.

12 FORMAT (I4,2(1X,I2)) !Because there are no delimiters between the parts.

READ (ACARD(11:L),*,END=600,ERR=601) (V(I),GOOD(I),I = 1,24) !But after the date, delimiters abound.

Calculations. Could use COUNT(array) and SUM(array), but each requires its own pass through the array.

20 T = 0 !Start on the day's statistics.

N = 0 !No values yet.

DO I = 1,24 !So, scan the cargo and do all the twiddling in one pass..

IF (GOOD(I).GT.0) THEN !A good value?

N = N + 1 !Yes. Count it in.

T = T + V(I) !And augment for the average.

IF (.NOT.INGOOD) THEN !Had we been ungood?

INGOOD = .TRUE. !Yes. But now it changes.

IF (BN.GT.BBN) THEN !The run just ending: is it longer?

BBN = BN !Yes. Make it the new baddest.

BBI = BI !Recalling its start index,

BBDATE = BDATE !And its start date.

END IF !So much for bigger badness.

END IF !Now we're in good data.

ELSE !Otherwise, a bad value is upon us.

IF (INGOOD) THEN !Were we good?

INGOOD = .FALSE. !No longer. A new bad run is starting.

BDATE = ACARD(1:10) !Recall the date for this starter.

BI = I !And its index.

BN = 0 !Start the run-length counter.

END IF !So much for a fall.

BN = BN + 1 !Count another bad value.

END IF !Good or bad, so much for that value.

END DO !On to the next.

Commentary for the day's data..

IF (N.LE.0) THEN !I prefer to avoid dividing by zero.

WRITE (MSG,21) NREC,ACARD(1:10) !So, no average to report.

21 FORMAT ("Record",I8," (",A,") has no good data!") !Just a remark.

ELSE !But otherwise,

WRITE(MSG,22) NREC,ACARD(1:10),N,T/N !An average is possible.

22 FORMAT("Record",I8," (",A,")",I3," good, average",F9.3) !So here it is.

NB = NB + 24 - N !Count the bad by implication.

NV = NV + N !Count the good directly.

VTOT = VTOT + T !Should really sum deviations from a working average.

END IF !So much for that line.

GO TO 10 !More! More! I want more!!

Complaints. Should really distinguish between trouble in the date part and in the data part.

600 WRITE (MSG,*) '"END" declared - insufficient data?' !Not enough numbers, presumably.

GO TO 602 !Reveal the record.

601 WRITE (MSG,*) '"ERR" declared - improper number format?' !Ah, but which number?

602 WRITE (MSG,603) NREC,L,ACARD(1:L) !Anyway, reveal the uninterpreted record.

603 FORMAT(" Record ",I0,", length ",I0," reads ",A) !Just so.

HIC = HIC + 1 !This may grow into a habit.

IF (HIC.LE.12) GO TO 10 !But if not yet, try the next record.

STOP "Enough distaste." !Or, give up.

666 WRITE (MSG,101) NREC,"format error!" !For A-style data? Should never happen!

GO TO 900 !But if it does, give up!

Closedown.

100 WRITE (MSG,101) NREC,"then end-of-file" !Discovered on the next attempt.

101 FORMAT (" Record ",I0,": ",A) !A record number plus a remark.

WRITE (MSG,102) NV,NB,VTOT/NV !The overall results.

102 FORMAT (I8," values, ",I0," bad. Average",F9.4) !This should do.

IF (BBN.LE.0) THEN !Now for a special report.

WRITE (MSG,*) "No bad value presented, so no longest run." !Unneeded!

ELSE !But actually, the example data has some bad values.

WRITE (MSG,103) BBN,BBI,BBDATE !And this is for the longest encountered.

103 FORMAT ("Longest bad run: ",I0,", starting hour ",I0," on ",A) !Just so.

END IF !Enough remarks.

900 CLOSE(IN) !Done.

END !Spaghetti rules.

</lang>

Output:

Record 1 (1990-01-01) 22 good, average 26.818

Record 2 (1990-01-02) 24 good, average 17.083

Record 3 (1990-01-03) 24 good, average 58.958

...etc.

Record 945 (1992-08-01) has no good data!

...etc.

Record 5471 (2004-12-31) 23 good, average 2.057

Record 5471: then end-of-file

129403 values, 1901 bad. Average 10.4974

Longest bad run: 589, starting hour 2 on 1993-02-09

=={{header|Go}}==