Talk:Text processing/1

From Rosetta Code
Revision as of 18:37, 8 November 2008 by rosettacode>Dmitry-kazakov (Why not to describe it?)

Why?

I was reading through old blog entries and thought it would be appropriate (minus the focus on speed).

Please clarify the task

  1. Syntax errors in the file to be detected?
  2. The field separator is what? One space, any non-empty chain of spaces, any non-empty chain of spaces or tabs. Something else?
  3. Average to evaluate over all fields or else over each field separately?
  4. When at the same line some fields are flagged invalid but others are not, is it a gap? Or is it only when all fields are invalid?
  5. Further, do valid fields participate in averaging when some other fields at the same line are invalid?
  6. When a field is not present is it a syntax error or a gap?
  7. What to do when syntactically wrong fields appear (not a number, too large number etc)?

--Dmitry-kazakov 12:13, 8 November 2008 (UTC)

Hi Dmitry the comp.lang.awk newsgroup thread contains all the information necessary for the original poster to get his job done. The example records are probably typical, but you need to try something out and make your own decisions on the format/error handling. The original newsgroup thread actually has more information than you get on some data munging problems as in many cases someone just says "wouldn't it be good if this talked to this"; or "When wasn't this working".

Data format information might be here. (Sorry if I seem patronising, it was not meant) --Paddy3118 17:34, 8 November 2008 (UTC)

I suppose that any task should be defined in the article. The code presented for this task looks like a translation from one language into another, rather than independent implementations. Actually there is no way to verify whether they do the job or not. What would be the right output if the input file were:
2008/Mar/21    -1E-2 1
On second thought I would suggest to replace it to something more general and better defined text processing task. Like parsing a CSV file, for example. --Dmitry-kazakov 18:37, 8 November 2008 (UTC)