Talk:Letter frequency

From Rosetta Code
Revision as of 21:25, 25 July 2012 by rosettacode>Gerard Schildberger (→‎Task description: removed an observation. -- ~~~~)

Task description

More detailed task description is needed. For example, should we only count ASCII letters A-Z? Case in-sensitive?

Or for that matter, what is a letter? For what language? Most programs seemed to assume the Latin alphabet for English. -- Gerard Schildberger 01:47, 30 June 2012 (UTC)

Maybe the results can be displayed with whatever method is the most convenient? I assume opening the file is required (not just handle a file that is already open)?

Since the first solutions were copied from another page and may not be correct solutions, those should be marked somehow. (Was there some specific tag for this purpose?) At least the Pascal solution seems to have nothing to do with the task.

--PauliKL 13:03, 19 September 2011 (UTC)

I took it as anything that is un-said should be whatever is convenient to the implementer.
  • ASCII? Count whatever the file open routine makes most easy.
  • Case sensitivity? Count what you get without applying any uppercase/lowercase filters.
  • Output format? whatever is convenient.
Open the file in your code? I interpreted this as being a requirement.
This leaves the guts as being a way to iterate through the characters keeping count. --Paddy3118 14:58, 19 September 2011 (UTC)

"Letter frequency" is not the same as "letter occurences". The title hints that more is needed in the task description. It would seem that some description of output is required as well. --Demivec 16:43, 9 November 2011 (UTC)

It seems that many program examples interpreted a letter as a character. A (Latin) letter has two forms: its uppercase and lowercase version. So if two H characters and three h characters were in a file, then there would be five occurrences of the letter aitch. [Aitch is the English name for the letter H or h.] The task description could've been more clear on that point, so for the REXX version 1, a count was done for each (Latin) letter, AND also for each character, and the counts are provided in seperate lists. Since it wasn't stated what a letter is (I used the primary definition that it's any of the symbols of an alphabet), it seemed appropriate to provide a both lists: a list of letters, and a list of all characters (for any lanuage's alphabet). -- Gerard Schildberger 21:19, 25 July 2012 (UTC)

In hindsight, it would've been nice to make a requirement to use the program example as the primary input (but not necessarily the only input), that way, everyone could see what the input is. At lease one example used UNIXDICT.TXT, which has no capital letters. -- Gerard Schildberger 21:19, 25 July 2012 (UTC)

a few remarks for Rexx

A few typos: carraiage -> carriage occurances -> occurrences independant -> independent

and some more substantial observations: y=d2x(L); if @.up.y==0 then iterate /*zero count? Then ignore letter*/

 c=d2c(L)                             /*C is the hex version of of char*/

In such cases I use cnt.0up.y so that a possible variable up never interferes actually y is the hex version , drop one ‘of’

@.=0 /*wouldn't it be neat to use Θ ? */ why not use cnt. ?

In the discussion I read: “Case sensitivity? Count what you get without applying any uppercase/lowercase filters”

upper c -> c=translate(c) would help for other Rexxes (in particular ooRexx)

--Walterpachl 09:21, 25 July 2012 (UTC)