Talk:Letter frequency

From Rosetta Code
Revision as of 22:48, 25 July 2012 by rosettacode>Gerard Schildberger (→‎Task description: added comment about only counting capital letters. -- ~~~~)

Task description

More detailed task description is needed. For example, should we only count ASCII letters A-Z? Case in-sensitive?

Or for that matter, what is a letter? For what language? Most programs seemed to assume the Latin alphabet for English. -- Gerard Schildberger 01:47, 30 June 2012 (UTC)

Maybe the results can be displayed with whatever method is the most convenient? I assume opening the file is required (not just handle a file that is already open)?

Since the first solutions were copied from another page and may not be correct solutions, those should be marked somehow. (Was there some specific tag for this purpose?) At least the Pascal solution seems to have nothing to do with the task.

--PauliKL 13:03, 19 September 2011 (UTC)

I took it as anything that is un-said should be whatever is convenient to the implementer.
  • ASCII? Count whatever the file open routine makes most easy.
  • Case sensitivity? Count what you get without applying any uppercase/lowercase filters.
  • Output format? whatever is convenient.
Open the file in your code? I interpreted this as being a requirement.
This leaves the guts as being a way to iterate through the characters keeping count. --Paddy3118 14:58, 19 September 2011 (UTC)

"Letter frequency" is not the same as "letter occurences". The title hints that more is needed in the task description. It would seem that some description of output is required as well. --Demivec 16:43, 9 November 2011 (UTC)

It seems that many program examples interpreted a letter as a character. A (Latin) letter has two forms: its uppercase and lowercase version. So if two H characters and three h characters were in a file, then there would be five occurrences of the letter aitch. [Aitch is the English name for the letter H or h.] The task description could've been more clear on that point, so for the REXX version 1, a count was done for each (Latin) letter, AND also for each character, and the counts are provided in seperate lists. This made the loosey goosey interpretation moot. Since it wasn't stated what a letter is (I used the primary definition that it's any of the symbols of an alphabet), it seemed appropriate to provide a both lists: a list of letters, and a list of all characters (for any lanuage's alphabet). -- Gerard Schildberger 21:19, 25 July 2012 (UTC)

In hindsight, it would've been nice to make a requirement to use the program example as the primary input (but not necessarily the only input); that way, everyone could see what was used for its input. At least one example used UNIXDICT.TXT, which has no capital letters. Another example only counts capital letters. Still others showed a list, but excluded most of the counts, so it can't be verified if the uppercase letters were included (or not) with the lowercase letters, or kept as separate counts. -- Gerard Schildberger 21:19, 25 July 2012 (UTC)

a few remarks for Rexx

A few typos: carraiage -> carriage occurances -> occurrences independant -> independent

(see next section on typos and misspellings) -- Gerard Schildberger 22:17, 25 July 2012 (UTC)

and some more substantial observations: y=d2x(L); if @.up.y==0 then iterate /*zero count? Then ignore letter*/

 c=d2c(L)                             /*C is the hex version of of char*/

In such cases I use cnt.0up.y so that a possible variable up never interferes actually y is the hex version , drop one ‘of’

I don't understand what you are observing, substantial or not. Thank you for your suggestion that I modify the version 1 code, but one of "them" can't be dropped as REXX version 1 keeps track of letters as well as characters, and both of "them" are needed for their respective (count) lists. -- Gerard Schildberger 22:01, 25 July 2012 (UTC)

@.=0 /*wouldn't it be neat to use Θ ? */ why not use cnt. ?

Because it's my style of coding. I use @. for important stemmed arrays, and it makes it easier to find in the code where that stemmed array is referenced. Using such constructs as c.c is very confusing to a novice reader of REXX. There's two variables being used, c and c, one is the stemmed array name, the other the stemmed array index. But --- defending one's programming style will just start a religious war, so there is no sense in pointing faults in another's coding style. -- Gerard Schildberger 22:01, 25 July 2012 (UTC)

In the discussion I read: “Case sensitivity? Count what you get without applying any uppercase/lowercase filters”

No filters were used in REXX version 1, letters and characters were counted correctly without case sensitivity, and a count was provided for each. -- Gerard Schildberger 22:01, 25 July 2012 (UTC)

upper c -> c=translate(c) would help for other Rexxes (in particular ooRexx)

REXX version 1 was coded for classic REXX (not the object-oriented version of REXX, ooRexx), and the use of the upper statement is more intuitive when being read by people who don't know REXX that well, the upper bif explains itself. It also has uses that translate can't do, but that discussion should be done elsewhere. -- Gerard Schildberger 22:01, 25 July 2012 (UTC)

--Walterpachl 09:21, 25 July 2012 (UTC)

typos and misspellings

Rather than point out typos and/or misspellings (and hoping that the original author notices the critique and corrects), I believe it is quite acceptable and more than that, expedient to just correct the typo or misspelling as long as it's a comment (in a program) or withing a "talk" page --- if the error is an obvious one. If there's a doubt, don't change it. It's harder to tell if there's an error when the wrong word is used (was it intentional?). If I'd bothered to complain about everybody's bad spelling, typos, or wrong word use, I'd never get any real work done. The few I did correct, I make sure it's the only thing I did on that update, so other people (especially the original poster) can see what was changed. And even then, I did the Rosetta Code update with trepidation and consternation. -- Gerard Schildberger 22:24, 25 July 2012 (UTC)

Changing program code or (input) data is much more probematic. General rule of thumb: don't.

Some programmers use misspelled words like kount (instead of count) for variable names intentionaly for whatever reasons. I also use misspelled words like Ka-razy (for crazy) at times in the comment portions; sometimes these attempts at humor may be hard to discern. There is a fine line between humor and ... not humor. -- Gerard Schildberger 22:17, 25 July 2012 (UTC)