Word frequency: Difference between revisions
Content deleted Content added
m added highlighting and whitespace, added periods to some end-of-sentences in the task preamble. |
m favored highlighting and whitespace over the use of double-quoted text, split a series of directives into separate bullets, |
||
Line 5: | Line 5: | ||
Given a text file and an integer '''n''', print/display the '''n''' most |
Given a text file and an integer '''n''', print/display the '''n''' most |
||
common words in the file (and the number of their occurrences) in decreasing frequency. |
common words in the file (and the number of their occurrences) in decreasing frequency. |
||
For the purposes of this task: |
For the purposes of this task: |
||
* A word is a sequence of one or more contiguous letters. |
* A word is a sequence of one or more contiguous letters. |
||
* You are free to define what a ''letter'' is. |
|||
* You are free to define what a letter is. Underscores, accented letters, apostrophes, and other special characters can be handled at the example writer's discretion. For example, you may treat a compound word like "well-dressed" as either one word or two. The word "it's" could also be one or two words as you see fit. You may also choose not to support non US-ASCII characters. Feel free to explicitly state the thoughts behind the program decisions. |
|||
* Underscores, accented letters, apostrophes, hyphens, and other special characters can be handled at your discretion. |
|||
⚫ | |||
* You may treat a compound word like '''well-dressed''' as either one word or two. |
|||
⚫ | |||
* The word '''it's''' could also be one or two words as you see fit. |
|||
⚫ | |||
* You may also choose not to support non US-ASCII characters. |
|||
⚫ | |||
⚫ | |||
⚫ | |||
* Treat '''color''' and '''colour''' as two distinct words. |
|||
⚫ | |||
⚫ | |||
* Feel free to explicitly state the thoughts behind the program decisions. |
|||
Show example output using [http://www.gutenberg.org/files/135/135-0.txt Les Misérables from Project Gutenberg] as the text file input and display the top 10 most used words. |
Show example output using [http://www.gutenberg.org/files/135/135-0.txt Les Misérables from Project Gutenberg] as the text file input and display the top '''10''' most used words. |
||