Jump to content

Wordlists

From Rosetta Code
Revision as of 02:51, 5 August 2020 by Petelomax (talk | contribs) (couple more)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Many of the tasks on Rosettacode need to be tested against some list of words. This a repository of links to various word lists which are publically available. Rosettacode does not own or claim copyright over ANY of these lists. This is just a reference for public convenience.


In no particular order, lists actually used on the site:



Approximately 25,000 words. All lower case. Minimal plurals, tenses, names, contractions or compounds words. Good general purpose word list.


Approximately 45,000 words. Mixed case. Contains plurals, tenses, names, minimal contractions / compounds. Heavy on computer related terminology.


10000 common American English words. Lower case. Contains some plurals and tenses but necessarily constrained by size.


  • *words.txt *link to repository rather than directly to list. From github user dwyl.
Approximately 466,000 English words. Mixed case. Lots of tenses, contractions, compounds, abbreviations and initialisms. Quite a few questionable / archaic words too.


Approximately 74,000 English words. Mixed case. Unfortunately severely truncated to lower cased i.


These are lists that have been mentioned or used in one or more Rosettacode tasks. By no stretch of the imagination is it, (or is it intended to be) a comprehensive list of wordlists.


If there is some other freely and publicly available wordlist that you feel should be mentioned, feel free to add a link / description. Cite a primary source if possible.

Like perhaps:

Aspell spell checker oriented wordlists: 12dicts *link to page, not directly to file. From Aspell and friends.

12 different dictionaries geared towards various purposes.

A couple more to consider, not yet used anywhere on RC. The first was contributed by Matt Sephton to rapideuphoria in 1997 and is pretty good for its size, I think. Not entirely sure where the second one came from though, or for that matter how useful it is.

https://github.com/petelomax/Phix/blob/master/demo/tinEWGdemo/tindemo/words.txt (193KB, 21086 words, lower case) https://github.com/petelomax/Phix/blob/master/demo/tinEWGdemo/WORDS.TXT (446KB, 51796 words, upper case) --Pete Lomax (talk) 02:50, 5 August 2020 (UTC)

Cookies help us deliver our services. By using our services, you agree to our use of cookies.