Talk:Textonyms: Difference between revisions

(→‎duplicate words in dictionary: make a tool do one thing and combine tools as required)
 
(5 intermediate revisions by one other user not shown)
Line 7:
== Example word list incomplete ==
It ends in the middle of "P". --[[User:Ledrug|Ledrug]] ([[User talk:Ledrug|talk]]) 19:48, 5 February 2015 (UTC)
 
: Will the truncation of that file/dictionary be fixed, or was that by design?   Perhaps a note or comment stating that the dictionary was intentionally reduced (and the reason); that would answer this issue. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 20:33, 10 February 2015 (UTC)
 
==Example word list has 'odd' words==
Line 83 ⟶ 85:
If you say "two", then that affects the count of how many words can be represented by the ''key digits''   (as described by this Rosetta Code task).
 
In the REXX program that I coded, it detects duplicate words (and ignores them, but displays a count if non-zero).   I believe that having duplicate words shouldn't alter the count of words representable by ''key digits''.   As the REXX program is currently coded, it ignores duplicated words and it shows a different digit combination count   ('''650''' digit combinations instead of '''661''', the latter counts duplicate words and reflects another way to count words representable by ''key digits'').
 
Better still, it would be nice to have a clean dictionary, or at the least, agree on whether or not duplicate words should be ignored   (and instead report on unique words that are in the dictionary).
Line 93 ⟶ 95:
:If someone has a non-unique word list they should pipe it through <tt>uniq</tt> or <tt>sort -f -u</tt> (or wrt Rosetta Code, see the relevant task for uniquely filtering). &mdash;[[User:dchapes|dchapes]] ([[User talk:dchapes|talk]] | [[Special:Contributions/dchapes|contribs]]) 20:04, 10 February 2015 (UTC)
 
:: Well, the '''if''' &nbsp; ··· &nbsp; '''IS'''. &nbsp; That is, the &nbsp; ''someone'' &nbsp; is Rosetta Code (or at least, the holder of that file), and the '''Textonyms/wordlist''' dictionary file does contain duplicate words, and it (the dictionary file) is referred to as a possible example dictionary to use (from the Rosetta Code task description). &nbsp; It shouldn't have to be massaged or piped though a filter to solve this Rosetta Code task. &nbsp; Furthermore, the ''uniq'' or ''sort'' (or any specific tool) isn't necessary to weed out duplicates. &nbsp; Of course, that is, if duplicate words are to be rejected/ignored, and so far, nobody has rung that bell yet. &nbsp; I went proactive (for the REXX programming solution) and ignored/rejected duplicate words as it appeared the correct manner in handling duplicates. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 20:27, 10 February 2015 (UTC)
-----
 
::: It would be senseless to say that aleph is a textonym for aleph so duplicate words should be rejected, and rejected words should not be counted, so any duplicates should only be counted once. The dictionary my real time spell checker is using at the moment doesn't think aleph is a word, but our wordlist knows better.
 
:::: I assume we all know that 'as' is a word.
:::: 'AS' is a qualification in UK schools approximately equivalent to half an A level.
:::: 'As' is a cuneiform symbol well known to those who have read the original Epic of Gilgamesh or followed closely the 14th.C BC correspondences to the Egyptian Pharohs. Maybe this latter is a little too rosetta.
 
::: 'AS', 'As', and 'as' are textonyms. The wordlist is meant to be clean, so any duplicates, that actually are duplicates, are in error and can be removed --[[User:Nigel Galloway|Nigel Galloway]] ([[User talk:Nigel Galloway|talk]]) 15:47, 11 February 2015 (UTC)
2,171

edits