Talk:Textonyms: Difference between revisions

← Older edit

Talk:Textonyms (view source)

Revision as of 15:47, 11 February 2015

3,334 bytes added , 9 years ago

→‎duplicate words in dictionary

Nigel Galloway

2,171

edits

Revision as of 18:27, 10 February 2015 (view source) rosettacode>Gerard Schildberger (→‎Correct number of Textonyms in unixdict.txt?: added a comment and a new talk section.) ← Older edit		Latest revision as of 15:47, 11 February 2015 (view source) Nigel Galloway (talk \| contribs) (→‎duplicate words in dictionary)
(7 intermediate revisions by 2 users not shown)
Line 7: == Example word list incomplete == It ends in the middle of "P". --[[User:Ledrug\|Ledrug]] ([[User talk:Ledrug\|talk]]) 19:48, 5 February 2015 (UTC) : Will the truncation of that file/dictionary be fixed, or was that by design?   Perhaps a note or comment stating that the dictionary was intentionally reduced (and the reason); that would answer this issue. -- [[User:Gerard Schildberger\|Gerard Schildberger]] ([[User talk:Gerard Schildberger\|talk]]) 20:33, 10 February 2015 (UTC) ==Example word list has 'odd' words== Line 26 ⟶ 28: :The code should work with any text list. I don't mind if the specific list isn't critical to the task. The advantage of the one I am proposing is that it includes names (i.e. Nigel and Paddy). The Perl entry has a wildcard feature that could be foreseeing 7733428483 8398. You may want a version which allows numbers. 2748424767 -> "Briticisms", "criticisms".--[[User:Nigel Galloway\|Nigel Galloway]] ([[User talk:Nigel Galloway\|talk]]) 15:29, 6 February 2015 (UTC) :: I agree with your statement of ''should work with any text list'' (dictionary).   However (see the talk section   '''duplicate words in dictionary''',   it depends on whether or not the dictionary contains duplicated words, and more importantly, how the computer program treats/handles those duplicates.   So, the inclusion of one particular dictionary can cause differences in the output of various computer programs, depending on, in this case, duplicate words.   Rejected words don't appear to be a problem;   but showing a count of rejected words would be a nice and easy thing to add as any dictionary is fair game. -- [[User:Gerard Schildberger\|Gerard Schildberger]] ([[User talk:Gerard Schildberger\|talk]]) 18:43, 10 February 2015 (UTC) == What does "mapping" mean here? == Line 81 ⟶ 85: If you say "two", then that affects the count of how many words can be represented by the ''key digits''   (as described by this Rosetta Code task). In the REXX program that I coded, it detects duplicate words (and ignores them, but displays a count if non-zero).   I believe that having duplicate words shouldn't alter the count of words representable by ''key digits''.   As the REXX program is currently coded, it ignores duplicated words and it shows a different digit combination count   ('''650''' digit combinations instead of '''661''', the latter counts duplicate words and reflects another way to count words representable by ''key digits''). Better still, it would be nice to have a clean dictionary, or at the least, agree on whether or not duplicate words should be ignored   (and instead report on unique words that are in the dictionary). Line 89 ⟶ 93: The   '''UNIXDICT'''   dictionary doesn't have that problem, fortunately.   In reality, almost all dictionaries have duplicate words (either by meaning, by use, by their derivation/root, by case/capitalization, or by whatever).   That shouldn't preclude the correct/accurate counting of (unique) words. -- [[User:Gerard Schildberger\|Gerard Schildberger]] ([[User talk:Gerard Schildberger\|talk]]) 18:27, 10 February 2015 (UTC) :If someone has a non-unique word list they should pipe it through <tt>uniq</tt> or <tt>sort -f -u</tt> (or wrt Rosetta Code, see the relevant task for uniquely filtering). —[[User:dchapes\|dchapes]] ([[User talk:dchapes\|talk]] \| [[Special:Contributions/dchapes\|contribs]]) 20:04, 10 February 2015 (UTC) ~~-----~~ :: Well, the '''if'''   ···   '''IS'''.   That is, the   ''someone''   is Rosetta Code (or at least, the holder of that file), and the '''Textonyms/wordlist''' dictionary file does contain duplicate words, and it (the dictionary file) is referred to as a possible example dictionary to use (from the Rosetta Code task description).   It shouldn't have to be massaged or piped though a filter to solve this Rosetta Code task.   Furthermore, the ''uniq'' or ''sort'' (or any specific tool) isn't necessary to weed out duplicates.   Of course, that is, if duplicate words are to be rejected/ignored, and so far, nobody has rung that bell yet.   I went proactive (for the REXX programming solution) and ignored/rejected duplicate words as it appeared the correct manner in handling duplicates. -- [[User:Gerard Schildberger\|Gerard Schildberger]] ([[User talk:Gerard Schildberger\|talk]]) 20:27, 10 February 2015 (UTC) ::: It would be senseless to say that aleph is a textonym for aleph so duplicate words should be rejected, and rejected words should not be counted, so any duplicates should only be counted once. The dictionary my real time spell checker is using at the moment doesn't think aleph is a word, but our wordlist knows better. :::: I assume we all know that 'as' is a word. :::: 'AS' is a qualification in UK schools approximately equivalent to half an A level. :::: 'As' is a cuneiform symbol well known to those who have read the original Epic of Gilgamesh or followed closely the 14th.C BC correspondences to the Egyptian Pharohs. Maybe this latter is a little too rosetta. ::: 'AS', 'As', and 'as' are textonyms. The wordlist is meant to be clean, so any duplicates, that actually are duplicates, are in error and can be removed --[[User:Nigel Galloway\|Nigel Galloway]] ([[User talk:Nigel Galloway\|talk]]) 15:47, 11 February 2015 (UTC)