There are many different string similarity algorithms, some of which are detailed here: https://itnext.io/string-similarity-the-basic-know-your-algorithms-guide-3de3d7346227. It looks like the reference Java implementation is using Hamming distance. Maybe the task description should require this, or otherwise clarify what is meant by similarity. --Chunes (talk) 04:42, 28 July 2020 (UTC)
- Good point, I have refined the task description and will find resources on algorithms that can be used.
- JusC 10:01, 29 July 2020 (UTC)
So.... I have to say, I am fairly baffled by this task. The task title is "Text Completion" but the actual requested operation is to use some kind of similarity function to find possible alternates to an entered word. Though honestly, ... takes in a user inputted word and prints out possible words that are valid in the English dictionary could be satisfied by just printing the entire dictionary. From reading the reference implementation it seems like we are supposed to do some kind of "typo check". What does completion have to do with anything? Are we supposed to update on the fly as the word is entered character by character? I could see the use of that but no implementation, including the reference Java entry makes any attempt to do that.
So it's just "Algorithmic word similarity as a percentage", pick an algorithm. Maybe something like existing task Levenshtein distance or maybe existing task Jaro distance or existing task Soundex. ( Actually I would be intrigued by someone trying to use Soundex as a spelling similarity function. 🤔 )
The reference implementation uses a Hamming distance, and fair enough, Rosettacode doesn't have a Hamming distance task, possibly because it is absolute crap as a similarity function in general. The Sorenson-Dice algorithm looks interesting at least but it's only mentioned in passing and
no only the alternate Raku entry has done it so far.
My point is the task is poorly named, poorly specced and is mostly a duplicate of several existing tasks. --Thundergnat (talk) 00:29, 31 July 2020 (UTC)
- I agree -- in its present form, the task is underspecified and does not have a good name. We should drop the task if those issues can't be fixed. We should probably have a trashed task template for deleted tasks to make this kind of thing more manageable (and to allow dead tasks to be resurrected back to draft status, if someone feels up to fixing them). --Rdm (talk) 20:40, 15 July 2022 (UTC)