I'm working on modernizing Rosetta Code's infrastructure. Starting with communications. Please accept this time-limited open invite to RC's Slack.. --Michael Mol (talk) 20:59, 30 May 2020 (UTC)

Talk:Text completion

From Rosetta Code

String similarity[edit]

There are many different string similarity algorithms, some of which are detailed here: https://itnext.io/string-similarity-the-basic-know-your-algorithms-guide-3de3d7346227. It looks like the reference Java implementation is using Hamming distance. Maybe the task description should require this, or otherwise clarify what is meant by similarity. --Chunes (talk) 04:42, 28 July 2020 (UTC)

Good point, I have refined the task description and will find resources on algorithms that can be used.
JusC 10:01, 29 July 2020 (UTC)


So.... I have to say, I am fairly baffled by this task. The task title is "Text Completion" but the actual requested operation is to use some kind of similarity function to find possible alternates to an entered word. Though honestly, ... takes in a user inputted word and prints out possible words that are valid in the English dictionary could be satisfied by just printing the entire dictionary. From reading the reference implementation it seems like we are supposed to do some kind of "typo check". What does completion have to do with anything? Are we supposed to update on the fly as the word is entered character by character? I could see the use of that but no implementation, including the reference Java entry makes any attempt to do that.

So it's just "Algorithmic word similarity as a percentage", pick an algorithm. Maybe something like existing task Levenshtein distance or maybe existing task Jaro distance or existing task Soundex. ( Actually I would be intrigued by someone trying to use Soundex as a spelling similarity function. 🤔 )

The reference implementation uses a Hamming distance, and fair enough, Rosettacode doesn't have a Hamming distance task, possibly because it is absolute crap as a similarity function in general. The Sorenson-Dice algorithm looks interesting at least but it's only mentioned in passing and no only the alternate Raku entry has done it so far.

My point is the task is poorly named, poorly specced and is mostly a duplicate of several existing tasks. --Thundergnat (talk) 00:29, 31 July 2020 (UTC)