Talk:Most frequent k chars distance: Difference between revisions

(→‎What is this?: Author, please explain.)
Line 8:
 
This is quite possibly the worst "similarity" function I have ever encountered and wonder if this is just a prank to see if someone can get something completely bogus published. The results table is inconsistent with the pseudo code. It is difficult to determine an algorithm that will give those results for the given inputs. Strings like 'iiii' and 'Mississippi' would rank as having a much shorter "distance" (whatever that means in this context) than identical strings like 'abcdefghijklmnopqrstuvwxyz' and 'abcdefghijklmnopqrstuvwxyz'. Which may be useful information... but this seems like an awfully convoluted way to obtain it. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]) 01:31, 24 March 2014 (UTC)
 
:I completely agree, the algorithm is not one of the best for string similarity and also there are many cases the algorithm fails. But, only for the text mining studies, where the words like 'iiiii' or 'abcdefghijklmnopqrstuvwxyz', exceptionally appears, the algorithm gives a better success than binary string similarity functions and works faster than algorithms with higher success like levenshtein distance. Besides, some specific text mining studies, perhaps in the future algorithm can be deployed in some other studies. I also found the inconsistency between pseudo code and the table and fixing it now. Also I will spend time on rewriting the article and fix its English. [[User:Shedai|Shedai]] ([[User talk:Shedai|talk]]) 08:01, 24 March 2014 (UTC)
Anonymous user