Talk:Most frequent k chars distance: Difference between revisions

Content added Content deleted

Inline

Revision as of 08:01, 24 March 2014

What is this?

This is a terrible write-up. Please try to define terms using plain English or math formulas, not some identifiers from a piece of sample code. Also this looks identical to the linked WP page except the code, which was removed by another WP editor. What's the purpose for this copy/paste? --Ledrug (talk) 02:05, 23 March 2014 (UTC)

I have asked the author to answer here. It seems to be their very first contribution to RC and seems as if it might be an attempt at self-publication. Even if it is not, I would be inclined to delete this if the author does not explain. --Paddy3118 (talk) 07:25, 24 March 2014 (UTC)

Prank Page?

This is quite possibly the worst "similarity" function I have ever encountered and wonder if this is just a prank to see if someone can get something completely bogus published. The results table is inconsistent with the pseudo code. It is difficult to determine an algorithm that will give those results for the given inputs. Strings like 'iiii' and 'Mississippi' would rank as having a much shorter "distance" (whatever that means in this context) than identical strings like 'abcdefghijklmnopqrstuvwxyz' and 'abcdefghijklmnopqrstuvwxyz'. Which may be useful information... but this seems like an awfully convoluted way to obtain it. --Thundergnat (talk) 01:31, 24 March 2014 (UTC)

I completely agree, the algorithm is not one of the best for string similarity and also there are many cases the algorithm fails. But, only for the text mining studies, where the words like 'iiiii' or 'abcdefghijklmnopqrstuvwxyz', exceptionally appears, the algorithm gives a better success than binary string similarity functions and works faster than algorithms with higher success like levenshtein distance. Besides, some specific text mining studies, perhaps in the future algorithm can be deployed in some other studies. I also found the inconsistency between pseudo code and the table and fixing it now. Also I will spend time on rewriting the article and fix its English. Shedai (talk) 08:01, 24 March 2014 (UTC)

Revision as of 07:25, 24 March 2014 (view source) rosettacode>Paddy3118 (→‎What is this?: Author, please explain.) ← Older edit		Revision as of 08:01, 24 March 2014 (view source) rosettacode>Shedai (→‎Prank Page?) Newer edit →
Line 8:		Line 8:

	This is quite possibly the worst "similarity" function I have ever encountered and wonder if this is just a prank to see if someone can get something completely bogus published. The results table is inconsistent with the pseudo code. It is difficult to determine an algorithm that will give those results for the given inputs. Strings like 'iiii' and 'Mississippi' would rank as having a much shorter "distance" (whatever that means in this context) than identical strings like 'abcdefghijklmnopqrstuvwxyz' and 'abcdefghijklmnopqrstuvwxyz'. Which may be useful information... but this seems like an awfully convoluted way to obtain it. --[[User:Thundergnat\|Thundergnat]] ([[User talk:Thundergnat\|talk]]) 01:31, 24 March 2014 (UTC)		This is quite possibly the worst "similarity" function I have ever encountered and wonder if this is just a prank to see if someone can get something completely bogus published. The results table is inconsistent with the pseudo code. It is difficult to determine an algorithm that will give those results for the given inputs. Strings like 'iiii' and 'Mississippi' would rank as having a much shorter "distance" (whatever that means in this context) than identical strings like 'abcdefghijklmnopqrstuvwxyz' and 'abcdefghijklmnopqrstuvwxyz'. Which may be useful information... but this seems like an awfully convoluted way to obtain it. --[[User:Thundergnat\|Thundergnat]] ([[User talk:Thundergnat\|talk]]) 01:31, 24 March 2014 (UTC)

			:I completely agree, the algorithm is not one of the best for string similarity and also there are many cases the algorithm fails. But, only for the text mining studies, where the words like 'iiiii' or 'abcdefghijklmnopqrstuvwxyz', exceptionally appears, the algorithm gives a better success than binary string similarity functions and works faster than algorithms with higher success like levenshtein distance. Besides, some specific text mining studies, perhaps in the future algorithm can be deployed in some other studies. I also found the inconsistency between pseudo code and the table and fixing it now. Also I will spend time on rewriting the article and fix its English. [[User:Shedai\|Shedai]] ([[User talk:Shedai\|talk]]) 08:01, 24 March 2014 (UTC)