Talk:Levenshtein distance: Difference between revisions

m
→‎Java: l/c i
(J: follow bug fix on main page)
m (→‎Java: l/c i)
 
(8 intermediate revisions by 4 users not shown)
Line 1:
 
==J Implementation==
 
Line 81 ⟶ 82:
{. 1 3 1 2 NB. find if current character matches
1
<./31 1, 1 2 NB. find cheapestedit ofcost previousdelta delete,from insertprevious oredit substitutecosts
1
1 1 1+3 1 2 NB. addfind costedit ofcosts fixingfrom currentadjacent characterstates to cheapestthis previous editstate
4 2 3
<./4 2 3 NB. find cheapest of previous delete, insert or substitute
2
1&{1 3 1 2 NB. retain current delete cost to be used as next substitute cost
Line 95 ⟶ 98:
 
<lang j> 'kitten' {."1@((1&{ ,~ (1 1,{.) <./@:+ }.)@,/\.)@,./\.@levD 'sitting'
3 3 4 45 5 6 6 6
3 2 3 34 4 5 5 5
4 3 2 23 4 4 4 4
5 4 3 2 2 3 3 3 3
6 5 4 3 2 2 2 2
6 5 4 3 2 1 1 1
Line 104 ⟶ 107:
 
Here, the rows correspond to suffixes of the string 'kitten' and the columns correspond to suffixes of the string 'sitting', and the numbers represent the edit cost of matching the two corresponding suffixes.
 
== Changed definition ==
 
The [http://rosettacode.org/mw/index.php?title=Levenshtein_distance&oldid=211835 2015-09-15] edits to the task description change the task. Prior to those edits, substitutions cost the same as an insert or a delete. Those edits instead declare that a substitution should cost the same as an insert plus a delete. And, those edits include a reference to a paper which describes this same system.
 
But that's a different task from what this was. A [[https://en.wikipedia.org/wiki/Edit_distance#Formal_definition_and_properties|wikipedia entry]] currently seems to suggest that this would make the task equivalent to the [[http://rosettacode.org/wiki/Longest_Common_Subsequence|longest common subsequence]] task. I don't think that's precise, but there is a pretty strong relationship there.
 
Anyways, I am reverting these edits, but did not want to lose track of them.
 
Perhaps this is worth doing a new task for? I'm not sure... --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 13:56, 15 September 2015 (UTC)
 
== Java ==
 
The Java Iterative space optimized (even bounded) version translated from Python was missing the initialization of the cost array. Implicitly done in Python, but missing for Java. This results in subtle bugs for a subset of strings. One such example being ["abcdefgh", "defghabc"] with the correct answer being 6 and the previously translated version here returning 5. I updated the source code with the correct initialization, (i.e. cost[i] = i;). --[[User:Caislen|Caislen]] ([[User talk:Caislen|talk]]) 20:41, 16 November 2020 (UTC)
7,805

edits