Levenshtein distance
This page uses content from Wikipedia. The original article was at Levenshtein distance. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance) |
In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.
For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
- kitten → sitten (substitution of 'k' with 's')
- sitten → sittin (substitution of 'e' with 'i')
- sittin → sitting (insert 'g' at the end).
The Levenshtein distance between "rosettacode", "raisethysword" is 8; The distance between two strings is same as that when both strings is reversed.
Task : Implements a Levenshtein distance function, or uses a library function, to show the Levenshtein distance between "kitten" and "sitting".
Other edit distance at Rosetacode.org :
D
The standard library std.algorithm module includes a Levenshtein distance function. <lang d>import std.algorithm ;
writeln(levenshteinDistance("kitten", "sitting")) ; // print 3</lang>
PureBasic
Based on Wikipedia's pseudocode. <lang PureBasic>Procedure LevenshteinDistance(A_string$, B_String$)
Protected m, n, i, j, min, k, l m = Len(A_string$) n = Len(B_String$) Dim D(m, n) For i=0 To m: D(i,0)=i: Next For j=0 To n: D(0,j)=j: Next For j=1 To n For i=1 To m If Mid(A_string$,i,1) = Mid(B_String$,j,1) D(i,j) = D(i-1, j-1); no operation required Else min = D(i-1, j)+1 ; a deletion k = D(i, j-1)+1 ; an insertion l = D(i-1, j-1)+1 ; a substitution If k < min: min=k: EndIf If l < min: min=l: EndIf D(i,j) = min EndIf Next Next ProcedureReturn D(m,n)
EndProcedure
- - Testing
n = LevenshteinDistance("kitten", "sitting") MessageRequester("Info","Levenshtein Distance= "+Str(n))</lang>
Tcl
<lang tcl>proc levenshteinDistance {s t} {
# Edge cases if {![set n [string length $t]]} {
return [string length $s]
} elseif {![set m [string length $s]]} {
return $n
} # Fastest way to initialize for {set i 0} {$i <= $m} {incr i} {
lappend d 0 lappend p $i
} # Loop, computing the distance table (well, a moving section) for {set j 0} {$j < $n} {} {
set tj [string index $t $j] lset d 0 [incr j] for {set i 0} {$i < $m} {} { set a [expr {[lindex $d $i]+1}] set b [expr {[lindex $p $i]+([string index $s $i] ne $tj)}] set c [expr {[lindex $p [incr i]]+1}] # Faster than min($a,$b,$c) lset d $i [expr {$a<$b ? $c<$a ? $c : $a : $c<$b ? $c : $b}] } # Swap set nd $p; set p $d; set d $nd
} # The score is at the end of the last-computed row return [lindex $p end]
}</lang> Usage: <lang tcl>puts [levenshteinDistance "kitten" "sitting"]; # Prints 3</lang>