Levenshtein distance
You are encouraged to solve this task according to the task description, using any language you may know.
This page uses content from Wikipedia. The original article was at Levenshtein distance. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance) |
In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.
For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
- kitten sitten (substitution of 'k' with 's')
- sitten sittin (substitution of 'e' with 'i')
- sittin sitting (insert 'g' at the end).
The Levenshtein distance between "rosettacode", "raisethysword" is 8; The distance between two strings is same as that when both strings is reversed.
Task : Implements a Levenshtein distance function, or uses a library function, to show the Levenshtein distance between "kitten" and "sitting".
Other edit distance at Rosettacode.org :
Ada
<lang Ada>with Ada.Text_IO;
procedure Main is
function Levenshtein_Distance (S, T : String) return Natural is D : array (0 .. S'Length, 0 .. T'Length) of Natural; begin for I in D'Range (1) loop D (I, 0) := I; end loop; for I in D'Range (2) loop D (0, I) := I; end loop; for J in T'Range loop for I in S'Range loop if S (I) = T (J) then D (I, J) := D (I - 1, J - 1); else D (I, J) := Natural'Min (Natural'Min (D (I - 1, J) + 1, D (I, J - 1) + 1), D (I - 1, J - 1) + 1); end if; end loop; end loop; return D (S'Length, T'Length); end Levenshtein_Distance;
begin
Ada.Text_IO.Put_Line ("kitten -> sitting:" & Integer'Image (Levenshtein_Distance ("kitten", "sitting"))); Ada.Text_IO.Put_Line ("rosettacode -> raisethysword:" & Integer'Image (Levenshtein_Distance ("rosettacode", "raisethysword")));
end Main;</lang>
- Output:
kitten -> sitting: 3 rosettacode -> raisethysword: 8
Aime
<lang aime>integer dist(text s, integer i, integer ls, text t, integer j, integer lt, list d) {
integer x;
x = l_q_integer(d, i * (lt + 1) + j); if (x == -1) { if (i == ls) { x = lt - j; } elif (j == lt) { x = ls - i; } elif (character(s, i) == character(t, j)) { x = dist(s, i + 1, ls, t, j + 1, lt, d); } else { x = dist(s, i + 1, ls, t, j + 1, lt, d); x = min(x, dist(s, i, ls, t, j + 1, lt, d)); x = min(x, dist(s, i + 1, ls, t, j, lt, d));
x += 1; }
l_r_integer(d, i * (lt + 1) + j, x); }
return x;
}
integer levenshtein(text s, text t) {
integer i, ls, lt; list d;
ls = length(s); lt = length(t);
i = (ls + 1) * (lt + 1); while (i) { lb_p_integer(d, -1); i -= 1; }
return dist(s, 0, ls, t, 0, lt, d);
}
integer main(void) {
text s1, s2;
s1 = "rosettacode"; s2 = "raisethysword";
o_form("distance between `~' and `~' is ~\n", s1, s2, levenshtein(s1, s2));
return 0;
}</lang>
AutoHotkey
<lang AutoHotkey>levenshtein(s, t){ If s = return StrLen(t) If t = return strLen(s) If SubStr(s, 1, 1) = SubStr(t, 1, 1) return levenshtein(SubStr(s, 2), SubStr(t, 2)) a := Levenshtein(SubStr(s, 2), SubStr(t, 2)) b := Levenshtein(s, SubStr(t, 2)) c := Levenshtein(SubStr(s, 2), t ) If (a > b) a := b if (a > c) a := c return a + 1 } s1 := "kitten" s2 := "sitting" MsgBox % "distance between " s1 " and " s2 ": " levenshtein(s1, s2)</lang>It correctly outputs '3'
AWK
Slavishly copied from the very clear AutoHotKey example.
<lang awk>#!/usr/bin/awk -f
BEGIN {
a = "kitten"; b = "sitting"; d = levenshteinDistance(a, b); p = d == 1 ? "" : "s"; printf("%s -> %s after %d edit%s\n", a, b, d, p); exit;
}
function levenshteinDistance(s1, s2,
s1First, s2First, s1Rest, s2Rest, distA, distB, distC, minDist) {
# If either string is empty, # then distance is insertion of the other's characters. if (length(s1) == 0) return length(s2); if (length(s2) == 0) return length(s1);
# Rest of process uses first characters # and remainder of each string. s1First = substr(s1, 1, 1); s2First = substr(s2, 1, 1); s1Rest = substr(s1, 2, length(s1)); s2Rest = substr(s2, 2, length(s2));
# If leading characters are the same, # then distance is that between the rest of the strings. if (s1First == s2First) { return levenshteinDistance(s1Rest, s2Rest); }
# Find the distances between sub strings. distA = levenshteinDistance(s1Rest, s2); distB = levenshteinDistance(s1, s2Rest); distC = levenshteinDistance(s1Rest, s2Rest);
# Return the minimum distance between substrings. minDist = distA; if (distB < minDist) minDist = distB; if (distC < minDist) minDist = distC; return minDist + 1; # Include change for the first character.
} </lang>
Example output:
kitten -> sitting after 3 edits
BBC BASIC
<lang bbcbasic> PRINT "'kitten' -> 'sitting' has distance " ;
PRINT ; FNlevenshtein("kitten", "sitting") PRINT "'rosettacode' -> 'raisethysword' has distance " ; PRINT ; FNlevenshtein("rosettacode", "raisethysword") END DEF FNlevenshtein(s$, t$) LOCAL i%, j%, m%, d%() DIM d%(LENs$, LENt$) FOR i% = 0 TO DIM(d%(),1) d%(i%,0) = i% NEXT FOR j% = 0 TO DIM(d%(),2) d%(0,j%) = j% NEXT FOR j% = 1 TO DIM(d%(),2) FOR i% = 1 TO DIM(d%(),1) IF MID$(s$,i%,1) = MID$(t$,j%,1) THEN d%(i%,j%) = d%(i%-1,j%-1) ELSE m% = d%(i%-1,j%-1) IF d%(i%,j%-1) < m% m% = d%(i%,j%-1) IF d%(i%-1,j%) < m% m% = d%(i%-1,j%) d%(i%,j%) = m% + 1 ENDIF NEXT NEXT j% = d%(i%-1,j%-1)</lang>
Output:
'kitten' -> 'sitting' has distance 3 'rosettacode' -> 'raisethysword' has distance 8
Bracmat
Recursive method, but with memoization. <lang bracmat>(levenshtein=
lev cache
. ( lev
= s s0 s1 t t0 t1 L a b c val key . (cache..find)$(str$!arg:?key):(?.?val) & !val | !arg:(?s,?t) & ( !s:&@(!t:? [?L) | !t:&@(!s:? [?L) ) & (cache..insert)$(!key.!L) & !L | !arg:(@(?:%?s0 ?s1),@(?:%?t0 ?t1)) & !s0:!t0 & lev$(!s1,!t1) | lev$(!s1,!t1):?a & lev$(!s,!t1):?b & lev$(!s1,!t):?c & (!b:<!a:?a|) & (!c:<!a:?a|) & (cache..insert)$(!key.1+!a) & 1+!a ) & new$hash:?cache & lev$!arg);</lang>
- Demonstrating:
levenshtein$(kitten,sitting) 3 levenshtein$(rosettacode,raisethysword) 8
C
Recursive method. Deliberately left in an inefficient state to show the recursive nature of the algorithm; notice how it would have become the Wikipedia algorithm if we memoized the function against parameters ls
and lt
.
<lang C>#include <stdio.h>
- include <string.h>
/* s, t: two strings; ls, lt: their respective length */ int levenshtein(const char *s, int ls, const char *t, int lt) {
int a, b, c;
/* if either string is empty, difference is inserting all chars * from the other */ if (!ls) return lt; if (!lt) return ls;
/* if last letters are the same, the difference is whatever is * required to edit the rest of the strings */ if (s[ls] == t[ls]) return levenshtein(s, ls - 1, t, lt - 1);
/* else try: * changing last letter of s to that of t; or * remove last letter of s; or * remove last letter of t, * any of which is 1 edit plus editing the rest of the strings */ a = levenshtein(s, ls - 1, t, lt - 1); b = levenshtein(s, ls, t, lt - 1); c = levenshtein(s, ls - 1, t, lt );
if (a > b) a = b; if (a > c) a = c;
return a + 1;
}
int main() {
const char *s1 = "rosettacode"; const char *s2 = "raisethysword"; printf("distance betweeh `%s' and `%s': %d\n", s1, s2, levenshtein(s1, strlen(s1), s2, strlen(s2)));
return 0;
}</lang> Take the above and add caching, we get (in C99): <lang c>#include <stdio.h>
- include <string.h>
int levenshtein(const char *s, const char *t) { int ls = strlen(s), lt = strlen(t); int d[ls + 1][lt + 1];
for (int i = 0; i <= ls; i++) for (int j = 0; j <= lt; j++) d[i][j] = -1;
int dist(int i, int j) { if (d[i][j] >= 0) return d[i][j];
int x; if (i == ls) x = lt - j; else if (j == lt) x = ls - i; else if (s[i] == t[j]) x = dist(i + 1, j + 1); else { x = dist(i + 1, j + 1);
int y; if ((y = dist(i, j + 1)) < x) x = y; if ((y = dist(i + 1, j)) < x) x = y; x++; } return d[i][j] = x; } return dist(0, 0); }
int main(void) { const char *s1 = "rosettacode"; const char *s2 = "raisethysword"; printf("distance betweeh `%s' and `%s': %d\n", s1, s2, levenshtein(s1, s2));
return 0;
}</lang>
C++
<lang c>#include <string>
- include <iostream>
using namespace std;
// Compute Levenshtein Distance // Martin Ettl, 2012-10-05
size_t uiLevenshteinDistance(const std::string &s1, const std::string &s2) {
const size_t m(s1.size()); const size_t n(s2.size());
if( m==0 ) return n; if( n==0 ) return m;
size_t *costs = new size_t[n + 1];
for( size_t k=0; k<=n; k++ ) costs[k] = k;
size_t i = 0; for ( std::string::const_iterator it1 = s1.begin(); it1 != s1.end(); ++it1, ++i ) { costs[0] = i+1; size_t corner = i;
size_t j = 0; for ( std::string::const_iterator it2 = s2.begin(); it2 != s2.end(); ++it2, ++j ) { size_t upper = costs[j+1]; if( *it1 == *it2 ) {
costs[j+1] = corner; }
else
{ size_t t(upper<corner?upper:corner);
costs[j+1] = (costs[j]<t?costs[j]:t)+1;
}
corner = upper; } }
size_t result = costs[n]; delete [] costs;
return result;
}
int main() { string s0 = "rosettacode";
string s1 = "raisethysword";
cout << "distance between " << s0 << " and " << s1 << " : " << uiLevenshteinDistance(s0,s1) << std::endl;
return 0;
} </lang>
- Example output:
$ ./a.out distance between rosettacode and raisethysword : 8
C#
This is a straightforward translation of the Wikipedia pseudocode. <lang csharp>using System;
namespace LevenshteinDistance {
class Program { static int LevenshteinDistance(string s, string t) { int[,] d = new int[s.Length + 1, t.Length + 1]; for (int i = 0; i <= s.Length; i++) d[i, 0] = i; for (int j = 0; j <= t.Length; j++) d[0, j] = j; for (int j = 1; j <= t.Length; j++) for (int i = 1; i <= s.Length; i++) if (s[i - 1] == t[j - 1]) d[i, j] = d[i - 1, j - 1]; //no operation else d[i, j] = Math.Min(Math.Min( d[i - 1, j] + 1, //a deletion d[i, j - 1] + 1), //an insertion d[i - 1, j - 1] + 1 //a substitution ); return d[s.Length, t.Length]; }
static void Main(string[] args) { if (args.Length == 2) Console.WriteLine("{0} -> {1} = {2}", args[0], args[1], LevenshteinDistance(args[0], args[1])); else Console.WriteLine("Usage:-\n\nLevenshteinDistance <string1> <string2>"); } }
}</lang>
- Example output:
> LevenshteinDistance kitten sitting kitten -> sitting = 3 > LevenshteinDistance rosettacode raisethysword rosettacode -> raisethysword = 8
CoffeeScript
<lang coffeescript>levenshtein = (str1, str2) ->
# more of less ported simple algorithm from JS m = str1.length n = str2.length d = []
return n unless m return m unless n
d[i] = [i] for i in [0..m] d[0][j] = j for j in [1..n] for i in [1..m] for j in [1..n] if str1[i-1] is str2[j-1] d[i][j] = d[i-1][j-1] else d[i][j] = Math.min( d[i-1][j] d[i][j-1] d[i-1][j-1] ) + 1
d[m][n]
console.log levenshtein("kitten", "sitting") console.log levenshtein("rosettacode", "raisethysword") console.log levenshtein("stop", "tops") console.log levenshtein("yo", "") console.log levenshtein("", "yo")</lang>
Common Lisp
<lang lisp>(defun levenshtein (a b)
(let* ((la (length a))
(lb (length b)) (rec (make-array (list (1+ la) (1+ lb)) :initial-element nil)))
(defun leven (x y) (cond
((zerop x) y) ((zerop y) x) ((aref rec x y) (aref rec x y)) (t (setf (aref rec x y) (+ (if (char= (char a (- la x)) (char b (- lb y))) 0 1) (min (leven (1- x) y) (leven x (1- y)) (leven (1- x) (1- y))))))))
(leven la lb)))
(print (levenshtein "rosettacode" "raisethysword"))</lang>
- Output:
8
Clojure
<lang lisp>(defn levenshtein [str1 str2]
(let [len1 (count str1) len2 (count str2)] (cond (zero? len1) len2 (zero? len2) len1 :else (let [cost (if (= (first str1) (first str2)) 0 1)] (min (inc (levenshtein (rest str1) str2)) (inc (levenshtein str1 (rest str2))) (+ cost (levenshtein (rest str1) (rest str2))))))))
(println (levenshtein "rosettacode" "raisethysword"))</lang>
- Output:
8
D
Standard Version
The standard library std.algorithm module includes a Levenshtein distance function: <lang d>import std.stdio, std.algorithm;
void main() {
writeln(levenshteinDistance("kitten", "sitting"));
}</lang>
- Output:
3
Iterative Version
<lang d>import std.stdio, std.algorithm;
int distance(in string s1, in string s2) pure nothrow {
auto costs = new int[s2.length + 1]; foreach (i; 0 .. s1.length + 1) { int lastValue = i; foreach (j; 0 .. s2.length + 1) { if (i == 0) costs[j] = j; else { if (j > 0) { int newValue = costs[j - 1]; if (s1[i - 1] != s2[j - 1]) newValue = min(newValue, lastValue, costs[j]) + 1; costs[j - 1] = lastValue; lastValue = newValue; } } } if (i > 0) costs[$ - 1] = lastValue; } return costs[$ - 1];
}
void main() {
foreach(p; [["kitten", "sitting"], ["rosettacode", "raisethysword"]]) writefln("distance(%s, %s): %d", p[0], p[1], distance(p[0], p[1]));
}</lang>
Recursive Version
A recursive version, as suggested. I think the result distance is the maximum depth of recursion. A distance of 8 may cost an aeon time to finish! <lang d>import std.stdio, std.algorithm, std.range;
int lDistR(T)(in T[] a, in T[] b) /*pure nothrow*/ {
if (a.length == 0) return b.length; if (b.length == 0) return a.length; if (a.length == b.length) { auto r = iota(a.length); auto u = r.until!(i => a[i] != b[i])(); if (equal(r, u)) // u is a short-circuit test until any // mismatch return 0; // u is equivalent to r, means no // mismatch found }
const(T)[][] candidate; immutable alen = a.length; immutable blen = b.length; // mutate _a_ by 1 edit to create members of candidate
// delete an _a_ element if (alen > blen) foreach (i; 0 .. alen) candidate ~= a[0 .. i] ~ a[i+1 .. $]; // insert matching _b_ element to _a_ if (alen < blen) { foreach (i; 0 .. alen+1) // from left candidate ~= a[0 .. i] ~ b[i] ~ a[i .. $]; foreach (i; 0 .. alen+1) // from right candidate ~= a[0 .. $-i] ~ b[$ - i - 1] ~ a[$-i .. $]; } // subsistute matching _a_ element with _b_'s if (alen == blen) foreach (i; 0 .. alen) if (a[i] != b[i]) candidate ~= a[0..i] ~ b[i] ~ a[i+1 .. $];
// exclusive cases, so only 1 edit is make to create each // new candidate
// minimum distance on this run return candidate.map!(e => lDistR(e, b) + 1)().reduce!min();
}
void main() {
writeln(lDistR("kitten", "sitting"));
}</lang>
DWScript
Based on Wikipedia version <lang delphi>function LevenshteinDistance(s, t : String) : Integer; var
i, j : Integer;
begin
var d:=new Integer[Length(s)+1, Length(t)+1]; for i:=0 to Length(s) do d[i, 0] := i; for j:=0 to Length(t) do d[0, j] := j; for j:=1 to Length(t) do for i:=1 to Length(s) do if s[i]=t[j] then d[i, j] := d[i-1, j-1] // no operation else d[i,j]:=MinInt(MinInt( d[i-1, j] +1 , // a deletion d[i, j-1] +1 ), // an insertion d[i- 1,j-1] +1 // a substitution ); Result:=d[Length(s), Length(t)];
end;
PrintLn(LevenshteinDistance('kitten', 'sitting'));</lang>
Ela
This code is translated from Haskell version.
<lang ela>open list
levenshtein s1 s2 = last <| foldl transform [0 .. length s1] s2
where transform (n::ns')@ns c = scanl calc (n+1) <| zip3 s1 ns ns' where calc z (c', x, y) = minimum [y+1, z+1, x + toInt (c' <> c)]</lang>
Executing:
<lang ela>(levenshtein "kitten" "sitting", levenshtein "rosettacode" "raisethysword")</lang>
- Output:
(3, 8)
Erlang
Here are two implementations. The first is the naive version, the second is a memoized version using Erlang's dictionary datatype. <lang erlang> -module(levenshtein). -compile(export_all).
distance_cached(S,T) ->
{L,_} = ld(S,T,dict:new()), L.
distance(S,T) ->
ld(S,T).
ld([],T) ->
length(T);
ld(S,[]) ->
length(S);
ld([X|S],[X|T]) ->
ld(S,T);
ld([_SH|ST]=S,[_TH|TT]=T) ->
1 + lists:min([ld(S,TT),ld(ST,T),ld(ST,TT)]).
ld([]=S,T,Cache) ->
{length(T),dict:store({S,T},length(T),Cache)};
ld(S,[]=T,Cache) ->
{length(S),dict:store({S,T},length(S),Cache)};
ld([X|S],[X|T],Cache) ->
ld(S,T,Cache);
ld([_SH|ST]=S,[_TH|TT]=T,Cache) ->
case dict:is_key({S,T},Cache) of true -> {dict:fetch({S,T},Cache),Cache}; false -> {L1,C1} = ld(S,TT,Cache), {L2,C2} = ld(ST,T,C1), {L3,C3} = ld(ST,TT,C2), L = 1+lists:min([L1,L2,L3]), {L,dict:store({S,T},L,C3)} end.
</lang> Below is a comparison of the runtimes, measured in microseconds, between the two implementations. <lang erlang> 68> timer:tc(levenshtein,distance,["rosettacode","raisethysword"]). {774799,8} % {Time, Result} 69> timer:tc(levenshtein,distance_cached,["rosettacode","raisethysword"]). {659,8} 70> timer:tc(levenshtein,distance,["kitten","sitting"]). {216,3} 71> timer:tc(levenshtein,distance_cached,["kitten","sitting"]). {213,3} </lang>
Euphoria
Code by Jeremy Cowgar from the Euphoria File Archive. <lang euphoria>function min(sequence s)
atom m m = s[1] for i = 2 to length(s) do if s[i] < m then m = s[i] end if end for return m
end function
function levenshtein(sequence s1, sequence s2)
integer n, m sequence d n = length(s1) + 1 m = length(s2) + 1
if n = 1 then return m-1 elsif m = 1 then return n-1 end if
d = repeat(repeat(0, m), n) for i = 1 to n do d[i][1] = i-1 end for
for j = 1 to m do d[1][j] = j-1 end for
for i = 2 to n do for j = 2 to m do d[i][j] = min({ d[i-1][j] + 1, d[i][j-1] + 1, d[i-1][j-1] + (s1[i-1] != s2[j-1]) }) end for end for
return d[n][m]
end function
? levenshtein("kitten", "sitting") ? levenshtein("rosettacode", "raisethysword")</lang>
- Output:
3 8
F#
<lang FSharp> open System
let getInput (name : string) =
Console.Write ("String {0}: ", name) Console.ReadLine ()
let levDist (strOne : string) (strTwo : string) =
let strOne = strOne.ToCharArray () let strTwo = strTwo.ToCharArray ()
let (distArray : int[,]) = Array2D.zeroCreate (strOne.Length + 1) (strTwo.Length + 1)
for i = 0 to strOne.Length do distArray.[i, 0] <- i for j = 0 to strTwo.Length do distArray.[0, j] <- j
for j = 1 to strTwo.Length do for i = 1 to strOne.Length do if strOne.[i - 1] = strTwo.[j - 1] then distArray.[i, j] <- distArray.[i - 1, j - 1] else distArray.[i, j] <- List.min ( [distArray.[i-1, j] + 1; distArray.[i, j-1] + 1; distArray.[i-1, j-1] + 1] ) distArray.[strOne.Length, strTwo.Length]
let stringOne = getInput "One"
let stringTwo = getInput "Two"
printf "%A" (levDist stringOne stringTwo)
Console.ReadKey () |> ignore </lang>
Forth
<lang forth>: levenshtein ( a1 n1 a2 n2 -- n3)
dup \ if either string is empty, difference if \ is inserting all chars from the other 2>r dup if 2dup 1- chars + c@ 2r@ 1- chars + c@ = if 1- 2r> 1- recurse exit else \ else try: 2dup 1- 2r@ 1- recurse -rot \ changing first letter of s to t; 2dup 2r@ 1- recurse -rot \ remove first letter of s; 1- 2r> recurse min min 1+ \ remove first letter of t, then \ any of which is 1 edit plus else \ editing the rest of the strings 2drop 2r> nip then else 2drop nip then
s" kitten" s" sitting" levenshtein . cr s" rosettacode" s" raisethysword" levenshtein . cr</lang>
Frink
Frink has a built-in function to calculate the Levenshtein edit distance between two strings: <lang frink>println[editDistance["kitten","sitting"]]</lang>
Go
WP algorithm: <lang go>package main
import "fmt"
func main() {
fmt.Println(ld("kitten", "sitting"))
}
func ld(s, t string) int {
d := make([][]int, len(s)+1) for i := range d { d[i] = make([]int, len(t)+1) } for i := range d { d[i][0] = i } for j := range d[0] { d[0][j] = j } for j := 1; j <= len(t); j++ { for i := 1; i <= len(s); i++ { if s[i-1] == t[j-1] { d[i][j] = d[i-1][j-1] } else { min := d[i-1][j] if d[i][j-1] < min { min = d[i][j-1] } if d[i-1][j-1] < min { min = d[i-1][j-1] } d[i][j] = min + 1 } }
} return d[len(s)][len(t)]
}</lang>
- Output:
3
<lang go>package main
import "fmt"
func levenshtein(s, t string) int {
if s == "" { return len(t) } if t == "" { return len(s) } if s[0] == t[0] { return levenshtein(s[1:], t[1:]) } a := levenshtein(s[1:], t[1:]) b := levenshtein(s, t[1:]) c := levenshtein(s[1:], t) if a > b { a = b } if a > c { a = c } return a + 1
}
func main() {
s1 := "rosettacode" s2 := "raisethysword" fmt.Printf("distance between %s and %s: %d\n", s1, s2, levenshtein(s1, s2))
}</lang>
- Output:
distance between rosettacode and raisethysword: 8
Groovy
<lang groovy>def distance(String str1, String str2) {
def dist = new int[str1.size() + 1][str2.size() + 1] (0..str1.size()).each { dist[it][0] = it } (0..str2.size()).each { dist[0][it] = it }
(1..str1.size()).each { i -> (1..str2.size()).each { j -> dist[i][j] = [dist[i - 1][j] + 1, dist[i][j - 1] + 1, dist[i - 1][j - 1] + ((str1[i - 1] == str2[j - 1]) ? 0 : 1)].min() } } return dist[str1.size()][str2.size()]
}
[ ['kitten', 'sitting']: 3,
['rosettacode', 'raisethysword']: 8, ['edocattesor', 'drowsyhtesiar']: 8 ].each { key, dist -> println "Checking distance(${key[0]}, ${key[1]}) == $dist" assert distance(key[0], key[1]) == dist
}</lang>
- Output:
Checking distance(kitten, sitting) == 3 Checking distance(rosettacode, raisethysword) == 8 Checking distance(edocattesor, drowsyhtesiar) == 8
Haskell
<lang haskell>levenshtein s1 s2 = last $ foldl transform [0 .. length s1] s2
where transform ns@(n:ns') c = scanl calc (n+1) $ zip3 s1 ns ns' where calc z (c', x, y) = minimum [y+1, z+1, x + fromEnum (c' /= c)]
main = print (levenshtein "kitten" "sitting")</lang>
- Output:
3
Icon and Unicon
<lang unicon>procedure main()
every process(!&input)
end
procedure process(s)
s ? (s1 := tab(upto(' \t')), s2 := (tab(many(' \t')), tab(0))) | fail write("'",s1,"' -> '",s2,"' = ", levenshtein(s1,s2))
end
procedure levenshtein(s, t)
if (n := *s+1) = 1 then return *t if (m := *t+1) = 1 then return *s every !(d := list(n,0)) := list(m, 0) every i := 1 to max(n,m) do d[i,1] := d[1,i] := i every d[1(i := 2 to n, s_i := s[iM1 := i-1]), j := 2 to m] := min(d[iM1,j], d[i,jM1:=j-1], d[iM1,jM1] + if s_i == t[jM1] then -1 else 0) + 1
return d[n,m]-1
end</lang>
- Example:
->leven kitten sitting 'kitten' -> 'sitting' = 3 ->
J
One approach would be a literal transcription of the wikipedia implementation: <lang j>levenshtein=:4 :0
D=. x +/&i.&>:&# y for_i.1+i.#x do. for_j.1+i.#y do. if. ((<:i){x)=(<:j){y do. D=.(D {~<<:i,j) (<i,j)} D else. min=. 1+<./D{~(i,j) <@:-"1#:1 2 3 D=. min (<i,j)} D end. end. end. {:{:D
)</lang> However, this is a rather slow and bulky approach. Another alternative would be: <lang j>levD=: i.@-@>:@#@] ,~ >:@i.@-@#@[ ,.~ ~:/ lev=: [: {. {."1@((1&{ ,~ (1 1 , {.) <./@:+ }.)@,/\.)@,./@levD</lang> First, we setup up an matrix of costs, with 0 or 1 for unexplored cells (1 being where the character pair corresponding to that cell position has two different characters). Note that the "cost to reach the empty string" cells go on the bottom and the right, instead of the top and the left, because this works better with J's "insert" operation (which we will call "reduce" in the next paragraph here).
Then we reduce the rows of that matrix using an operation that treats those two rows as columns and reduces the rows of this derived matrix with an operation that gives the (unexplored cell + the minumum of the other cells) followed by (the cell adjacent to the previously unexplored cell.
- Example use:
<lang j> 'kitten' levenshtein 'sitting' 3
'kitten' lev 'sitting'
3</lang> Time and space use: <lang j> ts=: 6!:2,7!:2
ts kitten levenshtein sitting
0.00153132 12800
ts kitten lev sitting
0.000132101 5376</lang> (The J flavored variant winds up being about 10 times faster, in J, for this test case, than the explicit version.)
See the Levenshtein distance essay on the Jwiki for additional solutions.
Java
Based on the definition for Levenshtein distance given in the Wikipedia article: <lang java>public class Levenshtein {
public static int distance(String a, String b) { a = a.toLowerCase(); b = b.toLowerCase(); // i == 0 int [] costs = new int [b.length() + 1]; for (int j = 0; j < costs.length; j++) costs[j] = j; for (int i = 1; i <= a.length(); i++) { // j == 0; nw = lev(i - 1, j) costs[0] = i; int nw = i - 1; for (int j = 1; j <= b.length(); j++) { int n = costs[j]; costs[j] = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1); nw = n; } } return costs[b.length()]; }
public static void main(String [] args) { String [] data = { "kitten", "sitting", "saturday", "sunday", "rosettacode", "raisethysword" }; for (int i = 0; i < data.length; i += 2) System.out.println("distance(" + data[i] + ", " + data[i+1] + ") = " + distance(data[i], data[i+1])); }
}</lang>
- Output:
distance(kitten, sitting) = 3 distance(saturday, sunday) = 3 distance(rosettacode, raisethysword) = 8
<lang java>public class Levenshtein{
public static int levenshtein(String s, String t){ /* if either string is empty, difference is inserting all chars * from the other */ if(s.length() == 0) return t.length(); if(t.length() == 0) return s.length();
/* if first letters are the same, the difference is whatever is * required to edit the rest of the strings */ if(s.charAt(0) == t.charAt(0)) return levenshtein(s.substring(1), t.substring(1));
/* else try: * changing first letter of s to that of t, * remove first letter of s, or * remove first letter of t */ int a = levenshtein(s.substring(1), t.substring(1)); int b = levenshtein(s, t.substring(1)); int c = levenshtein(s.substring(1), t);
if(a > b) a = b; if(a > c) a = c;
//any of which is 1 edit plus editing the rest of the strings return a + 1; }
public static void main(String[] args) { String s1 = "kitten"; String s2 = "sitting"; System.out.println("distance between '" + s1 + "' and '" + s2 + "': " + levenshtein(s1, s2)); s1 = "rosettacode"; s2 = "raisethysword"; System.out.println("distance between '" + s1 + "' and '" + s2 + "': " + levenshtein(s1, s2)); StringBuilder sb1 = new StringBuilder(s1); StringBuilder sb2 = new StringBuilder(s2); System.out.println("distance between '" + sb1.reverse() + "' and '" + sb2.reverse() + "': " + levenshtein(sb1.reverse().toString(), sb2.reverse().toString())); }
}</lang>
- Output:
distance between 'kitten' and 'sitting': 3 distance between 'rosettacode' and 'raisethysword': 8 distance between 'edocattesor' and 'drowsyhtesiar': 8
JavaScript
Based on the algorithm outlined in the Wikipedia article: <lang javascript>function levenshtein(str1, str2) {
var m = str1.length, n = str2.length, d = [], i, j;
if (!m) return n; if (!n) return m;
for (i = 0; i <= m; i++) d[i] = [i]; for (j = 0; j <= n; j++) d[0][j] = j;
for (j = 1; j <= n; j++) { for (i = 1; i <= m; i++) { if (str1[i-1] == str2[j-1]) d[i][j] = d[i - 1][j - 1]; else d[i][j] = Math.min(d[i-1][j], d[i][j-1], d[i-1][j-1]) + 1; } } return d[m][n];
}
console.log(levenshtein("kitten", "sitting")); console.log(levenshtein("stop", "tops")); console.log(levenshtein("rosettacode", "raisethysword"));</lang>
- Output:
3 2 8
Liberty BASIC
<lang lb>'Levenshtein Distance translated by Brandon Parker '08/19/10 'from http://www.merriampark.com/ld.htm#VB 'No credit was given to the Visual Basic Author on the site :-(
Print LevenshteinDistance("kitten", "sitting") End
'Get the minum of three values Function Minimum(a, b, c)
Minimum = Min(a, Min(b, c))
End Function
'Compute the Levenshtein Distance Function LevenshteinDistance(string1$, string2$)
n = Len(string1$) m = Len(string2$) If n = 0 Then LevenshteinDistance = m Exit Function End If If m = 0 Then LevenshteinDistance = n Exit Function End If Dim d(n, m) For i = 0 To n d(i, 0) = i Next i For i = 0 To m d(0, i) = i Next i For i = 1 To n si$ = Mid$(string1$, i, 1) For ii = 1 To m tj$ = Mid$(string2$, ii, 1) If si$ = tj$ Then cost = 0 Else cost = 1 End If d(i, ii) = Minimum((d(i - 1, ii) + 1), (d(i, ii - 1) + 1), (d(i - 1, ii - 1) + cost)) Next ii Next i LevenshteinDistance = d(n, m)
End Function </lang>
Lua
<lang lua>function Levenshtein_Distance( s1, s2 )
if s1:len() == 0 then return s2:len() end if s2:len() == 0 then return s1:len() end
if s1:sub( -1, -1 ) == s2:sub( -1, -1 ) then return Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -2 ) ) end
local a = Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -2 ) ) local b = Levenshtein_Distance( s1:sub( 1, -1 ), s2:sub( 1, -2 ) ) local c = Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -1 ) )
if a > b then return b + 1 end if a > c then return c + 1 end return a + 1 end
print( Levenshtein_Distance( "kitten", "sitting" ) ) print( Levenshtein_Distance( "rosettacode", "raisethysword" ) )</lang>
- Output:
3 8
Maple
<lang Maple> > with(StringTools): > Levenshtein("kitten","sitting");
3
> Levenshtein("rosettacode","raisethysword");
8
</lang>
Mathematica
<lang Mathematica>EditDistance["kitten","sitting"] ->3 EditDistance["rosettacode","raisethysword"] ->8</lang>
NetRexx
<lang NetRexx>/* NetRexx */ options replace format comments java crossref symbols nobinary
n = 0 w = n = n + 1; w[0] = n; w[n] = "kitten sitting" n = n + 1; w[0] = n; w[n] = "rosettacode raisethysword"
loop n = 1 to w[0]
say w[n].word(1) "->" w[n].word(2)":" levenshteinDistance(w[n].word(1), w[n].word(2)) end n
return
method levenshteinDistance(s, t) private static
s = s.lower t = t.lower
m = s.length n = t.length
-- for all i and j, d[i,j] will hold the Levenshtein distance between -- the first i characters of s and the first j characters of t; -- note that d has (m+1)x(n+1) values d = 0
-- source prefixes can be transformed into empty string by -- dropping all characters (Note, ooRexx arrays are 1-based) loop i = 2 to m + 1 d[i, 1] = 1 end i
-- target prefixes can be reached from empty source prefix -- by inserting every characters loop j = 2 to n + 1 d[1, j] = 1 end j
loop j = 2 to n + 1 loop i = 2 to m + 1 if s.substr(i - 1, 1) == t.substr(j - 1, 1) then do d[i, j] = d[i - 1, j - 1] -- no operation required end else do d[i, j] = - (d[i - 1, j] + 1).min( - -- a deletion (d[i, j - 1] + 1)).min( - -- an insertion (d[i - 1, j - 1] + 1)) -- a substitution end end i end j
return d[m + 1, n + 1]
</lang> Output:
kitten -> sitting: 3 rosettacode -> raisethysword: 8
Objective-C
Translation of the C# code into a NSString category <lang objc>@interface NSString (levenshteinDistance) - (NSUInteger)levenshteinDistanceToString:(NSString *)string; @end
@implementation NSString (levenshteinDistance) - (NSUInteger)levenshteinDistanceToString:(NSString *)string {
NSUInteger sl = [self length]; NSUInteger tl = [string length]; NSUInteger *d = calloc(sizeof(*d), (sl+1) * (tl+1));
- define d(i, j) d[((j) * sl) + (i)]
for (NSUInteger i = 0; i <= sl; i++) { d(i, 0) = i; } for (NSUInteger j = 0; j <= tl; j++) { d(0, j) = j; } for (NSUInteger j = 1; j <= tl; j++) { for (NSUInteger i = 1; i <= sl; i++) { if ([self characterAtIndex:i-1] == [string characterAtIndex:j-1]) { d(i, j) = d(i-1, j-1); } else { d(i, j) = MIN(d(i-1, j), MIN(d(i, j-1), d(i-1, j-1))) + 1; } } } NSUInteger r = d(sl, tl);
- undef d
free(d); return r;
} @end</lang>
OCaml
Translation of the pseudo-code of the Wikipedia article: <lang ocaml>let minimum a b c =
min a (min b c)
let levenshtein_distance s t =
let m = String.length s and n = String.length t in (* for all i and j, d.(i).(j) will hold the Levenshtein distance between the first i characters of s and the first j characters of t *) let d = Array.make_matrix (m+1) (n+1) 0 in
for i = 0 to m do d.(i).(0) <- i (* the distance of any first string to an empty second string *) done; for j = 0 to n do d.(0).(j) <- j (* the distance of any second string to an empty first string *) done;
for j = 1 to n do for i = 1 to m do
if s.[i-1] = t.[j-1] then d.(i).(j) <- d.(i-1).(j-1) (* no operation required *) else d.(i).(j) <- minimum (d.(i-1).(j) + 1) (* a deletion *) (d.(i).(j-1) + 1) (* an insertion *) (d.(i-1).(j-1) + 1) (* a substitution *) done; done;
d.(m).(n)
let test s t =
Printf.printf " %s -> %s = %d\n" s t (levenshtein_distance s t);
let () =
test "kitten" "sitting"; test "rosettacode" "raisethysword";
- </lang>
A recursive functional version
This could be made faster with memoization <lang OCaml>let levenshtein s t =
let rec dist i j = match (i,j) with | (i,0) -> i | (0,j) -> j | (i,j) -> if s.[i-1] = t.[j-1] then dist (i-1) (j-1) else let d1, d2, d3 = dist (i-1) j, dist i (j-1), dist (i-1) (j-1) in 1 + min d1 (min d2 d3) in dist (String.length s) (String.length t)
let test s t =
Printf.printf " %s -> %s = %d\n" s t (levenshtein s t)
let () =
test "kitten" "sitting"; test "rosettacode" "raisethysword";</lang>
- Output:
kitten -> sitting = 3 rosettacode -> raisethysword = 8
ooRexx
<lang ooRexx> say "kitten -> sitting:" levenshteinDistance("kitten", "sitting") say "rosettacode -> raisethysword:" levenshteinDistance("rosettacode", "raisethysword")
- routine levenshteinDistance
use arg s, t s = s~lower t = t~lower
m = s~length n = t~length
-- for all i and j, d[i,j] will hold the Levenshtein distance between -- the first i characters of s and the first j characters of t; -- note that d has (m+1)x(n+1) values d = .array~new(m + 1, m + 1)
-- clear all elements in d loop i = 1 to d~dimension(1) loop j = 1 to d~dimension(2) d[i, j] = 0 end end
-- source prefixes can be transformed into empty string by -- dropping all characters (Note, ooRexx arrays are 1-based) loop i = 2 to m + 1 d[i, 1] = 1 end
-- target prefixes can be reached from empty source prefix -- by inserting every characters loop j = 2 to n + 1 d[1, j] = 1 end
loop j = 2 to n + 1 loop i = 2 to m + 1 if s~subchar(i - 1) == t~subchar(j - 1) then d[i, j] = d[i - 1, j - 1] -- no operation required else d[i, j] = min(d[i - 1, j] + 1, - -- a deletion d[i, j-1] + 1, - -- an insertion d[i - 1, j - 1] + 1) -- a substitution end end
return d[m + 1, n + 1 ]
</lang> Output:
kitten -> sitting: 3 rosettacode -> raisethysword: 8
Pascal
A fairly direct translation of the wikipedia pseudo code: <lang pascal>Program LevenshteinDistanceDemo(output);
uses
Math;
function LevenshteinDistance(s, t: string): longint;
var d: array of array of integer; i, j, n, m: integer; begin n := length(t); m := length(s); setlength(d, n+1, m+1); for i := 0 to n do d[i,0] := i; for j := 0 to m do d[0,j] := j; for j := 1 to n do for i := 1 to m do if s[i] = t[j] then d[i,j] := d[i-1,j-1] else d[i,j] := min(d[i-1,j] + 1, min(d[i,j-1] + 1, d[i-1,j-1] + 1)); LevenshteinDistance := d[m,n]; end;
var
s1, s2: string;
begin
s1 := 'kitten'; s2 := 'sitting'; writeln('The Levenshtein distance between "', s1, '" and "', s2, '" is: ', LevenshteinDistance(s1, s2)); s1 := 'rosettacode'; s2 := 'raisethysword'; writeln('The Levenshtein distance between "', s1, '" and "', s2, '" is: ', LevenshteinDistance(s1, s2));
end.</lang>
- Output:
The Levenshtein distance between "kitten" and "sitting" is: 3 The Levenshtein distance between "rosettacode" and "raisethysword" is: 8
Perl
Recursive algorithm, as in the C sample. You are invited to comment out the line where it says so, and see the speed difference. By the way, there's the Memoize
standard module, but it requires setting quite a few parameters to work right for this example, so I'm just showing the simple minded caching scheme here.
<lang Perl>use List::Util 'min';
my %cache;
sub leven {
my ($s, $t) = @_; return length($t) if !$s; return length($s) if !$t;
$cache{$s}{$t} //= # try commenting out this line do { my ($s1, $t1) = (substr($s, 1), substr($t, 1));
(substr($s, 0, 1) eq substr($t, 0, 1)) ? leven($s1, $t1) : 1 + min(leven($s1, $t1), leven($s, $t1), leven($s1, $t )); };
}
print leven('rosettacode', 'raisethysword'), "\n";</lang>
Perl 6
Implementation of the wikipedia algorithm. Since column 0 and row 0 are used for base distances, the original algorithm would require us to compare "@s[$i-1] eq @t[$j-1]", and reference the $m and $n separately. Prepending an unused value (undef) onto @s and @t makes their indices align with the $i,$j numbering of @d, and lets us use .end instead of $m,$n. <lang perl6>sub levenshtein_distance ( Str $s, Str $t --> Int ) {
my @s = *, $s.comb; my @t = *, $t.comb;
my @d; @d[$_][ 0] = $_ for ^@s.end; @d[ 0][$_] = $_ for ^@t.end;
for 1..@s.end X 1..@t.end -> $i, $j { @d[$i][$j] = @s[$i] eq @t[$j] ?? @d[$i-1][$j-1] # No operation required when eq !! ( @d[$i-1][$j ], # Deletion @d[$i ][$j-1], # Insertion @d[$i-1][$j-1], # Substitution ).min + 1; }
return @d[*-1][*-1];
}
my @a = [<kitten sitting>], [<saturday sunday>], [<rosettacode raisethysword>];
for @a -> [$s, $t] {
say "levenshtein_distance('$s', '$t') == ", levenshtein_distance($s, $t);
}</lang>
- Output:
levenshtein_distance('kitten', 'sitting') == 3 levenshtein_distance('saturday', 'sunday') == 3 levenshtein_distance('rosettacode', 'raisethysword') == 8
PHP
<lang PHP> echo levenshtein('kitten','sitting'); echo levenshtein('rosettacode', 'raisethysword'); </lang>
- Output:
3 8
PicoLisp
Translation of the pseudo-code in the Wikipedia article: <lang PicoLisp>(de levenshtein (A B)
(let D (cons (range 0 (length A)) (mapcar '((I) (cons I (copy A))) (range 1 (length B)) ) ) (for (J . Y) B (for (I . X) A (set (nth D (inc J) (inc I)) (if (= X Y) (get D J I) (inc (min (get D J (inc I)) (get D (inc J) I) (get D J I) ) ) ) ) ) ) ) )</lang>
or, using 'map' to avoid list indexing: <lang PicoLisp>(de levenshtein (A B)
(let D (cons (range 0 (length A)) (mapcar '((I) (cons I (copy A))) (range 1 (length B)) ) ) (map '((B Y) (map '((A X P) (set (cdr P) (if (= (car A) (car B)) (car X) (inc (min (cadr X) (car P) (car X))) ) ) ) A (car Y) (cadr Y) ) ) B D ) ) )</lang>
- Output (both cases):
: (levenshtein (chop "kitten") (chop "sitting")) -> 3
Prolog
Works with SWI-Prolog.
Based on Wikipedia's pseudocode.
<lang Prolog>levenshtein(S, T, R) :-
length(S, M),
M1 is M+1,
length(T, N),
N1 is N+1,
length(Lev, N1),
maplist(init(M1), Lev),
numlist(0, N, LN),
maplist(init_n, LN, Lev),
nth0(0, Lev, Lev0),
numlist(0, M, Lev0),
% compute_levenshtein distance numlist(1, N, LN1), maplist(work_on_T(Lev, S), LN1, T), last(Lev, LevLast), last(LevLast, R).
work_on_T(Lev, S, J, TJ) :-
length(S, M),
numlist(1, M, LM),
maplist(work_on_S(Lev, J, TJ), LM, S).
work_on_S(Lev, J, C, I, C) :- % same char !, I1 is I-1, J1 is J-1, nth0(J1, Lev, LevJ1), nth0(I1, LevJ1, V), nth0(J, Lev, LevJ), nth0(I, LevJ, V).
work_on_S(Lev, J, _C1, I, _C2) :-
I1 is I-1, J1 is J - 1,
% compute the value for deletion
nth0(J, Lev, LevJ),
nth0(I1, LevJ, VD0),
VD is VD0 + 1,
% compute the value for insertion nth0(J1, Lev, LevJ1), nth0(I, LevJ1, VI0), VI is VI0 + 1,
% compute the value for substitution nth0(I1, LevJ1, VS0), VS is VS0 + 1,
% set the minimum value to cell(I,J) sort([VD, VI, VS], [V|_]),
nth0(I, LevJ, V).
init(Len, C) :-
length(C, Len).
init_n(N, L) :- nth0(0, L, N).</lang>
- Output examples:
?- levenshtein("kitten", "sitting", R). R = 3. ?- levenshtein("saturday", "sunday", R). R = 3. ?- levenshtein("rosettacode", "raisethysword", R). R = 8.
PureBasic
Based on Wikipedia's pseudocode. <lang PureBasic>Procedure LevenshteinDistance(A_string$, B_String$)
Protected m, n, i, j, min, k, l m = Len(A_string$) n = Len(B_String$) Dim D(m, n) For i=0 To m: D(i,0)=i: Next For j=0 To n: D(0,j)=j: Next For j=1 To n For i=1 To m If Mid(A_string$,i,1) = Mid(B_String$,j,1) D(i,j) = D(i-1, j-1); no operation required Else min = D(i-1, j)+1 ; a deletion k = D(i, j-1)+1 ; an insertion l = D(i-1, j-1)+1 ; a substitution If k < min: min=k: EndIf If l < min: min=l: EndIf D(i,j) = min EndIf Next Next ProcedureReturn D(m,n)
EndProcedure
- - Testing
n = LevenshteinDistance("kitten", "sitting") MessageRequester("Info","Levenshtein Distance= "+Str(n))</lang>
Python
Iterative
Implementation of the wikipedia algorithm, optimized for memory <lang python>def levenshteinDistance(s1,s2):
if len(s1) > len(s2): s1,s2 = s2,s1 distances = range(len(s1) + 1) for index2,char2 in enumerate(s2): newDistances = [index2+1] for index1,char1 in enumerate(s1): if char1 == char2: newDistances.append(distances[index1]) else: newDistances.append(1 + min((distances[index1], distances[index1+1], newDistances[-1]))) distances = newDistances return distances[-1]
print(levenshteinDistance("kitten","sitting")) print(levenshteinDistance("rosettacode","raisethysword")) </lang>
- Output:
3 8
Memoized recursive version
(Uses this cache from the standard library). <lang python>>>> from functools import lru_cache >>> @lru_cache(maxsize=4095) def ld(s, t): if not s: return len(t) if not t: return len(s) if s[0] == t[0]: return ld(s[1:], t[1:]) l1 = ld(s, t[1:]) l2 = ld(s[1:], t) l3 = ld(s[1:], t[1:]) return 1 + min(l1, l2, l3)
>>> print( ld("kitten","sitting"),ld("rosettacode","raisethysword") ) 3 8</lang>
Racket
A memoized recursive implementation. <lang racket>#lang racket
(define (levenshtein a b)
(define (ls0 a-index b-index) (cond [(or (= a-index -1) (= b-index -1)) (abs (- a-index b-index))] [else (define a-char (string-ref a a-index)) (define b-char (string-ref b b-index)) (if (equal? a-char b-char) (ls (sub1 a-index) (sub1 b-index)) (min (add1 (ls (sub1 a-index) b-index)) (add1 (ls a-index (sub1 b-index))) (add1 (ls (sub1 a-index) (sub1 b-index)))))])) (define memo (make-hash)) (define (ls a-i b-i) (hash-ref! memo (cons a-i b-i) (λ() (ls0 a-i b-i)))) (ls (sub1 (string-length a)) (sub1 (string-length b))))
(levenshtein "kitten" "sitting") (levenshtein "rosettacode" "raisethysword")</lang>
- Output:
3 8
REXX
<lang rexx>/*REXX program calculates the Levenshtein distance between two strings.*/ call levenshtein 'kitten' , "sitting" call levenshtein 'rosettacode' , "raisethysword" call levenshtein 'Sunday' , "Saturday" call levenshtein 'Vladimir_Levenshtein[1965]' , "Vladimir_Levenshtein[1965]" call levenshtein 'this_algorithm_is_similar_to', "Damerau-Levenshtein_distance" exit /*stick a fork in it, we're done.*/ /*──────────────────────────────────LEVENSHTEIN subroutine──────────────*/ levenshtein: procedure; parse arg s,t; sL=length(s); tL=length(t)
say ' 1st string = ' s say ' 2nd string = ' t
@.=0
do j=1 for tL; @.0.j=j; end /*j*/ do k=1 for sL; @.k.0=k; end /*k*/
do j=1 for tL; j_=j-1; q=substr(t,j,1) do k=1 for sL; k_=k-1 if q==substr(s,k,1) then @.k.j=@.k_.j_ else @.k.j=1 + min(@.k_.j, @.k.j_, @.k_.j_) end /*k*/ end /*j*/
say 'Levenshtein distance = ' @.sL.tL; say return</lang> output using various internal strings:
1st string = kitten 2nd string = sitting Levenshtein distance = 3 1st string = rosettacode 2nd string = raisethysword Levenshtein distance = 8 1st string = Sunday 2nd string = Saturday Levenshtein distance = 3 1st string = Vladimir_Levenshtein[1965] 2nd string = Vladimir_Levenshtein[1965] Levenshtein distance = 0 1st string = this_algorithm_is_similar_to 2nd string = Damerau-Levenshtein_distance Levenshtein distance = 24
Ruby
Implementation of the wikipedia algorithm. Invariant is that for current loop indices i
and j
, costs[k]
for k < j
contains lev(i, k)
and for k >= j
contains lev(i-1, k). The inner loop body restores the invariant for the
new value of j
.
<lang ruby>module Levenshtein
def self.distance(a, b) a, b = a.downcase, b.downcase costs = Array(0..b.length) # i == 0 (1..a.length).each do |i| costs[0], nw = i, i - 1 # j == 0; nw is lev(i-1, j) (1..b.length).each do |j| costs[j], nw = [costs[j] + 1, costs[j-1] + 1, a[i-1] == b[j-1] ? nw : nw + 1].min, costs[j] end end costs[b.length] end
def self.test %w{kitten sitting saturday sunday rosettacode raisethysword}.each_slice(2) do |a, b| puts "distance(#{a}, #{b}) = #{distance(a, b)}" end end
end</lang>
- Output:
> Levenshtein.test distance(kitten, sitting) = 3 distance(saturday, sunday) = 3 distance(rosettacode, raisethysword) = 8
Run BASIC
<lang runbasic>print levenshteinDistance("kitten", "sitting") print levenshteinDistance("rosettacode", "raisethysword") end function levenshteinDistance(s1$, s2$)
n = len(s1$) m = len(s2$) if n = 0 then levenshteinDistance = m goto [ex] end if if m = 0 then levenshteinDistance = n goto [ex] end if dim d(n, m) for i = 0 to n d(i, 0) = i next i for i = 0 to m d(0, i) = i next i for i = 1 to n si$ = mid$(s1$, i, 1) for j = 1 to m tj$ = mid$(s2$, j, 1) if si$ = tj$ then cost = 0 else cost = 1 d(i, j) = min((d(i - 1, j) + 1),min((d(i, j - 1) + 1),(d(i - 1, j - 1) + cost))) next j next i levenshteinDistance = d(n, m)
[ex]
end function</lang>Output:
3 8
Rust
Implementation of the wikipedia algorithm <lang rust>// rust 0.8
fn main() {
let x = levenshtein_distance("kitten", "sitting"); println!("{}", x);
}
fn levenshtein_distance(word1: &str, word2: &str) -> uint {
let word1_length = word1.len() + 1; let word2_length = word2.len() + 1;
let mut matrix = ~[~[0]];
for i in range(1, word1_length) { matrix[0].push(i); } for j in range(1, word2_length) { matrix.push(~[j]); }
for j in range(1, word2_length) { for i in range(1, word1_length) { let x: uint = if word1[i - 1] == word2[j - 1] { matrix[j-1][i-1] } else { let min_distance = [matrix[j][i-1], matrix[j-1][i], matrix[j-1][i-1]]; *min_distance.iter().min().unwrap() + 1 };
matrix[j].push(x); } }
matrix[word2_length-1][word1_length-1]
}</lang>
- Output:
3
Scala
Based on Wikipedia algorithm. <lang scala>import scala.math._
object Levenshtein {
def minimum(i1: Int, i2: Int, i3: Int)=min(min(i1, i2), i3) def distance(s1:String, s2:String)={ val dist=Array.tabulate(s2.length+1, s1.length+1){(j,i)=>if(j==0) i else if (i==0) j else 0}
for(j<-1 to s2.length; i<-1 to s1.length) dist(j)(i)=if(s2(j-1)==s1(i-1)) dist(j-1)(i-1)
else minimum(dist(j-1)(i)+1, dist(j)(i-1)+1, dist(j-1)(i-1)+1)
dist(s2.length)(s1.length) } def main(args: Array[String]): Unit = { printDistance("kitten", "sitting") printDistance("rosettacode", "raisethysword") } def printDistance(s1:String, s2:String)=println("%s -> %s : %d".format(s1, s2, distance(s1, s2)))
}</lang>
- Output:
kitten -> sitting : 3 rosettacode -> raisethysword : 8
Scheme
Recursive version slavishly copied from AWK.
<lang scheme>#!/usr/local/bin/gosh
(define (main args)
(let* ((a "kitten") (b "sitting") (d (levenshteinDistance a b))) (format #t "Distance between ~a and ~a is ~a.~%" a b d)))
- Number of edits needed to turn s1 into s2.
(define (levenshteinDistance s1 s2)
(let* ((s1Len (string-length s1)) (s2Len (string-length s2))) (cond ; If either empty, distance is length of the other. (insert all) ((= 0 s1Len) s2Len) ((= 0 s2Len) s1Len) (else (let* ((s1Chars (string->list s1)) (s2Chars (string->list s2)) (s1First (car s1Chars)) (s2First (car s2Chars)) (s1Rest (list->string (cdr s1Chars))) (s2Rest (list->string (cdr s2Chars)))) (cond ; If first chars equal, then use distance between the tails. ((equal? s1First s2First) (levenshteinDistance s1Rest s2Rest)) ; Distance is minimum of sub-strings. (else (+ 1 (min (levenshteinDistance s1Rest s2) (levenshteinDistance s1 s2Rest) (levenshteinDistance s1Rest s2Rest))))))))))
</lang>
- Output:
Distance between kitten and sitting is 3.
Seed7
<lang seed7>$ include "seed7_05.s7i";
const func integer: levenshteinDistance (in string: s, in string: t) is func
result var integer: distance is 0; local var array array integer: d is 0 times 0 times 0; var integer: i is 0; var integer: j is 0; begin d := [0 .. length(s)] times [0 .. length(t)] times 0; for key i range s do d[i][0] := i; end for; for key j range t do d[0][j] := j; for key i range s do if s[i] = t[j] then d[i][j] := d[pred(i)][pred(j)]; else d[i][j] := min(min(succ(d[pred(i)][j]), succ(d[i][pred(j)])), succ(d[pred(i)][pred(j)])); end if; end for; end for; distance := d[length(s)][length(t)]; end func;
const proc: main is func
begin writeln("kitten -> sitting: " <& levenshteinDistance("kitten", "sitting")); writeln("rosettacode -> raisethysword: " <& levenshteinDistance("rosettacode", "raisethysword")); end func;</lang>
- Output:
kitten -> sitting: 3 rosettacode -> raisethysword: 8
Smalltalk
ST/X provides a customizable levenshtein method in the String class (weights for individual operations can be passed in): <lang smalltalk>'kitten' levenshteinTo: 'sitting' s:1 k:1 c:1 i:1 d:1 -> 3 'rosettacode' levenshteinTo: 'raisethysword' s:1 k:1 c:1 i:1 d:1 -> 8</lang>
Tcl
<lang tcl>proc levenshteinDistance {s t} {
# Edge cases if {![set n [string length $t]]} {
return [string length $s]
} elseif {![set m [string length $s]]} {
return $n
} # Fastest way to initialize for {set i 0} {$i <= $m} {incr i} {
lappend d 0 lappend p $i
} # Loop, computing the distance table (well, a moving section) for {set j 0} {$j < $n} {} {
set tj [string index $t $j] lset d 0 [incr j] for {set i 0} {$i < $m} {} { set a [expr {[lindex $d $i]+1}] set b [expr {[lindex $p $i]+([string index $s $i] ne $tj)}] set c [expr {[lindex $p [incr i]]+1}] # Faster than min($a,$b,$c) lset d $i [expr {$a<$b ? $c<$a ? $c : $a : $c<$b ? $c : $b}] } # Swap set nd $p; set p $d; set d $nd
} # The score is at the end of the last-computed row return [lindex $p end]
}</lang>
- Usage:
<lang tcl>puts [levenshteinDistance "kitten" "sitting"]; # Prints 3</lang>
TSE SAL
<lang TSESAL>// library: math: get: damerau: levenshtein <description></description> <version>1.0.0.0.23</version> <version control></version control> (filenamemacro=getmadle.s) [kn, ri, th, 08-09-2011 23:04:55] INTEGER PROC FNMathGetDamerauLevenshteinDistanceI( STRING s1, STRING s2 )
INTEGER L1 = Length( s1 ) INTEGER L2 = Length( s2 ) INTEGER substitutionCostI = 0 STRING h1[255] = "" STRING h2[255] = "" IF ( ( L1 == 0 ) OR ( L2 == 0 ) ) // Trivial case: one string is 0-length RETURN( Max( L1, L2 ) ) ELSE // The cost of substituting the last character IF ( ( s1[ L1 ] ) == ( s2[ L2 ] ) ) substitutionCostI = 0 ELSE substitutionCostI = 1 ENDIF // h1 and h2 are s1 and s2 with the last character chopped off h1 = SubStr( s1, 1, L1 - 1 ) h2 = SubStr( s2, 1, L2 - 1 ) IF ( ( L1 > 1 ) AND ( L2 > 1 ) AND ( s1[ L1 - 0 ] == s2[ L2 - 1 ] ) AND ( s1[ L1 - 1 ] == s2[ L2 - 0 ] ) ) RETURN( Min( Min( FNMathGetDamerauLevenshteinDistanceI( h1, s2 ) + 1, FNMathGetDamerauLevenshteinDistanceI( s1, h2 ) + 1 ), Min( FNMathGetDamerauLevenshteinDistanceI( h1 , h2 ) + substitutionCostI, FNMathGetDamerauLevenshteinDistanceI( SubStr( s1, 1, L1 - 2 ), SubStr( s2, 1, L2 - 2 ) ) + 1 ) ) ) ENDIF RETURN( Min( Min( FNMathGetDamerauLevenshteinDistanceI( h1, s2 ) + 1, FNMathGetDamerauLevenshteinDistanceI( s1, h2 ) + 1 ), FNMathGetDamerauLevenshteinDistanceI( h1 , h2 ) + substitutionCostI ) ) ENDIF
END
PROC Main() STRING s1[255] = "arcain" STRING s2[255] = "arcane" Warn( "Minimum amount of steps to convert ", s1, " to ", s2, " = ", FNMathGetDamerauLevenshteinDistanceI( s1, s2 ) ) // gives e.g. 2 s1 = "algorithm" s2 = "altruistic" Warn( "Minimum amount of steps to convert ", s1, " to ", s2, " = ", FNMathGetDamerauLevenshteinDistanceI( s1, s2 ) ) // gives e.g. 6 END</lang>
Visual Basic .NET
<lang vbnet> Function LevenshteinDistance(ByVal String1 As String, ByVal String2 As String) As Integer
Dim Matrix(String1.Length, String2.Length) As Integer Dim Key As Integer For Key = 0 To String1.Length Matrix(Key, 0) = Key Next For Key = 0 To String2.Length Matrix(0, Key) = Key Next For Key1 As Integer = 1 To String2.Length For Key2 As Integer = 1 To String1.Length If String1(Key2 - 1) = String2(Key1 - 1) Then Matrix(Key2, Key1) = Matrix(Key2 - 1, Key1 - 1) Else Matrix(Key2, Key1) = Math.Min(Matrix(Key2 - 1, Key1) + 1, Math.Min(Matrix(Key2, Key1 - 1) + 1, Matrix(Key2 - 1, Key1 - 1) + 1)) End If Next Next Return Matrix(String1.Length - 1, String2.Length - 1) End Function</lang>
- Programming Tasks
- Solutions by Programming Task
- WikipediaSourced
- Ada
- Aime
- AutoHotkey
- AWK
- BBC BASIC
- Bracmat
- C
- C++
- C sharp
- CoffeeScript
- Common Lisp
- Clojure
- D
- DWScript
- Ela
- Erlang
- Euphoria
- F Sharp
- Forth
- Frink
- Go
- Groovy
- Haskell
- Icon
- Unicon
- J
- Java
- JavaScript
- Liberty BASIC
- Lua
- Maple
- Mathematica
- NetRexx
- Objective-C
- OCaml
- OoRexx
- Pascal
- Perl
- Perl 6
- PHP
- PicoLisp
- Prolog
- PureBasic
- Python
- Racket
- REXX
- Ruby
- Run BASIC
- Rust
- Scala
- Scheme
- Scheme examples needing attention
- Examples needing attention
- Seed7
- Smalltalk
- Tcl
- TSE SAL
- Visual Basic .NET