Levenshtein distance

From Rosetta Code
Jump to: navigation, search
Task
Levenshtein distance
You are encouraged to solve this task according to the task description, using any language you may know.
This page uses content from Wikipedia. The original article was at Levenshtein distance. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

  1. kitten sitten (substitution of 'k' with 's')
  2. sitten sittin (substitution of 'e' with 'i')
  3. sittin sitting (insert 'g' at the end).

The Levenshtein distance between "rosettacode", "raisethysword" is 8; The distance between two strings is same as that when both strings is reversed.

Task : Implements a Levenshtein distance function, or uses a library function, to show the Levenshtein distance between "kitten" and "sitting".

Other edit distance at Rosettacode.org :

Contents

[edit] Ada

with Ada.Text_IO;
 
procedure Main is
function Levenshtein_Distance (S, T : String) return Natural is
D : array (0 .. S'Length, 0 .. T'Length) of Natural;
begin
for I in D'Range (1) loop
D (I, 0) := I;
end loop;
for I in D'Range (2) loop
D (0, I) := I;
end loop;
for J in T'Range loop
for I in S'Range loop
if S (I) = T (J) then
D (I, J) := D (I - 1, J - 1);
else
D (I, J) :=
Natural'Min
(Natural'Min (D (I - 1, J) + 1, D (I, J - 1) + 1),
D (I - 1, J - 1) + 1);
end if;
end loop;
end loop;
return D (S'Length, T'Length);
end Levenshtein_Distance;
begin
Ada.Text_IO.Put_Line
("kitten -> sitting:" &
Integer'Image (Levenshtein_Distance ("kitten", "sitting")));
Ada.Text_IO.Put_Line
("rosettacode -> raisethysword:" &
Integer'Image (Levenshtein_Distance ("rosettacode", "raisethysword")));
end Main;
Output:
kitten -> sitting: 3
rosettacode -> raisethysword: 8

[edit] Aime

Translation of: C
integer
dist(text s, integer i, integer ls, text t, integer j, integer lt, list d)
{
integer x;
 
x = l_q_integer(d, i * (lt + 1) + j);
if (x == -1) {
if (i == ls) {
x = lt - j;
} elif (j == lt) {
x = ls - i;
} elif (s[i] == t[j]) {
x = dist(s, i + 1, ls, t, j + 1, lt, d);
} else {
x = dist(s, i + 1, ls, t, j + 1, lt, d);
x = min(x, dist(s, i, ls, t, j + 1, lt, d));
x = min(x, dist(s, i + 1, ls, t, j, lt, d));
 
x += 1;
}
 
l_r_integer(d, i * (lt + 1) + j, x);
}
 
return x;
}
 
integer
levenshtein(text s, text t)
{
integer i, ls, lt;
list d;
 
ls = length(s);
lt = length(t);
 
i = (ls + 1) * (lt + 1);
while (i) {
lb_p_integer(d, -1);
i -= 1;
}
 
return dist(s, 0, ls, t, 0, lt, d);
}
 
integer
main(void)
{
text s1, s2;
 
s1 = "rosettacode";
s2 = "raisethysword";
 
o_form("distance between `~' and `~' is ~\n", s1, s2, levenshtein(s1, s2));
 
return 0;
}

[edit] Arc

[edit] Waterhouse Arc

O(n * m) time, linear space, using lists instead of vectors

(def levenshtein (str1 str2)
(withs l1 len.str1
l2 len.str2
row range0:inc.l1
 
(times j l2
(let next list.j
(times i l1
(push
(inc:min
car.next
((if (is str1.i str2.j) dec id) car.row)
(car:zap cdr row))
next))
(= row nrev.next)))
row.l1))

[edit] AutoHotkey

Translation of: Go
levenshtein(s, t){
If s =
return StrLen(t)
If t =
return strLen(s)
If SubStr(s, 1, 1) = SubStr(t, 1, 1)
return levenshtein(SubStr(s, 2), SubStr(t, 2))
a := Levenshtein(SubStr(s, 2), SubStr(t, 2))
b := Levenshtein(s, SubStr(t, 2))
c := Levenshtein(SubStr(s, 2), t )
If (a > b)
a := b
if (a > c)
a := c
return a + 1
}
s1 := "kitten"
s2 := "sitting"
MsgBox % "distance between " s1 " and " s2 ": " levenshtein(s1, s2)
It correctly outputs '3'

[edit] AWK

Slavishly copied from the very clear AutoHotKey example.

#!/usr/bin/awk -f
 
BEGIN {
a = "kitten";
b = "sitting";
d = levenshteinDistance(a, b);
p = d == 1 ? "" : "s";
printf("%s -> %s after %d edit%s\n", a, b, d, p);
exit;
}
 
function levenshteinDistance(s1, s2,
s1First, s2First, s1Rest, s2Rest,
distA, distB, distC, minDist) {
 
# If either string is empty,
# then distance is insertion of the other's characters.
if (length(s1) == 0) return length(s2);
if (length(s2) == 0) return length(s1);
 
# Rest of process uses first characters
# and remainder of each string.
s1First = substr(s1, 1, 1);
s2First = substr(s2, 1, 1);
s1Rest = substr(s1, 2, length(s1));
s2Rest = substr(s2, 2, length(s2));
 
# If leading characters are the same,
# then distance is that between the rest of the strings.
if (s1First == s2First) {
return levenshteinDistance(s1Rest, s2Rest);
}
 
# Find the distances between sub strings.
distA = levenshteinDistance(s1Rest, s2);
distB = levenshteinDistance(s1, s2Rest);
distC = levenshteinDistance(s1Rest, s2Rest);
 
# Return the minimum distance between substrings.
minDist = distA;
if (distB < minDist) minDist = distB;
if (distC < minDist) minDist = distC;
return minDist + 1; # Include change for the first character.
}
 

Example output:

kitten -> sitting after 3 edits

[edit] BBC BASIC

      PRINT "'kitten' -> 'sitting' has distance " ;
PRINT ; FNlevenshtein("kitten", "sitting")
PRINT "'rosettacode' -> 'raisethysword' has distance " ;
PRINT ; FNlevenshtein("rosettacode", "raisethysword")
END
 
DEF FNlevenshtein(s$, t$)
LOCAL i%, j%, m%, d%()
DIM d%(LENs$, LENt$)
FOR i% = 0 TO DIM(d%(),1)
d%(i%,0) = i%
NEXT
FOR j% = 0 TO DIM(d%(),2)
d%(0,j%) = j%
NEXT
FOR j% = 1 TO DIM(d%(),2)
FOR i% = 1 TO DIM(d%(),1)
IF MID$(s$,i%,1) = MID$(t$,j%,1) THEN
d%(i%,j%) = d%(i%-1,j%-1)
ELSE
m% = d%(i%-1,j%-1)
IF d%(i%,j%-1) < m% m% = d%(i%,j%-1)
IF d%(i%-1,j%) < m% m% = d%(i%-1,j%)
d%(i%,j%) = m% + 1
ENDIF
NEXT
NEXT j%
= d%(i%-1,j%-1)

Output:

'kitten' -> 'sitting' has distance 3
'rosettacode' -> 'raisethysword' has distance 8

[edit] Bracmat

Translation of: C

Recursive method, but with memoization.

(levenshtein=
lev cache
. ( lev
= s s0 s1 t t0 t1 L a b c val key
. (cache..find)$(str$!arg:?key):(?.?val)
& !val
|  !arg:(?s,?t)
& ( !s:&@(!t:? [?L)
| !t:&@(!s:? [?L)
)
& (cache..insert)$(!key.!L)
& !L
|  !arg:(@(?:%?s0 ?s1),@(?:%?t0 ?t1))
& !s0:!t0
& lev$(!s1,!t1)
| lev$(!s1,!t1):?a
& lev$(!s,!t1):?b
& lev$(!s1,!t):?c
& (!b:<!a:?a|)
& (!c:<!a:?a|)
& (cache..insert)$(!key.1+!a)
& 1+!a
)
& new$hash:?cache
& lev$!arg);
Demonstrating:
 levenshtein$(kitten,sitting)
 3
 levenshtein$(rosettacode,raisethysword)
 8

[edit] C

Recursive method. Deliberately left in an inefficient state to show the recursive nature of the algorithm; notice how it would have become the Wikipedia algorithm if we memoized the function against parameters ls and lt.

#include <stdio.h>
#include <string.h>
 
/* s, t: two strings; ls, lt: their respective length */
int levenshtein(const char *s, int ls, const char *t, int lt)
{
int a, b, c;
 
/* if either string is empty, difference is inserting all chars
* from the other
*/

if (!ls) return lt;
if (!lt) return ls;
 
/* if last letters are the same, the difference is whatever is
* required to edit the rest of the strings
*/

if (s[ls] == t[ls])
return levenshtein(s, ls - 1, t, lt - 1);
 
/* else try:
* changing last letter of s to that of t; or
* remove last letter of s; or
* remove last letter of t,
* any of which is 1 edit plus editing the rest of the strings
*/

a = levenshtein(s, ls - 1, t, lt - 1);
b = levenshtein(s, ls, t, lt - 1);
c = levenshtein(s, ls - 1, t, lt );
 
if (a > b) a = b;
if (a > c) a = c;
 
return a + 1;
}
 
int main()
{
const char *s1 = "rosettacode";
const char *s2 = "raisethysword";
printf("distance between `%s' and `%s': %d\n", s1, s2,
levenshtein(s1, strlen(s1), s2, strlen(s2)));
 
return 0;
}

Take the above and add caching, we get (in C99):

#include <stdio.h>
#include <string.h>
 
int levenshtein(const char *s, const char *t)
{
int ls = strlen(s), lt = strlen(t);
int d[ls + 1][lt + 1];
 
for (int i = 0; i <= ls; i++)
for (int j = 0; j <= lt; j++)
d[i][j] = -1;
 
int dist(int i, int j) {
if (d[i][j] >= 0) return d[i][j];
 
int x;
if (i == ls)
x = lt - j;
else if (j == lt)
x = ls - i;
else if (s[i] == t[j])
x = dist(i + 1, j + 1);
else {
x = dist(i + 1, j + 1);
 
int y;
if ((y = dist(i, j + 1)) < x) x = y;
if ((y = dist(i + 1, j)) < x) x = y;
x++;
}
return d[i][j] = x;
}
return dist(0, 0);
}
 
int main(void)
{
const char *s1 = "rosettacode";
const char *s2 = "raisethysword";
printf("distance between `%s' and `%s': %d\n", s1, s2,
levenshtein(s1, s2));
 
return 0;
}

[edit] C++

#include <string>
#include <iostream>
using namespace std;
 
// Compute Levenshtein Distance
// Martin Ettl, 2012-10-05
 
size_t uiLevenshteinDistance(const std::string &s1, const std::string &s2)
{
const size_t m(s1.size());
const size_t n(s2.size());
 
if( m==0 ) return n;
if( n==0 ) return m;
 
size_t *costs = new size_t[n + 1];
 
for( size_t k=0; k<=n; k++ ) costs[k] = k;
 
size_t i = 0;
for ( std::string::const_iterator it1 = s1.begin(); it1 != s1.end(); ++it1, ++i )
{
costs[0] = i+1;
size_t corner = i;
 
size_t j = 0;
for ( std::string::const_iterator it2 = s2.begin(); it2 != s2.end(); ++it2, ++j )
{
size_t upper = costs[j+1];
if( *it1 == *it2 )
{
costs[j+1] = corner;
}
else
{
size_t t(upper<corner?upper:corner);
costs[j+1] = (costs[j]<t?costs[j]:t)+1;
}
 
corner = upper;
}
}
 
size_t result = costs[n];
delete [] costs;
 
return result;
}
 
int main()
{
string s0 = "rosettacode";
string s1 = "raisethysword";
cout << "distance between " << s0 << " and " << s1 << " : "
<< uiLevenshteinDistance(s0,s1) << std::endl;
 
return 0;
}
 
Example output:
$ ./a.out 
distance between rosettacode and raisethysword : 8

[edit] C#

This is a straightforward translation of the Wikipedia pseudocode.

using System;
 
namespace LevenshteinDistance
{
class Program
{
static int LevenshteinDistance(string s, string t)
{
int[,] d = new int[s.Length + 1, t.Length + 1];
for (int i = 0; i <= s.Length; i++)
d[i, 0] = i;
for (int j = 0; j <= t.Length; j++)
d[0, j] = j;
for (int j = 1; j <= t.Length; j++)
for (int i = 1; i <= s.Length; i++)
if (s[i - 1] == t[j - 1])
d[i, j] = d[i - 1, j - 1]; //no operation
else
d[i, j] = Math.Min(Math.Min(
d[i - 1, j] + 1, //a deletion
d[i, j - 1] + 1), //an insertion
d[i - 1, j - 1] + 1 //a substitution
);
return d[s.Length, t.Length];
}
 
static void Main(string[] args)
{
if (args.Length == 2)
Console.WriteLine("{0} -> {1} = {2}",
args[0], args[1], LevenshteinDistance(args[0], args[1]));
else
Console.WriteLine("Usage:-\n\nLevenshteinDistance <string1> <string2>");
}
}
}
Example output:
> LevenshteinDistance kitten sitting
kitten -> sitting = 3

> LevenshteinDistance rosettacode raisethysword
rosettacode -> raisethysword = 8

[edit] CoffeeScript

levenshtein = (str1, str2) ->
# more of less ported simple algorithm from JS
m = str1.length
n = str2.length
d = []
 
return n unless m
return m unless n
 
d[i] = [i] for i in [0..m]
d[0][j] = j for j in [1..n]
 
for i in [1..m]
for j in [1..n]
if str1[i-1] is str2[j-1]
d[i][j] = d[i-1][j-1]
else
d[i][j] = Math.min(
d[i-1][j]
d[i][j-1]
d[i-1][j-1]
) + 1
 
d[m][n]
 
console.log levenshtein("kitten", "sitting")
console.log levenshtein("rosettacode", "raisethysword")
console.log levenshtein("stop", "tops")
console.log levenshtein("yo", "")
console.log levenshtein("", "yo")

[edit] Common Lisp

(defun levenshtein (a b)
(let* ((la (length a))
(lb (length b))
(rec (make-array (list (1+ la) (1+ lb)) :initial-element nil)))
 
(defun leven (x y)
(cond
((zerop x) y)
((zerop y) x)
((aref rec x y) (aref rec x y))
(t (setf (aref rec x y)
(+ (if (char= (char a (- la x)) (char b (- lb y))) 0 1)
(min (leven (1- x) y)
(leven x (1- y))
(leven (1- x) (1- y))))))))
(leven la lb)))
 
(print (levenshtein "rosettacode" "raisethysword"))
Output:
8

[edit] Clojure

[edit] Recursive Version

(defn levenshtein [str1 str2]
(let [len1 (count str1)
len2 (count str2)]
(cond (zero? len1) len2
(zero? len2) len1
:else
(let [cost (if (= (first str1) (first str2)) 0 1)]
(min (inc (levenshtein (rest str1) str2))
(inc (levenshtein str1 (rest str2)))
(+ cost
(levenshtein (rest str1) (rest str2))))))))
 
(println (levenshtein "rosettacode" "raisethysword"))
Output:
8

[edit] Iterative version

(defn levenshtein [w1 w2]
(letfn [(cell-value [same-char? prev-row cur-row col-idx]
(min (inc (nth prev-row col-idx))
(inc (last cur-row))
(+ (nth prev-row (dec col-idx)) (if same-char?
0
1))))]
(loop [row-idx 1
max-rows (inc (count w2))
prev-row (range (inc (count w1)))]
(if (= row-idx max-rows)
(last prev-row)
(let [ch2 (nth w2 (dec row-idx))
next-prev-row (reduce (fn [cur-row i]
(let [same-char? (= (nth w1 (dec i)) ch2)]
(conj cur-row (cell-value same-char?
prev-row
cur-row
i))))
[row-idx] (range 1 (count prev-row)))]
(recur (inc row-idx) max-rows next-prev-row))))))

[edit] D

[edit] Standard Version

The standard library std.algorithm module includes a Levenshtein distance function:

void main() {
import std.stdio, std.algorithm;
 
levenshteinDistance("kitten", "sitting").writeln;
}
Output:
3

[edit] Iterative Version

Translation of: Java
import std.stdio, std.algorithm;
 
int distance(in string s1, in string s2) pure nothrow {
auto costs = new int[s2.length + 1];
 
foreach (immutable i; 0 .. s1.length + 1) {
int lastValue = i;
foreach (immutable j; 0 .. s2.length + 1) {
if (i == 0)
costs[j] = j;
else {
if (j > 0) {
int newValue = costs[j - 1];
if (s1[i - 1] != s2[j - 1])
newValue = min(newValue, lastValue, costs[j]) + 1;
costs[j - 1] = lastValue;
lastValue = newValue;
}
}
}
 
if (i > 0)
costs[$ - 1] = lastValue;
}
 
return costs[$ - 1];
}
 
void main() {
foreach (p; [["kitten", "sitting"], ["rosettacode", "raisethysword"]])
writefln("distance(%s, %s): %d", p[0], p[1], distance(p[0], p[1]));
}

[edit] Memoized Recursive Version

Translation of: Python
import std.stdio, std.array, std.algorithm, std.functional;
 
uint lDist(T)(in const(T)[] s, in const(T)[] t) nothrow {
alias mlDist = memoize!lDist;
if (s.empty || t.empty) return max(t.length, s.length);
if (s[0] == t[0]) return mlDist(s[1 .. $], t[1 .. $]);
return min(mlDist(s, t[1 .. $]),
mlDist(s[1 .. $], t),
mlDist(s[1 .. $], t[1 .. $])) + 1;
}
 
void main() {
lDist("kitten", "sitting").writeln;
lDist("rosettacode", "raisethysword").writeln;
}

[edit] DWScript

Based on Wikipedia version

function LevenshteinDistance(s, t : String) : Integer;
var
i, j : Integer;
begin
var d:=new Integer[Length(s)+1, Length(t)+1];
for i:=0 to Length(s) do
d[i, 0] := i;
for j:=0 to Length(t) do
d[0, j] := j;
 
for j:=1 to Length(t) do
for i:=1 to Length(s) do
if s[i]=t[j] then
d[i, j] := d[i-1, j-1] // no operation
else d[i,j]:=MinInt(MinInt(
d[i-1, j] +1 , // a deletion
d[i, j-1] +1 ), // an insertion
d[i- 1,j-1] +1 // a substitution
);
Result:=d[Length(s), Length(t)];
end;
 
PrintLn(LevenshteinDistance('kitten', 'sitting'));

[edit] Ela

This code is translated from Haskell version.

open list
 
levenshtein s1 s2 = last <| foldl transform [0 .. length s1] s2
where transform (n::ns')@ns c = scanl calc (n+1) <| zip3 s1 ns ns'
where calc z (c', x, y) = minimum [y+1, z+1, x + toInt (c' <> c)]

Executing:

(levenshtein "kitten" "sitting", levenshtein "rosettacode" "raisethysword")
Output:
(3, 8)

[edit] Erlang

Here are two implementations. The first is the naive version, the second is a memoized version using Erlang's dictionary datatype.

 
-module(levenshtein).
-compile(export_all).
 
distance_cached(S,T) ->
{L,_} = ld(S,T,dict:new()),
L.
 
distance(S,T) ->
ld(S,T).
 
ld([],T) ->
length(T);
ld(S,[]) ->
length(S);
ld([X|S],[X|T]) ->
ld(S,T);
ld([_SH|ST]=S,[_TH|TT]=T) ->
1 + lists:min([ld(S,TT),ld(ST,T),ld(ST,TT)]).
 
ld([]=S,T,Cache) ->
{length(T),dict:store({S,T},length(T),Cache)};
ld(S,[]=T,Cache) ->
{length(S),dict:store({S,T},length(S),Cache)};
ld([X|S],[X|T],Cache) ->
ld(S,T,Cache);
ld([_SH|ST]=S,[_TH|TT]=T,Cache) ->
case dict:is_key({S,T},Cache) of
true -> {dict:fetch({S,T},Cache),Cache};
false ->
{L1,C1} = ld(S,TT,Cache),
{L2,C2} = ld(ST,T,C1),
{L3,C3} = ld(ST,TT,C2),
L = 1+lists:min([L1,L2,L3]),
{L,dict:store({S,T},L,C3)}
end.
 

Below is a comparison of the runtimes, measured in microseconds, between the two implementations.

 
68> timer:tc(levenshtein,distance,["rosettacode","raisethysword"]).
{774799,8} % {Time, Result}
69> timer:tc(levenshtein,distance_cached,["rosettacode","raisethysword"]).
{659,8}
70> timer:tc(levenshtein,distance,["kitten","sitting"]).
{216,3}
71> timer:tc(levenshtein,distance_cached,["kitten","sitting"]).
{213,3}
 

[edit] Euphoria

Code by Jeremy Cowgar from the Euphoria File Archive.

function min(sequence s)
atom m
m = s[1]
for i = 2 to length(s) do
if s[i] < m then
m = s[i]
end if
end for
return m
end function
 
function levenshtein(sequence s1, sequence s2)
integer n, m
sequence d
n = length(s1) + 1
m = length(s2) + 1
 
if n = 1 then
return m-1
elsif m = 1 then
return n-1
end if
 
d = repeat(repeat(0, m), n)
for i = 1 to n do
d[i][1] = i-1
end for
 
for j = 1 to m do
d[1][j] = j-1
end for
 
for i = 2 to n do
for j = 2 to m do
d[i][j] = min({
d[i-1][j] + 1,
d[i][j-1] + 1,
d[i-1][j-1] + (s1[i-1] != s2[j-1])
})
end for
end for
 
return d[n][m]
end function
 
? levenshtein("kitten", "sitting")
? levenshtein("rosettacode", "raisethysword")
Output:
3
8

[edit] F#

 
open System
 
let getInput (name : string) =
Console.Write ("String {0}: ", name)
Console.ReadLine ()
 
let levDist (strOne : string) (strTwo : string) =
let strOne = strOne.ToCharArray ()
let strTwo = strTwo.ToCharArray ()
 
let (distArray : int[,]) = Array2D.zeroCreate (strOne.Length + 1) (strTwo.Length + 1)
 
for i = 0 to strOne.Length do distArray.[i, 0] <- i
for j = 0 to strTwo.Length do distArray.[0, j] <- j
 
for j = 1 to strTwo.Length do
for i = 1 to strOne.Length do
if strOne.[i - 1] = strTwo.[j - 1] then distArray.[i, j] <- distArray.[i - 1, j - 1]
else
distArray.[i, j] <- List.min (
[distArray.[i-1, j] + 1;
distArray.[i, j-1] + 1;
distArray.[i-1, j-1] + 1]
)
distArray.[strOne.Length, strTwo.Length]
 
 
let stringOne = getInput "One"
let stringTwo = getInput "Two"
printf "%A" (levDist stringOne stringTwo)
 
Console.ReadKey () |> ignore
 

[edit] Forth

Translation of: C
: levenshtein                          ( a1 n1 a2 n2 -- n3)
dup \ if either string is empty, difference
if \ is inserting all chars from the other
2>r dup
if
2dup 1- chars + c@ 2r@ 1- chars + c@ =
if
1- 2r> 1- recurse exit
else \ else try:
2dup 1- 2r@ 1- recurse -rot \ changing first letter of s to t;
2dup 2r@ 1- recurse -rot \ remove first letter of s;
1- 2r> recurse min min 1+ \ remove first letter of t,
then \ any of which is 1 edit plus
else \ editing the rest of the strings
2drop 2r> nip
then
else
2drop nip
then
;
 
s" kitten" s" sitting" levenshtein . cr
s" rosettacode" s" raisethysword" levenshtein . cr

[edit] Frink

Frink has a built-in function to calculate the Levenshtein edit distance between two strings:

println[editDistance["kitten","sitting"]]

[edit] Go

WP algorithm:

package main
 
import "fmt"
 
func main() {
fmt.Println(ld("kitten", "sitting"))
}
 
func ld(s, t string) int {
d := make([][]int, len(s)+1)
for i := range d {
d[i] = make([]int, len(t)+1)
}
for i := range d {
d[i][0] = i
}
for j := range d[0] {
d[0][j] = j
}
for j := 1; j <= len(t); j++ {
for i := 1; i <= len(s); i++ {
if s[i-1] == t[j-1] {
d[i][j] = d[i-1][j-1]
} else {
min := d[i-1][j]
if d[i][j-1] < min {
min = d[i][j-1]
}
if d[i-1][j-1] < min {
min = d[i-1][j-1]
}
d[i][j] = min + 1
}
}
 
}
return d[len(s)][len(t)]
}
Output:
3
Translation of: C
package main
 
import "fmt"
 
func levenshtein(s, t string) int {
if s == "" { return len(t) }
if t == "" { return len(s) }
if s[0] == t[0] {
return levenshtein(s[1:], t[1:])
}
a := levenshtein(s[1:], t[1:])
b := levenshtein(s, t[1:])
c := levenshtein(s[1:], t)
if a > b { a = b }
if a > c { a = c }
return a + 1
}
 
func main() {
s1 := "rosettacode"
s2 := "raisethysword"
fmt.Printf("distance between %s and %s: %d\n", s1, s2,
levenshtein(s1, s2))
}
Output:
distance between rosettacode and raisethysword: 8

[edit] Groovy

def distance(String str1, String str2) {
def dist = new int[str1.size() + 1][str2.size() + 1]
(0..str1.size()).each { dist[it][0] = it }
(0..str2.size()).each { dist[0][it] = it }
 
(1..str1.size()).each { i ->
(1..str2.size()).each { j ->
dist[i][j] = [dist[i - 1][j] + 1, dist[i][j - 1] + 1, dist[i - 1][j - 1] + ((str1[i - 1] == str2[j - 1]) ? 0 : 1)].min()
}
}
return dist[str1.size()][str2.size()]
}
 
[ ['kitten', 'sitting']: 3,
['rosettacode', 'raisethysword']: 8,
['edocattesor', 'drowsyhtesiar']: 8 ].each { key, dist ->
println "Checking distance(${key[0]}, ${key[1]}) == $dist"
assert distance(key[0], key[1]) == dist
}
Output:
Checking distance(kitten, sitting) == 3
Checking distance(rosettacode, raisethysword) == 8
Checking distance(edocattesor, drowsyhtesiar) == 8

[edit] Haskell

levenshtein s1 s2 = last $ foldl transform [0 .. length s1] s2
where transform ns@(n:ns') c = scanl calc (n+1) $ zip3 s1 ns ns'
where calc z (c', x, y) = minimum [y+1, z+1, x + fromEnum (c' /= c)]
 
main = print (levenshtein "kitten" "sitting")
Output:
3

[edit] Icon and Unicon

procedure main()
every process(!&input)
end
 
procedure process(s)
s ? (s1 := tab(upto(' \t')), s2 := (tab(many(' \t')), tab(0))) | fail
write("'",s1,"' -> '",s2,"' = ", levenshtein(s1,s2))
end
 
procedure levenshtein(s, t)
if (n := *s+1) = 1 then return *t
if (m := *t+1) = 1 then return *s
 
every !(d := list(n,0)) := list(m, 0)
every i := 1 to max(n,m) do d[i,1] := d[1,i] := i
every d[1(i := 2 to n, s_i := s[iM1 := i-1]), j := 2 to m] :=
min(d[iM1,j], d[i,jM1:=j-1],
d[iM1,jM1] + if s_i == t[jM1] then -1 else 0) + 1
 
return d[n,m]-1
end
Example:
->leven
kitten  sitting
'kitten' -> 'sitting' = 3
->

[edit] J

One approach would be a literal transcription of the wikipedia implementation:

levenshtein=:4 :0
D=. x +/&i.&>:&# y
for_i.1+i.#x do.
for_j.1+i.#y do.
if. ((<:i){x)=(<:j){y do.
D=.(D {~<<:i,j) (<i,j)} D
else.
min=. 1+<./D{~(i,j) <@:-"1#:1 2 3
D=. min (<i,j)} D
end.
end.
end.
{:{:D
)

However, this is a rather slow and bulky approach. Another alternative would be:

levD=: i.@-@>:@#@] ,~ >:@i.@-@#@[ ,.~ ~:/
lev=: [: {. {."1@((1&{ ,~ (1 1 , {.) <./@:+ }.)@,/\.)@,./@levD

First, we setup up an matrix of costs, with 0 or 1 for unexplored cells (1 being where the character pair corresponding to that cell position has two different characters). Note that the "cost to reach the empty string" cells go on the bottom and the right, instead of the top and the left, because this works better with J's "insert" operation (which we will call "reduce" in the next paragraph here).

Then we reduce the rows of that matrix using an operation that treats those two rows as columns and reduces the rows of this derived matrix with an operation that gives the (unexplored cell + the minumum of the other cells) followed by (the cell adjacent to the previously unexplored cell.

Example use:
   'kitten' levenshtein 'sitting'
3
'kitten' lev 'sitting'
3

Time and space use:

   ts=: 6!:2,7!:2
ts '''kitten'' levenshtein ''sitting'''
0.00153132 12800
ts '''kitten'' lev ''sitting'''
0.000132101 5376

(The J flavored variant winds up being about 10 times faster, in J, for this test case, than the explicit version.)

See the Levenshtein distance essay on the Jwiki for additional solutions.

[edit] Java

Based on the definition for Levenshtein distance given in the Wikipedia article:

public class Levenshtein {
 
public static int distance(String a, String b) {
a = a.toLowerCase();
b = b.toLowerCase();
// i == 0
int [] costs = new int [b.length() + 1];
for (int j = 0; j < costs.length; j++)
costs[j] = j;
for (int i = 1; i <= a.length(); i++) {
// j == 0; nw = lev(i - 1, j)
costs[0] = i;
int nw = i - 1;
for (int j = 1; j <= b.length(); j++) {
int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
nw = costs[j];
costs[j] = cj;
}
}
return costs[b.length()];
}
 
public static void main(String [] args) {
String [] data = { "kitten", "sitting", "saturday", "sunday", "rosettacode", "raisethysword" };
for (int i = 0; i < data.length; i += 2)
System.out.println("distance(" + data[i] + ", " + data[i+1] + ") = " + distance(data[i], data[i+1]));
}
}
Output:
distance(kitten, sitting) = 3
distance(saturday, sunday) = 3
distance(rosettacode, raisethysword) = 8
Translation of: C
public class Levenshtein{
public static int levenshtein(String s, String t){
/* if either string is empty, difference is inserting all chars
* from the other
*/

if(s.length() == 0) return t.length();
if(t.length() == 0) return s.length();
 
/* if first letters are the same, the difference is whatever is
* required to edit the rest of the strings
*/

if(s.charAt(0) == t.charAt(0))
return levenshtein(s.substring(1), t.substring(1));
 
/* else try:
* changing first letter of s to that of t,
* remove first letter of s, or
* remove first letter of t
*/

int a = levenshtein(s.substring(1), t.substring(1));
int b = levenshtein(s, t.substring(1));
int c = levenshtein(s.substring(1), t);
 
if(a > b) a = b;
if(a > c) a = c;
 
//any of which is 1 edit plus editing the rest of the strings
return a + 1;
}
 
public static void main(String[] args) {
String s1 = "kitten";
String s2 = "sitting";
System.out.println("distance between '" + s1 + "' and '"
+ s2 + "': " + levenshtein(s1, s2));
s1 = "rosettacode";
s2 = "raisethysword";
System.out.println("distance between '" + s1 + "' and '"
+ s2 + "': " + levenshtein(s1, s2));
StringBuilder sb1 = new StringBuilder(s1);
StringBuilder sb2 = new StringBuilder(s2);
System.out.println("distance between '" + sb1.reverse() + "' and '"
+ sb2.reverse() + "': "
+ levenshtein(sb1.reverse().toString(), sb2.reverse().toString()));
}
}
Output:
distance between 'kitten' and 'sitting': 3
distance between 'rosettacode' and 'raisethysword': 8
distance between 'edocattesor' and 'drowsyhtesiar': 8

[edit] JavaScript

Based on the algorithm outlined in the Wikipedia article:

function levenshtein(str1, str2) {
var m = str1.length,
n = str2.length,
d = [],
i, j;
 
if (!m) return n;
if (!n) return m;
 
for (i = 0; i <= m; i++) d[i] = [i];
for (j = 0; j <= n; j++) d[0][j] = j;
 
for (j = 1; j <= n; j++) {
for (i = 1; i <= m; i++) {
if (str1[i-1] == str2[j-1]) d[i][j] = d[i - 1][j - 1];
else d[i][j] = Math.min(d[i-1][j], d[i][j-1], d[i-1][j-1]) + 1;
}
}
return d[m][n];
}
 
console.log(levenshtein("kitten", "sitting"));
console.log(levenshtein("stop", "tops"));
console.log(levenshtein("rosettacode", "raisethysword"));
Output:
3
2
8

[edit] jq

The main point of interest about the following implementation is that it shows how the naive recursive algorithm can be tweaked within a completely functional framework to yield an efficient implementation.

Performance: Here is a breakdown of the run-times on a 2.53GHz machine:

  9ms overhead (invoking jq and compiling the program)
 17ms for kitten/sitting
 67ms for rosettacode/raisethysword
 71ms for edocattesor/drowsyhtesiar
 
# lookup the distance between s and t in the nested cache,
# which uses basic properties of the Levenshtein distance to save space:
def lookup(s;t):
if (s == t) then 0
elif (s|length) == 0 then (t|length)
elif (t|length) == 0 then (s|length)
elif (s|length) > (t|length) then
.[t] as $t | if $t then $t[s] else null end
else .[s] as $s | if $s then $s[t] else null end
end ;
 
# output is the updated cache;
# basic properties of the Levenshtein distance are used to save space:
def store(s;t;value):
if (s == t) then .
else (s|length) as $s | (t|length) as $t
| if $s == 0 or $t == 0 then .
elif $s < $t then .[s][t] = value
elif $t < $s then .[t][s] = value
else (.[s][t] = value) | (.[t][s] = value)
end
end ;
 
# Input is a cache of nested objects; output is [distance, cache]
def ld(s1; s2):
 
# emit [distance, cache]
# input: cache
def cached_ld(s;t):
lookup(s;t) as $check
| if $check then [ $check, . ]
else ld(s;t)
end
 ;
 
# If either string is empty,
# then distance is insertion of the other's characters.
if (s1|length) == 0 then [(s2|length), .]
elif (s2|length) == 0 then [(s1|length), .]
elif (s1[0:1] == s2[0:1]) then
cached_ld(s1[1:]; s2[1:])
else
cached_ld(s1[1:]; s2) as $a
| ($a[1] | cached_ld(s1; s2[1:])) as $b
| ($b[1] | cached_ld(s1[1:]; s2[1:])) as $c
| [$a[0], $b[0], $c[0]] | (min + 1) as $d
| [$d, ($c[1] | store(s1;s2;$d)) ]
end ;
 
def levenshteinDistance(s;t):
s as $s | t as $t | {} | ld($s;$t) | .[0];

Task

def demo:
"levenshteinDistance between \(.[0]) and \(.[1]) is \( levenshteinDistance(.[0]; .[1]) )";
 
(["kitten", "sitting"] | demo),
(["rosettacode","raisethysword"] | demo),
(["edocattesor", "drowsyhtesiar"] | demo),
(["this_algorithm_is_similar_to",
"Damerau-Levenshtein_distance"] | demo)
Output:
levenshteinDistance between kitten and sitting is 3
levenshteinDistance between rosettacode and raisethysword is 8
levenshteinDistance between edocattesor and drowsyhtesiar is 8

[edit] LFE

[edit] Simple Implementation

Suitable for short strings:

 
(defun levenshtein-simple
(('() str)
(length str))
((str '())
(length str))
(((cons a str1) (cons b str2)) (when (== a b))
(levenshtein-simple str1 str2))
(((= (cons _ str1-tail) str1) (= (cons _ str2-tail) str2))
(+ 1 (lists:min
(list
(levenshtein-simple str1 str2-tail)
(levenshtein-simple str1-tail str2)
(levenshtein-simple str1-tail str2-tail))))))
 

You can copy and paste that function into an LFE REPL and run it like so:

 
> (levenshtein-simple "a" "a")
0
> (levenshtein-simple "a" "")
1
> (levenshtein-simple "" "a")
1
> (levenshtein-simple "kitten" "sitting")
3
 

It is not recommended to test strings longer than the last example using this implementation, as performance quickly degrades.

[edit] Cached Implementation

 
(defun levenshtein-distance (str1 str2)
(let (((tuple distance _) (levenshtein-distance
str1 str2 (dict:new))))
distance))
 
(defun levenshtein-distance
(((= '() str1) str2 cache)
(tuple (length str2)
(dict:store (tuple str1 str2)
(length str2)
cache)))
((str1 (= '() str2) cache)
(tuple (length str1)
(dict:store (tuple str1 str2)
(length str1)
cache)))
(((cons a str1) (cons b str2) cache) (when (== a b))
(levenshtein-distance str1 str2 cache))
(((= (cons _ str1-tail) str1) (= (cons _ str2-tail) str2) cache)
(case (dict:is_key (tuple str1 str2) cache)
('true (tuple (dict:fetch (tuple str1 str2) cache) cache))
('false (let* (((tuple l1 c1) (levenshtein-distance str1 str2-tail cache))
((tuple l2 c2) (levenshtein-distance str1-tail str2 c1))
((tuple l3 c3) (levenshtein-distance str1-tail str2-tail c2))
(len (+ 1 (lists:min (list l1 l2 l3)))))
(tuple len (dict:store (tuple str1 str2) len c3)))))))
 

As before, here's some usage in the REPL. Note that longer strings are now possible to compare without incurring long execution times:

 
> (levenshtein-distance "a" "a")
0
> (levenshtein-distance "a" "")
1
> (levenshtein-distance "" "a")
1
> (levenshtein-distance "kitten" "sitting")
3
> (levenshtein-distance "rosettacode" "raisethysword")
8
 

[edit] Liberty BASIC

'Levenshtein Distance translated by Brandon Parker
'08/19/10
'from http://www.merriampark.com/ld.htm#VB
'No credit was given to the Visual Basic Author on the site :-(
 
Print LevenshteinDistance("kitten", "sitting")
End
 
'Get the minum of three values
Function Minimum(a, b, c)
Minimum = Min(a, Min(b, c))
End Function
 
'Compute the Levenshtein Distance
Function LevenshteinDistance(string1$, string2$)
n = Len(string1$)
m = Len(string2$)
If n = 0 Then
LevenshteinDistance = m
Exit Function
End If
If m = 0 Then
LevenshteinDistance = n
Exit Function
End If
Dim d(n, m)
For i = 0 To n
d(i, 0) = i
Next i
For i = 0 To m
d(0, i) = i
Next i
For i = 1 To n
si$ = Mid$(string1$, i, 1)
For ii = 1 To m
tj$ = Mid$(string2$, ii, 1)
If si$ = tj$ Then
cost = 0
Else
cost = 1
End If
d(i, ii) = Minimum((d(i - 1, ii) + 1), (d(i, ii - 1) + 1), (d(i - 1, ii - 1) + cost))
Next ii
Next i
LevenshteinDistance = d(n, m)
End Function

[edit] Lua

function Levenshtein_Distance( s1, s2 )
if s1:len() == 0 then return s2:len() end
if s2:len() == 0 then return s1:len() end
 
if s1:sub( -1, -1 ) == s2:sub( -1, -1 ) then
return Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -2 ) )
end
 
local a = Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -2 ) )
local b = Levenshtein_Distance( s1:sub( 1, -1 ), s2:sub( 1, -2 ) )
local c = Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -1 ) )
 
if a > b then return b + 1 end
if a > c then return c + 1 end
return a + 1
end
 
print( Levenshtein_Distance( "kitten", "sitting" ) )
print( Levenshtein_Distance( "rosettacode", "raisethysword" ) )
Output:
3
8

[edit] Maple

 
> with(StringTools):
> Levenshtein("kitten","sitting");
3
 
> Levenshtein("rosettacode","raisethysword");
8
 


[edit] Mathematica

EditDistance["kitten","sitting"]
->3
EditDistance["rosettacode","raisethysword"]
->8

[edit] NetRexx

Translation of: ooRexx
/* NetRexx */
options replace format comments java crossref symbols nobinary
 
n = 0
w = ''
n = n + 1; w[0] = n; w[n] = "kitten sitting"
n = n + 1; w[0] = n; w[n] = "rosettacode raisethysword"
 
loop n = 1 to w[0]
say w[n].word(1) "->" w[n].word(2)":" levenshteinDistance(w[n].word(1), w[n].word(2))
end n
return
 
method levenshteinDistance(s, t) private static
s = s.lower
t = t.lower
 
m = s.length
n = t.length
 
-- for all i and j, d[i,j] will hold the Levenshtein distance between
-- the first i characters of s and the first j characters of t;
-- note that d has (m+1)x(n+1) values
d = 0
 
-- source prefixes can be transformed into empty string by
-- dropping all characters (Note, ooRexx arrays are 1-based)
loop i = 2 to m + 1
d[i, 1] = 1
end i
 
-- target prefixes can be reached from empty source prefix
-- by inserting every characters
loop j = 2 to n + 1
d[1, j] = 1
end j
 
loop j = 2 to n + 1
loop i = 2 to m + 1
if s.substr(i - 1, 1) == t.substr(j - 1, 1) then do
d[i, j] = d[i - 1, j - 1] -- no operation required
end
else do
d[i, j] = -
(d[i - 1, j] + 1).min( - -- a deletion
(d[i, j - 1] + 1)).min( - -- an insertion
(d[i - 1, j - 1] + 1)) -- a substitution
end
end i
end j
 
return d[m + 1, n + 1]
 

Output:

kitten -> sitting: 3 
rosettacode -> raisethysword: 8 

[edit] Nimrod

Translation of: Python
import sequtils
 
proc levenshteinDistance(s1, s2): int =
var (s1, s2) = (s1, s2)
 
if s1.len > s2.len:
swap s1, s2
 
var distances = toSeq(0..s1.len)
 
for i2, c2 in s2:
var newDistances = @[i2+1]
for i1, c1 in s1:
if c1 == c2:
newDistances.add(distances[i1])
else:
newDistances.add(1 + min(distances[i1], distances[i1+1],
newDistances[newDistances.high]))
 
distances = newDistances
result = distances[distances.high]
 
echo levenshteinDistance("kitten","sitting")
echo levenshteinDistance("rosettacode","raisethysword")

[edit] Objeck

Translation of: C#
class Levenshtein {
function : Main(args : String[]) ~ Nil {
if(args->Size() = 2) {
s := args[0]; t := args[1]; d := Distance(s,t);
"{$s} -> {$t} = {$d}"->PrintLine();
};
}
 
function : native : Distance(s : String,t : String) ~ Int {
d := Int->New[s->Size() + 1, t->Size() + 1];
for(i := 0; i <= s->Size(); i += 1;) {
d[i,0] := i;
};
 
for(j := 0; j <= t->Size(); j += 1;) {
d[0,j] := j;
};
 
for(j := 1; j <= t->Size(); j += 1;) {
for(i := 1; i <= s->Size(); i += 1;) {
if(s->Get(i - 1) = t->Get(j - 1)) {
d[i,j] := d[i - 1, j - 1];
}
else {
d[i,j] := (d[i - 1, j] + 1)
->Min(d[i, j - 1] + 1)
->Min(d[i - 1, j - 1] + 1);
};
};
};
 
return d[s->Size(), t->Size()];
}
}

[edit] Objective-C

Translation of the C# code into a NSString category

@interface NSString (levenshteinDistance)
- (NSUInteger)levenshteinDistanceToString:(NSString *)string;
@end
 
@implementation NSString (levenshteinDistance)
- (NSUInteger)levenshteinDistanceToString:(NSString *)string {
NSUInteger sl = [self length];
NSUInteger tl = [string length];
NSUInteger *d = calloc(sizeof(*d), (sl+1) * (tl+1));
 
#define d(i, j) d[((j) * sl) + (i)]
for (NSUInteger i = 0; i <= sl; i++) {
d(i, 0) = i;
}
for (NSUInteger j = 0; j <= tl; j++) {
d(0, j) = j;
}
for (NSUInteger j = 1; j <= tl; j++) {
for (NSUInteger i = 1; i <= sl; i++) {
if ([self characterAtIndex:i-1] == [string characterAtIndex:j-1]) {
d(i, j) = d(i-1, j-1);
} else {
d(i, j) = MIN(d(i-1, j), MIN(d(i, j-1), d(i-1, j-1))) + 1;
}
}
}
 
NSUInteger r = d(sl, tl);
#undef d
 
free(d);
 
return r;
}
@end

[edit] OCaml

Translation of the pseudo-code of the Wikipedia article:

let minimum a b c =
min a (min b c)
 
let levenshtein_distance s t =
let m = String.length s
and n = String.length t in
(* for all i and j, d.(i).(j) will hold the Levenshtein distance between
the first i characters of s and the first j characters of t *)

let d = Array.make_matrix (m+1) (n+1) 0 in
 
for i = 0 to m do
d.(i).(0) <- i (* the distance of any first string to an empty second string *)
done;
for j = 0 to n do
d.(0).(j) <- j (* the distance of any second string to an empty first string *)
done;
 
for j = 1 to n do
for i = 1 to m do
 
if s.[i-1] = t.[j-1] then
d.(i).(j) <- d.(i-1).(j-1) (* no operation required *)
else
d.(i).(j) <- minimum
(d.(i-1).(j) + 1) (* a deletion *)
(d.(i).(j-1) + 1) (* an insertion *)
(d.(i-1).(j-1) + 1) (* a substitution *)
done;
done;
 
d.(m).(n)
;;
 
let test s t =
Printf.printf " %s -> %s = %d\n" s t (levenshtein_distance s t);
;;
 
let () =
test "kitten" "sitting";
test "rosettacode" "raisethysword";
;;

[edit] A recursive functional version

This could be made faster with memoization

let levenshtein s t =
let rec dist i j = match (i,j) with
| (i,0) -> i
| (0,j) -> j
| (i,j) ->
if s.[i-1] = t.[j-1] then dist (i-1) (j-1)
else let d1, d2, d3 = dist (i-1) j, dist i (j-1), dist (i-1) (j-1) in
1 + min d1 (min d2 d3)
in
dist (String.length s) (String.length t)
 
let test s t =
Printf.printf " %s -> %s = %d\n" s t (levenshtein s t)
 
let () =
test "kitten" "sitting";
test "rosettacode" "raisethysword";
Output:
 kitten -> sitting = 3
 rosettacode -> raisethysword = 8

[edit] ooRexx

 
say "kitten -> sitting:" levenshteinDistance("kitten", "sitting")
say "rosettacode -> raisethysword:" levenshteinDistance("rosettacode", "raisethysword")
 
::routine levenshteinDistance
use arg s, t
s = s~lower
t = t~lower
 
m = s~length
n = t~length
 
-- for all i and j, d[i,j] will hold the Levenshtein distance between
-- the first i characters of s and the first j characters of t;
-- note that d has (m+1)x(n+1) values
d = .array~new(m + 1, n + 1)
 
-- clear all elements in d
loop i = 1 to d~dimension(1)
loop j = 1 to d~dimension(2)
d[i, j] = 0
end
end
 
-- source prefixes can be transformed into empty string by
-- dropping all characters (Note, ooRexx arrays are 1-based)
loop i = 2 to m + 1
d[i, 1] = 1
end
 
-- target prefixes can be reached from empty source prefix
-- by inserting every characters
loop j = 2 to n + 1
d[1, j] = 1
end
 
loop j = 2 to n + 1
loop i = 2 to m + 1
if s~subchar(i - 1) == t~subchar(j - 1) then
d[i, j] = d[i - 1, j - 1] -- no operation required
else d[i, j] = min(d[i - 1, j] + 1, - -- a deletion
d[i, j-1] + 1, - -- an insertion
d[i - 1, j - 1] + 1) -- a substitution
end
end
 
return d[m + 1, n + 1 ]
 

Output:

kitten -> sitting: 3
rosettacode -> raisethysword: 8

[edit] Pascal

A fairly direct translation of the wikipedia pseudo code:

Program LevenshteinDistanceDemo(output);
 
uses
Math;
 
function LevenshteinDistance(s, t: string): longint;
var
d: array of array of integer;
i, j, n, m: integer;
begin
n := length(t);
m := length(s);
setlength(d, m+1, n+1);
 
for i := 0 to m do
d[i,0] := i;
for j := 0 to n do
d[0,j] := j;
for j := 1 to n do
for i := 1 to m do
if s[i] = t[j] then
d[i,j] := d[i-1,j-1]
else
d[i,j] := min(d[i-1,j] + 1, min(d[i,j-1] + 1, d[i-1,j-1] + 1));
LevenshteinDistance := d[m,n];
end;
 
var
s1, s2: string;
 
begin
s1 := 'kitten';
s2 := 'sitting';
writeln('The Levenshtein distance between "', s1, '" and "', s2, '" is: ', LevenshteinDistance(s1, s2));
s1 := 'rosettacode';
s2 := 'raisethysword';
writeln('The Levenshtein distance between "', s1, '" and "', s2, '" is: ', LevenshteinDistance(s1, s2));
end.
Output:
The Levenshtein distance between "kitten" and "sitting" is: 3
The Levenshtein distance between "rosettacode" and "raisethysword" is: 8

[edit] Perl

Recursive algorithm, as in the C sample. You are invited to comment out the line where it says so, and see the speed difference. By the way, there's the Memoize standard module, but it requires setting quite a few parameters to work right for this example, so I'm just showing the simple minded caching scheme here.

use List::Util 'min';
 
my %cache;
 
sub leven {
my ($s, $t) = @_;
return length($t) if !$s;
return length($s) if !$t;
 
$cache{$s}{$t} //= # try commenting out this line
do {
my ($s1, $t1) = (substr($s, 1), substr($t, 1));
 
(substr($s, 0, 1) eq substr($t, 0, 1))
? leven($s1, $t1)
: 1 + min(leven($s1, $t1),
leven($s, $t1),
leven($s1, $t ));
};
}
 
print leven('rosettacode', 'raisethysword'), "\n";

[edit] Perl 6

Implementation of the wikipedia algorithm. Since column 0 and row 0 are used for base distances, the original algorithm would require us to compare "@s[$i-1] eq @t[$j-1]", and reference the $m and $n separately. Prepending an unused value (undef) onto @s and @t makes their indices align with the $i,$j numbering of @d, and lets us use .end instead of $m,$n.

sub levenshtein_distance ( Str $s, Str $t --> Int ) {
my @s = *, $s.comb;
my @t = *, $t.comb;
 
my @d;
@d[$_][ 0] = $_ for ^@s.end;
@d[ 0][$_] = $_ for ^@t.end;
 
for 1..@s.end X 1..@t.end -> $i, $j {
@d[$i][$j] = @s[$i] eq @t[$j]
?? @d[$i-1][$j-1] # No operation required when eq
!! ( @d[$i-1][$j ], # Deletion
@d[$i ][$j-1], # Insertion
@d[$i-1][$j-1], # Substitution
).min + 1;
}
 
return @d[*-1][*-1];
}
 
my @a = [<kitten sitting>], [<saturday sunday>], [<rosettacode raisethysword>];
 
for @a -> [$s, $t] {
say "levenshtein_distance('$s', '$t') == ", levenshtein_distance($s, $t);
}
Output:
levenshtein_distance('kitten', 'sitting') == 3
levenshtein_distance('saturday', 'sunday') == 3
levenshtein_distance('rosettacode', 'raisethysword') == 8

[edit] PHP

 
echo levenshtein('kitten','sitting');
echo levenshtein('rosettacode', 'raisethysword');
 
Output:
3
8

[edit] PicoLisp

Translation of the pseudo-code in the Wikipedia article:

(de levenshtein (A B)
(let D
(cons
(range 0 (length A))
(mapcar
'((I) (cons I (copy A)))
(range 1 (length B)) ) )
(for (J . Y) B
(for (I . X) A
(set
(nth D (inc J) (inc I))
(if (= X Y)
(get D J I)
(inc
(min
(get D J (inc I))
(get D (inc J) I)
(get D J I) ) ) ) ) ) ) ) )

or, using 'map' to avoid list indexing:

(de levenshtein (A B)
(let D
(cons
(range 0 (length A))
(mapcar
'((I) (cons I (copy A)))
(range 1 (length B)) ) )
(map
'((B Y)
(map
'((A X P)
(set (cdr P)
(if (= (car A) (car B))
(car X)
(inc (min (cadr X) (car P) (car X))) ) ) )
A
(car Y)
(cadr Y) ) )
B
D ) ) )
Output (both cases):
: (levenshtein (chop "kitten") (chop "sitting"))
-> 3

[edit] Prolog

Works with SWI-Prolog.
Based on Wikipedia's pseudocode.

levenshtein(S, T, R) :-
length(S, M),
M1 is M+1,
length(T, N),
N1 is N+1,
length(Lev, N1),
maplist(init(M1), Lev),
numlist(0, N, LN),
maplist(init_n, LN, Lev),
nth0(0, Lev, Lev0),
numlist(0, M, Lev0),
 
% compute_levenshtein distance
numlist(1, N, LN1),
maplist(work_on_T(Lev, S), LN1, T),
last(Lev, LevLast),
last(LevLast, R).
 
 
work_on_T(Lev, S, J, TJ) :-
length(S, M),
numlist(1, M, LM),
maplist(work_on_S(Lev, J, TJ), LM, S).
 
work_on_S(Lev, J, C, I, C) :-
% same char
!,
I1 is I-1, J1 is J-1,
nth0(J1, Lev, LevJ1),
nth0(I1, LevJ1, V),
nth0(J, Lev, LevJ),
nth0(I, LevJ, V).
 
 
work_on_S(Lev, J, _C1, I, _C2) :-
I1 is I-1, J1 is J - 1,
% compute the value for deletion
nth0(J, Lev, LevJ),
nth0(I1, LevJ, VD0),
VD is VD0 + 1,
 
% compute the value for insertion
nth0(J1, Lev, LevJ1),
nth0(I, LevJ1, VI0),
VI is VI0 + 1,
 
% compute the value for substitution
nth0(I1, LevJ1, VS0),
VS is VS0 + 1,
 
% set the minimum value to cell(I,J)
sort([VD, VI, VS], [V|_]),
 
nth0(I, LevJ, V).
 
 
init(Len, C) :-
length(C, Len).
 
init_n(N, L) :-
nth0(0, L, N).
Output examples:
 ?- levenshtein("kitten", "sitting", R).
R = 3.

 ?- levenshtein("saturday", "sunday", R).
R = 3.

 ?- levenshtein("rosettacode", "raisethysword", R).
R = 8.

[edit] PureBasic

Based on Wikipedia's pseudocode.

Procedure LevenshteinDistance(A_string$, B_String$)
Protected m, n, i, j, min, k, l
m = Len(A_string$)
n = Len(B_String$)
Dim D(m, n)
 
For i=0 To m: D(i,0)=i: Next
For j=0 To n: D(0,j)=j: Next
 
For j=1 To n
For i=1 To m
If Mid(A_string$,i,1) = Mid(B_String$,j,1)
D(i,j) = D(i-1, j-1); no operation required
Else
min = D(i-1, j)+1 ; a deletion
k = D(i, j-1)+1 ; an insertion
l = D(i-1, j-1)+1 ; a substitution
If k < min: min=k: EndIf
If l < min: min=l: EndIf
D(i,j) = min
EndIf
Next
Next
ProcedureReturn D(m,n)
EndProcedure
 
;- Testing
n = LevenshteinDistance("kitten", "sitting")
MessageRequester("Info","Levenshtein Distance= "+Str(n))

[edit] Python

[edit] Iterative

Implementation of the wikipedia algorithm, optimized for memory

def levenshteinDistance(s1,s2):
if len(s1) > len(s2):
s1,s2 = s2,s1
distances = range(len(s1) + 1)
for index2,char2 in enumerate(s2):
newDistances = [index2+1]
for index1,char1 in enumerate(s1):
if char1 == char2:
newDistances.append(distances[index1])
else:
newDistances.append(1 + min((distances[index1],
distances[index1+1],
newDistances[-1])))
distances = newDistances
return distances[-1]
 
print(levenshteinDistance("kitten","sitting"))
print(levenshteinDistance("rosettacode","raisethysword"))
 
Output:
3
8

[edit] Memoized recursive version

(Uses this cache from the standard library).

>>> from functools import lru_cache
>>> @lru_cache(maxsize=4095)
def ld(s, t):
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld(s[1:], t[1:])
l1 = ld(s, t[1:])
l2 = ld(s[1:], t)
l3 = ld(s[1:], t[1:])
return 1 + min(l1, l2, l3)
 
>>> print( ld("kitten","sitting"),ld("rosettacode","raisethysword") )
3 8

[edit] Racket

A memoized recursive implementation.

#lang racket
 
(define (levenshtein a b)
(define (ls0 a-index b-index)
(cond [(or (= a-index -1) (= b-index -1)) (abs (- a-index b-index))]
[else
(define a-char (string-ref a a-index))
(define b-char (string-ref b b-index))
(if (equal? a-char b-char)
(ls (sub1 a-index) (sub1 b-index))
(min (add1 (ls (sub1 a-index) b-index))
(add1 (ls a-index (sub1 b-index)))
(add1 (ls (sub1 a-index) (sub1 b-index)))))]))
(define memo (make-hash))
(define (ls a-i b-i)
(hash-ref! memo (cons a-i b-i) (λ() (ls0 a-i b-i))))
(ls (sub1 (string-length a)) (sub1 (string-length b))))
 
(levenshtein "kitten" "sitting")
(levenshtein "rosettacode" "raisethysword")
Output:
3
8

[edit] REXX

/*REXX program calculates the  Levenshtein distance between two strings.*/
call levenshtein 'kitten' , "sitting"
call levenshtein 'rosettacode' , "raisethysword"
call levenshtein 'Sunday' , "Saturday"
call levenshtein 'Vladimir_Levenshtein[1965]' , "Vladimir_Levenshtein[1965]"
call levenshtein 'this_algorithm_is_similar_to', "Damerau-Levenshtein_distance"
exit /*stick a fork in it, we're done.*/
/*──────────────────────────────────LEVENSHTEIN subroutine──────────────*/
levenshtein: procedure; parse arg s,t; sL=length(s); tL=length(t)
say ' 1st string = ' s
say ' 2nd string = ' t
@.=0
do j=1 for tL; @.0.j=j; end /*j*/
do k=1 for sL; @.k.0=k; end /*k*/
 
do j=1 for tL; j_=j-1; q=substr(t,j,1)
do k=1 for sL; k_=k-1
if q==substr(s,k,1) then @.k.j=@.k_.j_
else @.k.j=1 + min(@.k_.j, @.k.j_, @.k_.j_)
end /*k*/
end /*j*/
 
say 'Levenshtein distance = ' @.sL.tL; say
return

output using various internal strings:

          1st string  =  kitten
          2nd string  =  sitting
Levenshtein distance  =  3

          1st string  =  rosettacode
          2nd string  =  raisethysword
Levenshtein distance  =  8

          1st string  =  Sunday
          2nd string  =  Saturday
Levenshtein distance  =  3

          1st string  =  Vladimir_Levenshtein[1965]
          2nd string  =  Vladimir_Levenshtein[1965]
Levenshtein distance  =  0

          1st string  =  this_algorithm_is_similar_to
          2nd string  =  Damerau-Levenshtein_distance
Levenshtein distance  =  24

[edit] Ruby

Implementation of the wikipedia algorithm. Invariant is that for current loop indices i and j, costs[k] for k < j contains lev(i, k) and for k >= j contains lev(i-1, k). The inner loop body restores the invariant for the new value of j.

module Levenshtein
 
def self.distance(a, b)
a, b = a.downcase, b.downcase
costs = Array(0..b.length) # i == 0
(1..a.length).each do |i|
costs[0], nw = i, i - 1 # j == 0; nw is lev(i-1, j)
(1..b.length).each do |j|
costs[j], nw = [costs[j] + 1, costs[j-1] + 1, a[i-1] == b[j-1] ? nw : nw + 1].min, costs[j]
end
end
costs[b.length]
end
 
def self.test
%w{kitten sitting saturday sunday rosettacode raisethysword}.each_slice(2) do |a, b|
puts "distance(#{a}, #{b}) = #{distance(a, b)}"
end
end
 
end
 
Levenshtein.test
Output:
distance(kitten, sitting) = 3
distance(saturday, sunday) = 3
distance(rosettacode, raisethysword) = 8

A variant can be found used in Rubygems [1]

def levenshtein_distance(str1, str2)
n = str1.length
m = str2.length
max = n/2
 
return m if 0 == n
return n if 0 == m
return n if (n - m).abs > max
 
d = (0..m).to_a
x = nil
 
str1.each_char.with_index do |char1,i|
e = i+1
 
str2.each_char.with_index do |char2,j|
cost = (char1 == char2) ? 0 : 1
x = [ d[j+1] + 1, # insertion
e + 1, # deletion
d[j] + cost # substitution
].min
d[j] = e
e = x
end
 
d[m] = x
end
 
x
end
 
%w{kitten sitting saturday sunday rosettacode raisethysword}.each_slice(2) do |a, b|
puts "distance(#{a}, #{b}) = #{levenshtein_distance(a, b)}"
end

same output

[edit] Run BASIC

print levenshteinDistance("kitten", "sitting")
print levenshteinDistance("rosettacode", "raisethysword")
end
function levenshteinDistance(s1$, s2$)
n = len(s1$)
m = len(s2$)
if n = 0 then
levenshteinDistance = m
goto [ex]
end if
if m = 0 then
levenshteinDistance = n
goto [ex]
end if
dim d(n, m)
for i = 0 to n
d(i, 0) = i
next i
for i = 0 to m
d(0, i) = i
next i
for i = 1 to n
si$ = mid$(s1$, i, 1)
for j = 1 to m
tj$ = mid$(s2$, j, 1)
if si$ = tj$ then cost = 0 else cost = 1
d(i, j) = min((d(i - 1, j) + 1),min((d(i, j - 1) + 1),(d(i - 1, j - 1) + cost)))
next j
next i
levenshteinDistance = d(n, m)
[ex]
end function
Output:
3
8

[edit] Rust

Implementation of the wikipedia algorithm

// rust 0.8
 
fn main() {
let x = levenshtein_distance("kitten", "sitting");
println!("{}", x);
}
 
fn levenshtein_distance(word1: &str, word2: &str) -> uint {
let word1_length = word1.len() + 1;
let word2_length = word2.len() + 1;
 
let mut matrix = ~[~[0]];
 
for i in range(1, word1_length) { matrix[0].push(i); }
for j in range(1, word2_length) { matrix.push(~[j]); }
 
for j in range(1, word2_length) {
for i in range(1, word1_length) {
let x: uint = if word1[i - 1] == word2[j - 1] {
matrix[j-1][i-1]
}
else {
let min_distance = [matrix[j][i-1], matrix[j-1][i], matrix[j-1][i-1]];
*min_distance.iter().min().unwrap() + 1
};
 
matrix[j].push(x);
}
}
 
matrix[word2_length-1][word1_length-1]
}
Output:
3

[edit] Scala

Based on Wikipedia algorithm.

import scala.math._
 
object Levenshtein {
def minimum(i1: Int, i2: Int, i3: Int)=min(min(i1, i2), i3)
def distance(s1:String, s2:String)={
val dist=Array.tabulate(s2.length+1, s1.length+1){(j,i)=>if(j==0) i else if (i==0) j else 0}
 
for(j<-1 to s2.length; i<-1 to s1.length)
dist(j)(i)=if(s2(j-1)==s1(i-1)) dist(j-1)(i-1)
else minimum(dist(j-1)(i)+1, dist(j)(i-1)+1, dist(j-1)(i-1)+1)
 
dist(s2.length)(s1.length)
}
 
def main(args: Array[String]): Unit = {
printDistance("kitten", "sitting")
printDistance("rosettacode", "raisethysword")
}
 
def printDistance(s1:String, s2:String)=println("%s -> %s : %d".format(s1, s2, distance(s1, s2)))
}
Output:
kitten -> sitting : 3
rosettacode -> raisethysword : 8

[edit] Scheme

Recursive version from wikipedia article.

 
(define (levenshtein s t)
(define (%levenshtein s sl t tl)
(cond ((zero? sl) tl)
((zero? tl) sl)
(else
(min (+ (%levenshtein (cdr s) (- sl 1) t tl) 1)
(+ (%levenshtein s sl (cdr t) (- tl 1)) 1)
(+ (%levenshtein (cdr s) (- sl 1) (cdr t) (- tl 1))
(if (char=? (car s) (car t)) 0 1))))))
(%levenshtein (string->list s)
(string-length s)
(string->list t)
(string-length t)))
 
Output:
> (levenshtein "kitten" "sitting")
3
> (levenshtein "rosettacode" "raisethysword")
8

[edit] Seed7

$ include "seed7_05.s7i";
 
const func integer: levenshteinDistance (in string: s, in string: t) is func
result
var integer: distance is 0;
local
var array array integer: d is 0 times 0 times 0;
var integer: i is 0;
var integer: j is 0;
begin
d := [0 .. length(s)] times [0 .. length(t)] times 0;
for key i range s do
d[i][0] := i;
end for;
for key j range t do
d[0][j] := j;
for key i range s do
if s[i] = t[j] then
d[i][j] := d[pred(i)][pred(j)];
else
d[i][j] := min(min(succ(d[pred(i)][j]), succ(d[i][pred(j)])), succ(d[pred(i)][pred(j)]));
end if;
end for;
end for;
distance := d[length(s)][length(t)];
end func;
 
const proc: main is func
begin
writeln("kitten -> sitting: " <& levenshteinDistance("kitten", "sitting"));
writeln("rosettacode -> raisethysword: " <& levenshteinDistance("rosettacode", "raisethysword"));
end func;
Output:
kitten -> sitting: 3
rosettacode -> raisethysword: 8

[edit] Sidef

Works on both arrays and strings:

func lev(s, t) {
var d = [ 0 .. t.len, 1 .. s.len -> map {[_]}...];
{ |i|
{ |j|
d[i][j] = (
s[i-1] == t[j-1]
 ? d[i-1][j-1]
 : [d[i-1][j], d[i][j-1], d[i-1][j-1]].min+1;
);
} * t.len;
} * s.len;
d[-1][-1] \\ [s.len, t.len].min;
};
 
say lev('kitten', 'sitting'); # prints: 3
say lev('rosettacode', 'raisethysword'); # prints: 8

[edit] Smalltalk

Works with: Smalltalk/X

ST/X provides a customizable levenshtein method in the String class (weights for individual operations can be passed in):

'kitten' levenshteinTo: 'sitting' s:1 k:1 c:1 i:1 d:1 -> 3
'rosettacode' levenshteinTo: 'raisethysword' s:1 k:1 c:1 i:1 d:1 -> 8


[edit] Tcl

proc levenshteinDistance {s t} {
# Edge cases
if {![set n [string length $t]]} {
return [string length $s]
} elseif {![set m [string length $s]]} {
return $n
}
# Fastest way to initialize
for {set i 0} {$i <= $m} {incr i} {
lappend d 0
lappend p $i
}
# Loop, computing the distance table (well, a moving section)
for {set j 0} {$j < $n} {} {
set tj [string index $t $j]
lset d 0 [incr j]
for {set i 0} {$i < $m} {} {
set a [expr {[lindex $d $i]+1}]
set b [expr {[lindex $p $i]+([string index $s $i] ne $tj)}]
set c [expr {[lindex $p [incr i]]+1}]
# Faster than min($a,$b,$c)
lset d $i [expr {$a<$b ? $c<$a ? $c : $a : $c<$b ? $c : $b}]
}
# Swap
set nd $p; set p $d; set d $nd
}
# The score is at the end of the last-computed row
return [lindex $p end]
}
Usage:
puts [levenshteinDistance "kitten" "sitting"];   # Prints 3

[edit] TSE SAL

// library: math: get: damerau: levenshtein <description></description> <version>1.0.0.0.23</version> <version control></version control> (filenamemacro=getmadle.s) [kn, ri, th, 08-09-2011 23:04:55]
INTEGER PROC FNMathGetDamerauLevenshteinDistanceI( STRING s1, STRING s2 )
INTEGER L1 = Length( s1 )
INTEGER L2 = Length( s2 )
INTEGER substitutionCostI = 0
STRING h1[255] = ""
STRING h2[255] = ""
IF ( ( L1 == 0 ) OR ( L2 == 0 ) )
// Trivial case: one string is 0-length
RETURN( Max( L1, L2 ) )
ELSE
// The cost of substituting the last character
IF ( ( s1[ L1 ] ) == ( s2[ L2 ] ) )
substitutionCostI = 0
ELSE
substitutionCostI = 1
ENDIF
// h1 and h2 are s1 and s2 with the last character chopped off
h1 = SubStr( s1, 1, L1 - 1 )
h2 = SubStr( s2, 1, L2 - 1 )
IF ( ( L1 > 1 ) AND ( L2 > 1 ) AND ( s1[ L1 - 0 ] == s2[ L2 - 1 ] ) AND ( s1[ L1 - 1 ] == s2[ L2 - 0 ] ) )
RETURN( Min( Min( FNMathGetDamerauLevenshteinDistanceI( h1, s2 ) + 1, FNMathGetDamerauLevenshteinDistanceI( s1, h2 ) + 1 ), Min( FNMathGetDamerauLevenshteinDistanceI( h1 , h2 ) + substitutionCostI, FNMathGetDamerauLevenshteinDistanceI( SubStr( s1, 1, L1 - 2 ), SubStr( s2, 1, L2 - 2 ) ) + 1 ) ) )
ENDIF
RETURN( Min( Min( FNMathGetDamerauLevenshteinDistanceI( h1, s2 ) + 1, FNMathGetDamerauLevenshteinDistanceI( s1, h2 ) + 1 ), FNMathGetDamerauLevenshteinDistanceI( h1 , h2 ) + substitutionCostI ) )
ENDIF
END
 
PROC Main()
STRING s1[255] = "arcain"
STRING s2[255] = "arcane"
Warn( "Minimum amount of steps to convert ", s1, " to ", s2, " = ", FNMathGetDamerauLevenshteinDistanceI( s1, s2 ) ) // gives e.g. 2
s1 = "algorithm"
s2 = "altruistic"
Warn( "Minimum amount of steps to convert ", s1, " to ", s2, " = ", FNMathGetDamerauLevenshteinDistanceI( s1, s2 ) ) // gives e.g. 6
END

[edit] Visual Basic .NET

 Function LevenshteinDistance(ByVal String1 As String, ByVal String2 As String) As Integer
Dim Matrix(String1.Length, String2.Length) As Integer
Dim Key As Integer
For Key = 0 To String1.Length
Matrix(Key, 0) = Key
Next
For Key = 0 To String2.Length
Matrix(0, Key) = Key
Next
For Key1 As Integer = 1 To String2.Length
For Key2 As Integer = 1 To String1.Length
If String1(Key2 - 1) = String2(Key1 - 1) Then
Matrix(Key2, Key1) = Matrix(Key2 - 1, Key1 - 1)
Else
Matrix(Key2, Key1) = Math.Min(Matrix(Key2 - 1, Key1) + 1, Math.Min(Matrix(Key2, Key1 - 1) + 1, Matrix(Key2 - 1, Key1 - 1) + 1))
End If
Next
Next
Return Matrix(String1.Length - 1, String2.Length - 1)
End Function

[edit] zkl

Translation of: D
fcn levenshtein(s1,s2){
costs := (0).pump(s2.len() + 1,List).copy(); // List of garbage
ns2:=s2.len()+1;
foreach i in (s1.len()+1){
lastValue := i;
foreach j in (ns2){
if (i==0) costs[j] = j;
else if (j>0){
newValue := costs[j-1];
if (s1[i-1] != s2[j-1])
newValue = newValue.min(lastValue, costs[j]) + 1;
costs[j-1] = lastValue;
lastValue = newValue;
}
}
if (i>0) costs[-1] = lastValue;
}
costs[-1]
}
foreach a,b in (T(T("kitten","sitting"), T("rosettacode","raisethysword"),
T("yo",""), T("","yo"), T("abc","abc")) ){
println(a," --> ",b,": ",levenshtein(a,b));
}
Output:
kitten --> sitting: 3
rosettacode --> raisethysword: 8
yo --> : 2
 --> yo: 2
abc --> abc: 0
Personal tools
Namespaces

Variants
Actions
Community
Explore
Misc
Toolbox