Levenshtein distance

From Rosetta Code
Revision as of 18:41, 6 May 2012 by rosettacode>Gerard Schildberger (→‎{{header|REXX}}: improved speed of the Levenshtein subroutine. -- ~~~~)
Task
Levenshtein distance
You are encouraged to solve this task according to the task description, using any language you may know.
This page uses content from Wikipedia. The original article was at Levenshtein distance. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

  1. kitten sitten (substitution of 'k' with 's')
  2. sitten sittin (substitution of 'e' with 'i')
  3. sittin sitting (insert 'g' at the end).

The Levenshtein distance between "rosettacode", "raisethysword" is 8; The distance between two strings is same as that when both strings is reversed.

Task : Implements a Levenshtein distance function, or uses a library function, to show the Levenshtein distance between "kitten" and "sitting".

Other edit distance at Rosettacode.org :

Ada

<lang Ada>with Ada.Text_IO;

procedure Main is

  function Levenshtein_Distance (S, T : String) return Natural is
     D : array (0 .. S'Length, 0 .. T'Length) of Natural;
  begin
     for I in D'Range (1) loop
        D (I, 0) := I;
     end loop;
     for I in D'Range (2) loop
        D (0, I) := I;
     end loop;
     for J in T'Range loop
        for I in S'Range loop
           if S (I) = T (J) then
              D (I, J) := D (I - 1, J - 1);
           else
              D (I, J) :=
                 Natural'Min
                   (Natural'Min (D (I - 1, J) + 1, D (I, J - 1) + 1),
                    D (I - 1, J - 1) + 1);
           end if;
        end loop;
     end loop;
     return D (S'Length, T'Length);
  end Levenshtein_Distance;

begin

  Ada.Text_IO.Put_Line
    ("kitten -> sitting:" &
     Integer'Image (Levenshtein_Distance ("kitten", "sitting")));
  Ada.Text_IO.Put_Line
    ("rosettacode -> raisethysword:" &
     Integer'Image (Levenshtein_Distance ("rosettacode", "raisethysword")));

end Main;</lang>

output:

kitten -> sitting: 3
rosettacode -> raisethysword: 8

AutoHotkey

Translation of: Go

<lang AutoHotkey>levenshtein(s, t){ If s = return StrLen(t) If t = return strLen(s) If SubStr(s, 1, 1) = SubStr(t, 1, 1) return levenshtein(SubStr(s, 2), SubStr(t, 2)) a := Levenshtein(SubStr(s, 2), SubStr(t, 2)) b := Levenshtein(s, SubStr(t, 2)) c := Levenshtein(SubStr(s, 2), t ) If (a > b) a := b if (a > c) a := c return a + 1 } s1 := "kitten" s2 := "sitting" MsgBox % "distance between " s1 " and " s2 ": " levenshtein(s1, s2)</lang>It correctly outputs '3'

Bracmat

Recursive method (Translation of C), but with memoization.

<lang bracmat>(levenshtein=

 lev cache

. ( lev

   =   s s0 s1 t t0 t1 L a b c val key
     .     (cache..find)$(str$!arg:?key):(?.?val)
         & !val
       |   !arg:(?s,?t)
         & ( !s:&@(!t:? [?L)
           | !t:&@(!s:? [?L)
           )
         & (cache..insert)$(!key.!L)
         & !L
       |   !arg:(@(?:%?s0 ?s1),@(?:%?t0 ?t1))
         & !s0:!t0
         & lev$(!s1,!t1)
       |   lev$(!s1,!t1):?a
         & lev$(!s,!t1):?b
         & lev$(!s1,!t):?c
         & (!b:<!a:?a|)
         & (!c:<!a:?a|)
         & (cache..insert)$(!key.1+!a)
         & 1+!a
   )
 & new$hash:?cache
 & lev$!arg);</lang>
 levenshtein$(kitten,sitting)
 3
 levenshtein$(rosettacode,raisethysword)
 8

C

Recursive method. Deliberately left in an inefficient state to show the recursive nature of the algorithm; notice how it would have become the wikipedia algorithm if we memoized the function against parameters ls and lt. <lang C>#include <stdio.h>

  1. include <string.h>

/* s, t: two strings; ls, lt: their respective length */ int levenshtein(char *s, int ls, char *t, int lt) {

       int a, b, c;
       /* if either string is empty, difference is inserting all chars 
        * from the other
        */
       if (!ls) return lt;
       if (!lt) return ls;
       /* if first letters are the same, the difference is whatever is
        * required to edit the rest of the strings
        */
       if (s[ls] == t[lt])
               return levenshtein(s, ls - 1, t, lt - 1);
       /* else try:
        *      changing first letter of s to that of t; or
        *      remove first letter of s; or
        *      remove first letter of t,
        * any of which is 1 edit plus editing the rest of the strings
        */
       a = levenshtein(s, ls - 1, t, lt - 1);
       b = levenshtein(s, ls,     t, lt - 1);
       c = levenshtein(s, ls - 1, t, lt    );
       if (a > b) a = b;
       if (a > c) a = c;
       return a + 1;

}

int main() {

       char *s1 = "rosettacode";
       char *s2 = "raisethysword";
       printf("distance betweeh `%s' and `%s': %d\n", s1, s2,
               levenshtein(s1, strlen(s1), s2, strlen(s2)));
       return 0;

}</lang> Take the above and add caching, we get (in C99):<lang c>#include <stdio.h>

  1. include <string.h>

int levenshtein(char *s, char *t) { int ls = strlen(s), lt = strlen(t); int d[ls + 1][lt + 1];

for (int i = 0; i <= ls; i++) for (int j = 0; j <= lt; j++) d[i][j] = -1;

int dist(int i, int j) { if (d[i][j] >= 0) return d[i][j];

int x; if (i == ls) x = lt - j; else if (j == lt) x = ls - i; else if (s[i] == t[j]) x = dist(i + 1, j + 1); else { x = dist(i + 1, j + 1);

int y; if ((y = dist(i, j + 1)) < x) x = y; if ((y = dist(i + 1, j)) < x) x = y; x++; } return d[i][j] = x; } return dist(0, 0); }

int main(void) { char *s1 = "rosettacode"; char *s2 = "raisethysword"; printf("distance betweeh `%s' and `%s': %d\n", s1, s2, levenshtein(s1, s2));

       return 0;

}</lang>

C#

This is a straightforward translation of the Wikipedia pseudocode.

<lang csharp> using System;

namespace LevenshteinDistance {

   class Program
   {
       static int LevenshteinDistance(string s, string t)
       {
           int[,] d = new int[s.Length + 1, t.Length + 1];
           for (int i = 0; i <= s.Length; i++)
               d[i, 0] = i;
           for (int j = 0; j <= t.Length; j++)
               d[0, j] = j;
           for (int j = 1; j <= t.Length; j++)
               for (int i = 1; i <= s.Length; i++)
                   if (s[i - 1] == t[j - 1])
                       d[i, j] = d[i - 1, j - 1];  //no operation
                   else
                       d[i, j] = Math.Min(Math.Min(
                           d[i - 1, j] + 1,    //a deletion
                           d[i, j - 1] + 1),   //an insertion
                           d[i - 1, j - 1] + 1 //a substitution
                           );
           return d[s.Length, t.Length];
       }
       static void Main(string[] args)
       {
           if (args.Length == 2)
               Console.WriteLine("{0} -> {1} = {2}",
                   args[0], args[1], LevenshteinDistance(args[0], args[1]));
           else
               Console.WriteLine("Usage:-\n\nLevenshteinDistance <string1> <string2>");
       }
   }

} </lang> Example output:-

> LevenshteinDistance kitten sitting
kitten -> sitting = 3

> LevenshteinDistance rosettacode raisethysword
rosettacode -> raisethysword = 8

CoffeeScript

<lang coffeescript>

levenshtein = (str1, str2) ->

 # more of less ported simple algorithm from JS
 m = str1.length
 n = str2.length
 d = []
 return n  unless m
 return m  unless n
 d[i] = [i] for i in [0..m]
 d[0][j] = j for j in [1..n]  
   
 for i in [1..m]
   for j in [1..n]
     if str1[i-1] is str2[j-1]
       d[i][j] = d[i-1][j-1]
     else
       d[i][j] = Math.min(
         d[i-1][j]
         d[i][j-1]
         d[i-1][j-1]
       ) + 1
 d[m][n]

console.log levenshtein("kitten", "sitting") console.log levenshtein("rosettacode", "raisethysword") console.log levenshtein("stop", "tops") console.log levenshtein("yo", "") console.log levenshtein("", "yo") </lang>

Common Lisp

<lang lisp>(defun levenshtein (a b)

 (let* ((la  (length a))

(lb (length b)) (rec (make-array (list (1+ la) (1+ lb)) :initial-element nil)))

   (defun leven (x y)
     (cond

((zerop x) y) ((zerop y) x) ((aref rec x y) (aref rec x y)) (t (setf (aref rec x y) (+ (if (char= (char a (- la x)) (char b (- lb y))) 0 1) (min (leven (1- x) y) (leven x (1- y)) (leven (1- x) (1- y))))))))

   (leven la lb)))

(print (levenshtein "rosettacode" "raisethysword"))</lang>output<lang>8</lang>

D

Standard Version

The standard library std.algorithm module includes a Levenshtein distance function: <lang d>import std.stdio, std.algorithm;

void main() {

   writeln(levenshteinDistance("kitten", "sitting"));

}</lang>

Output:

3

Iterative Version

Translation of: Java

<lang d>import std.stdio, std.algorithm;

int distance(in string s1, in string s2) pure nothrow {

 auto costs = new int[s2.length + 1];
 
 foreach (i; 0 .. s1.length + 1) {
   int lastValue = i;
   foreach (j; 0 .. s2.length + 1) {
     if (i == 0)
       costs[j] = j;
     else {
       if (j > 0) {
         int newValue = costs[j - 1];
         if (s1[i - 1] != s2[j - 1])
           newValue = min(newValue, lastValue, costs[j]) + 1;
         costs[j - 1] = lastValue;
         lastValue = newValue;
       }
     }
   }
   
   if (i > 0)
     costs[$ - 1] = lastValue;
 }
 
 return costs[$ - 1];

}

void main() {

 foreach(p; [["kitten", "sitting"], ["rosettacode", "raisethysword"]])
   writefln("distance(%s, %s): %d", p[0], p[1], distance(p[0], p[1]));

}</lang>

Recursive Version

A recursive version, as suggested. I think the result distance is the maximum depth of recursion. A distance of 8 may cost an aeon time to finish! <lang d>import std.stdio, std.algorithm, std.range;

int lDistR(T)(in T[] a, in T[] b) /*pure nothrow*/ {

   if (a.length == 0)
       return b.length;
   if (b.length == 0)
       return a.length;
   if (a.length == b.length) {
       auto r = iota(a.length);
       auto u = r.until!(i => a[i] != b[i])();
       if (equal(r, u)) // u is a short-circuit test until any
                        // mismatch
           return 0;    // u is equivalent to r, means no
                        // mismatch found
   }
   const(T)[][] candidate;
   immutable alen = a.length;
   immutable blen = b.length;
   // mutate _a_ by 1 edit to create members of candidate
   // delete an _a_ element
   if (alen > blen)
       foreach (i; 0 .. alen)
           candidate ~= a[0 .. i] ~ a[i+1 .. $];
   // insert matching _b_ element to _a_
   if (alen < blen) {
       foreach (i; 0 .. alen+1) // from left
           candidate ~=  a[0 .. i] ~ b[i] ~ a[i .. $];
       foreach (i; 0 .. alen+1) // from right
           candidate ~=  a[0 .. $-i] ~ b[$ - i - 1] ~ a[$-i .. $];
   }
   // subsistute matching _a_ element with _b_'s
   if (alen == blen)
       foreach (i; 0 .. alen)
           if (a[i] != b[i])
               candidate ~= a[0..i] ~ b[i] ~ a[i+1 .. $];
   // exclusive cases, so only 1 edit is make to create each
   // new candidate
   // minimum distance on this run
   return candidate.map!(e => lDistR(e, b) + 1)().reduce!min();

}

void main() {

   writeln(lDistR("kitten", "sitting"));

}</lang>

DWScript

Based on Wikipedia version

<lang delphi>function LevenshteinDistance(s, t : String) : Integer; var

  i, j : Integer;

begin

  var d:=new Integer[Length(s)+1, Length(t)+1];
  for i:=0 to Length(s) do
     d[i, 0] := i;
  for j:=0 to Length(t) do
     d[0, j] := j;
  
  for j:=1 to Length(t) do
     for i:=1 to Length(s) do
        if s[i]=t[j] then
           d[i, j] := d[i-1, j-1] // no operation
        else d[i,j]:=MinInt(MinInt(
              d[i-1, j] +1 ,    // a deletion
              d[i, j-1] +1 ),   // an insertion
              d[i- 1,j-1] +1    // a substitution
              );
  Result:=d[Length(s), Length(t)];

end;

PrintLn(LevenshteinDistance('kitten', 'sitting')); </lang>

Euphoria

Code by Jeremy Cowgar from the Euphoria File Archive. <lang euphoria>function min(sequence s)

   atom m
   m = s[1]
   for i = 2 to length(s) do
       if s[i] < m then
           m = s[i]
       end if
   end for
   return m

end function

function levenshtein(sequence s1, sequence s2)

   integer n, m
   sequence d
   n = length(s1) + 1
   m = length(s2) + 1
   if n = 1  then
       return m-1
   elsif m = 1 then
       return n-1
   end if
   d = repeat(repeat(0, m), n)
   for i = 1 to n do
       d[i][1] = i-1
   end for
   for j = 1 to m do
       d[1][j] = j-1
   end for
   for i = 2 to n do
       for j = 2 to m do
           d[i][j] = min({
               d[i-1][j] + 1,
               d[i][j-1] + 1,
               d[i-1][j-1] + (s1[i-1] != s2[j-1])
           })
       end for
   end for
   return d[n][m]

end function

? levenshtein("kitten", "sitting") ? levenshtein("rosettacode", "raisethysword")</lang>

Output:

3
8

Frink

Frink has a built-in function to calculate the Levenshtein edit distance between two strings: <lang frink> println[editDistance["kitten","sitting"]] </lang>

Go

WP algorithm: <lang go>package main

import "fmt"

func main() {

   fmt.Println(ld("kitten", "sitting"))

}

func ld(s, t string) int {

   d := make([][]int, len(s)+1)
   for i := range d {
       d[i] = make([]int, len(t)+1)
   }
   for i := range d {
       d[i][0] = i
   }
   for j := range d[0] {
       d[0][j] = j
   }
   for j := 1; j <= len(t); j++ {
       for i := 1; i <= len(s); i++ {
           if s[i-1] == t[j-1] {
               d[i][j] = d[i-1][j-1]
           } else {
               min := d[i-1][j]
               if d[i][j-1] < min {
                   min = d[i][j-1]
               }
               if d[i-1][j-1] < min {
                   min = d[i-1][j-1]
               }
               d[i][j] = min + 1
           }
       }
   }
   return d[len(s)][len(t)]

}</lang> Output:

3
Translation of: C

<lang go>package main

import "fmt"

func levenshtein(s, t string) int {

   if s == "" { return len(t) }
   if t == "" { return len(s) }
   if s[0] == t[0] {
       return levenshtein(s[1:], t[1:])
   }
   a := levenshtein(s[1:], t[1:])
   b := levenshtein(s,     t[1:])
   c := levenshtein(s[1:], t)
   if a > b { a = b }
   if a > c { a = c }
   return a + 1

}

func main() {

   s1 := "rosettacode"
   s2 := "raisethysword"
   fmt.Printf("distance between %s and %s: %d\n", s1, s2,
       levenshtein(s1, s2))

}</lang> Output:

distance between rosettacode and raisethysword: 8

Haskell

<lang haskell>levenshtein s1 s2 = last $ foldl transform [0 .. length s1] s2

 where transform ns@(n:ns') c = scanl calc (n+1) $ zip3 s1 ns ns'
         where calc z (c', x, y) = minimum [y+1, z+1, x + fromEnum (c' /= c)]

main = print (levenshtein "kitten" "sitting")</lang> Output:

3

Icon and Unicon

<lang unicon>procedure main()

   every process(!&input)

end

procedure process(s)

   s ? (s1 := tab(upto(' \t')), s2 := (tab(many(' \t')), tab(0))) | fail
   write("'",s1,"' -> '",s2,"' = ", levenshtein(s1,s2))

end

procedure levenshtein(s, t)

   if (n := *s+1) = 1 then return *t
   if (m := *t+1) = 1 then return *s

   every !(d := list(n,0)) := list(m, 0)
   every i := 1 to max(n,m) do d[i,1] := d[1,i] := i
   every d[1(i := 2 to n, s_i := s[iM1 := i-1]), j := 2 to m] :=
            min(d[iM1,j], d[i,jM1:=j-1],
                d[iM1,jM1] + if s_i == t[jM1] then -1 else 0) + 1
   return d[n,m]-1

end</lang>

->leven
kitten  sitting
'kitten' -> 'sitting' = 3
->

J

One approach would be a literal transcription of the wikipedia implementation:

<lang j>levenshtein=:4 :0

 D=. x +/&i.&>:&# y
 for_i.1+i.#x do.
   for_j.1+i.#y do.
     if. ((<:i){x)=(<:j){y do.
       D=.(D {~<<:i,j) (<i,j)} D
     else.
       min=. 1+<./D{~(i,j) <@:-"1#:1 2 3
       D=. min (<i,j)} D
     end.
   end.
 end.
 {:{:D

)</lang>

However, this is a rather slow and bulky approach. Another alternative would be:

<lang j>levD=: i.@-@>:@#@] ,~ >:@i.@-@#@[ ,.~ ~:/ lev=: [: {. {."1@((1&{ ,~ (1 1 , {.) <./@:+ }.)@,/\.)@,./@levD</lang>

First, we setup up an matrix of costs, with 0 or 1 for unexplored cells (1 being where the character pair corresponding to that cell position has two different characters). Note that the "cost to reach the empty string" cells go on the bottom and the right, instead of the top and the left, because this works better with J's "reduce" operator.

Then we reduce the rows of that matrix using an operation that treats those two rows as columns and reduces the rows of this derived matrix with an operation that gives the (unexplored cell + the minumum of the other cells) followed by (the cell adjacent to the previously unexplored cell.

Example use:

<lang j> 'kitten' levenshtein 'sitting' 3

  'kitten' lev 'sitting'

3</lang>

Time and space use:

<lang j> ts=: 6!:2,7!:2

  ts kitten levenshtein sitting

0.00153132 12800

  ts kitten lev sitting

0.000132101 5376</lang>

(The J flavored variant winds up being about 10 times faster, in J, for this test case, than the explicit version.)

See the Levenshtein distance essay on the Jwiki for additional solutions.

Java

Based on the algorithm outlined in the Wikipedia article:

<lang java>public class LevenshteinDistance {

 public static int computeDistance(String s1, String s2)
 {
   s1 = s1.toLowerCase();
   s2 = s2.toLowerCase();
   
   int[] costs = new int[s2.length() + 1];
   for (int i = 0; i <= s1.length(); i++)
   {
     int lastValue = i;
     for (int j = 0; j <= s2.length(); j++)
     {
       if (i == 0)
         costs[j] = j;
       else
       {
         if (j > 0)
         {
           int newValue = costs[j - 1];
           if (s1.charAt(i - 1) != s2.charAt(j - 1))
             newValue = Math.min(Math.min(newValue, lastValue), costs[j]) + 1;
           costs[j - 1] = lastValue;
           lastValue = newValue;
         }
       }
     }
     if (i > 0)
       costs[s2.length()] = lastValue;
   }
   return costs[s2.length()];
 }
 public static void printDistance(String s1, String s2)
 {
   System.out.println(s1 + "-->" + s2 + ": " + computeDistance(s1, s2));
 }
 
 public static void main(String[] args)
 {
   printDistance("kitten", "sitting");
   printDistance("rosettacode", "raisethysword");
   printDistance(new StringBuilder("rosettacode").reverse().toString(), new StringBuilder("raisethysword").reverse().toString());
   for (int i = 1; i < args.length; i += 2)
     printDistance(args[i - 1], args[i]);
 }

} </lang>

Output:

kitten-->sitting: 3
rosettacode-->raisethysword: 8
edocattesor-->drowsyhtesiar: 8
Translation of: C

<lang java>public class Levenshtein{

   public static int levenshtein(String s, String t){
       /* if either string is empty, difference is inserting all chars 
        * from the other
        */
       if(s.length() == 0) return t.length();
       if(t.length() == 0) return s.length();
       /* if first letters are the same, the difference is whatever is
        * required to edit the rest of the strings
        */
       if(s.charAt(0) == t.charAt(0))
           return levenshtein(s.substring(1), t.substring(1));
       /* else try:
        *      changing first letter of s to that of t,
        *      remove first letter of s, or
        *      remove first letter of t
        */
       int a = levenshtein(s.substring(1), t.substring(1));
       int b = levenshtein(s, t.substring(1));
       int c = levenshtein(s.substring(1), t);
       if(a > b) a = b;
       if(a > c) a = c;
       //any of which is 1 edit plus editing the rest of the strings
       return a + 1;
   }
   public static void main(String[] args) {
       String s1 = "kitten";
       String s2 = "sitting";
       System.out.println("distance between '" + s1 + "' and '"
               + s2 + "': " + levenshtein(s1, s2));
       s1 = "rosettacode";
       s2 = "raisethysword";
       System.out.println("distance between '" + s1 + "' and '"
               + s2 + "': " + levenshtein(s1, s2));
       StringBuilder sb1 = new StringBuilder(s1);
       StringBuilder sb2 = new StringBuilder(s2);
       System.out.println("distance between '" + sb1.reverse() + "' and '"
               + sb2.reverse() + "': "
               + levenshtein(sb1.reverse().toString(), sb2.reverse().toString()));
   }

}</lang> Output:

distance between 'kitten' and 'sitting': 3
distance between 'rosettacode' and 'raisethysword': 8
distance between 'edocattesor' and 'drowsyhtesiar': 8

JavaScript

Based on the algorithm outlined in the Wikipedia article: <lang javascript>function levenshtein(str1, str2) {

   var m = str1.length,
       n = str2.length,
       d = [],
       i, j;
   if (!m) return n;
   if (!n) return m;
   for (i = 0; i <= m; i++) d[i] = [i];
   for (j = 0; j <= n; j++) d[0][j] = j;
   for (j = 1; j <= n; j++) {
       for (i = 1; i <= m; i++) {
           if (str1[i-1] == str2[j-1]) d[i][j] = d[i - 1][j - 1];
           else d[i][j] = Math.min(d[i-1][j], d[i][j-1], d[i-1][j-1]) + 1;
       }
   }
   return d[m][n];

}

console.log(levenshtein("kitten", "sitting")); console.log(levenshtein("stop", "tops")); console.log(levenshtein("rosettacode", "raisethysword"));</lang> Output:

3
2
8

Liberty BASIC

<lang lb> 'Levenshtein Distance translated by Brandon Parker '08/19/10 'from http://www.merriampark.com/ld.htm#VB 'No credit was given to the Visual Basic Author on the site :-(

Print LevenshteinDistance("kitten", "sitting") End

'Get the minum of three values Function Minimum(a, b, c)

   Minimum = Min(a, Min(b, c))

End Function


'Compute the Levenshtein Distance Function LevenshteinDistance(string1$, string2$)

   n = Len(string1$)
   m = Len(string2$)
   If n = 0 Then
       LevenshteinDistance = m
       Exit Function
   End If
   If m = 0 Then
       LevenshteinDistance = n
       Exit Function
   End If
   Dim d(n, m)
   For i = 0 To n
       d(i, 0) = i
   Next i
   For i = 0 To m
       d(0, i) = i
   Next i
   For i = 1 To n
       si$ = Mid$(string1$, i, 1)
       For ii = 1 To m
           tj$ = Mid$(string2$, ii, 1)
           If si$ = tj$ Then
               cost = 0
           Else
               cost = 1
           End If
           d(i, ii) = Minimum((d(i - 1, ii) + 1), (d(i, ii - 1) + 1), (d(i - 1, ii - 1) + cost))
       Next ii
   Next i
   LevenshteinDistance = d(n, m)

End Function

</lang>

Lua

<lang lua>function Levenshtein_Distance( s1, s2 )

   if s1:len() == 0 then return s2:len() end
   if s2:len() == 0 then return s1:len() end

if s1:sub( -1, -1 ) == s2:sub( -1, -1 ) then return Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -2 ) ) end

local a = Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -2 ) ) local b = Levenshtein_Distance( s1:sub( 1, -1 ), s2:sub( 1, -2 ) ) local c = Levenshtein_Distance( s1:sub( 1, -2 ), s2:sub( 1, -1 ) )

if a > b then return b + 1 end if a > c then return c + 1 end return a + 1 end


print( Levenshtein_Distance( "kitten", "sitting" ) ) print( Levenshtein_Distance( "rosettacode", "raisethysword" ) )</lang>

3
8

Mathematica

<lang Mathematica>EditDistance["kitten","sitting"] ->3 EditDistance["rosettacode","raisethysword"] ->8</lang>

OCaml

Translation of the pseudo-code of the Wikipedia article:

<lang ocaml>let minimum a b c =

 min a (min b c)

let levenshtein_distance s t =

 let m = String.length s
 and n = String.length t in
 (* for all i and j, d.(i).(j) will hold the Levenshtein distance between
    the first i characters of s and the first j characters of t *)
 let d = Array.make_matrix (m+1) (n+1) 0 in
 for i = 0 to m do
   d.(i).(0) <- i  (* the distance of any first string to an empty second string *)
 done;
 for j = 0 to n do
   d.(0).(j) <- j  (* the distance of any second string to an empty first string *)
 done;
 for j = 1 to n do
   for i = 1 to m do
     if s.[i-1] = t.[j-1] then
       d.(i).(j) <- d.(i-1).(j-1)  (* no operation required *)
     else
       d.(i).(j) <- minimum
                      (d.(i-1).(j) + 1)   (* a deletion *)
                      (d.(i).(j-1) + 1)   (* an insertion *)
                      (d.(i-1).(j-1) + 1) (* a substitution *)
   done;
 done;
 d.(m).(n)

let test s t =

 Printf.printf " %s -> %s = %d\n" s t (levenshtein_distance s t);

let () =

 test "kitten" "sitting";
 test "rosettacode" "raisethysword";
</lang>

A recursive functional version

This could be made faster with memoization <lang OCaml>let levenshtein s t =

  let rec dist i j = match (i,j) with
     | (i,0) -> i
     | (0,j) -> j
     | (i,j) ->
        if s.[i-1] = t.[j-1] then dist (i-1) (j-1)
        else let d1, d2, d3 = dist (i-1) j, dist i (j-1), dist (i-1) (j-1) in
        1 + min d1 (min d2 d3)
  in
  dist (String.length s) (String.length t)

let test s t =

 Printf.printf " %s -> %s = %d\n" s t (levenshtein s t)

let () =

 test "kitten" "sitting";
 test "rosettacode" "raisethysword";</lang>

Output:

 kitten -> sitting = 3
 rosettacode -> raisethysword = 8

Pascal

A fairly direct translation of the wikipedia pseudo code: <lang pascal>Program LevenshteinDistanceDemo(output);

uses

 Math;

function LevenshteinDistance(s, t: string): longint;

 var
   d: array of array of integer;
   i, j, n, m: integer;
 begin
   n := length(t);
   m := length(s);
   setlength(d, n+1, m+1);
   
   for i := 0 to n do
     d[i,0] := i;
   for j := 0 to m do
     d[0,j] := j;
   for j := 1 to n do
     for i := 1 to m do
       if s[i] = t[j] then  
         d[i,j] := d[i-1,j-1]
       else
         d[i,j] := min(d[i-1,j] + 1, min(d[i,j-1] + 1, d[i-1,j-1] + 1));
   LevenshteinDistance := d[m,n];
 end;
 

var

 s1, s2: string;

begin

 s1 := 'kitten';
 s2 := 'sitting';
 writeln('The Levenshtein distance between "', s1, '" and "', s2, '" is: ', LevenshteinDistance(s1, s2));
 s1 := 'rosettacode';
 s2 := 'raisethysword';
 writeln('The Levenshtein distance between "', s1, '" and "', s2, '" is: ', LevenshteinDistance(s1, s2));

end.</lang> Output:

The Levenshtein distance between "kitten" and "sitting" is: 3
The Levenshtein distance between "rosettacode" and "raisethysword" is: 8

Perl

Recursive algorithm, as in the C sample. You are invited to comment out the line where it says so, and see the speed difference. By the way, there's the Memoize standard module, but it requires setting quite a few parameters to work right for this example, so I'm just showing the simple minded caching scheme here. <lang Perl>use List::Util 'min';

my %cache;

sub leven {

       my ($s, $t) = @_;
       return length($t) if !$s;
       return length($s) if !$t;
       $cache{"$s,$t"} //=          # try commenting out this line
       do {
               my ($s1, $t1) = (substr($s, 1), substr($t, 1));
               (substr($s, 0, 1) eq substr($t, 0, 1))
                       ? leven($s1, $t1)
                       : 1 + min(leven($s1, $t1),
                                 leven($s,  $t1),
                                 leven($s1, $t ));
       };

}

print leven('rosettacode', 'raisethysword'), "\n";</lang>

Perl 6

Implementation of the wikipedia algorithm. Since column 0 and row 0 are used for base distances, the original algorithm would require us to compare "@s[$i-1] eq @t[$j-1]", and reference the $m and $n separately. Prepending an unused value (undef) onto @s and @t makes their indices align with the $i,$j numbering of @d, and lets us use .end instead of $m,$n. <lang perl6>sub levenshtein_distance ( Str $s, Str $t --> Int ) {

   my @s = *, $s.comb;
   my @t = *, $t.comb;
   my @d;
   @d[$_][ 0] = $_ for ^@s.end;
   @d[ 0][$_] = $_ for ^@t.end;
   for 1..@s.end X 1..@t.end -> $i, $j {
       @d[$i][$j] = @s[$i] eq @t[$j]
           ??   @d[$i-1][$j-1]    # No operation required when eq
           !! ( @d[$i-1][$j  ],   # Deletion
                @d[$i  ][$j-1],   # Insertion
                @d[$i-1][$j-1],   # Substitution
              ).min + 1;
   }
   return @d[*-1][*-1];

}

my @a = [<kitten sitting>], [<saturday sunday>], [<rosettacode raisethysword>];

for @a -> [$s, $t] {

   say "levenshtein_distance('$s', '$t') == ", levenshtein_distance($s, $t);

}</lang>

Output:

levenshtein_distance('kitten', 'sitting') == 3
levenshtein_distance('saturday', 'sunday') == 3
levenshtein_distance('rosettacode', 'raisethysword') == 8

PicoLisp

Translation of the pseudo-code in the Wikipedia article: <lang PicoLisp>(de levenshtein (A B)

  (let D
     (cons
        (range 0 (length A))
        (mapcar
           '((I) (cons I (copy A)))
           (range 1 (length B)) ) )
     (for (J . Y) B
        (for (I . X) A
           (set
              (nth D (inc J) (inc I))
              (if (= X Y)
                 (get D J I)
                 (inc
                    (min
                       (get D J (inc I))
                       (get D (inc J) I)
                       (get D J I) ) ) ) ) ) ) ) )</lang>

or, using 'map' to avoid list indexing: <lang PicoLisp>(de levenshtein (A B)

  (let D
     (cons
        (range 0 (length A))
        (mapcar
           '((I) (cons I (copy A)))
           (range 1 (length B)) ) )
     (map
        '((B Y)
           (map
              '((A X P)
                 (set (cdr P)
                    (if (= (car A) (car B))
                       (car X)
                       (inc (min (cadr X) (car P) (car X))) ) ) )
              A
              (car Y)
              (cadr Y) ) )
        B
        D ) ) )</lang>

Output in both cases:

: (levenshtein (chop "kitten") (chop "sitting"))
-> 3

Prolog

Works with SWI-Prolog.
Based on Wikipedia's pseudocode. <lang Prolog>levenshtein(S, T, R) :- length(S, M), M1 is M+1, length(T, N), N1 is N+1, length(Lev, N1), maplist(init(M1), Lev), numlist(0, N, LN), maplist(init_n, LN, Lev), nth0(0, Lev, Lev0), numlist(0, M, Lev0),

% compute_levenshtein distance numlist(1, N, LN1), maplist(work_on_T(Lev, S), LN1, T), last(Lev, LevLast), last(LevLast, R).


work_on_T(Lev, S, J, TJ) :- length(S, M), numlist(1, M, LM), maplist(work_on_S(Lev, J, TJ), LM, S).

work_on_S(Lev, J, C, I, C) :- % same char !, I1 is I-1, J1 is J-1, nth0(J1, Lev, LevJ1), nth0(I1, LevJ1, V), nth0(J, Lev, LevJ), nth0(I, LevJ, V).


work_on_S(Lev, J, _C1, I, _C2) :- I1 is I-1, J1 is J - 1, % compute the value for deletion nth0(J, Lev, LevJ), nth0(I1, LevJ, VD0), VD is VD0 + 1,

% compute the value for insertion nth0(J1, Lev, LevJ1), nth0(I, LevJ1, VI0), VI is VI0 + 1,

% compute the value for substitution nth0(I1, LevJ1, VS0), VS is VS0 + 1,

% set the minimum value to cell(I,J) sort([VD, VI, VS], [V|_]),

nth0(I, LevJ, V).


init(Len, C) :- length(C, Len).

init_n(N, L) :- nth0(0, L, N). </lang> Output examples :

 ?- levenshtein("kitten", "sitting", R).
R = 3.

 ?- levenshtein("saturday", "sunday", R).
R = 3.

 ?- levenshtein("rosettacode", "raisethysword", R).
R = 8.

PureBasic

Based on Wikipedia's pseudocode. <lang PureBasic>Procedure LevenshteinDistance(A_string$, B_String$)

 Protected m, n, i, j, min, k, l
 m = Len(A_string$)
 n = Len(B_String$)
 Dim D(m, n)
 
 For i=0 To m: D(i,0)=i: Next
 For j=0 To n: D(0,j)=j: Next
 
 For j=1 To n
   For i=1 To m
     If Mid(A_string$,i,1) = Mid(B_String$,j,1)
       D(i,j) = D(i-1, j-1); no operation required
     Else
       min = D(i-1, j)+1   ; a deletion
       k   = D(i, j-1)+1   ; an insertion
       l   = D(i-1, j-1)+1 ; a substitution
       If k < min: min=k: EndIf
       If l < min: min=l: EndIf
       D(i,j) = min
     EndIf
   Next
 Next
 ProcedureReturn D(m,n)

EndProcedure

- Testing

n = LevenshteinDistance("kitten", "sitting") MessageRequester("Info","Levenshtein Distance= "+Str(n))</lang>

Python

Iterative

Implementation of the wikipedia algorithm, optimized for memory <lang python> def levenshteinDistance(s1,s2): if len(s1) > len(s2): s1,s2 = s2,s1 distances = range(len(s1) + 1) for index2,char2 in enumerate(s2): newDistances = [index2+1] for index1,char1 in enumerate(s1): if char1 == char2: newDistances.append(distances[index1]) else: newDistances.append(1 + min((distances[index1],distances[index1+1],newDistances[-1]))) distances = newDistances return distances[-1]

print levenshteinDistance("kitten","sitting"),levenshteinDistance("rosettacode","raisethysword") </lang>

Output:

3 8

Memoized recursive version

(Uses this cache from the standard library). <lang python>>>> from functools import lru_cache >>> @lru_cache(maxsize=4095) def ld(s, t): if not s: return len(t) if not t: return len(s) if s[0] == t[0]: return ld(s[1:], t[1:]) l1 = ld(s, t[1:]) l2 = ld(s[1:], t) l3 = ld(s[1:], t[1:]) return 1 + min(l1, l2, l3)

>>> print( ld("kitten","sitting"),ld("rosettacode","raisethysword") ) 3 8</lang>

Racket

A memoized recursive implementation. <lang scheme>#lang racket

(define (levenshtein a b)

 (define (ls0 a-index b-index)
   (cond [(or (= a-index -1) (= b-index -1)) (abs (- a-index b-index))]
         [else 
          (define a-char (string-ref a a-index))
          (define b-char (string-ref b b-index))
          (if (equal? a-char b-char)
              (ls (sub1 a-index) (sub1 b-index))
              (min (add1 (ls (sub1 a-index) b-index))
                   (add1 (ls a-index (sub1 b-index)))
                   (add1 (ls (sub1 a-index) (sub1 b-index)))))]))
 (define memo (make-hash))
 (define (ls a-i b-i)
   (hash-ref memo (cons a-i b-i) 
             ( () (hash-set! memo (cons a-i b-i) (ls0 a-i b-i)) (ls a-i b-i))))
 (ls (sub1 (string-length a)) (sub1 (string-length b))))</lang>

REXX

<lang rexx>/*REXX program to calculate the Levenshtein distance (between 2 strings)*/ call levenshtein 'kitten' , "sitting" call levenshtein 'Sunday' , "Saturday" call levenshtein 'Vladimir_Levenshtein[1965]' ,"Vladimir_Levenshtein[1965]" call levenshtein 'this_algorithm_is_similar_to' ,"Damerau-Levenshtein_distance" exit /*─────────────────────────────────────Levenshtein subroutine───────────*/ levenshtein: procedure; parse arg s,t; m=length(s); n=length(t); d.=0

                       say '          1st string =' s
                       say '          2nd string =' t
   do i=1 for m;  d.i.0=i;  end  /*i*/
   do j=1 for n;  d.0.j=j;  end  /*j*/
do j=1 for n;      tj=substr(t,j,1)
    do i=1 for m;  si=substr(s,i,1)
    if si==tj then d.i.j=d(i-1,j-1)
              else d.i.j=min(d(i-1,j),d(i,j-1),d(i-1,j-1))+1
    end  /*i*/
end      /*j*/

say 'Levenshtein distance =' d.m.n; say return d.m.n /*─────────────────────────────────────D subroutine─────────────────────*/ d: parse arg a,b; return d.a.b</lang> Output when using the input within the REXX program:

          1st string = kitten
          2nd string = sitting
Levenshtein distance = 3

          1st string = Sunday
          2nd string = Saturday
Levenshtein distance = 3

          1st string = Vladimir_Levenshtein[1965]
          2nd string = Vladimir_Levenshtein[1965]
Levenshtein distance = 0

          1st string = this_algorithm_is_similar_to
          2nd string = Damerau-Levenshtein_distance
Levenshtein distance = 24

Ruby

Implementation of the wikipedia algorithm <lang ruby> def levenshtein_distance(s, t)

 m = s.length
 n = t.length
 return m if n == 0
 return n if m == 0
 d = Array.new(m+1) {Array.new(n+1)}
 (0..m).each {|i| d[i][0] = i}
 (0..n).each {|j| d[0][j] = j}
 (1..n).each do |j|
   (1..m).each do |i|
     d[i][j] = if s[i-1] == t[j-1]  # adjust index into string
                 d[i-1][j-1]       # no operation required
               else
                 [ d[i-1][j]+1,    # deletion
                   d[i][j-1]+1,    # insertion
                   d[i-1][j-1]+1,  # substitution
                 ].min
               end
   end
 end
 d[m][n]

end

[ ['kitten','sitting'], ['saturday','sunday'], ["rosettacode", "raisethysword"] ].each do |s,t|

 puts "levenshtein_distance('#{s}', '#{t}') = #{levenshtein_distance(s, t)}"

end</lang>

output

levenshtein_distance('kitten', 'sitting') = 3
levenshtein_distance('saturday', 'sunday') = 3
levenshtein_distance('rosettacode', 'raisethysword') = 8

Scala

Based on Wikipedia algorithm. <lang scala>import scala.math._

object Levenshtein {

  def minimum(i1: Int, i2: Int, i3: Int)=min(min(i1, i2), i3)
  def distance(s1:String, s2:String)={
     val dist=Array.tabulate(s2.length+1, s1.length+1){(j,i)=>if(j==0) i else if (i==0) j else 0}
     for(j<-1 to s2.length; i<-1 to s1.length)
        dist(j)(i)=if(s2(j-1)==s1(i-1)) dist(j-1)(i-1)

else minimum(dist(j-1)(i)+1, dist(j)(i-1)+1, dist(j-1)(i-1)+1)

     dist(s2.length)(s1.length)
  }
 
  def main(args: Array[String]): Unit = {
     printDistance("kitten", "sitting")
     printDistance("rosettacode", "raisethysword")
  }
 
  def printDistance(s1:String, s2:String)=println("%s -> %s : %d".format(s1, s2, distance(s1, s2)))

}</lang> Output:

kitten -> sitting : 3
rosettacode -> raisethysword : 8

Seed7

<lang seed7>$ include "seed7_05.s7i";

const func integer: min (in integer: a, in integer: b) is func

 result
   var integer: min is 0;
 begin
   if a < b then
     min := a;
   else
     min := b;
   end if;
 end func;

const func integer: levenshteinDistance (in string: s, in string: t) is func

 result
   var integer: distance is 0;
 local
   var array array integer: d is 0 times 0 times 0;
   var integer: i is 0;
   var integer: j is 0;
 begin
   d := [0 .. length(s)] times [0 .. length(t)] times 0;
   for key i range s do
     d[i][0] := i;
   end for;
   for key j range t do
     d[0][j] := j;
     for key i range s do
       if s[i] = t[j] then
         d[i][j] := d[pred(i)][pred(j)];
       else
         d[i][j] := min(min(succ(d[pred(i)][j]), succ(d[i][pred(j)])), succ(d[pred(i)][pred(j)]));
       end if;
     end for;
   end for;
   distance := d[length(s)][length(t)];
 end func;

const proc: main is func

 begin
   writeln("kitten -> sitting: " <& levenshteinDistance("kitten", "sitting"));
   writeln("rosettacode -> raisethysword: " <& levenshteinDistance("rosettacode", "raisethysword"));
 end func;</lang>

Output:

kitten -> sitting: 3
rosettacode -> raisethysword: 8

Tcl

<lang tcl>proc levenshteinDistance {s t} {

   # Edge cases
   if {![set n [string length $t]]} {

return [string length $s]

   } elseif {![set m [string length $s]]} {

return $n

   }
   # Fastest way to initialize
   for {set i 0} {$i <= $m} {incr i} {

lappend d 0 lappend p $i

   }
   # Loop, computing the distance table (well, a moving section)
   for {set j 0} {$j < $n} {} {

set tj [string index $t $j] lset d 0 [incr j] for {set i 0} {$i < $m} {} { set a [expr {[lindex $d $i]+1}] set b [expr {[lindex $p $i]+([string index $s $i] ne $tj)}] set c [expr {[lindex $p [incr i]]+1}] # Faster than min($a,$b,$c) lset d $i [expr {$a<$b ? $c<$a ? $c : $a : $c<$b ? $c : $b}] } # Swap set nd $p; set p $d; set d $nd

   }
   # The score is at the end of the last-computed row
   return [lindex $p end]

}</lang> Usage: <lang tcl>puts [levenshteinDistance "kitten" "sitting"]; # Prints 3</lang>

TSE SAL

<lang TSE SAL>

// library: math: get: damerau: levenshtein <description></description> <version>1.0.0.0.23</version> <version control></version control> (filenamemacro=getmadle.s) [kn, ri, th, 08-09-2011 23:04:55] INTEGER PROC FNMathGetDamerauLevenshteinDistanceI( STRING s1, STRING s2 )

INTEGER L1 = Length( s1 )
INTEGER L2 = Length( s2 )
INTEGER substitutionCostI = 0
STRING h1[255] = ""
STRING h2[255] = ""
IF ( ( L1 == 0 ) OR ( L2 == 0 ) )
 // Trivial case: one string is 0-length
 RETURN( Max( L1, L2 ) )
ELSE
 // The cost of substituting the last character
 IF   ( ( s1[ L1 ] ) == ( s2[ L2 ] ) )
  substitutionCostI = 0
 ELSE
  substitutionCostI = 1
 ENDIF
 // h1 and h2 are s1 and s2 with the last character chopped off
 h1 = SubStr( s1, 1,  L1 - 1 )
 h2 = SubStr( s2, 1,  L2 - 1 )
 IF ( ( L1 > 1 ) AND  ( L2 > 1 ) AND  ( s1[ L1 - 0 ] == s2[ L2 - 1 ] ) AND ( s1[ L1 - 1 ] == s2[ L2 - 0 ] ) )
  RETURN( Min( Min( FNMathGetDamerauLevenshteinDistanceI( h1, s2 ) + 1, FNMathGetDamerauLevenshteinDistanceI( s1, h2 ) + 1 ), Min( FNMathGetDamerauLevenshteinDistanceI( h1 , h2 ) + substitutionCostI, FNMathGetDamerauLevenshteinDistanceI( SubStr( s1, 1,  L1 - 2 ), SubStr( s2, 1, L2 - 2 ) ) + 1 ) ) )
 ENDIF
 RETURN( Min( Min( FNMathGetDamerauLevenshteinDistanceI( h1, s2 ) + 1, FNMathGetDamerauLevenshteinDistanceI( s1, h2 ) + 1 ), FNMathGetDamerauLevenshteinDistanceI( h1 ,  h2 ) + substitutionCostI ) )
ENDIF

END

PROC Main() STRING s1[255] = "arcain" STRING s2[255] = "arcane" Warn( "Minimum amount of steps to convert ", s1, " to ", s2, " = ", FNMathGetDamerauLevenshteinDistanceI( s1, s2 ) ) // gives e.g. 2 s1 = "algorithm" s2 = "altruistic" Warn( "Minimum amount of steps to convert ", s1, " to ", s2, " = ", FNMathGetDamerauLevenshteinDistanceI( s1, s2 ) ) // gives e.g. 6 END

</lang>

Visual Basic .NET

<lang vbnet>

Function LevenshteinDistance(ByVal String1 As String, ByVal String2 As String) As Integer
       Dim Matrix(String1.Length, String2.Length) As Integer
       Dim Key As Integer
       For Key = 0 To String1.Length
           Matrix(Key, 0) = Key
       Next
       For Key = 0 To String2.Length
           Matrix(0, Key) = Key
       Next
       For Key1 As Integer = 1 To String2.Length
           For Key2 As Integer = 1 To String1.Length
               If String1(Key2 - 1) = String2(Key1 - 1) Then
                   Matrix(Key2, Key1) = Matrix(Key2 - 1, Key1 - 1)
               Else
                   Matrix(Key2, Key1) = Math.Min(Matrix(Key2 - 1, Key1) + 1, Math.Min(Matrix(Key2, Key1 - 1) + 1, Matrix(Key2 - 1, Key1 - 1) + 1))
               End If
           Next
       Next
       Return Matrix(String1.Length - 1, String2.Length - 1)
   End Function

</lang>