Longest common subsequence: Difference between revisions

Content deleted Content added

Inline

Revision as of 18:16, 11 May 2008

The longest common subsequence (or LCS) of groups A and B is the longest group of elements from A and B that are common between the two groups and in the same order in each group. For example, the sequences "1234" and "1224533324" have an LCS of "1234":

1234
1224533324

For a string example, consider the sequences "thisisatest" and "testing123testing". An LCS would be "tsitest":

thisisatest
testing123testing

In this puzzle, your code only needs to deal with strings. Write a function which returns an LCS of two strings (case-sensitive). You don't need to show multiple LCS's.

BASIC

Works with: QuickBasic version 4.5

Translation of: Java

<qbasic>FUNCTION lcs$ (a$, b$)

   IF LEN(a$) = 0 OR LEN(b$) = 0 THEN

lcs$ = ""

   ELSEIF RIGHT$(a$, 1) = RIGHT$(b$, 1) THEN

lcs$ = lcs$(LEFT$(a$, LEN(a$) - 1), LEFT$(b$, LEN(b$) - 1)) + RIGHT$(a$, 1)

   ELSE

x$ = lcs$(a$, LEFT$(b$, LEN(b$) - 1)) y$ = lcs$(LEFT$(a$, LEN(a$) - 1), b$) IF LEN(x$) > LEN(y$) THEN lcs$ = x$ ELSE lcs$ = y$ END IF

   END IF

END FUNCTION</qbasic>

Haskell

The Wikipedia solution translates directly into Haskell, with the only difference that equal characters are added in front:

longest xs ys = if length xs > length ys then xs else ys

lcs [] _ = []
lcs _ [] = []
lcs (x:xs) (y:ys) 
  | x == y    = x : lcs xs ys
  | otherwise = longest (lcs (x:xs) ys) (lcs xs (y:ys))

Memoization (aka dynamic programming) of that uses zip to make both the index and the character available:

import Data.Array

lcs xs ys = a!(0,0) where
  n = length xs
  m = length ys
  a = array ((0,0),(n,m)) $ l1 ++ l2 ++ l3
  l1 = [((i,m),[]) | i <- [0..n]]
  l2 = [((n,j),[]) | j <- [0..m]]
  l3 = [((i,j), f x y i j) | (x,i) <- zip xs [0..], (y,j) <- zip ys [0..]]
  f x y i j 
    | x == y    = x : a!(i+1,j+1)
    | otherwise = longest (a!(i,j+1)) (a!(i+1,j))

Both solutions work of course not only with strings, but also with any other list. Example:

*Main> lcs "thisisatest" "testing123testing"
"tsitest"

Java

Recursion

This is not a particularly fast algorithm, but it gets the job done eventually. The speed is a result of many recursive functions calls.

<java>public static String lcs(String a, String b){

   if(a.length() == 0 || b.length() == 0){
       return "";
   }else if(a.charAt(a.length()-1) == b.charAt(b.length()-1)){
       return lcs(a.substring(0,a.length()-1),b.substring(0,b.length()-1))
           + a.charAt(a.length()-1);
   }else{
       String x = lcs(a, b.substring(0,b.length()-1));
       String y = lcs(a.substring(0,a.length()-1), b);
       return (x.length() > y.length()) ? x : y;
   }

}</java>

Dynamic Programming

<java>public static String lcs(String a, String b) {

   int[][] lengths = new int[a.length()+1][b.length()+1];

   // row 0 and column 0 are initialized to 0 already

   for (int i = 0; i < a.length(); i++)
       for (int j = 0; j < b.length(); j++)
           if (a.charAt(i) == b.charAt(j))
               lengths[i+1][j+1] = lengths[i][j] + 1;
           else
               lengths[i+1][j+1] =
                   Math.max(lengths[i+1][j], lengths[i][j+1]);

   // read the substring out from the matrix
   StringBuffer sb = new StringBuffer();
   for (int x = a.length(), y = b.length();
        x != 0 && y != 0; ) {
       if (lengths[x][y] == lengths[x-1][y])
           x--;
       else if (lengths[x][y] == lengths[x][y-1])
           y--;
       else {
           assert a.charAt(x-1) == b.charAt(y-1);
           sb.append(a.charAt(x-1));
           x--;
           y--;
       }
   }

   return sb.reverse().toString();

}</java>

@@ Line 61: / Line 61: @@
 "tsitest"
 </pre>
-=={{header|J}}==
- lcs=: [: >./@, [: (* * 1 + >./\@:(_1&|.)@:|:^:2)^:_ [: ,&0@:,.&0 =/
-Program written by Ewart Shaw. Copied, with permission, from the [http://www.jsoftware.com/jwiki/Essays/Longest_Common_Subsequence J wiki essay] on this topic.
 =={{header|Java}}==