Longest common subsequence: Difference between revisions

Content added Content deleted
m (Found one more font face change to make.)
(Restored original string names of A and B, since coding samples were based on that assumption.)
Line 4: Line 4:
The '''Longest Common Subsequence''' (or [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem '''LCS''']) is a subsequence of maximum length common to two (or more) strings.
The '''Longest Common Subsequence''' (or [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem '''LCS''']) is a subsequence of maximum length common to two (or more) strings.


Let x = x[0]... x[m-1] and y = y[0]... y[n-1], m <= n be strings drawn from an alphabet '''Σ''' of size s, containing every distinct symbol in x + y.
Let A = A[0] A[m-1] and B = B[0] B[n-1], m <= n be strings drawn from an alphabet '''Σ''' of size s, containing every distinct symbol in A + B.


An ordered pair (i, j) will be called a match if x[i] == y[j], where 0 <= i < m and 0 <= j < n.
An ordered pair (i, j) will be called a match if A[i] == B[j], where 0 <= i < m and 0 <= j < n.


Define a strict Cartesian product-order (<) over these ordered pairs, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2.
Define a strict Cartesian product-order (<) over these ordered pairs, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2.
Line 14: Line 14:
Finding an '''LCS''' can then be stated as the problem of finding a chain of maximum cardinality over the set of matches '''M'''.
Finding an '''LCS''' can then be stated as the problem of finding a chain of maximum cardinality over the set of matches '''M'''.


This set of matches can be represented as an m*n bit matrix, where each bit '''M[i, j]''' is True iff there is a match at the corresponding positions of strings x and y.
This set of matches can also be visualized as an m*n bit matrix, where each bit '''M[i, j]''' is True iff there is a match at the corresponding positions of strings A and B.


Any chain '''C''' can then be visualized as a monotonically increasing curve through match bits which are set to True.
Then any chain '''C''' can be visualized as a monotonically increasing curve through those match bits which are set to True.


For example, the sequences "1234" and "1224533324" have an LCS of "1234":
For example, the sequences "1234" and "1224533324" have an LCS of "1234":