Longest common subsequence: Difference between revisions
Content deleted Content added
m Adjusted References, emphasizing the priority of Claus Rick. |
Clarified statement of the Longest Common Subsequence Problem |
||
Line 1:
{{task}}[[Category:Recursion]][[Category:Memoization]]
Define a subsequence to be any string obtained by deleting zero or more symbols from an input string.
The '''Longest Common Subsequence''' (or [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem '''LCS''']) is a subsequence of maximum length common to two (or more) strings.
Let x = x[0]... x[m-1] and y = y[0]... y[n-1], m <= n be strings drawn from an alphabet Sigma of size s, containing every distinct symbol in x + y.
An ordered pair (i, j) will be called a match if x[i] == y[j], where 0 <= i < m and 0 <= j < n.
Define a strict Cartesian product-order over these ordered pairs, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2.
Given such a product-order < over a set of matches M, a chain C is any subset of M where either p1 < p2 or p2 < p1, for distinct pairs p1 and p2 in C.
Finding the LCS can then be viewed as the problem of finding a chain of maximum cardinality over the set of matches M.
This set of matches can be represented as an m*n bit matrix, where each bit M[i, j] is True iff there is a match at the corresponding positions of strings x and y.
A chain can then be visualized as any monotonically increasing curve through those match bits which are set to True.
For example, the sequences "1234" and "1224533324" have an LCS of "1234":
'''<u>1234</u>'''
'''<u>12</u>'''245'''<u>3</u>'''332'''<u>4</u>'''
|