Longest common subsequence: Difference between revisions

Content added Content deleted
m (Revised interpretation of M as a relation.)
(Simplified discussion of the product-order, defining it to be non-strict in keeping with wider convention. Corrected legend.)
Line 6: Line 6:
The [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem '''Longest Common Subsequence'''] ('''LCS''') is a subsequence of maximum length common to two or more strings.
The [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem '''Longest Common Subsequence'''] ('''LCS''') is a subsequence of maximum length common to two or more strings.


Let ''A'' = ''A''[0]… ''A''[m - 1] and ''B'' = ''B''[0]… ''B''[n - 1], m ≤ n be strings drawn from an alphabet Σ of size s, containing every distinct symbol in A + B.
Let ''A'' ≡ ''A''[0]… ''A''[m - 1] and ''B'' ≡ ''B''[0]… ''B''[n - 1], m < n be strings drawn from an alphabet Σ of size s, containing every distinct symbol in A + B.


An ordered pair (i, j) will be called a match if ''A''[i] == ''B''[j], where 0 &le; i < m and 0 &le; j < n.
An ordered pair (i, j) will be referred to as a match if ''A''[i] = ''B''[j], where 0 &lt; i &le; m and 0 &lt; j &le; n.


Define the strict Cartesian [https://en.wikipedia.org/wiki/Product_order product-order] (<) over matches, such that (i1, j1) < (i2, j2) &hArr; i1 < i2 and j1 < j2. Defining (>) similarly, we can write m2 < m1 as m1 > m2.
Define the [https://en.wikipedia.org/wiki/Product_order product-order] (&le;) over ordered pairs, such that (i1, j1) &le; (i2, j2) &hArr; i1 &le; i2 and j1 &le; j2. Defining (&ge;) similarly, we can write m2 &le; m1 as m1 &ge; m2.


We write m1 <> m2 to mean that either m1 < m2 or m1 > m2 holds, ''i.e.'', that m1, m2 are ''comparable''.
We say that m1, m2 are ''comparable'' if either m1 &le; m2 or m1 &ge; m2 holds. If i1 &lt; i2 and j2 &lt; j1 (or i2 &lt; i1 and j1 &lt; j2) then neither m1 &le; m2 nor m1 &ge; m2 are possible; and we say that m1, m2 are ''incomparable''.


Given a product-order over the set of matches '''M''', a chain '''C''' is any subset of '''M''' in which every pair of distinct elements m1 and m2 are comparable. Similarly, an antichain '''D''' is any subset of '''M''' in which every pair of distinct elements m1 and m2 are incomparable.
If i1 &le; i2 and j2 &le; j1 (or i2 &le; i1 and j1 &le; j2) then neither m1 < m2 nor m1 > m2 are possible; and m1, m2 are ''incomparable''.


The set '''M''' represents a relation over match pairs: '''M'''[i, j] &hArr; (i, j) &isin; '''M'''. Any chain '''C''' can be visualized as a curve which strictly increases as it passes through each match pair in the m*n coordinate space.
Defining (#) to denote this case, we write m1 # m2. Because the underlying product-order is strict, m1 == m2 (''i.e.'', i1 == i2 and j1 == j2) implies m1 # m2. m1 <> m2 implies m1 &ne; m2, ''i.e.'', that the two tuples differ in some component. Thus, the (<>) operator is the inverse of (#).


Finding an LCS can be restated as the problem of finding a chain of maximum cardinality p over the set of matches '''M'''.
Given a product-order over the set of matches '''M''', a chain '''C''' is any subset of '''M''' where m1 <> m2 for every pair of distinct elements m1 and m2 of '''C'''. Similarly, an antichain '''D''' is any subset of '''M''' where m1 # m2 for every pair of distinct elements m1 and m2 of '''D'''.

Finding an LCS can then be restated as the problem of finding a chain of maximum cardinality p over the set of matches '''M'''.

The set '''M''' represents a relation over match pairs: (i, j) &isin; '''M''' &hArr; '''M'''[i, j]. Any chain '''C''' can be visualized as a strictly increasing curve which passes through each match pair in the m*n coordinate space.


According to [Dilworth 1950], this cardinality p equals the minimum number of disjoint antichains into which '''M''' can be decomposed. Note that such a decomposition into the minimal number p of disjoint antichains may not be unique.
According to [Dilworth 1950], this cardinality p equals the minimum number of disjoint antichains into which '''M''' can be decomposed. Note that such a decomposition into the minimal number p of disjoint antichains may not be unique.
Line 46: Line 42:
A, B are input strings of lengths m, n respectively
A, B are input strings of lengths m, n respectively
p is the length of the LCS
p is the length of the LCS
M is the set of match pairs (i, j) such that x[i] == y[j]
M is the set of match pairs (i, j) such that A[i] = B[j]
r is the magnitude of M
r is the magnitude of M
s is the magnitude of the alphabet of distinct symbols in x + y
s is the magnitude of the alphabet Σ of distinct symbols in A + B


'''References'''
'''References'''