Longest common subsequence: Difference between revisions
Corrected definitions made in the introduction.
(Removed discussion of Forward and Reverse Contours pending correction.) |
(Corrected definitions made in the introduction.) |
||
Line 8:
Let ''A'' ≡ ''A''[0]… ''A''[m - 1] and ''B'' ≡ ''B''[0]… ''B''[n - 1], m < n be strings drawn from an alphabet Σ of size s, containing every distinct symbol in A + B.
An ordered pair (i, j) will be referred to as a match if ''A''[i] = ''B''[j], where 0 &
The set of matches '''M''' defines a relation over matches: '''M'''[i, j] ⇔ (i, j) ∈ '''M'''.
Define a ''non-strict'' [https://en.wikipedia.org/wiki/Product_order product-order] (≤) over ordered pairs, such that (i1, j1) ≤ (i2, j2) ⇔ i1 ≤ i2 and j1 ≤ j2. We define (≥) similarly.
We say
Every Common Sequence of length ''q'' corresponds to a chain of cardinality ''q'', over the set of matches '''M'''. Thus, finding an LCS can be restated as the problem of finding a chain of maximum cardinality ''p''.
According to [Dilworth 1950], this cardinality ''p'' equals the minimum number of disjoint antichains into which '''M''' can be decomposed. Note that such a decomposition into the minimal number p of disjoint antichains may not be unique.
'''Background'''
Where the number of symbols appearing in matches is small relative to the length of the input strings, reuse of the symbols increases; and the number of matches will tend towards
The divide-and-conquer approach of [Hirschberg 1975] limits the space required to O(''n''). However, this approach requires O(''m*n'') time even in the best case.
Line 44 ⟶ 46:
A, B are input strings of lengths m, n respectively
p is the length of the LCS
M is the set of
r is the magnitude of M
s is the magnitude of the alphabet Σ of distinct symbols in A + B
|