Longest common subsequence: Difference between revisions

m
Specified the (<>) relation over matches.
(Undo revision 344177 by CNHume ([[Userp formatting)
m (Specified the (<>) relation over matches.)
Line 10:
Define the strict Cartesian product-order (<) over matches, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2. Defining (>) similarly, we can write m2 < m1 as m1 > m2.
 
If i1 <= j1 and i2 >= j2 (or if i1 >= i2 and i2 <= j2) then neither m1 < m2 nor m1 > m2 are possible; and m1, m2 are said to be ''incomparable''. Defining (#) to denote this case, we can write m1 # m2. '''Note:''' This includes the possibility that m1 == m2, i.e., that i1 == i2 and j1 == j2.
 
We write m1 <> m2 to mean that either m1 < m2 or m1 > m2. Thus, the (<>) operator is the inverse of (#). In this case, m1 != m2 holds, i.e., the two tuples differ in some component.
 
Given a product-order over the set of matches '''M''', a chain '''C''' is any subset of '''M''' where either m1 < m2 or m2 < m1 for every pair of distinct elements m1 and m2 of '''C'''.
Line 30 ⟶ 32:
In the application of comparing file revisions, records from the input files form a large symbol space; and the number of symbols may be reduced to the length of the LCS and the number of matches may tend towards linear, O(m+n) growth.
 
AIn this case, a binary search optimization due to Hunt and Szymanski can be applied into thisthe case,basic Dynamic Programming approach which results in expected performance of O(n log m). In the worst case, performance may degrade to O(m*n log m) time if the number of matches, and the space required to represent them, should grow to O(m*n).
 
Claus Rick has also described a linear-space algorithm with a time bound of O(n*s + min(p*m, p*(n-p))).
159

edits