Longest common subsequence: Difference between revisions

← Older edit

Longest common subsequence (view source)

Revision as of 15:17, 28 April 2024

8 bytes added , 1 month ago

m

→‎{{header|EasyLang}}

Chkas

2,021

edits

Revision as of 20:45, 27 August 2022 (view source) Thundergnat (talk \| contribs) m (syntax highlighting fixup automation) ← Older edit		Latest revision as of 15:17, 28 April 2024 (view source) Chkas (talk \| contribs) m (→‎{{header\|EasyLang}})
(16 intermediate revisions by 6 users not shown)
Line 8: Let ''A'' &equiv; ''A''[0]… ''A''[m - 1] and ''B'' &equiv; ''B''[0]… ''B''[n - 1], m < n be strings drawn from an alphabet Σ of size s, containing every distinct symbol in A + B. An ordered pair (i, j) will be referred to as a match if ''A''[i] = ''B''[j], where 0 &ltle; i &lelt; m and 0 &ltle; j &lelt; n. The set of matches '''M''' defines a relation over matches: '''M'''[i, j] ⇔ (i, j) ∈ '''M'''. Define a ''non-strict'' [https://en.wikipedia.org/wiki/Product_order product-order] (≤) over ordered pairs, such that (i1, j1) ≤ (i2, j2) ⇔ i1 ≤ i2 and j1 ≤ j2. We define (≥) similarly. We say ~~m1,~~ordered pairs p1 and m2p2 are ''comparable'' if either m1p1 ≤ m2p2 or m1p1 ≥ m2p2 holds. If i1 < i2 and j2 < j1 (or i2 < i1 and j1 < j2) then neither m1p1 ≤ m2p2 nor m1p1 ≥ m2p2 are possible;, and we say ~~m1,~~p1 and m2p2 are ''incomparable''. ~~We also define the ''strict'' product-order (<) over ordered pairs, such that (i1, j1) < (i2, j2) ⇔ i1 < i2 and j1 < j2. We define (>) similarly.~~ Given a set of matches '''M''', a chain '''C''' is a subset of '''M''' consisting of at least one element m; and where either m1 < m2 or m1 > m2 for every pair of distinct elements m1 and m2. An antichain '''D''' is any subset of '''M''' in which every pair of distinct elements m1 and m2 are incomparable. The set '''M''' represents a relation over match pairs: '''M'''[i, j] ⇔ (i, j) ∈ '''M'''. A chain '''C''' can be visualized as a curve which strictly increases as it passes through each match pair in the mn coordinate space. ~~Finding an LCS can be restated as the problem of finding a chain of maximum cardinality p over the set of matches '''M'''.~~ According to [Dilworth 1950], this cardinality p equals the minimum number of disjoint antichains into which '''M''' can be decomposed. Note that such a decomposition into the minimal number p of disjoint antichains may not be unique. ~~'''Contours'''~~ ~~Forward Contours FC[''k''] of ''class k'' are defined inductively, as follows:~~ ~~FC[0] consists of those elements m1 for which there exists no element m2 such that m2 < m1.~~ Define the ''strict'' product-order (<) over ordered pairs, such that (i1, j1) < (i2, j2) ⇔ i1 < i2 and j1 < j2. We define (>) similarly. ~~FC[''k''] consists of those elements m1 for which there exists no element m2 such that m2 < m1; and where neither m1 nor m2 are contained in FC[''l''] for any ''class l'' < ''k''.~~ A chain '''C''' is a subset of '''M''' consisting of at least one element m; and where either m1 < m2 or m1 > m2 for every pair of distinct elements m1 and m2. An antichain '''D''' is any subset of '''M''' in which every pair of distinct elements m1 and m2 are incomparable. ~~Reverse Contours RC[''k''] of ''class k'' are defined similarly.~~ A chain can be visualized as a strictly increasing curve that passes through matches (i, j) in the mn coordinate space of '''M'''[i, j]. ~~Members of the Meet (&and;), or ''Infimum'' of a Forward Contour are referred to as its Dominant Matches: those m1 for which there exists no m2 such that m2 < m1.~~ Every Common Sequence of length ''q'' corresponds to a chain of cardinality ''q'', over the set of matches '''M'''. Thus, finding an LCS can be restated as the problem of finding a chain of maximum cardinality ''p''. ~~Members of the Join (&or;), or ''Supremum'' of a Reverse Contour are referred to as its Dominant Matches: those m1 for which there exists no m2 such that m2 > m1.~~ According to [Dilworth 1950], this cardinality ''p'' equals the minimum number of disjoint antichains into which '''M''' can be decomposed. Note that such a decomposition into the minimal number p of disjoint antichains may not be unique. ~~Where multiple Dominant Matches exist within a Meet (or within a Join, respectively) the Dominant Matches will be incomparable to each other.~~ '''Background''' Where the number of symbols appearing in matches is small relative to the length of the input strings, reuse of the symbols increases; and the number of matches will tend towards ~~quadratic,~~ O(''mn'') quadratic growth. This occurs, for example, in the Bioinformatics application of nucleotide and protein sequencing. The divide-and-conquer approach of [Hirschberg 1975] limits the space required to O(''n''). However, this approach requires O(''mn'') time even in the best case. Line 60 ⟶ 46: A, B are input strings of lengths m, n respectively p is the length of the LCS M is the set of ~~match pairs~~matches (i, j) such that A[i] = B[j] r is the magnitude of M s is the magnitude of the alphabet Σ of distinct symbols in A + B Line 337 ⟶ 323: =={{header\|BASIC}}== ==={{header\|QuickBASIC}}=== {{works with\|QuickBasic\|4.5}} {{trans\|Java}} Line 354 ⟶ 341: END IF END FUNCTION</syntaxhighlight> =={{header\|BASIC256}}== Line 508 ⟶ 494: =={{header\|C++}}== '''Hunt and Szymanski algorithm''' <syntaxhighlight lang="cpp"~~>#include <stdint.h~~> #include <stdint.h> #include <string> #include <memory> // for shared_ptr<> #include <iostream> #include <deque> #include <unordered_map> //[C++11] #include <algorithm> // for lower_bound() #include <iterator> // for next() and prev() Line 521 ⟶ 508: class LCS { protected: // ~~This~~Instances of the Pair linked list class isare used to ~~trace~~recover the LCS ~~candidates~~: class Pair { public: Line 541 ⟶ 528: typedef deque<shared_ptr<Pair>> PAIRS; ~~typedef deque<uint32_t> THRESHOLD;~~ typedef deque<uint32_t> INDEXES; typedef unordered_map<char, INDEXES> CHAR_TO_INDEXES_MAP; typedef deque<INDEXES> MATCHES; static uint32_t FindLCS(~~MATCHES& indexesOf2MatchedByIndex1, shared_ptr<Pair> pairs) {~~ MATCHES& indexesOf2MatchedByIndex1, shared_ptr<Pair>* pairs) { auto traceLCS = pairs != nullptr; PAIRS chains; ~~THRESHOLD~~INDEXES ~~threshold~~prefixEnd; // //[Assert]After each index1 iteration ~~threshold~~prefixEnd[index3] is the least index2 // such that the LCS of s1[0:index1] and s2[0:index2] has length index3 + 1 // uint32_t index1 = 0; for (const auto& it1 : indexesOf2MatchedByIndex1) { ifauto ~~(!it1->empty())~~dq2 {= it1; auto ~~dq2~~limit = it1prefixEnd.end(); for (auto ~~limit~~it2 = ~~threshold~~dq2.~~end~~rbegin(); it2 != dq2.rend(); it2++) { // Each index1, index2 pair corresponds to a match ~~for (auto it2 = dq2.rbegin(); it2 != dq2.rend(); it2++) {~~ ~~// Each of the index1,~~auto index2 ~~pairs considered here correspond to a~~= ~~match~~it2; ~~auto index2 = it2;~~ // // Note: The reverse iterator it2 visits index2 values in descending order, // allowing ~~thresholds~~in-place toupdate beof ~~updated in-place~~prefixEnd[]. std::lower_bound() is used to // to perform a binary search. // limit = lower_bound(~~threshold~~prefixEnd.begin(), limit, index2); ~~auto index3 = distance(threshold.begin(), limit);~~ // // Look ahead to the next index2 value to optimize ~~space~~Pairs used inby the Hunt // and Szymanski algorithm. If the next index2 is also an improvement on // the value currently held in ~~threshold~~prefixEnd[index3], a new Pair will only be // superseded on the next index2 iteration. // // ~~Depending~~Verify that a onnext ~~match~~index2 ~~redundancy,~~value ~~the~~exists; ~~number~~and ofthat ~~Pair~~this ~~constructions~~value ~~may~~is begreater // ~~divided~~than bythe ~~factors~~final ~~ranging~~index2 ~~from~~value 2of upthe toLCS 10prefix orat ~~more.~~prev(limit): // auto ~~skipIndex2~~preferNextIndex2 = next(it2) != dq2.rend() && (limit == ~~threshold~~prefixEnd.begin() \|\| prev(limit) < next(it2)); ~~if (skipIndex2) continue;~~ // ~~if (limit == threshold.end()) {~~ // Depending on match redundancy, this optimization may reduce the number ~~// Insert Case~~ // of Pair allocations by factors ranging from 2 up to 10 or more. ~~threshold.push_back(index2);~~ // ~~Refresh limit iterator:~~ if (preferNextIndex2) continue; ~~limit = prev(threshold.end());~~ ~~if (traceLCS) {~~ auto ~~prefix~~index3 = ~~index3 > 0 ? chains[index3 - 1] :~~distance(prefixEnd.begin(), ~~nullptr~~limit); ~~auto last = make_shared<Pair>(index1, index2, prefix);~~ if (limit == ~~chains~~prefixEnd.~~push_back~~end(~~last~~);) { // Insert }Case prefixEnd.push_back(index2); // Refresh limit iterator: limit = prev(prefixEnd.end()); if (traceLCS) { chains.push_back(pushPair(chains, index3, index1, index2)); } ~~else if (index2 < limit) {~~} else if (index2 < //limit) ~~Update Case~~{ // Update ~~limit value:~~Case // Update limit ~~= index2;~~value: limit = ~~if (traceLCS) {~~index2; if (traceLCS) { ~~auto prefix = index3 > 0 ? chains[index3 - 1] : nullptr;~~ chains[index3] = ~~auto~~pushPair(chains, ~~last~~index3, ~~= make_shared<Pair>(~~index1, index2~~, prefix~~); ~~chains[index3] = last;~~ } } } ~~} // next index2~~ } // next index2 } index1++; Line 614 ⟶ 601: if (traceLCS) { // Return the LCS as a linked list of matched index pairs: auto last = chains.~~size~~empty() >? 0nullptr ?: chains.back() ~~: nullptr~~; // Reverse longest chain pairs = Pair::Reverse(last); } auto length = ~~threshold~~prefixEnd.size(); return length; } private: static shared_ptr<Pair> pushPair( PAIRS& chains, const ptrdiff_t& index3, uint32_t& index1, uint32_t& index2) { auto prefix = index3 > 0 ? chains[index3 - 1] : nullptr; return make_shared<Pair>(index1, index2, prefix); } protected: // // Match() avoids mn comparisons by using CHAR_TO_INDEXES_MAP to Line 631 ⟶ 626: // time will be O(log(m+n)), at most. // static void Match( ~~void Match(CHAR_TO_INDEXES_MAP& indexesOf2MatchedByChar, MATCHES& indexesOf2MatchedByIndex1,~~ CHAR_TO_INDEXES_MAP& indexesOf2MatchedByChar, MATCHES& indexesOf2MatchedByIndex1, const string& s1, const string& s2) { uint32_t index = 0; Line 643 ⟶ 639: } static string Select(shared_ptr<Pair> pairs, uint32_t length, bool right, const string& s1, const string& s2) { string buffer; Line 655 ⟶ 651: public: static string Correspondence(const string& s1, const string& s2) { CHAR_TO_INDEXES_MAP indexesOf2MatchedByChar; MATCHES indexesOf2MatchedByIndex1; // holds references into indexesOf2MatchedByChar Line 665 ⟶ 661: };</syntaxhighlight> '''Example''': <syntaxhighlight lang="cpp"> ~~LCS lcs;~~ auto s = ~~lcs.~~LCS::Correspondence(s1, s2); cout << s << endl;</syntaxhighlight> More fully featured examples are available at [https://github.com/CNHume/Samples/tree/master/C%2B%2B/LCS Samples/C++/LCS]. =={{header\|Clojure}}== Line 982 ⟶ 980: lcsRecursion('', 'x') = lcsRecursion('x', 'x') = x</pre> =={{header\|EasyLang}}== {{trans\|BASIC256}} <syntaxhighlight> func$ right a$ n . return substr a$ (len a$ - n + 1) n . func$ left a$ n . if n < 0 n = len a$ + n . return substr a$ 1 n . func$ lcs a$ b$ . if len a$ = 0 or len b$ = 0 return "" . if right a$ 1 = right b$ 1 return lcs left a$ -1 left b$ -1 & right a$ 1 . x$ = lcs a$ left b$ -1 y$ = lcs left a$ -1 b$ if len x$ > len y$ return x$ else return y$ . . print lcs "1234" "1224533324" print lcs "thisisatest" "testing123testing" </syntaxhighlight> {{out}} <pre> 1234 tsitest </pre> =={{header\|Egison}}== Line 2,160 ⟶ 2,195: ===Alternate letting regex do all the work=== <syntaxhighlight lang="perl">~~#!/usr/bin/perl~~use strict; ~~use strict; # https://rosettacode.org/wiki/Longest_common_subsequence~~ use warnings; use feature 'bitwise'; print "lcs is ", lcs('thisisatest', 'testing123testing'), "\n"; Line 2,170 ⟶ 2,204: { my ($c, $d) = @_; for my $len ( reverse 1 .. length($c &. $d) ) { "$c\n$d" =~ join '.*', ('(.)') x $len, "\n", map "\\$_", 1 .. $len and Line 2,178 ⟶ 2,212: }</syntaxhighlight> {{out}} <pre>lcs is ~~tsitest~~tastiest</pre> =={{header\|Phix}}== Line 3,176 ⟶ 3,210: <syntaxhighlight lang="ruby">func lcs(xstr, ystr) is cached { xstr.is_empty && return xstr; ystr.is_empty && return ystr; var(x, xs, y, ys) = (xstr.ftfirst(~~0,0~~1), xstr.ftslice(1), ystr.ftfirst(~~0,0~~1), ystr.ftslice(1)); if (x == y) { x + lcs(xs, ys) } else { [lcs(xstr, ys), lcs(xs, ystr)].max_by { .len }; } } say lcs("thisisatest", "testing123testing");</syntaxhighlight> {{out}} <pre> Line 3,367 ⟶ 3,401: =={{header\|Wren}}== {{trans\|Kotlin}} <syntaxhighlight lang="~~ecmascript~~wren">var lcs // recursive lcs = Fn.new { \|x, y\| if (x.count == 0 \|\| y.count == 0) return ""