Longest common substring: Difference between revisions
No edit summary |
|||
Line 925: | Line 925: | ||
<lang dyalect>func lComSubStr(w1, w2) { |
<lang dyalect>func lComSubStr(w1, w2) { |
||
var (len, end) = (0, 0) |
var (len, end) = (0, 0) |
||
var mat = Array. |
var mat = Array.Empty(w1.Length() + 1, () => Array.Empty(w2.Length() + 1, 0)) |
||
var (i, j) = (0, 0) |
var (i, j) = (0, 0) |
||
Line 934: | Line 934: | ||
mat[i + 1][j + 1] = curLen |
mat[i + 1][j + 1] = curLen |
||
if curLen > len { |
if curLen > len { |
||
len = curLen |
len = curLen |
||
end = i |
end = i |
||
} |
} |
||
} |
} |
||
Line 943: | Line 943: | ||
i += 1 |
i += 1 |
||
} |
} |
||
String(values: w1). |
String(values: w1).Substring((end + 1) - len, len) |
||
} |
} |
||
func comSubStr(w1, w2) { |
func comSubStr(w1, w2) { |
||
return String(lComSubStr(w1. |
return String(lComSubStr(w1.Iterate().ToArray(), w2.Iterate().ToArray())) |
||
} |
} |
||
Revision as of 18:52, 8 March 2022
You are encouraged to solve this task according to the task description, using any language you may know.
- Task
Write a function that returns the longest common substring of two strings.
Use it within a program that demonstrates sample output from the function, which will consist of the longest common substring between "thisisatest" and "testing123testing".
Note that substrings are consecutive characters within a string. This distinguishes them from subsequences, which is any sequence of characters within a string, even if there are extraneous characters in between them.
Hence, the longest common subsequence between "thisisatest" and "testing123testing" is "tsitest", whereas the longest common substring is just "test".
- Metrics
- Counting
- Word frequency
- Letter frequency
- Jewels and stones
- I before E except after C
- Bioinformatics/base count
- Count occurrences of a substring
- Count how many vowels and consonants occur in a string
- Remove/replace
- XXXX redacted
- Conjugate a Latin verb
- Remove vowels from a string
- String interpolation (included)
- Strip block comments
- Strip comments from a string
- Strip a set of characters from a string
- Strip whitespace from a string -- top and tail
- Strip control codes and extended characters from a string
- Anagrams/Derangements/shuffling
- Word wheel
- ABC problem
- Sattolo cycle
- Knuth shuffle
- Ordered words
- Superpermutation minimisation
- Textonyms (using a phone text pad)
- Anagrams
- Anagrams/Deranged anagrams
- Permutations/Derangements
- Find/Search/Determine
- ABC words
- Odd words
- Word ladder
- Semordnilap
- Word search
- Wordiff (game)
- String matching
- Tea cup rim text
- Alternade words
- Changeable words
- State name puzzle
- String comparison
- Unique characters
- Unique characters in each string
- Extract file extension
- Levenshtein distance
- Palindrome detection
- Common list elements
- Longest common suffix
- Longest common prefix
- Compare a list of strings
- Longest common substring
- Find common directory path
- Words from neighbour ones
- Change e letters to i in words
- Non-continuous subsequences
- Longest common subsequence
- Longest palindromic substrings
- Longest increasing subsequence
- Words containing "the" substring
- Sum of the digits of n is substring of n
- Determine if a string is numeric
- Determine if a string is collapsible
- Determine if a string is squeezable
- Determine if a string has all unique characters
- Determine if a string has all the same characters
- Longest substrings without repeating characters
- Find words which contains all the vowels
- Find words which contains most consonants
- Find words which contains more than 3 vowels
- Find words which first and last three letters are equals
- Find words which odd letters are consonants and even letters are vowels or vice_versa
- Formatting
- Substring
- Rep-string
- Word wrap
- String case
- Align columns
- Literals/String
- Repeat a string
- Brace expansion
- Brace expansion using ranges
- Reverse a string
- Phrase reversals
- Comma quibbling
- Special characters
- String concatenation
- Substring/Top and tail
- Commatizing numbers
- Reverse words in a string
- Suffixation of decimal numbers
- Long literals, with continuations
- Numerical and alphabetical suffixes
- Abbreviations, easy
- Abbreviations, simple
- Abbreviations, automatic
- Song lyrics/poems/Mad Libs/phrases
- Mad Libs
- Magic 8-ball
- 99 Bottles of Beer
- The Name Game (a song)
- The Old lady swallowed a fly
- The Twelve Days of Christmas
- Tokenize
- Text between
- Tokenize a string
- Word break problem
- Tokenize a string with escaping
- Split a character string based on change of character
- Sequences
- References
11l
<lang 11l>F longest_common_substring(s1, s2)
V ir = 0 V jr = -1 L(i1) 0 .< s1.len V? i2 = s2.find(s1[i1]) L i2 != N V (j1, j2) = (i1, i2) L j1 < s1.len & j2 < s2.len & s2[j2] == s1[j1] I j1 - i1 >= jr - ir (ir, jr) = (i1, j1) j1++ j2++ i2 = s2.find(s1[i1], i2 + 1) R s1[ir..jr]
print(longest_common_substring(‘thisisatest’, ‘testing123testing’))</lang>
- Output:
test
Action!
<lang Action!>BYTE Func Equals(CHAR ARRAY a,b)
BYTE i
IF a(0)#b(0) THEN RETURN (0) FI
FOR i=1 TO a(0) DO IF a(i)#b(i) THEN RETURN (0) FI OD
RETURN (1)
PROC Lcs(CHAR ARRAY a,b,res)
CHAR ARRAY t(100) BYTE i,j,len
IF a(0)<b(0) THEN len=a(0) ELSE len=b(0) FI
WHILE len>0 DO FOR i=1 to a(0)-len+1 DO SCopyS(res,a,i,i+len-1) FOR j=1 to b(0)-len+1 DO SCopyS(t,b,j,j+len-1) IF Equals(res,t) THEN RETURN FI OD OD len==-1 OD res(0)=0
RETURN
PROC Test(CHAR ARRAY a,b)
CHAR ARRAY res(100)
Lcs(a,b,res) PrintF("lcs(""%S"",""%S"")=""%S""%E",a,b,res)
RETURN
PROC Main()
Test("thisisatest","testing123testing")
RETURN</lang>
- Output:
Screenshot from Atari 8-bit computer
lcs("thisisatest","testing123testing")="test"
Ada
<lang Ada>with Ada.Text_Io;
procedure Longest_Common_Substring is
function Common (Left, Right: String) return String is Com : array (Left'Range, Right'Range) of Natural := (others => (others => 0)); Longest : Natural := 0; Last : Natural := 0; begin for L in Left'Range loop for R in Right'Range loop if Left (L) = Right (R) then if L > Left'First and R > Right'First then Com (L, R) := Com (L - 1, R - 1) + 1; else Com (L, R) := 1; end if; if Com (L, R) > Longest then Longest := Com (L, R); Last := L; end if; end if; end loop; end loop; return Left (Last - Longest + 1 .. Last); end Common;
begin
Ada.Text_Io.Put_Line (Common ("thisisatest", "testing123testing"));
end Longest_Common_Substring;</lang>
- Output:
test
Aime
<lang aime>void test_string(text &g, v, l) {
integer n;
n = prefix(v, l); if (~g < n) { g = cut(l, 0, n); }
}
longest(text u, v) {
record r; text g, l, s;
while (~u) { r[u] = 0; u = delete(u, 0); } while (~v) { if (rsk_lower(r, v, l)) { test_string(g, v, l); } if (rsk_upper(r, v, l)) { test_string(g, v, l); } v = delete(v, 0); }
g;
}</lang> <lang aime>o_(longest("thisisatest", "testing123testing"), "\n");</lang>
ALGOL 68
<lang algol68>BEGIN
# returns the longest common substring of s and t # PROC longest common substring = ( STRING s, t )STRING: BEGIN STRING s1 = s[ @ 1 ]; # normalise bounds to 1 : ... # STRING s2 = t[ @ 1 ]; STRING result := ""; INT result len := 0; FOR i TO UPB s1 DO FOR j TO UPB s2 DO IF s1[ i ] = s2[ j ] THEN INT k := 1; WHILE INT ik = i + k; INT jk = j + k; IF ik > UPB s1 OR jk > UPB s2 THEN FALSE ELSE s1[ ik ] = s2[ jk ] FI DO k +:= 1 OD; IF k > result len THEN # found a longer substring # result len := k; result := s1[ i : ( i + k ) - 1 ] FI FI OD OD; result END # longest common substring # ;
# task test case # print( ( longest common substring( "thisisatest", "testing123testing" ), newline ) )
END</lang>
- Output:
test
AppleScript
Iterative
This allows for the possibility of co-longest substrings, returning one instance of each. If either input string is empty, it's taken as meaning there are no common substrings.
<lang applescript>on LCS(a, b)
-- Identify the shorter string. The longest common substring won't be longer than it! set lengthA to a's length set lengthB to b's length if (lengthA < lengthB) then set {shorterString, shorterLength, longerString} to {a, lengthA, b} else set {shorterString, shorterLength, longerString} to {b, lengthB, a} end if set longestMatches to {} set longestMatchLength to 0 -- Find the longest matching substring starting at each character in the shorter string. repeat with i from 1 to shorterLength repeat with j from shorterLength to i by -1 set thisSubstring to text i thru j of shorterString if (longerString contains thisSubstring) then -- Match found. If it's longer than the previously found match, or a new string of the same length, remember it. set matchLength to j - i + 1 if (matchLength > longestMatchLength) then set longestMatches to {thisSubstring} set longestMatchLength to matchLength else if ((matchLength = longestMatchLength) and (thisSubstring is not in longestMatches)) then set end of longestMatches to thisSubstring end if -- Don't bother with the match's own substrings. exit repeat end if end repeat end repeat return longestMatches
end LCS</lang>
<lang applescript>LCS("thisisatest", "testing123testing")</lang>
- Output:
<lang applescript>{"test"}</lang>
Or: <lang applescript>LCS("thisisthebesttest", "besting123testing")</lang>
- Output:
<lang applescript>{"best", "test"}</lang>
Functional
Using library functions wherever possible, for better productivity, (and for more granular Rosetta comparison): <lang applescript>------------------ LONGEST COMMON SUBSTRING ----------------
-- longestCommon :: Eq a => [a] -> [a] -> [a] on longestCommon(a, b)
-- The longest common substring of two given strings. script substrings on |λ|(s) map(my concat, concatMap(my tails, rest of inits(s))) end |λ| end script set {xs, ys} to map(substrings, {a, b}) maximumBy(comparing(my |length|), intersect(xs, ys))
end longestCommon
TEST ---------------------------
on run
longestCommon("testing123testing", "thisisatest")
end run
GENERIC FUNCTIONS --------------------
-- comparing :: (a -> b) -> (a -> a -> Ordering) on comparing(f)
script on |λ|(a, b) tell mReturn(f) set fa to |λ|(a) set fb to |λ|(b) if fa < fb then -1 else if fa > fb then 1 else 0 end if end tell end |λ| end script
end comparing
-- concat :: [String] -> String
on concat(xs)
script go on |λ|(a, x) a & x end |λ| end script foldl(go, "", xs)
end concat
-- concatMap :: (a -> [b]) -> [a] -> [b]
on concatMap(f, xs)
set lng to length of xs set acc to {} tell mReturn(f) repeat with i from 1 to lng set acc to acc & (|λ|(item i of xs, i, xs)) end repeat end tell return acc
end concatMap
-- foldl :: (a -> b -> a) -> a -> [b] -> a
on foldl(f, startValue, xs)
tell mReturn(f) set v to startValue set lng to length of xs repeat with i from 1 to lng set v to |λ|(v, item i of xs, i, xs) end repeat return v end tell
end foldl
-- inits :: String -> [String]
on inits(xs)
script charInit on |λ|(_, i, xs) text 1 thru i of xs end |λ| end script {""} & map(charInit, xs)
end inits
-- intersect :: (Eq a) => [a] -> [a] -> [a]
on intersect(xs, ys)
if length of xs < length of ys then set {shorter, longer} to {xs, ys} else set {longer, shorter} to {xs, ys} end if if shorter ≠ {} then set lst to {} repeat with x in shorter if longer contains x then set end of lst to contents of x end repeat lst else {} end if
end intersect
-- length :: [a] -> Int
on |length|(xs)
set c to class of xs if list is c or string is c then length of xs else (2 ^ 29 - 1) -- (maxInt - simple proxy for non-finite) end if
end |length|
-- maximumBy :: (a -> a -> Ordering) -> [a] -> a
on maximumBy(f, xs)
set cmp to mReturn(f) script max on |λ|(a, b) if a is missing value or cmp's |λ|(a, b) < 0 then b else a end if end |λ| end script foldl(max, missing value, xs)
end maximumBy
-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
-- 2nd class handler function lifted into 1st class script wrapper. if script is class of f then f else script property |λ| : f end script end if
end mReturn
-- map :: (a -> b) -> [a] -> [b]
on map(f, xs)
-- The list obtained by applying f -- to each element of xs. tell mReturn(f) set lng to length of xs set lst to {} repeat with i from 1 to lng set end of lst to |λ|(item i of xs, i, xs) end repeat return lst end tell
end map
-- tails :: String -> [String]
on tails(xs)
set es to characters of xs script residue on |λ|(_, i) items i thru -1 of es end |λ| end script map(residue, es) & {""}
end tails</lang>
- Output:
"test"
Arturo
<lang rebol>lcs: function [a,b][
lengths: map 0..size a => [map 0..size b => 0] greatestLength: 0 result: "" loop.with:'i a 'x [ loop.with:'j b 'y [ if x=y [ if? or? i=0 j=0 -> lengths\[i]\[j]: 0 else -> lengths\[i]\[j]: 1 + lengths\[i-1]\[j-1]
if greatestLength < lengths\[i]\[j] [ greatestLength: lengths\[i]\[j] result: slice a (i-greatestLength)+1 i ] ] ] ] return result
]
print lcs "thisisatest", "testing123testing"</lang>
- Output:
test
AutoHotkey
Using Text Comparison
<lang AutoHotkey>LCS(a, b){ x := i := 1 while StrLen(x) Loop % StrLen(a) IfInString, b, % x := SubStr(a, i:=StrLen(x)=1 ? i+1 : i, n:=StrLen(a)+1-A_Index) res := StrLen(res) > StrLen(x) ? res : x return res }</lang> Examples:<lang AutoHotkey>MsgBox % LCS("thisisatest", "testing123testing")</lang>
Outputs:
test
Using RegEx
<lang AutoHotkey>LCS(a, b){ while pos := RegExMatch(a "`n" b, "(.+)(?=.*\R.*\1)", m, pos?pos+StrLen(m):1) res := StrLen(res) > StrLen(m1) ? res : m1 return res }</lang> Examples:<lang AutoHotkey>MsgBox % LCS("thisisatest", "testing123testing")</lang>
Outputs:
test
BaCon
<lang bacon>FUNCTION Common_Sub$(haystack$, needle$)
WHILE LEN(needle$) FOR x = LEN(needle$) DOWNTO 1 IF INSTR(haystack$, LEFT$(needle$, x)) THEN RETURN LEFT$(needle$, x) NEXT needle$ = MID$(needle$, 2) WEND EXIT
ENDFUNC
PRINT Common_Sub$("thisisatest", "testing123testing")</lang>
- Output:
test
BASIC256
<lang BASIC256>function LCS(a, b) if length(a) = 0 or length(b) = 0 then return "" while length(b) for j = length(b) to 1 step -1 if instr(a, left(b, j)) then return left(b, j) next j b = mid$(b, 2) end while end function
print LCS("thisisatest", "testing123testing") end</lang>
Bracmat
<lang Bracmat> ( lcs
= X a b x L . !arg:(?a.?b) & 0:?L & :?X & ( @( !a : ? ( ?x & @(!x:? [>!L) & @(!b:? !x ?) & @(!x:? [?L:?X) ) (?&~) ) | !X ) )
& out$(lcs$(thisisatest.testing123testing))</lang> Output
test
C
<lang C>#include <stdio.h>
void lcs(const char * const sa, const char * const sb, char ** const beg, char ** const end) {
size_t apos, bpos; ptrdiff_t len;
*beg = 0; *end = 0; len = 0;
for (apos = 0; sa[apos] != 0; ++apos) { for (bpos = 0; sb[bpos] != 0; ++bpos) { if (sa[apos] == sb[bpos]) { len = 1; while (sa[apos + len] != 0 && sb[bpos + len] != 0 && sa[apos + len] == sb[bpos + len]) { len++; } }
if (len > *end - *beg) { *beg = sa + apos; *end = *beg + len; len = 0; } } }
}
int main() {
char *s1 = "thisisatest"; char *s2 = "testing123testing"; char *beg, *end, *it;
lcs(s1, s2, &beg, &end);
for (it = beg; it != end; it++) { putchar(*it); } printf("\n");
return 0;
}</lang>
- Output:
test
C#
Using dynamic programming
<lang csharp>using System;
namespace LongestCommonSubstring {
class Program { static void Main(string[] args) { Console.WriteLine(lcs("thisisatest", "testing123testing")); Console.ReadKey(true); }
public static string lcs(string a, string b) { var lengths = new int[a.Length, b.Length]; int greatestLength = 0; string output = ""; for (int i = 0; i < a.Length; i++) { for (int j = 0; j < b.Length; j++) { if (a[i] == b[j]) { lengths[i, j] = i == 0 || j == 0 ? 1 : lengths[i - 1, j - 1] + 1; if (lengths[i, j] > greatestLength) { greatestLength = lengths[i, j]; output = a.Substring(i - greatestLength + 1, greatestLength); } } else { lengths[i, j] = 0; } } } return output; } }
}</lang>
- Output:
test
Searching for smaller substrings of a in b
<lang csharp>//C# program tests the LCSUBSTR (Longest Common Substring) subroutine. using System; namespace LongestCommonSubstring {
class Program { static void Main(string[] args) { string a = args.Length >= 1 ? args[0] : ""; /*get two arguments (strings). */ string b = args.Length == 2 ? args[1] : ""; if (a == "") a = "thisisatest"; /*use this string for a default. */ if (b == "") b = "testing123testing"; /* " " " " " " */ Console.WriteLine("string A = {0}", a); /*echo string A to screen. */ Console.WriteLine("string B = {0}", b); /*echo string B to screen. */ Console.WriteLine("LCsubstr = {0}", LCsubstr(a, b)); /*tell Longest Common Substring. */ Console.ReadKey(true); } /*stick a fork in it, we're done.*/
/*─────────────────────────────────LCSUBSTR subroutine─────────────────────────────────*/ public static string LCsubstr(string x, string y) /*Longest Common Substring. */ { string output = ""; int lenx = x.Length; /*shortcut for using the X length*/ for (int j = 0; j < lenx; j++) /*step through start points in X.*/ { for (int k = lenx - j; k > -1; k--) /*step through string lengths. */ { string common = x.Substring(j, k); /*extract a common substring. */ if (y.IndexOf(common) > -1 && common.Length > output.Length) output = common; /*longest?*/ } /*k*/ } /*j*/ return output; /*$ is "" if no common string. */ } }
}</lang> output when using the default inputs:
string A = thisisatest string B = testing123testing LCsubstr = test
Searching for smaller substrings of a in b (simplified)
<lang csharp>//C# program tests the LCS (Longest Common Substring) subroutine. using System; namespace LongestCommonSubstring {
class Program { static void Main(string[] args) { string a = args.Length >= 1 ? args[0] : ""; /*get two arguments (strings). */ string b = args.Length == 2 ? args[1] : ""; if (a == "") a = "thisisatest"; /*use this string for a default. */ if (b == "") b = "testing123testing"; /* " " " " " " */ Console.WriteLine("string A = {0}", a); /*echo string A to screen. */ Console.WriteLine("string B = {0}", b); /*echo string B to screen. */ Console.WriteLine("LCS = {0}", lcs(a, b)); /*tell Longest Common Substring. */ Console.ReadKey(true); } /*stick a fork in it, we're done.*/
/*─────────────────────────────────LCS subroutine─────────────────────────────────*/ private static string lcs(string a, string b) { if(b.Length<a.Length){ string t=a; a=b; b=t; } for (int n = a.Length; n > 0; n--) { for (int m = a.Length-n; m <= a.Length-n; m++) { string s=a.Substring(m,n); if(b.Contains(s)) return(s); } } return ""; } }
</lang> output when using the default inputs:
string A = thisisatest string B = testing123testing LCS = test
C++
<lang cpp>#include <string>
- include <algorithm>
- include <iostream>
- include <set>
- include <vector>
auto collectSubStrings( const std::string& s, int maxSubLength ) {
int l = s.length(); auto res = std::set<std::string>(); for ( int start = 0; start < l; start++ ) { int m = std::min( maxSubLength, l - start + 1 ); for ( int length = 1; length < m; length++ ) { res.insert( s.substr( start, length ) ); } } return res;
}
std::string lcs( const std::string& s0, const std::string& s1 ) {
// collect substring set auto maxSubLength = std::min( s0.length(), s1.length() ); auto set0 = collectSubStrings( s0, maxSubLength ); auto set1 = collectSubStrings( s1, maxSubLength );
// get commons into a vector auto common = std::vector<std::string>(); std::set_intersection( set0.begin(), set0.end(), set1.begin(), set1.end(), std::back_inserter( common ) );
// get the longest one std::nth_element( common.begin(), common.begin(), common.end(), []( const std::string& s1, const std::string& s2 ) { return s1.length() > s2.length(); } ); return *common.begin();
}
int main( int argc, char* argv[] ) {
auto s1 = std::string( "thisisatest" ); auto s2 = std::string( "testing123testing" ); std::cout << "The longest common substring of " << s1 << " and " << s2 << " is:\n"; std::cout << "\"" << lcs( s1, s2 ) << "\" !\n"; return 0;
}</lang>
- Output:
The longest common substring of thisisatest and testing123testing is: "test" !
Common Lisp
<lang lisp> (defun longest-common-substring (a b)
"Return the longest substring common to a and b" ;; Found at https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Common_Lisp (let ((L (make-array (list (length a) (length b)) :initial-element 0)) (z 0) (result '()) ) (dotimes (i (length a)) (dotimes (j (length b)) (when (char= (char a i) (char b j)) (setf (aref L i j) (if (or (zerop i) (zerop j)) 1 (1+ (aref L (1- i) (1- j))) )) (when (> (aref L i j) z) (setf z (aref L i j) result '() )) (when (= (aref L i j) z) (pushnew (subseq a (1+ (- i z)) (1+ i)) result :test #'equal ))))) result ))
</lang>
- Output:
(longest-common-substring "thisisatest" "testing123testing") => ("test")
D
<lang d>import std.stdio;
string lcs(string a, string b) {
int[][] lengths; lengths.length = a.length; for (int i=0; i<a.length; i++) { lengths[i].length = b.length; }
int greatestLength; string output; for (int i=0; i<a.length; i++) { for (int j=0; j<b.length; j++) { if (a[i]==b[j]) { lengths[i][j] = i==0 || j==0 ? 1 : lengths[i-1][j-1] + 1; if (lengths[i][j] > greatestLength) { greatestLength = lengths[i][j]; int start = i-greatestLength+1; output = a[start..start+greatestLength]; } } else { lengths[i][j] = 0; } } } return output;
}
void main() {
writeln(lcs("testing123testing", "thisisatest"));
}</lang>
- Output:
test
Delphi
<lang Delphi> program Longest_Common_Substring;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
function lcs(x, y: string): string; var
n, m, Alength: Integer; t, common: string; j: Integer; k: Integer;
begin
Result := ; Alength := x.Length;
for j := 0 to Alength - 1 do for k := Alength - j downto 0 do begin common := x.Substring(j, k); if (y.IndexOf(common) > -1) and (common.Length > Result.Length) then Result := common; end;
end;
var
a, b: string;
begin
a := 'thisisatest'; b := 'testing123testing'; if ParamCount = 2 then begin if not ParamStr(1).IsEmpty then a := ParamStr(1); if not ParamStr(2).IsEmpty then b := ParamStr(2); end;
Writeln('string A = ', a); Writeln('string B = ', b); Writeln('LCsubstr = ', lcs(a, b)); readln;
end. </lang>
- Output:
string A = thisisatest123 string B = testing123testing LCsubstr = test
Dyalect
<lang dyalect>func lComSubStr(w1, w2) {
var (len, end) = (0, 0) var mat = Array.Empty(w1.Length() + 1, () => Array.Empty(w2.Length() + 1, 0)) var (i, j) = (0, 0) for sLett in w1 { for tLett in w2 { if tLett == sLett { let curLen = mat[i][j] + 1 mat[i + 1][j + 1] = curLen if curLen > len { len = curLen end = i } } j += 1 } j = 0 i += 1 } String(values: w1).Substring((end + 1) - len, len)
}
func comSubStr(w1, w2) {
return String(lComSubStr(w1.Iterate().ToArray(), w2.Iterate().ToArray()))
}
comSubStr("thisisatest", "testing123testing") // "test"</lang>
Elixir
<lang elixir>defmodule LCS do
def longest_common_substring(a,b) do alist = to_charlist(a) |> Enum.with_index blist = to_charlist(b) |> Enum.with_index lengths = for i <- 0..length(alist)-1, j <- 0..length(blist), into: %{}, do: {{i,j},0} Enum.reduce(alist, {lengths,0,""}, fn {x,i},acc -> Enum.reduce(blist, acc, fn {y,j},{map,gleng,lcs} -> if x==y do len = if i==0 or j==0, do: 1, else: map[{i-1,j-1}]+1 map = %{map | {i,j} => len} if len > gleng, do: {map, len, String.slice(a, i - len + 1, len)}, else: {map, gleng, lcs} else {map, gleng, lcs} end end) end) |> elem(2) end
end
IO.puts LCS.longest_common_substring("thisisatest", "testing123testing")</lang>
- Output:
test
Factor
<lang factor>USING: io sequences.extras ;
"thisisatest" "testing123testing" longest-subseq print</lang>
- Output:
test
Fortran
<lang fortran> program main implicit none
call compare('testing123testingthing', 'thisis', 'thi') call compare('testing', 'sting', 'sting') call compare('thisisatest_stinger', 'testing123testingthing', 'sting') call compare('thisisatest_stinger', 'thisis', 'thisis') call compare('thisisatest', 'testing123testing', 'test') call compare('thisisatest', 'thisisatest', 'thisisatest')
contains subroutine compare(a,b,answer) character(len=*),intent(in) :: a, b, answer character(len=:),allocatable :: a2, match character(len=*),parameter :: g='(*(g0))' integer :: i
a2=a ! should really make a2 the shortest and b the longest match= do i=1,len(a2)-1 call compare_sub(a2,b,match) if(len(a2).lt.len(match))exit a2=a2(:len(a2)-1) enddo write(*,g) merge('(PASSED)','(FAILED)',answer.eq.match), & & ' longest match found: "',match,'"; expected "',answer,'"', & & ' comparing "',a,'" and "',b,'"'
end subroutine subroutine compare_sub(a,b,match) character(len=*),intent(in) :: a, b character(len=:),allocatable :: match integer :: left, foundat, len_a
len_a=len(a) do left=1,len_a foundat=index(b,a(left:)) if(foundat.ne.0.and.len(match).lt.len_a-left+1)then if(len(a(left:)).gt.len(match))then match=a(left:) exit endif endif enddo
end subroutine compare_sub end program main </lang>
- Output:
(PASSED) longest match found: "thi"; expected "thi" comparing "testing123testingthing" and "thisis" (PASSED) longest match found: "sting"; expected "sting" comparing "testing" and "sting" (PASSED) longest match found: "sting"; expected "sting" comparing "thisisatest_stinger" and "testing123testingthing" (PASSED) longest match found: "thisis"; expected "thisis" comparing "thisisatest_stinger" and "thisis" (PASSED) longest match found: "test"; expected "test" comparing "thisisatest" and "testing123testing" (PASSED) longest match found: "thisisatest"; expected "thisisatest" comparing "thisisatest" and "thisisatest"
FreeBASIC
<lang freebasic>Function LCS(a As String, b As String) As String
If Len(a) = 0 Or Len(b) = 0 Then Return "" While Len(b) For j As Integer = Len(b) To 1 Step -1 If Instr(a, Left(b, j)) Then Return Left(b, j) Next j b = Mid(b, 2) Wend
End Function
Print LCS("thisisatest", "testing123testing") Sleep</lang>
Go
<lang go>package main
import "fmt"
func lcs(a, b string) (output string) {
lengths := make([]int, len(a)*len(b)) greatestLength := 0 for i, x := range a { for j, y := range b { if x == y { if i == 0 || j == 0 { lengths[i*len(b)+j] = 1 } else { lengths[i*len(b)+j] = lengths[(i-1)*len(b)+j-1] + 1 } if lengths[i*len(b)+j] > greatestLength { greatestLength = lengths[i*len(b)+j] output = a[i-greatestLength+1 : i+1] } } } } return
}
func main() {
fmt.Println(lcs("thisisatest", "testing123testing"))
}</lang>
- Output:
test
Haskell
<lang Haskell>import Data.Ord (comparing) import Data.List (maximumBy, intersect)
subStrings :: [a] -> a subStrings s =
let intChars = length s in [ take n $ drop i s | i <- [0 .. intChars - 1] , n <- [1 .. intChars - i] ]
longestCommon :: Eq a => [a] -> [a] -> [a] longestCommon a b =
maximumBy (comparing length) (subStrings a `intersect` subStrings b)
main :: IO () main = putStrLn $ longestCommon "testing123testing" "thisisatest"</lang>
- Output:
test
Or, fusing subStrings as tail . inits <=< tails
<lang haskell>import Control.Monad ((<=<)) import Data.List (inits, intersect, maximumBy, tails) import Data.Ord (comparing)
LONGEST COMMON SUBSTRING ---------------
longestCommon :: Eq a => [a] -> [a] -> [a] longestCommon a b =
let pair [x, y] = (x, y) in maximumBy (comparing length) $ (uncurry intersect . pair) $ [tail . inits <=< tails] <*> [a, b]
TEST -------------------------
main :: IO () main =
putStrLn $ longestCommon "testing123testing" "thisisatest"</lang>
- Output:
test
J
This algorithm starts by comparing each character in the one string to each character in the other, and then iterates on this result until it finds the length of the longest common substring. So if Lx is the length of one argument string, Ly is the length of the other argument string, and Lr is the length of the result string, this algorithm uses space on the order of Lx*Ly and time on the order of Lx*Ly*Lr.
In other words: this can be suitable for small problems, but you might want something better if you're comparing gigabyte length strings with high commonality.
<lang J>lcstr=:4 :0
C=. ({.~ 1+$) x=/y M=. >./ (* * * >. * + (_1&|.)@:|:^:2)^:_ C N=. >./ M y {~ (M i. N)-i.-N
)</lang>
Intermedate results:
C shows which characters are in common between the two strings. M marks the length of the longest common substring ending at each position in the right argument N is the length of the longest common substring
Example use:
<lang J> 'thisisatest' lcs 'testing123testing' test</lang>
Java
<lang java>public class LongestCommonSubstring {
public static void main(String[] args) { System.out.println(lcs("testing123testing", "thisisatest")); System.out.println(lcs("test", "thisisatest")); System.out.println(lcs("testing", "sting")); System.out.println(lcs("testing", "thisisasting")); }
static String lcs(String a, String b) { if (a.length() > b.length()) return lcs(b, a);
String res = ""; for (int ai = 0; ai < a.length(); ai++) { for (int len = a.length() - ai; len > 0; len--) {
for (int bi = 0; bi <= b.length() - len; bi++) {
if (a.regionMatches(ai, b, bi, len) && len > res.length()) { res = a.substring(ai, ai + len); } } } } return res; }
}</lang>
test
test
sting
sting
JavaScript
<lang javascript>(() => {
'use strict';
// longestCommon :: String -> String -> String const longestCommon = (s1, s2) => maximumBy( comparing(length), intersect(...apList( [s => map( concat, concatMap(tails, compose(tail, inits)(s)) )], [s1, s2] )) );
// main :: IO () const main = () => console.log( longestCommon( "testing123testing", "thisisatest" ) );
// GENERIC FUNCTIONS ----------------------------
// Each member of a list of functions applied to each // of a list of arguments, deriving a list of new values.
// apList (<*>) :: [(a -> b)] -> [a] -> [b] const apList = (fs, xs) => // fs.reduce((a, f) => a.concat( xs.reduce((a, x) => a.concat([f(x)]), []) ), []);
// comparing :: (a -> b) -> (a -> a -> Ordering) const comparing = f => (x, y) => { const a = f(x), b = f(y); return a < b ? -1 : (a > b ? 1 : 0); };
// compose (<<<) :: (b -> c) -> (a -> b) -> a -> c const compose = (f, g) => x => f(g(x));
// concat :: a -> [a] // concat :: [String] -> String const concat = xs => 0 < xs.length ? (() => { const unit = 'string' !== typeof xs[0] ? ( [] ) : ; return unit.concat.apply(unit, xs); })() : [];
// concatMap :: (a -> [b]) -> [a] -> [b] const concatMap = (f, xs) => xs.reduce((a, x) => a.concat(f(x)), []);
// inits([1, 2, 3]) -> [[], [1], [1, 2], [1, 2, 3] // inits('abc') -> ["", "a", "ab", "abc"]
// inits :: [a] -> a // inits :: String -> [String] const inits = xs => [ [] ] .concat(('string' === typeof xs ? xs.split() : xs) .map((_, i, lst) => lst.slice(0, i + 1)));
// intersect :: (Eq a) => [a] -> [a] -> [a] const intersect = (xs, ys) => xs.filter(x => -1 !== ys.indexOf(x));
// Returns Infinity over objects without finite length. // This enables zip and zipWith to choose the shorter // argument when one is non-finite, like cycle, repeat etc
// length :: [a] -> Int const length = xs => (Array.isArray(xs) || 'string' === typeof xs) ? ( xs.length ) : Infinity;
// map :: (a -> b) -> [a] -> [b] const map = (f, xs) => xs.map(f);
// maximumBy :: (a -> a -> Ordering) -> [a] -> a const maximumBy = (f, xs) => 0 < xs.length ? ( xs.slice(1) .reduce((a, x) => 0 < f(x, a) ? x : a, xs[0]) ) : undefined;
// tail :: [a] -> [a] const tail = xs => 0 < xs.length ? xs.slice(1) : [];
// tails :: [a] -> a const tails = xs => { const es = ('string' === typeof xs) ? ( xs.split() ) : xs; return es.map((_, i) => es.slice(i)) .concat([ [] ]); };
// MAIN --- return main();
})();</lang>
- Output:
test
jq
Utility functions: <lang jq># Create an m x n matrix def matrix(m; n; init):
if m == 0 then [] elif m == 1 then [range(0;n) | init] elif m > 0 then matrix(1;n;init) as $row | [range(0;m) | $row ] else error("matrix\(m);_;_) invalid") end;
def set(i;j; value):
setpath([i,j]; value);</lang>
Longest Common Substring: <lang jq>def lcs(a; b):
matrix(a|length; b|length; 0) as $lengths # state: [ $lengths, greatestLength, answer ] | [$lengths, 0] | reduce range(0; a|length) as $i (.; reduce range(0; b|length) as $j (.; if a[$i:$i+1] == b[$j:$j+1] then (if $i == 0 or $j == 0 then 1 else .[0][$i-1][$j-1] + 1 end) as $x | .[0] |= set($i; $j; $x) | if $x > .[1] then .[1] = $x | .[2] = a[1+$i - $x : 1+$i] # output else . end else . end )) | .[2];</lang>
Example: <lang jq>lcs("thisisatest"; "testing123testing")</lang>
- Output:
<lang sh>$ jq -n -f Longest_common_substring.jq "test"</lang>
Julia
<lang julia>function lcs(s1::AbstractString, s2::AbstractString)
l, r = 1, 0 sub_len = 0 for i in 1:length(s1) for j in i:length(s1) if !contains(s2, SubString(s1, i, j)) break elseif sub_len < j - i l, r = i, j sub_len = j - i end end end s1[l:r]
end
@show lcs("thisisatest", "testing123testing")</lang>
Kotlin
<lang scala>// version 1.1.2
fun lcs(a: String, b: String): String {
if (a.length > b.length) return lcs(b, a) var res = "" for (ai in 0 until a.length) { for (len in a.length - ai downTo 1) { for (bi in 0 until b.length - len) { if (a.regionMatches(ai, b, bi,len) && len > res.length) { res = a.substring(ai, ai + len) } } } } return res
}
fun main(args: Array<String>) = println(lcs("testing123testing", "thisisatest"))</lang>
- Output:
test
Lobster
<lang Lobster>import std def lcs(a, b) -> string:
var out = "" let lengths = map(a.length * b.length): 0 var greatestLength = 0 for(a) x, i: for(b) y, j: if x == y: if i == 0 or j == 0: lengths[i * b.length + j] = 1 else: lengths[i * b.length + j] = lengths[(i-1) * b.length + j - 1] + 1 if lengths[i * b.length + j] > greatestLength: greatestLength = lengths[i * b.length + j] out = a.substring(i - greatestLength + 1, greatestLength) return out</lang>
<lang Lobster>import std def lcs2(a, b) -> string:
var out = "" let lengths = map(b.length): map(a.length): 0 var greatestLength = 0 for(a) x, i: for(b) y, j: if x == y: if i == 0 or j == 0: lengths[j][i] = 1 else: lengths[j][i] = lengths[j-1][i-1] + 1 if lengths[j][i] > greatestLength: greatestLength = lengths[j][i] out = a.substring(i - greatestLength + 1, greatestLength) return out</lang>
Maple
StringTools:-LongestCommonSubString()
returns the longest common substring of two strings.
StringTools:-CommonSubSequence()
returns the longest common subsequence() of two strings.
<lang Maple>StringTools:-LongestCommonSubString("thisisatest","testing123testing");</lang>
Mathematica/Wolfram Language
The function LongestCommonSubsequence
returns the longest common substring, and LongestCommonSequence
returns the longest common subsequence.
<lang Mathematica>Print[LongestCommonSubsequence["thisisatest", "testing123testing"]];</lang>
- Output:
test
Modula-2
<lang Modula2>MODULE LCS; FROM FormatString IMPORT FormatString; FROM Terminal IMPORT WriteString,WriteLn,Write,ReadChar;
PROCEDURE WriteSubstring(s : ARRAY OF CHAR; b,e : CARDINAL); VAR i : CARDINAL; BEGIN
IF b=e THEN RETURN END; IF e>HIGH(s) THEN e := HIGH(s) END;
FOR i:=b TO e DO Write(s[i]) END
END WriteSubstring;
TYPE
Pair = RECORD a,b : CARDINAL; END;
PROCEDURE lcs(sa,sb : ARRAY OF CHAR) : Pair; VAR
output : Pair; a,b,len : CARDINAL;
BEGIN
output := Pair{0,0};
FOR a:=0 TO HIGH(sa) DO FOR b:=0 TO HIGH(sb) DO IF (sa[a]#0C) AND (sb[b]#0C) AND (sa[a]=sb[b]) THEN len := 1; WHILE (a+len<HIGH(sa)) AND (b+len<HIGH(sb)) DO IF sa[a+len] = sb[b+len] THEN INC(len) ELSE BREAK END END; DEC(len);
IF len>output.b-output.a THEN output := Pair{a,a+len} END END END END;
RETURN output
END lcs;
VAR res : Pair; BEGIN
res := lcs("testing123testing", "thisisatest"); WriteSubstring("testing123testing", res.a, res.b); WriteLn;
ReadChar
END LCS.</lang>
Nim
<lang Nim># Longest common substring.
import sequtils
func lcs(a, b: string): string =
var lengths = newSeqWith(a.len, newSeq[int](b.len)) var greatestLength = 0 for i, x in a: for j, y in b: if x == y: lengths[i][j] = if i == 0 or j == 0: 1 else: lengths[i - 1][j - 1] + 1 if lengths[i][j] > greatestLength: greatestLength = lengths[i][j] result = a[(i - greatestLength + 1)..i]
echo lcs("thisisatest", "testing123testing")</lang>
- Output:
test
Pascal
using FreePascal
<lang Pascal> PROGRAM LongestCommonSubString.pas;
{$DEFINE DEBUGGING} {$IFDEF FPC}
{$mode objfpc}{$H+}{$J-}{R+}
{$ELSE}
{$APPTYPE CONSOLE}
{$ENDIF}
(*)
Free Pascal Compiler version 3.2.0 [2020/06/14] for x86_64 The free and readable alternative at C/C++ speeds compiles natively to almost any platform, including raspberry PI * Can run independently from DELPHI / Lazarus
(*)
FUNCTION lcss( S1, S2: string ) : string ; (*) LongestCommonSubString: plain version without extra libraries From https://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Longest_common_substring translated from java (*)
VAR i: integer = 0; j: integer = 0; x: integer = 0; LenS1: integer = 0; LenS2: integer = 0; Start: integer = 0; Max: integer = 0; BEGIN LenS1 := length ( S1 ) - 1 ; LenS2 := length ( S2 ) - 1 ; FOR i := 0 TO LenS1 DO BEGIN
FOR j := 0 TO LenS2 DO BEGIN
x := 0;
WHILE ( S1 [ i + x ] = S2 [ j + x ] ) DO
BEGIN Inc ( x ) ; IF ( ( ( i + x ) > LenS1 ) or ( ( j + x ) > LenS2 ) ) THEN BREAK ; END ;
IF ( x > Max ) THEN
BEGIN Max := x ; Start := i ; END ; END ; END ;
lcss := copy ( S1, Start, ( Start + Max ) ) ;
END ;
VAR
S1: string ; S2: string ;
BEGIN
S1 := 'thisisatest' ; S2 := 'testing123testing' ; WriteLn( Lcss ( S1, S2 ) ) ;
END.
</lang>JPD 2021/06/18 Output:
test
using FreePascal v.2
<lang Pascal> PROGRAM LongestCommonSubString.pas;
{$IFDEF FPC}
{$mode objfpc}{$H+}{$J-}{R+}
{$ELSE}
{$APPTYPE CONSOLE}
{$ENDIF}
(*)
Free Pascal Compiler version 3.2.0 [2020/06/14] for x86_64 The free and readable alternative at C/C++ speeds compiles natively to almost any platform, including raspberry PI * Can run independently from DELPHI / Lazarus
(*)
USES
SysUtils;
FUNCTION lcss( S1, S2: string ) : string ; (*) LongestCommonSubString: adaptation from Delphi
(*) VAR j: Integer = 0 ; k: Integer = 0 ; lenS1: integer = 0 ; S: string = ; BEGIN Result := S ; lenS1 := S1.Length ; FOR j := 0 TO lenS1 - 1 DO BEGIN FOR k := lenS1 - j DOWNTO 0 DO BEGIN S := S1.Substring ( j, k ) ; IF ( S2.IndexOf ( S ) > -1 ) AND ( S.Length > Result.Length ) THEN Result := S; END; END; END;
VAR
S1: string ; S2: string ;
BEGIN
S1 := 'thisisatest' ; S2 := 'testing123testing' ; IF ParamCount = 2 THEN BEGIN IF NOT ParamStr(1).IsEmpty THEN S1 := ParamStr(1); IF NOT ParamStr(2).IsEmpty THEN S2 := ParamStr(2); END; Writeln ('string A = ', S1) ; Writeln ('string B = ', S2) ; WriteLn ( Lcss ( S1, S2 ) ) ;
END. </lang>JPD 2021/06/18 Output:
string A = thisisatest
string B = testing123testing
test
Perl
<lang Perl>#!/usr/bin/perl use strict ; use warnings ;
sub longestCommonSubstr {
my $first = shift ; my $second = shift ; my %firstsubs = findSubstrings ( $first ); my %secondsubs = findSubstrings ( $second ) ; my @commonsubs ; foreach my $subst ( keys %firstsubs ) { if ( exists $secondsubs{ $subst } ) {
push ( @commonsubs , $subst ) ;
} } my @sorted = sort { length $b <=> length $a } @commonsubs ; return $sorted[0] ;
}
sub findSubstrings {
my $string = shift ; my %substrings ; my $l = length $string ; for ( my $start = 0 ; $start < $l ; $start++ ) { for ( my $howmany = 1 ; $howmany < $l - $start + 1 ; $howmany++) {
$substrings{substr( $string , $start , $howmany) } = 1 ;
} } return %substrings ;
}
my $longest = longestCommonSubstr( "thisisatest" ,"testing123testing" ) ; print "The longest common substring of <thisisatest> and <testing123testing> is $longest !\n" ; </lang>
- Output:
The longest common substring of <thisisatest> and <testing123testing> is test !
Alternate letting regex do the work
<lang perl>#!/usr/bin/perl
use strict; # https://rosettacode.org/wiki/Longest_Common_Substring use warnings;
my $one = 'thisisatest'; my $two = 'testing123testing';
my @best; "$one\n$two" =~ /(.+).*\n.*\1(?{ $best[length $1]{$1}++})(*FAIL)/; print "$_\n" for sort keys %{ $best[-1] };</lang>
- Output:
test
Phix
function lcs(string a, b) integer longest = 0 string best = "" for i=1 to length(a) do integer ch = a[i] for j=1 to length(b) do if ch=b[j] then integer n=1 while i+n<=length(a) and j+n<=length(b) and a[i+n]=b[j+n] do n += 1 end while if n>longest then longest = n best = a[i..i+n-1] end if end if end for end for return best end function ?lcs("thisisatest", "testing123testing") ?lcs("testing123testing","thisisatest")
- Output:
"test" "test"
PicoLisp
<lang PicoLisp>(de longestCommonSubstring (Str1 Str2)
(setq Str1 (chop Str1) Str2 (chop Str2)) (let Res NIL (map '((Lst1) (map '((Lst2) (let Len 0 (find '((A B) (nand (= A B) (inc 'Len))) Lst1 Lst2 ) (when (> Len (length Res)) (setq Res (head Len Lst1)) ) ) ) Str2 ) ) Str1 ) (pack Res) ) )</lang>
Test: <lang PicoLisp>: (longestCommonSubstring "thisisatest" "testing123testing") -> "test"</lang>
PowerShell
<lang PowerShell>function lcs([String]$a, [String]$b) {
if ([String]::IsNullOrEmpty($a) -or [String]::IsNullOrEmpty($b)) { return "" } $startIndex, $size = -1, -1 for ($k = 0; $k -lt $a.Length; ++$k) { for ($i, $j, $d = $k, 0, 0; ($i -lt $a.Length) -and ($j -lt $b.Length); ++$i, ++$j) { if ($a.Chars($i) -eq $b.Chars($j)) { $d += 1 if ($size -lt $d) { $startIndex = $i - $d + 1 $size = $d } } else { $d = 0 } } } for ($k = 1; $k -lt $b.Length; ++$k) { for ($i, $j, $d = 0, $k, 0; ($i -lt $a.Length) -and ($j -lt $b.Length); ++$i, ++$j) { if ($a.Chars($i) -eq $b.Chars($j)) { $d += 1 if ($size -lt $d) { $startIndex = $i - $d + 1 $size = $d } } else { $d = 0 } } } if ($size -lt 0) { return "" } return $a.Substring($startIndex, $size)
}
function Print-Lcs([String]$a, [String]$b) {
return "lcs $a $b = $(lcs $a $b)"
} Print-Lcs 'thisisatest' 'testing123testing' Print-Lcs 'testing' 'sting' Print-Lcs 'thisisatest_stinger' 'testing123testingthing' Print-Lcs 'thisisatest_stinger' 'thisis' Print-Lcs 'testing123testingthing' 'thisis' Print-Lcs 'thisisatest' 'thisisatest'</lang>
- Output:
lcs thisisatest testing123testing = test lcs testing sting = sting lcs thisisatest_stinger testing123testingthing = sting lcs thisisatest_stinger thisis = thisis lcs testing123testingthing thisis = thi lcs thisisatest thisisatest = thisisatest
Prolog
<lang Prolog>common_sublist(A, B, M) :- append(_, Ma, A), append(M, _, Ma), append(_, Mb, B), append(M, _, Mb).
longest_list([], L, _, L). longest_list([L|Ls], LongestList, LongestLength, Result) :- length(L, Len), Len >= LongestLength -> longest_list(Ls, L, Len, Result) ; longest_list(Ls, LongestList, LongestLength, Result).
longest_substring(A, B, Result) :- string_chars(A, AChars), string_chars(B, BChars), findall(SubString, ( dif(SubString, []), common_sublist(AChars, BChars, SubString) ), AllSubstrings), longest_list(AllSubstrings, [], 0, LongestSubString), string_chars(Result, LongestSubString).</lang>
- Output:
?- longest_substring("thisisatest", "testing123testing", Longest). Longest = "test".
PureBasic
<lang PureBasic>Procedure.s lcs(a$,b$)
If Len(a$)>Len(b$) : Swap a$,b$ : EndIf l=Len(a$) For c=1 To l For i=1 To 1+l-c If FindString(b$,Mid(a$,i,c)) res$=Mid(a$,i,c) EndIf Next Next ProcedureReturn res$
EndProcedure
t1$="testing123testing" t2$="thisisatest" Debug lcs(t1$,t2$)</lang>
- Output:
test
Python
Python: Idiomatic
Python: Using Indexes
<lang python>s1 = "thisisatest" s2 = "testing123testing" len1, len2 = len(s1), len(s2) ir, jr = 0, -1 for i1 in range(len1):
i2 = s2.find(s1[i1]) while i2 >= 0: j1, j2 = i1, i2 while j1 < len1 and j2 < len2 and s2[j2] == s1[j1]: if j1-i1 >= jr-ir: ir, jr = i1, j1 j1 += 1; j2 += 1 i2 = s2.find(s1[i1], i2+1)
print (s1[ir:jr+1])</lang>
- Output:
"test"
Python: Set of substrings
From my explanatory blog post. <lang python>def _set_of_substrings(s:str) -> set:
"_set_of_substrings('ABBA') == {'A', 'AB', 'ABB', 'ABBA', 'B', 'BA', 'BB', 'BBA'}" len_s = len(s) return {s[m: n] for m in range(len_s) for n in range(m+1, len_s +1)}
def _set_of_common_substrings(s:str, common: set) -> set:
"Substrings of s that are also in common" len_s = len(s) return {this for m in range(len_s) for n in range(m+1, len_s +1) if (this := s[m:n]) in common}
def lcs_ss(*strings):
"longest of the common substrings of all substrings of each string" strings_iter = (s for s in strings) common = _set_of_substrings(next(strings_iter)) # First string substrings for s in strings_iter: if not common: break common = _set_of_common_substrings(s, common) # Accumulate the common
return max(common, key= lambda x: len(x)) if common else
s0 = "thisisatest_stinger"
s1 = "testing123testingthing"
s2 = "thisis"
ans = lcs_ss(s0, s1) print(f"\n{repr(s0)}, {repr(s1)} ->> {repr(ans)}") ans = lcs_ss(s0, s2) print(f"\n{repr(s0)}, {repr(s2)} ->> {repr(ans)}") ans = lcs_ss(s1, s2) print(f"\n{repr(s1)}, {repr(s2)} ->> {repr(ans)}") ans = lcs_ss(s0, s1, s2) print(f"\n{repr(s0)}, {repr(s1)}, {repr(s2)} ->> {repr(ans)}") </lang>
- Output:
'thisisatest_stinger', 'testing123testingthing' ->> 'sting' 'thisisatest_stinger', 'thisis' ->> 'thisis' 'testing123testingthing', 'thisis' ->> 'thi' 'thisisatest_stinger', 'testing123testingthing', 'thisis' ->> 'thi'
Functional
Expressed as a composition of pure generic functions:
<lang python>Longest common substring
from itertools import accumulate, chain from functools import reduce
- longestCommon :: String -> String -> String
def longestCommon(s1):
The longest common substring of two given strings. def go(s2): return max(intersect( *map(lambda s: map( concat, concatMap(tails)( compose(tail, list, inits)(s) ) ), [s1, s2]) ), key=len) return go
- ------------------------- TEST -------------------------
def main():
Test print( longestCommon( "testing123testing" )( "thisisatest" ) )
- ----------------------- GENERIC ------------------------
- compose :: ((a -> a), ...) -> (a -> a)
def compose(*fs):
Composition, from right to left, of a series of functions. def go(f, g): def fg(x): return f(g(x)) return fg return reduce(go, fs, lambda x: x)
- concat :: [String] -> String
def concat(xs):
The concatenation of all the elements in a list or iterable. return .join(xs)
- concatMap :: (a -> [b]) -> [a] -> [b]
def concatMap(f):
A concatenated list over which a function has been mapped. The list monad can be derived by using a function f which wraps its output in a list, (using an empty list to represent computational failure). def go(xs): return chain.from_iterable(map(f, xs)) return go
- inits :: [a] -> a
def inits(xs):
all initial segments of xs, shortest first. return accumulate(chain([[]], xs), lambda a, x: a + [x])
- intersect :: [a] -> [a] -> [a]
def intersect(xs, ys):
The ordered intersection of xs and ys. intersect([1,2,2,3,4])([6,4,4,2]) -> [2,2,4] s = set(ys) return (x for x in xs if x in s)
- scanl :: (b -> a -> b) -> b -> [a] -> [b]
def scanl(f):
scanl is like reduce, but defines a succession of intermediate values, building from the left. def go(a): def g(xs): return accumulate(chain([a], xs), f) return g return go
- tail :: [a] -> [a]
- tail :: Gen [a] -> [a]
def tail(xs):
The elements following the head of a (non-empty) list. return xs[1:]
- tails :: [a] -> a
def tails(xs):
All final segments of xs, longest first. return [ xs[i:] for i in range(0, 1 + len(xs)) ]
- MAIN ---
main() </lang>
test
Racket
A chance to show off how to use HashTable
types in typed/racket
<lang racket>#lang typed/racket (: lcs (String String -> String)) (define (lcs a b)
(: all-substrings# (String -> (HashTable String Boolean))) (define (all-substrings# str) (define l (string-length str)) (for*/hash : (HashTable String Boolean) ((s (in-range 0 l)) (e (in-range (add1 s) (add1 l)))) (values (substring str s e) #t))) (define a# (all-substrings# a)) (define b# (all-substrings# b)) (define-values (s l) (for/fold : (Values String Nonnegative-Integer) ((s "") (l : Nonnegative-Integer 0)) ((a_ (in-hash-keys a#)) #:when (and (> (string-length a_) l) (hash-ref b# a_ #f))) (values a_ (string-length a_)))) s)
(module+ test
("thisisatest" . lcs . "testing123testing"))</lang>
- Output:
"test"
Raku
(formerly Perl 6) <lang perl6>sub createSubstrings( Str $word --> Array ) {
my $length = $word.chars ; my @substrings ; for (0..$length - 1) -> $start { for (1..$length - $start) -> $howmany {
@substrings.push( $word.substr( $start , $howmany ) ) ;
} } return @substrings ;
}
sub findLongestCommon( Str $first , Str $second --> Str ) {
my @substringsFirst = createSubstrings( $first ) ; my @substringsSecond = createSubstrings( $second ) ; my $firstset = set( @substringsFirst ) ; my $secondset = set( @substringsSecond ) ; my $common = $firstset (&) $secondset ; return $common.keys.sort({$^b.chars <=> $^a.chars})[0] ; # or: sort(-*.chars)[0]
}
sub MAIN( Str $first , Str $second ) {
say "The longest common substring of $first and $second is " ~ "{findLongestCommon( $first , $second ) } !" ;
}</lang>
- Output:
The longest common substring of thisisatest and testing123testing is test !
Functional
<lang perl6>sub substrings ($s) { (flat (0..$_ X 1..$_).grep:{$_ ≥ [+] @_}).map: { $s.substr($^a, $^b) } given $s.chars } sub infix:<LCS>($s1, $s2) { ([∩] ($s1, $s2)».&substrings).keys.sort(*.chars).tail }
my $first = 'thisisatest'; my $second = 'testing123testing'; say "The longest common substring between '$first' and '$second' is '{$first LCS $second}'.";</lang>
- Output:
The longest common substring between 'thisisatest' and 'testing123testing' is 'test'.
REXX
<lang rexx>/*REXX program determines the LCSUBSTR (Longest Common Substring) via a function. */ parse arg a b . /*obtain optional arguments from the CL*/ if a== then a= "thisisatest" /*Not specified? Then use the default.*/ if b== then b= "testing123testing" /* " " " " " " */ say ' string A =' a /*echo string A to the terminal screen.*/ say ' string B =' b /* " " B " " " " */ say ' LCsubstr =' LCsubstr(a, b) /*display the Longest Common Substring.*/ exit 0 /*stick a fork in it, we're all done. */ /*──────────────────────────────────────────────────────────────────────────────────────*/ LCsubstr: procedure; parse arg x,y,,$; #= 0 /*LCsubstr: Longest Common Substring. */
L= length(x); w= length(y) /*placeholders for string length of X,Y*/ if w<L then do; parse arg y,x; L= w /*switch X & Y if Y is shorter than X*/ end do j=1 for L while j<=L-# /*step through start points in string X*/ do k=L-j+1 to # by -1 /*step through string lengths. */ _= substr(x, j, k) /*extract a possible common substring. */ if pos(_, y)\==0 then if k># then do; $= _; #= k; end end /*k*/ /* [↑] determine if string _ is longer*/ end /*j*/ /*#: the current length of $ string.*/ return $ /*$: (null if there isn't common str.)*/</lang>
- output when using the default inputs:
string A = thisisatest string B = testing123testing LCsubstr = test
Ring
<lang ring>
- Project : Longest Common Substring
str1 = "testing123testing" str2 = "tsitest" see longest(str1, str2)
func longest(str1, str2) subarr = [] for n=1 to len(str1)
for m=1 to len(str1) sub = substr(str1, n, m) if substr(str2, sub) > 0 add(subarr, sub) ok next
next
temp = 0 for n=1 to len(subarr)
if len(subarr[n]) > temp temp = len(subarr[n]) subend = subarr[n] ok
next see subend + nl </lang> Output:
test
Ruby
<lang ruby>def longest_common_substring(a,b)
lengths = Array.new(a.length){Array.new(b.length, 0)} greatestLength = 0 output = "" a.each_char.with_index do |x,i| b.each_char.with_index do |y,j| next if x != y lengths[i][j] = (i.zero? || j.zero?) ? 1 : lengths[i-1][j-1] + 1 if lengths[i][j] > greatestLength greatestLength = lengths[i][j] output = a[i - greatestLength + 1, greatestLength] end end end output
end
p longest_common_substring("thisisatest", "testing123testing")</lang>
- Output:
"test"
Rust
<lang rust>fn longest_common_substring(s1: &str, s2: &str) -> String {
let s1_chars: Vec<char> = s1.chars().collect(); let s2_chars: Vec<char> = s2.chars().collect(); let mut lcs = "".to_string();
for i in 0..s1_chars.len() { for j in 0..s2_chars.len() { if s1_chars[i] == s2_chars[j] { let mut tmp_lcs = s2_chars[j].to_string(); let mut tmp_i = i + 1; let mut tmp_j = j + 1;
while tmp_i < s1_chars.len() && tmp_j < s2_chars.len() && s1_chars[tmp_i] == s2_chars[tmp_j] { tmp_lcs = format!("{}{}", tmp_lcs, s1_chars[tmp_i]); tmp_i += 1; tmp_j += 1; }
if tmp_lcs.len() > lcs.len() { lcs = tmp_lcs; } } } }
lcs
}
fn main() {
let s1 = "thisisatest"; let s2 = "testing123testing"; let lcs = longest_common_substring(s1, s2); println!("{}", lcs);
}</lang>
- Output:
"test"
Scala
The two algorithms below are Scala optimized versions of the Dynamic Programming algorithm pseudocode solution found on the "Longest Common Substring" Wikipedia page.
For a more in-depth look at the Scala solution space for this problem, please see this StackOverflow answer.
longestCommonSubstringsOptimizedPureFP
This algorithm sticks to "Pure" FP (Functional Programming) in that it uses tail recursion while avoiding any use of mutable collections or vars within the function's implementation.
Explore and experiment withing the online Scala playgrounds that run in your favorite browser: ScalaFiddle (ES a.k.a. JavaScript, non JVM) or Scastie (remote JVM). <lang scala>def longestCommonSubstringsOptimizedPureFP(left: String, right: String): Option[Set[String]] =
if (left.nonEmpty && right.nonEmpty) { val (shorter, longer) = if (left.length < right.length) (left, right) else (right, left)
@scala.annotation.tailrec def recursive( indexLonger: Int = 0, indexShorter: Int = 0, currentLongestLength: Int = 0, lengthsPrior: List[Int] = List.fill(shorter.length)(0), lengths: List[Int] = Nil, accumulator: List[Int] = Nil ): (Int, List[Int]) = if (indexLonger < longer.length) { val length = if (longer(indexLonger) != shorter(indexShorter)) 0 else lengthsPrior.head + 1 val newCurrentLongestLength = if (length > currentLongestLength) length else currentLongestLength val newAccumulator = if ((length < currentLongestLength) || (length == 0)) accumulator else { val entry = indexShorter - length + 1 if (length > currentLongestLength) List(entry) else entry :: accumulator } if (indexShorter < shorter.length - 1) recursive( indexLonger, indexShorter + 1, newCurrentLongestLength, lengthsPrior.tail, length :: lengths, newAccumulator ) else recursive( indexLonger + 1, 0, newCurrentLongestLength, 0 :: lengths.reverse, Nil, newAccumulator ) } else (currentLongestLength, accumulator)
val (length, indexShorters) = recursive() if (indexShorters.nonEmpty) Some( indexShorters .map { indexShorter => shorter.substring(indexShorter, indexShorter + length) } .toSet ) else None } else None
println(longestCommonSubstringsOptimizedPureFP("thisisatest", "testing123testing"))</lang>
- Output:
"Some(Set(test))"
longestCommonSubstringsOptimizedReferentiallyTransparentFP
While this algorithm remains consistent with the FP concept of referential transparency, it does use both a mutable collection and a var within the function's implementation which provide an almost three times performance improvement over the above longestCommonSubstringsOptimizedPureFP implementation.
Explore this visual diff to see the changes between the longestCommonSubstringsOptimizedPureFP (above) and longestCommonSubstringsOptimizedReferentiallyTransparentFP (below) implementations
Explore and experiment withing the online Scala playgrounds that run in your favorite browser: ScalaFiddle (ES a.k.a. JavaScript, non JVM) or Scastie (remote JVM). <lang scala>def longestCommonSubstringsOptimizedReferentiallyTransparentFP(left: String, right: String): Option[Set[String]] =
if (left.nonEmpty && right.nonEmpty) { val (shorter, longer) = if (left.length < right.length) (left, right) else (right, left) val lengths: Array[Int] = new Array(shorter.length) //mutable
@scala.annotation.tailrec def recursive( indexLonger: Int = 0, indexShorter: Int = 0, currentLongestLength: Int = 0, lastIterationLength: Int = 0, accumulator: List[Int] = Nil ): (Int, List[Int]) = if (indexLonger < longer.length) { val length = if (longer(indexLonger) != shorter(indexShorter)) 0 else if (indexShorter == 0) 1 else lastIterationLength + 1 val newLastIterationLength = lengths(indexShorter) lengths(indexShorter) = length //mutation val newCurrentLongestLength = if (length > currentLongestLength) length else currentLongestLength val newAccumulator = if ((length < currentLongestLength) || (length == 0)) accumulator else { val entry = indexShorter - length + 1 if (length > currentLongestLength) List(entry) else entry :: accumulator } if (indexShorter < shorter.length - 1) recursive( indexLonger, indexShorter + 1, newCurrentLongestLength, newLastIterationLength, newAccumulator ) else recursive( indexLonger + 1, 0, newCurrentLongestLength, newLastIterationLength, newAccumulator ) } else (currentLongestLength, accumulator)
val (length, indexShorters) = recursive() if (indexShorters.nonEmpty) Some( indexShorters .map { indexShorter => shorter.substring(indexShorter, indexShorter + length) } .toSet ) else None } else None
println(longestCommonSubstringsOptimizedReferentiallyTransparentFP("thisisatest", "testing123testing"))</lang>
- Output:
"Some(Set(test))"
Sidef
<lang ruby>func createSubstrings(String word) -> Array {
gather { combinations(word.len+1, 2, {|i,j| take(word.substr(i, j-i)) }) }
}
func findLongestCommon(String first, String second) -> String {
createSubstrings(first) & createSubstrings(second) -> max_by { .len }
}
say findLongestCommon("thisisatest", "testing123testing")</lang>
- Output:
test
Swift
<lang swift>func lComSubStr<
S0: Sliceable, S1: Sliceable, T: Equatable where S0.Generator.Element == T, S1.Generator.Element == T, S0.Index.Distance == Int, S1.Index.Distance == Int >(w1: S0, _ w2: S1) -> S0.SubSlice { var (len, end) = (0, 0) let empty = Array(Repeat(count: w2.count + 1, repeatedValue: 0)) var mat: Int = Array(Repeat(count: w1.count + 1, repeatedValue: empty)) for (i, sLett) in w1.enumerate() { for (j, tLett) in w2.enumerate() where tLett == sLett { let curLen = mat[i][j] + 1 mat[i + 1][j + 1] = curLen if curLen > len { len = curLen end = i } } } return w1[advance(w1.startIndex, (end + 1) - len)...advance(w1.startIndex, end)]
}
func lComSubStr(w1: String, _ w2: String) -> String {
return String(lComSubStr(w1.characters, w2.characters))
}</lang>
Output:
<lang swift>lComSubStr("thisisatest", "testing123testing") // "test"</lang>
VBA
<lang vb> Function Longest_common_substring(string1 As String, string2 As String) As String Dim i As Integer, j As Integer, temp As String, result As String
For i = 1 To Len(string1) For j = 1 To Len(string1) temp = Mid(string1, i, j) If InStr(string2, temp) Then If Len(temp) > Len(result) Then result = temp End If Next Next Longest_common_substring = result
End Function
Sub MainLCS()
Debug.Print Longest_common_substring("thisisatest", "testing123testing")
End Sub </lang>
- Output:
Invoke the script calling "MainLCS".
test
VBScript
<lang vb> Function lcs(string1,string2) For i = 1 To Len(string1) tlcs = tlcs & Mid(string1,i,1) If InStr(string2,tlcs) Then If Len(tlcs) > Len(lcs) Then lcs = tlcs End If Else tlcs = "" End If Next End Function
WScript.Echo lcs(WScript.Arguments(0),WScript.Arguments(1)) </lang>
- Output:
Invoke the script from a command prompt.
C:\>cscript.exe /nologo lcs.vbs "thisisatest" "testing123testing" test
Wren
<lang ecmascript>var lcs = Fn.new { |a, b|
var la = a.count var lb = b.count var lengths = List.filled(la * lb, 0) var greatestLength = 0 var output = "" var i = 0 for (x in a) { var j = 0 for (y in b) { if (x == y) { lengths[i*lb + j] = (i == 0 || j == 0) ? 1 : lengths[(i-1)*lb+j-1] + 1 } if (lengths[i*lb+j] > greatestLength) { greatestLength = lengths[i*lb+j] output = a[i-greatestLength+1..i] } j = j + 1 } i = i + 1 } return output
}
System.print(lcs.call("thisisatest", "testing123testing"))</lang>
- Output:
test
Yabasic
<lang Yabasic>sub LCS$(a$, b$)
if len(a$) = 0 or len(b$) = 0 then return "" : endif
while len(b$) for j = len(b$) to 1 step -1 if instr(a$, left$(b$, j)) then return left$(b$, j) : endif next j b$ = mid$(b$, 2) wend end sub
print LCS$("thisisatest", "testing123testing") end</lang>
zkl
<lang zkl>fcn lcd(a,b){
if(b.len()<a.len()){ t:=a; a=b; b=t; } foreach n,m in ([a.len()..1,-1],a.len()-n+1){ s:=a[m,n]; if(b.holds(s)) return(s); } ""
}</lang> <lang zkl>lcd("testing123testing","thisisatest").println();</lang>
- Output:
test
- Programming Tasks
- Solutions by Programming Task
- 11l
- Action!
- Ada
- Aime
- ALGOL 68
- AppleScript
- Arturo
- AutoHotkey
- BaCon
- BASIC256
- Bracmat
- C
- C sharp
- C++
- Common Lisp
- D
- Delphi
- Dyalect
- Elixir
- Factor
- Fortran
- FreeBASIC
- Go
- Haskell
- J
- Java
- JavaScript
- Jq
- Julia
- Kotlin
- Lobster
- Maple
- Mathematica
- Wolfram Language
- Modula-2
- Nim
- Pascal
- Perl
- Phix
- PicoLisp
- PowerShell
- Prolog
- PureBasic
- Python
- Racket
- Raku
- REXX
- Ring
- Ruby
- Rust
- Scala
- Sidef
- Swift
- VBA
- VBScript
- Wren
- Yabasic
- Zkl