Longest common substring: Difference between revisions

From Rosetta Code
Content added Content deleted
m (→‎{{header|REXX}}: changed a comment.)
(Changing the instructions to map more closely to what people clearly do around here anyway...)
Line 1: Line 1:
{{draft task}}Write a program that prompts the user to enter two different strings, and then output their longest common substring. Note that substrings are consecutive characters within a string. This distinguishes them from subsequences, which is any sequence of characters within a string, even if there are extraneous characters in between them. Hence, the longest common subsequence between "thisisatest" and "testing123testing" is "tsitest", whereas the longest common sub''string'' is just "test".
{{draft task}}Write a function returns the longest common substring of two strings. Use it within a program that demonstrates sample output from the function, which will consist of the longest common substring between "thisisatest" and "testing123testing". Note that substrings are consecutive characters within a string. This distinguishes them from subsequences, which is any sequence of characters within a string, even if there are extraneous characters in between them. Hence, the longest common subsequence between "thisisatest" and "testing123testing" is "tsitest", whereas the longest common sub''string'' is just "test".


References:
References:
Line 13: Line 13:
static void Main(string[] args)
static void Main(string[] args)
{
{
Console.Write("Enter word 1: ");
Console.WriteLine(lcs("thisisatest", "testing123testing"));
string word1 = Console.ReadLine();
Console.Write("Enter word 2: ");
string word2 = Console.ReadLine();
Console.WriteLine(lcs(word1, word2));
Console.ReadKey(true);
Console.ReadKey(true);
}
}


//This algorithm for finding the longest commmon substring uses dynamic programming.
//This algorithm for finding the longest common substring uses dynamic programming.
public static string lcs(string a, string b)
public static string lcs(string a, string b)
{
{
Line 64: Line 60:
}
}
}</lang>
}</lang>
{{out}}
<pre>
test
</pre>


=={{header|Go}}==
=={{header|Go}}==

Revision as of 20:13, 25 February 2015

Longest common substring is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Write a function returns the longest common substring of two strings. Use it within a program that demonstrates sample output from the function, which will consist of the longest common substring between "thisisatest" and "testing123testing". Note that substrings are consecutive characters within a string. This distinguishes them from subsequences, which is any sequence of characters within a string, even if there are extraneous characters in between them. Hence, the longest common subsequence between "thisisatest" and "testing123testing" is "tsitest", whereas the longest common substring is just "test".

References:

C#

<lang csharp>using System; namespace LongestCommonSubstring {

   class Program
   {
       static void Main(string[] args)
       {
           Console.WriteLine(lcs("thisisatest", "testing123testing"));
           Console.ReadKey(true);
       }
       //This algorithm for finding the longest common substring uses dynamic programming.
       public static string lcs(string a, string b)
       {
           var lengths = new int[a.Length, b.Length];
           int greatestLength = 0;
           string output = "";
           //Compare each consecutive character in string a with each consecutive character in string b.
           for (int i = 0; i < a.Length; i++)
           {
               for (int j = 0; j < b.Length; j++)
               {
                   //Keep a running count of the number of consecutive characters in each that are the same.
                   if (a[i] == b[j])
                   {
                       if (i == 0 || j == 0)
                       {
                           lengths[i, j] = 1;
                       }
                       else
                       {
                           lengths[i, j] = lengths[i - 1, j - 1] + 1;
                       }
                       //When you find the greatest count so far, note the substring of which it is. 
                       if (lengths[i, j] > greatestLength)
                       {
                           greatestLength = lengths[i, j];
                           output = a.Substring(i - greatestLength + 1, greatestLength);
                       }
                   }
                   else
                   {
                       lengths[i, j] = 0;
                   }
               }
           }
           //Output the substring you've noted.
           return output;
       }
   }

}</lang>

Output:
test

Go

Translation of: C#

<lang go>package main

import "fmt"

func lcs(a, b string) (output string) {

   lengths := make([]int, len(a)*len(b))
   greatestLength := 0
   for i, x := range a {
       for j, y := range b {
           if x == y {
               if i == 0 || j == 0 {
                   lengths[i*len(b)+j] = 1
               } else {
                   lengths[i*len(b)+j] = lengths[(i-1)*len(b)+j-1] + 1
               }
               if lengths[i*len(b)+j] > greatestLength {
                   greatestLength = lengths[i*len(b)+j]
                   output = a[i-greatestLength+1 : i+1]
               }
           }
       }
   }
   return

}

func main() {

   fmt.Println(lcs("thisisatest", "testing123testing"))

}</lang>

Output:
test

J

This algorithm starts by comparing each character in the one string to each character in the other, and then iterates on this result until it finds the length of the longest common substring. So if Lx is the length of one argument string, Ly is the length of the other argument string, and Lr is the length of the result string, this algorithm uses space on the order of Lx*Ly and time on the order of Lx*Ly*Lr.

In other words: this can be suitable for small problems, but you might want something better if you're comparing gigabyte length strings with high commonality.

<lang J>lcstr=:4 :0

 C=. ({.~ 1+$) x=/y
 M=. >./ (* * * >. * + (_1&|.)@:|:^:2)^:_  C
 N=. >./ M
 y {~ (M i. N)-i.-N

)</lang>

Intermedate results:

   C shows which characters are in common between the two strings.
   M marks the length of the longest common substring ending at each position in the right argument
   N is the length of the longest common substring

Example use:

<lang J> 'thisisatest' lcs 'testing123testing' test</lang>

REXX

<lang rexx>/*REXX program tests the LCSUBSTR (Longest Common Substring) subroutine.*/ parse arg a b . /*get two arguments (strings). */ if a== then a= "thisisatest" /*use this string for a default. */ if b== then b= "testing123testing" /* " " " " " " */ say ' string A =' a /*echo string A to screen. */ say ' string B =' b /*echo string B to screen. */ say ' LCsubstr =' LCsubstr(a,b) /*tell Longest Common Substring. */ exit /*stick a fork in it, we're done.*/ /*──────────────────────────────────LCSUBSTR subroutine─────────────────*/ LCsubstr: procedure; parse arg x,y,$ /*Longest Common Substring. */

  1. =length(x) /*shortcut for using the X length*/
           do   j=1         for #     /*step through start points in X.*/
             do k=#  by -1  for #     /*step through string lengths.   */
             _=strip(substr(x,j,k))   /*extract a common substring.    */
             if pos(_,y)\==0 & length(_)>length($)  then $=_ /*longest?*/
             end   /*j*/              /* [↑]  see if it is the longest.*/
           end     /*k*/

return $ /*$ is null if no common string.*/</lang> output when using the default inputs:

   string A = thisisatest
   string B = testing123testing
   LCsubstr = test