Talk:Longest common substring

From Rosetta Code


It seems that this is a duplicate of Longest Common Subsequence?
If not, is it unique enough to co-exist? --Paddy3118 (talk) 13:56, 18 February 2015 (UTC)

It looks like this one only counts consecutive letters and doesn't allow the subsequence to be split. I vote too similar to co-exist. --Mwn3d (talk) 21:13, 18 February 2015 (UTC)
I know it's not a duplicate, because Longest Common Subsequence produces different results. For example, the longest common subsequence between "thisisatest" and "testing123testing" is "tsitest". The longest common substring is just "test". I was going to use my code as an example of dynamic programming in Longest Common Subsequence until I noticed the difference in the algorithms. Longest Common Subsequence skips characters in the middle of a string, whereas Longest Common Substring only considers consecutive characters. For that matter, the two problems also have separate Wikipedia articles, for what it's worth.
All I know is, Longest Common Subsequence didn't solve my problem, but Longest Common Substring does. That's enough reason for me for it to co-exist. --Geoffhacker (talk) 21:17, 18 February 2015 (UTC)
I originally thought this was a dup, and changed my mind. Seems quite different to me, and of much more interest to bioinformaticians than the other algorithm. --TimToady (talk) 06:16, 19 February 2015 (UTC)
It is indeed a problem in a Coursera bioinformatics course that is currently running and solutions are expected to use suffix trees. The strings for that exercise are like 30K in length. I tried the current C# algorithm (coded in Go though) on one of these data sets. It took 15 seconds or so but did produce an answer that passed the grader. —Sonia (talk) 21:05, 19 February 2015 (UTC)

Thanks for pointing out the difference Geoffhacker. Keep those new tasks rollin' ...
--Paddy3118 (talk) 11:51, 19 February 2015 (UTC)

Generalized Suffix Trees

I have added a reference to generalized suffix trees which is a O(n) time method for solving this for n strings.--Nigel Galloway (talk) 13:31, 19 February 2015 (UTC)

Yes, generalised suffix trees would certainly provide a more efficient algorithm for solving this problem. But from what I've seen, they also take longer to code than the time I happen to have available. Still, it'd be cool to see one implemented if anyone did have the time. --Geoffhacker (talk) 19:43, 24 February 2015 (UTC)
I see that you have created the first necessary subtask in that algorithm, building a suffix tree. Good idea, leverage the crowd! --Geoffhacker (talk) 19:53, 24 February 2015 (UTC)