Longest common prefix: Difference between revisions
m (→{{header|J}}) |
No edit summary |
||
Line 3: | Line 3: | ||
Given a set of strings, ''R'', for a prefix ''S'', it should hold that: |
Given a set of strings, ''R'', for a prefix ''S'', it should hold that: |
||
:<math>\forall x\ \in\ R:\ S \le</math><sub>pref</sub> <math>\ x</math> |
:<math>\forall x\ \in\ R:\ S \le</math><sub>pref</sub> <math>\ x</math> ~ "for all members ''x'' of set ''R'', it hold true that string ''S'' is a prefix of ''x''" |
||
:(help here: ''does not express that ''S'' is the <u>longest</u> common prefix of ''x) |
:(help here: ''does not express that ''S'' is the <u>longest</u> common prefix of ''x) |
||
An example use case for this: given a set of phone numbers, identify a common dialing code. This can be accomplished by first determining the common prefix (if any), and then matching it against know dialing codes (iteratively dropping characters from rhs until a match is found, as the ''lcp'' function may match more than the dialing code). |
An example use case for this: given a set of phone numbers, identify a common dialing code. This can be accomplished by first determining the common prefix (if any), and then matching it against know dialing codes (iteratively dropping characters from rhs until a match is found, as the ''lcp'' function may match more than the dialing code). |
Revision as of 13:21, 20 March 2015
It is often useful to find the common prefix of a set of strings, that is, the longest initial portion of all strings that are identical.
Given a set of strings, R, for a prefix S, it should hold that:
- pref ~ "for all members x of set R, it hold true that string S is a prefix of x"
- (help here: does not express that S is the longest common prefix of x)
An example use case for this: given a set of phone numbers, identify a common dialing code. This can be accomplished by first determining the common prefix (if any), and then matching it against know dialing codes (iteratively dropping characters from rhs until a match is found, as the lcp function may match more than the dialing code).
- Test cases
For a function, lcp, accepting a list of strings, the following should hold true (the empty string, , is considered a prefix of all strings):
lcp("interspecies","interstelar","interstate") = "inters" lcp("throne","throne") = "throne" lcp("throne","dungeon") = lcp("cheese") = "cheese" lcp() = lcp() = lcp("prefix","suffix") =
Task inspired by this stackoverflow question: Find the longest common starting substring in a set of strings
J
<lang J>lcp=: {. {.~ 0 i.~ [: */2 =/\ ]</lang>
In other words: compare adjacent strings pair-wise, combine results logically, find first mismatch in any of them, take that many characters from the first of the strings.
As the number of adjacent pairs is O(n) where n is the number of strings, this approach could be faster in the limit cases than sorting.
Examples:
<lang J> lcp 'interspecies','interstellar',:'interstate' inters
lcp 'throne',:'throne'
throne
lcp 'throne',:'dungeon'
lcp ,:'cheese'
cheese
lcp ,:
lcp 0 0$
lcp 'prefix',:'suffix'
</lang>
Python
Python: Functional
To see if all the n'th characters are the same I sort and compare the first to the last in the lambda function. <lang python>from itertools import takewhile
def lcp(*s):
return .join(ch[0] for ch in takewhile(lambda x: x.sort() or not x or x[0] == x[-1],
(list(y) for y in zip(*s))))
assert lcp("interspecies","interstelar","interstate") == "inters" assert lcp("throne","throne") == "throne" assert lcp("throne","dungeon") == "" assert lcp("cheese") == "cheese" assert lcp("") == "" assert lcp("prefix","suffix") == ""</lang>
The above runs without output.
Scala
<lang scala>class TestLCP extends FunSuite {
test("shared start") { assert(lcp("interspecies","interstelar","interstate") === "inters") assert(lcp("throne","throne") === "throne") assert(lcp("throne","dungeon").isEmpty) assert(lcp("cheese") === "cheese") assert(lcp("").isEmpty) assert(lcp(Nil :_*).isEmpty) assert(lcp("prefix","suffix").isEmpty) }
def lcp(list: String*) = list.foldLeft("")((_,_) => (list.min.view,list.max.view).zipped.takeWhile(v => v._1 == v._2).unzip._1.mkString)
}</lang>
zkl
The string method prefix returns the number of common prefix characters. <lang zkl>fcn lcp(s,strings){ s[0,s.prefix(vm.pasteArgs(1))] }</lang> Or, without using prefix:
<lang zkl>fcn lcp(strings){
vm.arglist.reduce(fcn(prefix,s){ Utils.Helpers.zipW(prefix,s) // lazy zip .pump(String,fcn([(a,b)]){ a==b and a or Void.Stop }) })
}</lang> <lang zkl>tester:=TheVault.Test.UnitTester.UnitTester(); tester.testRun(lcp.fp("interspecies","interstelar","interstate"),Void,"inters",__LINE__); tester.testRun(lcp.fp("throne","throne"),Void,"throne",__LINE__); tester.testRun(lcp.fp("throne","dungeon"),Void,"",__LINE__); tester.testRun(lcp.fp("cheese"),Void,"cheese",__LINE__); tester.testRun(lcp.fp(""),Void,"",__LINE__); tester.testRun(lcp.fp("prefix","suffix"),Void,"",__LINE__); tester.stats();</lang> The fp (partial application) method is used to delay running lcp until the tester actually tests.
- Output:
===================== Unit Test 1 ===================== Test 1 passed! ===================== Unit Test 2 ===================== Test 2 passed! ===================== Unit Test 3 ===================== Test 3 passed! ===================== Unit Test 4 ===================== Test 4 passed! ===================== Unit Test 5 ===================== Test 5 passed! ===================== Unit Test 6 ===================== Test 6 passed! 6 tests completed. Passed test(s): 6 (of 6)