Talk:Count occurrences of a substring
More specific?
What happens when there are more than one way to match? Say you have string "aBaBaBa" and want to find non-overlapping "aBa"s, you could say both (aBa)B(aBa) and aB(aBa)Ba, where brackets denote the found matches, are valid. --Ledrug 17:07, 16 June 2011 (UTC)
- My guess would be to specify to match left-to-right, but that might be too restrictive of possible methods. Maybe say "give the highest possible number of non-overlapping matches"? Can you think of any situations where that isn't the same as the count when matching from left-to-right (or even right-to-left)? --Mwn3d 17:20, 16 June 2011 (UTC)
- Earliest match always produces most matches. Proof: suppose a string is matched in two ways:
- <lang>...(m1)...(more matches)...
....(m2)...(more)...</lang>
- where m1 and m2 overlap, and suppose both are giving highest possible number of matches at their starting location; further suppose second way matches more times: but this can't be, because we can then give up m2 and use m1 while keeping the rest, and it would have more matches than first way, which is a contradiction. Hence, highest num of matches can be obtained by starting earliest.
- Since reversing both pattern and string makes left-to-right problem into a right-to-left one, this also proves highest matches can be obtained from either end.
- That exercise aside, I just thought the task should be worded so that any given input would produce an unambiguous result. --Ledrug 17:45, 16 June 2011 (UTC)
- For completeness' sake (and because it wouldn't affect the existing examples), I added that clarification and referenced this proof. --Mwn3d 17:52, 16 June 2011 (UTC)
- That exercise aside, I just thought the task should be worded so that any given input would produce an unambiguous result. --Ledrug 17:45, 16 June 2011 (UTC)
Yes, the problem is stated ambiguously and was probably inspired by some particular language's implementation of a built in function. It would be more appropriate to count all occurrences of a substring. Counting the maximum number of non-overlapping substrings can turn into a very complicated problem requiring backtracking.
- Yes, some languages' BIF may treat it as caseless, still others may treat all whitespace equally (as blank or blanks). It would've been interesting to make a case-sensative and a caseless version. The whitespace issue can be more complicated. -- Gerard Schildberger 18:33, 12 June 2012 (UTC)