Boyer-Moore string search: Difference between revisions

Added Yabasic
(Added Yabasic)
 
(32 intermediate revisions by 13 users not shown)
Line 1:
{{draft task}}
 
;Task:
Line 5:
This algorithm is designed for pattern searching on certain types of devices which are backtracking-unfriendly such as [https://en.wikipedia.org/wiki/Tape_drive Tape drives] and [https://en.wikipedia.org/wiki/Hard_disk_drive Hard disks].
 
ItsIt works by first caching a segment of data from storage and match it against the pattern from the highest position backward to the lowest,. ifIf the matching fails, it will cache next segment of data, and move the start point forward to skip the portion of data which will definitely fail to match,. This continues until we successfully match the pattern or the pointer exceeds the data length.
 
[https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm Follow this link for more information about this algorithm].
 
To properly test this algorithm, it would be good to search for a string which contains repeated subsequences, such as <code>alfalfa</code> and the text being searched should not only include a match but that match should be preceded by words which partially match, such as <code>alfredo</code>, <code>behalf</code>, <code>calfskin</code>, <code>halfhearted</code>, <code>malfunction</code> or <code>severalfold</code>.
=={{header|11l}}==
{{trans|Python}}
 
<syntaxhighlight lang="11l">V ALPHABET_SIZE = 256
 
F alphabet_index(Char c) -> Int
‘Return the index of the given ASCII character.’
V val = c.code
assert(Int(val) C 0 .< :ALPHABET_SIZE)
R val
 
F match_length(String s, Int =idx1, Int =idx2) -> Int
‘Return the length of the match of the substrings of S beginning at idx1 and idx2.’
I idx1 == idx2
R s.len - idx1
V match_count = 0
L idx1 < s.len & idx2 < s.len & s[idx1] == s[idx2]
match_count++
idx1++
idx2++
R match_count
 
F fundamental_preprocess(String s) -> [Int]
‘Return Z, the Fundamental Preprocessing of S.
 
Z[i] is the length of the substring beginning at i which is also a prefix of S.
This pre-processing is done in O(n) time, where n is the length of S.’
I s.empty
R []
I s.len == 1
R [1]
V z = [0] * s.len
z[0] = s.len
z[1] = match_length(s, 0, 1)
L(i) 2 .< 1 + z[1]
z[i] = z[1] - i + 1
V l = 0
V r = 0
L(i) 2 + z[1] .< s.len
I i <= r
V k = i - l
V b = z[k]
V a = r - i + 1
I b < a
z[i] = b
E
z[i] = a + match_length(s, a, r + 1)
l = i
r = i + z[i] - 1
E
z[i] = match_length(s, 0, i)
I z[i] > 0
l = i
r = i + z[i] - 1
R z
 
F bad_character_table(String s) -> [[Int]]
Generates R for S, which is an array indexed by the position of some character c in the
ASCII table. At that index in R is an array of length |S|+1, specifying for each
index i in S (plus the index after S) the next location of character c encountered when
traversing S from right to left starting at i. This is used for a constant-time lookup
for the bad character rule in the Boyer-Moore string search algorithm, although it has
a much larger size than non-constant-time solutions.
I s.empty
R [[Int]()] * :ALPHABET_SIZE
V r = [[-1]] * :ALPHABET_SIZE
V alpha = [-1] * :ALPHABET_SIZE
L(c) s
alpha[alphabet_index(c)] = L.index
L(a) alpha
r[L.index].append(a)
R r
 
F good_suffix_table(String s) -> [Int]
Generates L for S, an array used in the implementation of the strong good suffix rule.
L[i] = k, the largest position in S such that S[i:] (the suffix of S starting at i) matches
a suffix of S[:k] (a substring in S ending at k). Used in Boyer-Moore, L gives an amount to
shift P relative to T such that no instances of P in T are skipped and a suffix of P[:L[i]]
matches the substring of T matched by a suffix of P in the previous match attempt.
Specifically, if the mismatch took place at position i-1 in P, the shift magnitude is given
by the equation len(P) - L[i]. In the case that L[i] = -1, the full shift table is used.
Since only proper suffixes matter, L[0] = -1.
V l = [-1] * s.len
V _n_ = fundamental_preprocess(reversed(s))
_n_.reverse()
L(j) 0 .< s.len - 1
V i = s.len - _n_[j]
I i != s.len
l[i] = j
R l
 
F full_shift_table(String s) -> [Int]
Generates F for S, an array used in a special case of the good suffix rule in the Boyer-Moore
string search algorithm. F[i] is the length of the longest suffix of S[i:] that is also a
prefix of S. In the cases it is used, the shift magnitude of the pattern P relative to the
text T is len(P) - F[i] for a mismatch occurring at i-1.
V f = [0] * s.len
V Z = fundamental_preprocess(s)
V longest = 0
L(zv) reversed(Z)
V i = L.index
longest = I zv == i + 1 {max(zv, longest)} E longest
f[(len)-i - 1] = longest
R f
 
F string_search(P, _t_) -> [Int]
Implementation of the Boyer-Moore string search algorithm. This finds all occurrences of P
in T, and incorporates numerous ways of pre-processing the pattern to determine the optimal
amount to shift the string and skip comparisons. In practice it runs in O(m) (and even
sublinear) time, where m is the length of T. This implementation performs a case-sensitive
search on ASCII characters. P must be ASCII as well.
I P.empty | _t_.empty | _t_.len < P.len
R []
 
[Int] matches
 
V r = bad_character_table(P)
V l = good_suffix_table(P)
V f = full_shift_table(P)
 
V k = P.len - 1
V previous_k = -1
L k < _t_.len
V i = P.len - 1
V h = k
L i >= 0 & h > previous_k & P[i] == _t_[h]
i--
h--
I i == -1 | h == previous_k
matches.append(k - P.len + 1)
k += I P.len > 1 {P.len - f[1]} E 1
E
V char_shift = i - r[alphabet_index(_t_[h])][i]
Int suffix_shift
I i + 1 == P.len
suffix_shift = 1
E I l[i + 1] == -1
suffix_shift = P.len - f[i + 1]
E
suffix_shift = P.len - 1 - l[i + 1]
V shift = max(char_shift, suffix_shift)
previous_k = I shift >= i + 1 {k} E previous_k
k += shift
R matches
 
V TEXT1 = ‘InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented’
V TEXT2 = ‘Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.’
V (PAT1, PAT2, PAT3) = (‘put’, ‘and’, ‘alfalfa’)
 
print(‘Found ’PAT1‘ at: ’string_search(PAT1, TEXT1))
print(‘Found ’PAT2‘ at: ’string_search(PAT2, TEXT1))
print(‘Found ’PAT3‘ at: ’string_search(PAT3, TEXT2))</syntaxhighlight>
 
{{out}}
<pre>
Found put at: [26, 90]
Found and at: [101, 128, 171]
Found alfalfa at: [33, 87]
</pre>
 
=={{header|C++}}==
<syntaxhighlight lang="c++">
#include <algorithm>
#include <cstdint>
#include <experimental/iterator>
#include <functional>
#include <iomanip>
#include <iostream>
#include <string>
#include <vector>
 
void display(const std::vector<int32_t>& numbers) {
std::cout << "[";
std::copy(std::begin(numbers), std::end(numbers), std::experimental::make_ostream_joiner(std::cout, ", "));
std::cout << "]" << std::endl;
}
 
int32_t string_search_single(const std::string& haystack, const std::string& needle) {
auto it = std::search(haystack.begin(), haystack.end(), std::boyer_moore_searcher(needle.begin(), needle.end()));
 
if ( it != haystack.end() ) {
return std::distance(haystack.begin(), it);
}
return -1;
}
 
std::vector<int32_t> string_search(const std::string& haystack, const std::string& needle) {
std::vector<int32_t> result = {};
uint64_t start = 0;
int32_t index = 0;
while ( index >= 0 && start < haystack.length() ) {
std::string haystackReduced = haystack.substr(start);
index = string_search_single(haystackReduced, needle);
if ( index >= 0 ) {
result.push_back(start + index);
start += index + needle.length();
}
}
return result;
}
 
int main() {
const std::vector<std::string> texts = {
"GCTAGCTCTACGAGTCTA",
"GGCTATAATGCGTA",
"there would have been a time for such a word",
"needle need noodle needle",
"DKnuthusesandprogramsanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguages",
"Nearby farms grew an acre of alfalfa on the dairy's behalf, with bales of that alfalfa exchanged for milk."
};
 
const std::vector<std::string> patterns = { "TCTA", "TAATAAA", "word", "needle", "and", "alfalfa" };
 
for ( uint64_t i = 0; i < texts.size(); ++i ) {
std::cout << "text" << ( i + 1 ) << " = " << texts[i] << std::endl;
}
std::cout << std::endl;
 
for ( uint64_t i = 0; i < texts.size(); ++i ) {
std::vector<int32_t> indexes = string_search(texts[i], patterns[i]);
std::cout << "Found " << std::quoted(patterns[i]) << " in 'text" << ( i + 1 ) << "' at indexes ";
display(string_search(texts[i], patterns[i]));
}
}
</syntaxhighlight>
{{ out }}
<pre>
text1 = GCTAGCTCTACGAGTCTA
text2 = GGCTATAATGCGTA
text3 = there would have been a time for such a word
text4 = needle need noodle needle
text5 = DKnuthusesandprogramsanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguages
text6 = Nearby farms grew an acre of alfalfa on the dairy's behalf, with bales of that alfalfa exchanged for milk.
 
Found "TCTA" in 'text1' at indexes [6, 14]
Found "TAATAAA" in 'text2' at indexes []
Found "word" in 'text3' at indexes [40]
Found "needle" in 'text4' at indexes [0, 19]
Found "and" in 'text5' at indexes [10, 46, 73]
Found "alfalfa" in 'text6' at indexes [29, 79]
</pre>
 
=={{header|Emacs Lisp}}==
 
<langsyntaxhighlight lang="lisp">
 
;; Compile the pattern to a right most position map
(defun bm_compile_pattern (pattern)
"Compile the pattern to a right most position map"
(let ((patLen (length pattern))
(rightMap (make-vector 256 -1))
Line 24 ⟶ 274:
)
)
;;
;;(print (bm_compile_pattern "abcdb"))
(defun bm_make_suffix_table (text)
 
(let ((suffix-table (make-vector (length text) -1)) (textLen (length text))
(suffix-found nil)
)
(cl-loop for pos from (1- textLen) downto 1 do
(setq suffix-found nil)
(cl-loop for ptn from (- textLen 2) downto 0 while (not suffix-found) do
(let ((start1 pos) (end1 (1- textLen))
(start2 (- ptn (- (1- textLen) pos))) (end2 ptn)
(matched 't)
)
(if (< start2 0) (setq start2 0))
(cl-loop for idx1 from end1 downto start1 and idx2 from end2 downto start2 while matched do
(if (/= (elt text idx1) (elt text idx2))
(setq matched nil))
)
(when matched
(aset suffix-table pos start2)
(setq suffix-found 't) )
)
)
)
suffix-table
)
)
;;
;;
(defun bm_substring_search (pattern text)
"Boyer-Moore string search"
Line 32 ⟶ 308:
(startPos 0)
(result nil)
(rightMap (bm_compile_pattern pattern)))
(suffixTable (bm_make_suffix_table pattern)))
 
;; Continue this loop when no result and not exceed the text length
(while (and (not result) (<= (+ startPos patLen) txtLen))
 
(let ((idx patLen)
(suffixSkip 0)
(badCharSkip 0)
(skip 0))
(while (and (= 0 skip) (<= 0 (setq idx (1- idx))))
(setq suffixSkip 0)
(setq badCharSkip 0)
;; skip when the character at position idx is different
(when (/= (elt pattern idx) (elt text (+ startPos idx)))
(when (< idx (1- (length pattern)))
(setq suffixSkip (aref suffixTable (1+ idx))) )
(setq badCharSkip (- idx (aref rightMap (elt text (+ startPos idx)))))
;; looking up the right most position in pattern
(setq skip (max 1 (- idx (aref rightMap (elt text (+ startPos idx))))))
(setq skip (max 1 badCharSkip suffixSkip))
)
)
Line 55 ⟶ 339:
)
)
;;
 
 
(let ((pattern "alfalfa")
(full_text "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk."))
(bm_substring_search pattern full_text) )
 
(bm_substring_search pattern full_text)
)
</syntaxhighlight>
</lang>
 
outputs:
 
<pre>33</pre>
</pre>
 
=={{header|Fortran}}==
<syntaxhighlight lang="Fortran">
module bm_str
implicit none
private
public :: bmstr
!
! PARAMETER definitions
!
integer , private , parameter :: NO_OF_CHARS = 256, SIZEX=256
contains
pure subroutine badcharheuristic(Str,M,Badchar)
implicit none
! Dummy arguments
!
integer :: M
integer , dimension(NO_OF_CHARS) :: Badchar
character(1) , dimension(M) :: Str
intent (in) M , Str
intent (out) Badchar
!
! Local variables
!
integer :: i
! Code starts here
Badchar(NO_OF_CHARS) = -1
do i = 1 , M
Badchar(iachar(Str(i))) = i
enddo
return
end subroutine badcharheuristic
function bmstr(Pat,M,Str,N) result(found)
implicit none
!
! Dummy arguments
!
integer :: M
integer :: N
character(len=m) :: Pat
character(len=n) :: Str
intent (in) M , N , Pat , Str
!
! Local variables
!
integer , dimension(NO_OF_CHARS) :: badchar
integer :: found
integer :: i
integer :: j
integer :: s
! Code starts here
!
found = -1
if ( (M==0) .OR. (N==0) .OR. (M>N) ) return
badchar = 0
call badcharheuristic(Pat,M,badchar)
i = 0
s = 0
do while ( s<=(N-M) )
j = M
do j = m,1,-1
if((Pat(j:j) /= Str(s+j:s+j)) )exit ! Leave, the pattern doesn't match.
enddo
if ( j < 1 ) then ! Found, let's leave
found = s + 1
return
endif
i = badchar(iachar(Str(s+j:s+j)))
s = s + MAX(1,j-i)
enddo
found = -1
return
end function bmstr
end module bm_str
</syntaxhighlight>
</pre>
 
=={{header|FreeBASIC}}==
<syntaxhighlight lang="vb">#define max(a, b) iif((a) > (b), (a), (b))
 
Dim Shared As Integer suff(), bmBc(), bmGs()
 
Sub preBmBc(pat As String)
Dim As Integer m = Len(pat), i
Redim bmBc(m)
For i = 1 To m-1
bmBc(Cint(pat[i])) = m - i
Next
End Sub
 
Sub suffixes(pat As String)
Dim As Integer m = Len(pat), g = m, i, f
Redim suff(m)
suff(m) = m
For i = m-1 To 1 Step -1
If i > g And suff(i + m - f) < i - g Then
suff(i) = suff(i + m - f)
Else
If i < g Then g = i
f = i
While g >= 1 And pat[g] = pat[g + m - f]
g -= 1
Wend
suff(i) = f - g
End If
Next
End Sub
 
Sub preBmGs(pat As String)
Dim As Integer m = Len(pat), j = 1, i
Redim suff(m)
Redim bmGs(m)
For i = m To 1 Step -1
If suff(i) = i Then
While j < m - i
If bmGs(j) = m Then bmGs(j) = m - i
j += 1
Wend
End If
Next
For i = 1 To m-1
bmGs(m - suff(i)) = m - i
Next
End Sub
 
Sub BM(pat As String, s As String, case_insensitive As Boolean = False)
Dim As String pins = "'" & pat & "' in " & "'" & s & "'"
If case_insensitive Then
pat = Lcase(pat)
s = Lcase(s)
End If
' Preprocessing
preBmGs(pat)
preBmBc(pat)
' Searching
Dim As Integer j = 0, n = Len(s), m = Len(pat), i = m
While j <= n - m
i -= 1
If pat[i] <> s[i+j] Then Exit While
j += Iif(i < 1, bmGs(0), max(bmGs(i), bmBc(Len(s[i+j]) - m + i)))
Wend
Dim As Integer many = Instr(s, pat)
Dim As String tmp = ""
If Not many > 0 Then
Print "No "; pins
Else
Do While many > 0 'if not found loop will be skipped
tmp &= Str(many) & ","
many = Instr(many + 1, s, pat)
Loop
Print Using "Found & at indices [&]"; pins; tmp & Chr(8)
End If
End Sub
 
BM("GCAGAGAG","GCATCGCAGAGAGTATACAGTACG")
BM("TCTA","GCTAGCTCTACGAGTCTA")
BM("TAATAAA","GGCTATAATGCGTA")
BM("word","there would have been a time for such a word")
BM("needle","needle need noodle needle")
Const book = "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesley" + _
"DKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeand" + _
"assemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented"
BM("put",book)
BM("and",book)
Const farm = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with " + _
"bales of all that alfalfa exchanged for milk."
BM("alfalfa",farm)
 
Sleep</syntaxhighlight>
{{out}}
<pre>Found 'GCAGAGAG' in 'GCATCGCAGAGAGTATACAGTACG' at indices [6]
Found 'TCTA' in 'GCTAGCTCTACGAGTCTA' at indices [7,15]
No 'TAATAAA' in 'GGCTATAATGCGTA'
Found 'word' in 'there would have been a time for such a word' at indices [41]
Found 'needle' in 'needle need noodle needle' at indices [1,20]
Found 'put' in 'InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented' at indices [27,91]
Found 'and' in InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented' at indices [102,129,172]
Found 'alfalfa' in 'Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.' at indices [34,88]</pre>
 
=={{header|J}}==
{{trans|Emacs Lisp}}
<syntaxhighlight lang="j">bmsearch0=: {{
<lang J>bmsearch=: {{
startPos=. 0
rightMap=. (i.#x) (3 u: x)} 256#_1
Line 89 ⟶ 557:
end.
''
}}</langsyntaxhighlight>
 
Example use:
<lang Jpre> 'alfalfa' bmsearchbmsearch0 'On behalf of an alfredo calfskin malfunction we donate alfalfa.'
55</langpre>
 
Caution: this<tt>bmsearch0</tt> is an incomplete implementation of the algorithm. See talk page.
 
Here's a complete implementation of the algorithm:
 
<syntaxhighlight lang="j">Z=: {{ y {{ 1 i.~y~:(#y){.m }}\.y }}
 
bmsearch1=: {{
mx=. <:nx=. #x
my=. <:ny=. #y
R=. |:>./\_1,(i.@# ([`]`((256#_1)"0))}&> 3&u:) x
L=.{{
j=.I.*y
j ((#y)-y{~j)} _1#~#y }} Z&.|. x
F=. >./\.(*]=#-i.@#) Z x
k=. mx
k0=. _1
while. k < ny do.
i=. mx
h=. k
while. (0<:i) * (k0<h) * (i{x) = h{y do.
i=. i-1
h=. h-1
end.
if. (_1=i)+.k0=h do.
1+k-nx return.
else.
if. mx=i do.
suffix_shift=. 1
elseif. _1=L{~i1=. i+1 do.
suffix_shift=. nx-F{~i1
else.
suffix_shift=. mx-L{~i1
end.
shift=. suffix_shift >. i-R{~<(3 u: h{y),i
if. shift > i do. k0=. k end.
k=. k + shift
end.
end.
EMPTY
}}</syntaxhighlight>
 
<pre> 'alfalfa' bmsearch1 'On behalf of an alfredo calfskin malfunction we donate alfalfa.'
55</pre>
 
Testing performance on a relatively long text (and pattern):
 
<pre> text=: 'acgt'{~?.1e7#4
pat=. text{~9e6+i.1e5
pat bmsearch0 text
9000000
pat bmsearch1 text
9000000
'pat bmsearch0 text' %&timex 'pat bmsearch1 text'
2.33477</pre>
 
In other words: <tt>bmsearch0</tt> takes slightly over twice as long as <tt>bmsearch1</tt> for a somewhat randomly picked example search (though the results are the same).
 
=={{header|Java}}==
<syntaxhighlight lang="java">
 
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
 
/**
* An implementation of the Boyer-Moore string search algorithm.
* It finds all occurrences of a pattern in a text, performing a case-insensitive search on ASCII characters.
*
* For all full description of the algorithm visit:
* https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm
*/
public final class BoyerMooreStringSearch {
public static void main(String[] aArgs) {
List<String> texts = List.of(
"GCTAGCTCTACGAGTCTA",
"GGCTATAATGCGTA",
"there would have been a time for such a word",
"needle need noodle needle",
"DKnuthusesandshowsanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustrate",
"Farms grew an acre of alfalfa on the dairy's behalf, with bales of that alfalfa exchanged for milk.");
List<String> patterns = List.of( "TCTA", "TAATAAA", "word", "needle", "put", "and", "alfalfa" );
for ( int i = 0; i < texts.size(); i++ ) {
System.out.println("text" + ( i + 1 ) + " = " + texts.get(i));
}
System.out.println();
for ( int i = 0; i < patterns.size(); i++ ) {
final int j = ( i < 5 ) ? i : i - 1;
System.out.println("Found \"" + patterns.get(i) + "\" in 'text" + ( j + 1 ) + "' at indexes "
+ stringSearch(texts.get(j), patterns.get(i)));
}
}
/**
* Return a list of indexes at which the given pattern matches the given text.
*/
private static List<Integer> stringSearch(String aText, String aPattern) {
if ( aPattern.isEmpty() || aText.isEmpty() || aText.length() < aPattern.length() ) {
return Collections.emptyList();
}
 
List<Integer> matches = new ArrayList<Integer>();
 
// Preprocessing
List<List<Integer>> R = badCharacterTable(aPattern);
int[] L = goodSuffixTable(aPattern);
int[] F = fullShiftTable(aPattern);
 
int k = aPattern.length() - 1; // Represents the alignment of the end of aPattern relative to aText
int previousK = -1; // Represents the above alignment in the previous phase
while ( k < aText.length() ) {
int i = aPattern.length() - 1; // Index of the character to compare in aPattern
int h = k; // Index of the character to compare in aText
while ( i >= 0 && h > previousK && aPattern.charAt(i) == aText.charAt(h) ) {
i -= 1;
h -= 1;
}
if ( i == -1 || h == previousK ) { // Match has been found
matches.add(k - aPattern.length() + 1);
k += ( aPattern.length() > 1 ) ? aPattern.length() - F[1] : 1;
} else { // No match, shift by the maximum of the bad character and good suffix rules
int suffixShift;
int charShift = i - R.get(alphabetIndex(aText.charAt(h))).get(i);
if ( i + 1 == aPattern.length() ) { // Mismatch occurred on the first character
suffixShift = 1;
} else if ( L[i + 1] == -1 ) { // Matched suffix does not appear anywhere in aPattern
suffixShift = aPattern.length() - F[i + 1];
} else { // Matched suffix appears in aPattern
suffixShift = aPattern.length() - 1 - L[i + 1];
}
int shift = Math.max(charShift, suffixShift);
if ( shift >= i + 1 ) { // Galil's rule
previousK = k;
}
k += shift;
}
}
return matches;
}
/**
* Create the shift table, F, for the given text, which is an array such that
* F[i] is the length of the longest suffix of, aText.substring(i), which is also a prefix of the given text.
*
* Use case: If a mismatch occurs at character index i - 1 in the pattern,
* the magnitude of the shift of the pattern, P, relative to the text is: P.length() - F[i].
*/
private static int[] fullShiftTable(String aText) {
int[] F = new int[aText.length()];
int[] Z = fundamentalPreprocess(aText);
int longest = 0;
Collections.reverse(Arrays.asList(Z));
for ( int i = 0; i < Z.length; i++ ) {
int zv = Z[i];
if ( zv == i + 1 ) {
longest = Math.max(zv, longest);
}
F[F.length - i - 1] = longest;
}
return F;
}
/**
* Create the good suffix table, L, for the given text, which is an array such that
* L[i] = k, is the largest index in the given text, S,
* such that S.substring(i) matches a suffix of S.substring(0, k).
*
* Use case: If a mismatch of characters occurs at index i - 1 in the pattern, P,
* then a shift of magnitude, P.length() - L[i], is such that no instances of the pattern in the text are omitted.
* Furthermore, a suffix of P.substring(0, L[i]) matches the same substring of the text that was matched by a
* suffix in the pattern on the previous matching attempt.
* In the case that L[i] = -1, the full shift table must be used.
*/
private static int[] goodSuffixTable(String aText) {
int[] L = IntStream.generate( () -> -1 ).limit(aText.length()).toArray();
String reversed = new StringBuilder(aText).reverse().toString();
int[] N = fundamentalPreprocess(reversed);
Collections.reverse(Arrays.asList(N));
for ( int j = 0; j < aText.length() - 1; j++ ) {
int i = aText.length() - N[j];
if ( i != aText.length() ) {
L[i] = j;
}
}
return L;
}
/**
* Create the bad character table, R, for the given text,
* which is a list indexed by the ASCII value of a character, C, in the given text.
*
* Use case: The entry at index i of R is a list of size: 1 + length of the given text.
* This inner list gives at each index j the next position at which character C will be found
* while traversing the given text from right to left starting from index j.
*/
private static List<List<Integer>> badCharacterTable(String aText) {
if ( aText.isEmpty() ) {
return Collections.emptyList();
}
List<List<Integer>> R = IntStream.range(0, ALPHABET_SIZE).boxed()
.map( i -> new ArrayList<Integer>(Collections.nCopies(1,-1)) ).collect(Collectors.toList());
List<Integer> alpha = new ArrayList<Integer>(Collections.nCopies(ALPHABET_SIZE, -1));
for ( int i = 0; i < aText.length(); i++ ) {
alpha.set(alphabetIndex(aText.charAt(i)), i);
for ( int j = 0; j < alpha.size(); j++ ) {
R.get(j).add(alpha.get(j));
}
}
return R;
}
/**
* Create the fundamental preprocess, Z, of the given text, which is an array such that
* Z[i] is the length of the substring of the given text beginning at index i which is also a prefix of the text.
*/
private static int[] fundamentalPreprocess(String aText) {
if ( aText.isEmpty() ) {
return new int[0];
}
if ( aText.length() == 1 ) {
return new int[] { 1 };
}
int[] Z = new int[aText.length()];
Z[0] = aText.length();
Z[1] = matchLength(aText, 0, 1);
for ( int i = 2; i <= Z[1]; i++ ) {
Z[i] = Z[1] - i + 1;
}
// Define the left and right limits of the z-box
int left = 0;
int right = 0;
for ( int i = 2 + Z[1]; i < aText.length(); i++ ) {
if ( i <= right ) { // i falls within existing z-box
final int k = i - left;
final int b = Z[k];
final int a = right - i + 1;
if ( b < a ) { // b ends within existing z-box
Z[i] = b;
} else { // b ends at or after the end of the z-box,
// an explicit match to the right of the z-box is required
Z[i] = a + matchLength(aText, a, right + 1);
left = i;
right = i + Z[i] - 1;
}
} else { // i does not fall within existing z-box
Z[i] = matchLength(aText, 0, i);
if ( Z[i] > 0 ) {
left = i;
right = i + Z[i] - 1;
}
}
}
return Z;
}
/**
* Return the length of the match of the two substrings of the given text beginning at each of the given indexes.
*/
private static int matchLength(String aText, int aIndexOne, int aIndexTwo) {
if ( aIndexOne == aIndexTwo ) {
return aText.length() - aIndexOne;
}
int matchCount = 0;
while ( aIndexOne < aText.length() && aIndexTwo < aText.length()
&& aText.charAt(aIndexOne) == aText.charAt(aIndexTwo) ) {
matchCount += 1;
aIndexOne += 1;
aIndexTwo += 1;
}
return matchCount;
}
/**
* Return the ASCII index of the given character, if it is such, otherwise throw an illegal argument exception.
*/
private static int alphabetIndex(char aChar) {
final int result = (int) aChar;
if ( result >= ALPHABET_SIZE ) {
throw new IllegalArgumentException("Not an ASCII character:" + aChar);
}
return result;
}
 
/* The range of ASCII characters is 0..255, both inclusive. */
private static final int ALPHABET_SIZE = 256;
}
</syntaxhighlight>
<pre>
text1 = GCTAGCTCTACGAGTCTA
text2 = GGCTATAATGCGTA
text3 = there would have been a time for such a word
text4 = needle need noodle needle
text5 = DKnuthusesandshowsanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustrate
text6 = Farms grew an acre of alfalfa on the dairy's behalf, with bales of that alfalfa exchanged for milk.
 
Found "TCTA" in 'text1' at indexes [6, 14]
Found "TAATAAA" in 'text2' at indexes []
Found "word" in 'text3' at indexes [40]
Found "needle" in 'text4' at indexes [0, 19]
Found "put" in 'text5' at indexes [32]
Found "and" in 'text5' at indexes [10, 43, 70]
Found "alfalfa" in 'text6' at indexes [22, 72]
</pre>
 
=={{header|Julia}}==
Line 103 ⟶ 885:
"Algorithms on Strings, Trees, and Sequences", 1997, available online as a pdf in several publicly indexed
archives on the internet.
<langsyntaxhighlight rubylang="julia">""" Rosetta Code task at rosettacode.org/mw/index.php?title=Boyer-Moore_string_search """
 
const ASCIIchars = String([Char(i) for i = 0:255]) # any standard or extended ASCII char
Line 415 ⟶ 1,197:
occurrences, num_alignments, num_character_comparisons = boyer_moore_with_counts(p7, p_bm7, t7)
@show occurrences, num_alignments, num_character_comparisons
</langsyntaxhighlight>{{out}}
<pre>
(p_bm.amap, p_bm.bad_char) = (Dict('A' => 1, 'G' => 3, 'T' => 4, 'C' => 2), [[0, 0, 0, 0], [0, 0, 0, 1], [0, 2, 0, 1], [3, 2, 0, 1]])
Line 429 ⟶ 1,211:
(occurrences, num_alignments, num_character_comparisons) = ([33, 87], 20, 42)
</pre>
 
=={{header|Nim}}==
{{trans|Python}}
<syntaxhighlight lang = Nim>#[ This version uses ASCII for case-sensitive matching. For Unicode you may want to match in UTF-8
bytes instead of creating a 0x10FFFF-sized table.
]#
 
import std/[algorithm, sequtils, strutils]
 
const AlphabetSize = 256
 
func reversed(s: string): string =
## Return the reverse of an ASCII string.
for i in countdown(s.high, 0):
result.add s[i]
 
proc alphabetIndex(c: char): int =
## Return the index of the given ASCII character.
result = ord(c)
assert result in 0..<ALPHABET_SIZE
 
proc matchLength(s: string; idx1, idx2: int): int =
## Return the length of the match of the substrings of "s" beginning at "idx1" and "idx2".
if idx1 == idx2: return s.len - idx1
var idx1 = idx1
var idx2 = idx2
while idx1 < s.len and idx2 < s.len and s[idx1] == s[idx2]:
inc result
inc idx1
inc idx2
 
proc fundamentalPreprocess(s: string): seq[int] =
## Return "z", the Fundamental Preprocessing of "s".
 
# "z[i]" is the length of the substring beginning at "i" which is also a prefix of "s".
# This preprocessing is done in O(n) time, where n is the length of "s".
if s.len == 0: return
if s.len == 1: return @[1]
result = repeat(0, s.len)
result[0] = s.len
result[1] = s.matchLength(0, 1)
for i in 2..result[1]:
result[i] = result[1] - i + 1
# Defines lower and upper limits of z-box.
var l, r = 0
for i in (2 + result[1])..s.high:
if i <= r: # "i" falls within existing z-box.
let k = i - l
let b = result[k]
let a = r - i + 1
if b < a: # "b" ends within existing z-box.
result[i] = b
else: # "b" ends at or after the end of the z-box.
# We need to do an explicit match to the right of the z-box.
result[i] = a + s.matchLength(a, r + 1)
l = i
r = i + result[i] - 1
else: # "i" does not reside within existing z-box.
result[i] = s.matchLength(0, i)
if result[i] > 0:
l = i
r = i + result[i] - 1
 
proc badCharacterTable(s: string): seq[seq[int]] =
## Generates "R" for "s", which is an array indexed by the position of some character "c"
## in the ASCII table.
 
# At that index in "R" is an array of length |s|+1, specifying for each index "i" in "s"
# (plus the index after "s") the next location of character "c" encountered when traversing
# "S" from right to left starting at "i". This is used for a constant-time lookup for the bad
# character rule in the Boyer-Moore string search algorithm, although it has a much larger
# size than non-constant-time solutions.
if s.len == 0: return newSeqWith(AlphabetSize, newSeq[int]())
result = repeat(@[-1], AlphabetSize)
var alpha = repeat(-1, AlphabetSize)
for i, c in s:
alpha[alphabetIndex(c)] = i
for j, a in alpha:
result[j].add a
 
proc goodSuffixTable(s: string): seq[int] =
## Generates "L" for "s", an array used in the implementation of the strong good suffix rule.
 
# "L[i] = k", the largest position in S such that "s[i:]" (the suffix of "s" starting at "i")
# matches a suffix of "s[:k]" (a substring in "s" ending at "k"). Used in Boyer-Moore, "L"
# gives an amount to shift "P" relative to "T" such that no instances of "P" in "T" are skipped
# and a suffix of "P[:L[i]]" matches the substring of "T" matched by a suffix of "P" in the
# previous match attempt.
# Specifically, if the mismatch took place at position "i-1" in "P", the shift magnitude is
# given by the formula "len(P) - L[i]". In the case that "L[i] = -1", the full shift table
# is used. Since only proper suffixes matter, "L[0] = -1".
result =repeat(-1, s.len)
var n = fundamentalPreprocess(reversed(s))
n.reverse()
for j in 0..(s.len - 2):
let i = s.len - n[j]
if i != s.len:
result[i] = j
 
proc fullShiftTable(s: string): seq[int] =
## Generates "F" for "s", an array used in a special case of the good suffix rule in the
## Boyer-Moore string search algorithm.
 
# "F[i]" is the length of the longest suffix of "s[i:]" that is also a prefix of "s". In
# the cases it is used, the shift magnitude of the pattern "P" relative to the text "T" is
# "len(P) - F[i]" for a mismatch occurring at "i-1".
result = repeat(0, s.len)
let z = fundamentalPreprocess(s)
var longest = 0
for i, zv in reversed(z):
if zv == i + 1:
longest = max(zv, longest)
result[^(i + 1)] = longest
 
proc stringSearch(p, t: string): seq[int] =
## Implementation of the Boyer-Moore string search algorithm.
 
# This finds all occurrences of "p" in "t", and incorporates numerous ways of preprocessing
# the pattern to determine the optimal amount to shift the string and skip comparisons.
# In practice it runs in O(m) (and even sublinear) time, where "m" is the length of "t".
# This implementation performs a case-sensitive search on ASCII characters. "p" must be
# ASCII as well.
 
if p.len == 0 or t.len == 0 or t.len < p.len: return
 
# Preprocessing
let r = badCharacterTable(p)
let l = goodSuffixTable(p)
let f = fullShiftTable(p)
 
var k = p.len - 1 # Represents alignment of end of "p" relative to "t".
var prevk = -1 # Represents alignment in previous phase (Galil's rule).
while k < t.len:
var i = p.len - 1 # Character to compare in "p".
var h = k # Character to compare in "t".
while i >= 0 and h > prevk and p[i] == t[h]: # Matches starting from end of "p".
dec i
dec h
if i == -1 or h == prevk: # Match has been found (Galil's rule).
result.add k - p.len + 1
inc k, (if p.len > 1: p.len - f[1] else: 1)
else: # No match: shift by max of bad character and good suffix rules.
let charShift = i - r[alphabetIndex(t[h])][i]
let suffixShift = if i + 1 == p.len: # Mismatch happened on first attempt.
1
elif l[i + 1] == -1: # Matched suffix does not appear anywhere in "p".
p.len - f[i + 1]
else: # Matched suffix appears in "p".
p.len - 1 - l[i + 1]
let shift = max(charShift, suffixShift)
if shift >= i + 1: prevk = k # Galil's rule
inc k, shift
 
const
Text1 = "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesley" &
"DKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassembly" &
"languagestoillustratetheconceptsandalgorithmsastheyarepresented"
Text2 = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, " &
"with bales of all that alfalfa exchanged for milk."
(Pat1, Pat2, Pat3) = ("put", "and", "alfalfa")
 
echo "Found '", Pat1, "' at: ", Pat1.stringSearch(Text1).join(", ")
echo "Found '", Pat2, "' at: ", Pat2.stringSearch(Text1).join(", ")
echo "Found '", Pat3, "' at: ", Pat3.stringSearch(Text2).join(", ")
</syntaxhighlight>
 
{{out}}
<pre>Found 'put' at: 26, 90
Found 'and' at: 101, 128, 171
Found 'alfalfa' at: 33, 87
</pre>
 
=={{header|Pascal}}==
Works with FPC (currently only version 3.3.1).
<syntaxhighlight lang="pascal">
program BMTest;
{$mode objfpc}{$h+}
{$modeswitch functionreferences}
{$modeswitch anonymousfunctions}
uses
SysUtils, Math;
 
type
TSearchFun = reference to function(const s: rawbytestring): specialize TArray<SizeInt>;
 
{ returns a function that performs a case-sensitive search for all occurrences(1-based)
of the specified pattern using the Boyer-Moore algorithm with Galil optimization }
function BmgCreate(const aPattern: rawbytestring): TSearchFun;
var
BcTable: array[Byte] of Integer; //bad character shifts
p: PByte absolute aPattern;
procedure FillBcTable;
var
I: Integer;
begin
FillDWord(BcTable, Length(BcTable), DWord(Length(aPattern)));
for I := 0 to Length(aPattern) - 2 do
BcTable[p[I]] := Pred(Length(aPattern) - I);
end;
var
GsTable: array of Integer = nil; //good suffix shifts
procedure MakeGsTable;
function IsPrefix(aPos: Integer): Boolean;
var
I, SuffixLen: Integer;
begin
SuffixLen := Length(aPattern) - aPos;
for I := 0 to Pred(SuffixLen) do
if (p[I] <> p[aPos + I]) then exit(False);
Result := True;
end;
function GetSuffixLen(aPos: Integer): Integer;
begin
Result := 0;
while(Result < aPos)and(p[aPos - Result] = p[Pred(Length(aPattern) - Result)])do
Inc(Result);
end;
var
I, LastPrefix, SuffixLen: Integer;
begin
SetLength(GsTable, Length(aPattern));
LastPrefix := Pred(Length(aPattern));
for I := Pred(Length(aPattern)) downto 0 do begin
if IsPrefix(Succ(I)) then
LastPrefix := Succ(I);
GsTable[I] := LastPrefix + Length(aPattern) - Succ(I);
end;
for I := 0 to Length(aPattern) - 2 do begin
SuffixLen := GetSuffixLen(I);
if p[I - SuffixLen] <> p[Pred(Length(aPattern) - SuffixLen)] then
GsTable[Pred(Length(aPattern) - SuffixLen)] := Pred(Length(aPattern) + SuffixLen - I);
end;
end;
var
Needle: rawbytestring;
begin
if aPattern <> '' then begin
Needle := aPattern;
FillBcTable;
MakeGsTable;
end else
Needle := '';
{ returns an empty array if there are no matches or the pattern is empty }
Result := function(const aHaystack: rawbytestring): specialize TArray<SizeInt>
var
pNeedle: PByte absolute Needle;
pHaystack: PByte absolute aHaystack;
I, J, NeedleLast, OutPos, OldPfxEnd: SizeInt;
const
OutInitLen = 4;
begin
Result := nil;
if (Needle = '') or (Length(aHaystack) < Length(Needle)) then exit;
SetLength(Result, OutInitLen);
OutPos := 0;
NeedleLast := Pred(Length(Needle));
I := NeedleLast;
OldPfxEnd := 0;
while I < Length(aHaystack) do begin
J := NeedleLast;
while (J >= OldPfxEnd) and (pNeedle[J] = pHaystack[I]) do begin
Dec(J); Dec(I);
end;
if J < OldPfxEnd then begin
if OutPos = Length(Result) then SetLength(Result, OutPos * 2);
Result[OutPos] := I - OldPfxEnd + 2;
Inc(OutPos);
Inc(I, Succ(GsTable[0] - OldPfxEnd));
OldPfxEnd := Length(Needle)*2 - GsTable[0];
end else begin
Inc(I, Max(BcTable[pHaystack[I]], GsTable[J]));
OldPfxEnd := 0;
end;
end;
SetLength(Result, OutPos);
end;
end;
 
procedure WriteArray(const a: array of SizeInt);
var
I: SizeInt;
begin
Write('[');
for I := 0 to High(a) do
if I < High(a) then Write(a[I], ', ')
else Write(a[I]);
WriteLn(']');
end;
 
const
Text1 = 'Nearby farms grew a half acre of alfalfa on the dairy''s behalf, with bales of all that alfalfa exchanged for milk';
Text2 = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
Text3 = 'CAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGT';
Text4 = 'AGGTGTGGAAACAAGCACCTAGATGTGCTGAACCCGGGGCACACGTTCAGTCAGCGACTC';
var
BmgSearch: TSearchFun;
begin
WriteArray(BmgCreate('alfalfa')(Text1));
WriteArray(BmgCreate('aaaaaaaaaaaaaaaaaaaa')(Text2));
BmgSearch := BmgCreate('CAGTCAG');
WriteArray(BmgSearch(Text3));
WriteArray(BmgSearch(Text4));
end.
</syntaxhighlight>
{{out}}
<pre>
[34, 88]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53]
[48]
</pre>
 
=={{header|Perl}}==
{{trans|Raku}}
<syntaxhighlight lang="perl" line>use v5.36;
use List::AllUtils <min max zip_by>;
 
sub suffixes ($m,@pat) {
my($f, $g) = (0, $m-1);
my @suff = (0) x $m-1; push @suff, $m;
for (my $i = $m-2; $i >= 0; --$i) {
if ($i > $g and $suff[$i + $m - 1 - $f] < $i - $g) {
$suff[$i] = $suff[$i + $m - 1 - $f]
} else {
$g = $i if $i < $g;
$f = $i;
while ($g >= 0 and $pat[$g] eq $pat[$g + $m - 1 - $f]) { $g-- }
$suff[$i] = $f - $g
}
}
@suff
}
 
sub preBmGs ($m,@pat) {
my @suff = suffixes($m,@pat);
my @bmGs = ($m) x $m;
 
for my $k (reverse 0..$m-2) {
if ($suff[$k] == $k + 1) {
for (my $j=0; $j < $m-1-$k; $j++) { $bmGs[$k]=$m-1-$k if $bmGs[$j] == $m }
}
}
for (0..$m-2) { $bmGs[$m - 1 - $suff[$_]] = $m - 1 - $_ }
@bmGs
}
 
sub BM ($txt,$pat) {
my @txt = split '', $txt;
my @pat = split '', $pat;
my ($m, $n, $j) = (length($pat), length($txt), 0);
my @bmGs = preBmGs($m,@pat);
 
my $x = min $m-1, $#pat;
my %bmBc = zip_by { @_ } [@pat[0..$x-1]], [reverse 1..$x];
 
my @I;
while ($j <= $n - $m) {
my $i = $m - 1;
for (; $i >= 0 and $pat[$i] eq $txt[$i + $j]; ) { $i-- }
if ($i < 0) {
push @I, $j;
$j += $bmGs[0]
} else {
$j += max $bmGs[$i], ($bmBc{$txt[$i + $j]} // $m)-$m+$i
}
}
@I
}
 
my @texts = (
"GCTAGCTCTACGAGTCTA",
"GGCTATAATGCGTA",
"there would have been a time for such a word",
"needle need noodle needle", "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented",
"Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.",
);
my @pats = <TCTA TAATAAA word needle put and alfalfa>;
 
say "text$_ = $texts[$_]" for 0..$#texts;
say '';
 
for (0.. $#pats) {
my $j = $_ < 5 ? $_ : $_-1 ; # for searching text4 twice
say "Found '$pats[$_]' in 'text$j' at indices " . join ', ', BM($texts[$j],$pats[$_]);
}</syntaxhighlight>
{{out}}
<pre>text0 = GCTAGCTCTACGAGTCTA
text1 = GGCTATAATGCGTA
text2 = there would have been a time for such a word
text3 = needle need noodle needle
text4 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented
text5 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.
 
Found 'TCTA' in 'text0' at indices 6, 14
Found 'TAATAAA' in 'text1' at indices
Found 'word' in 'text2' at indices 40
Found 'needle' in 'text3' at indices 0, 19
Found 'put' in 'text4' at indices 26, 90
Found 'and' in 'text4' at indices 101, 128, 171
Found 'alfalfa' in 'text5' at indices 33, 87</pre>
 
=={{header|Phix}}==
<!--<langsyntaxhighlight Phixlang="phix">(phixonline)-->
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\BoyerMoore.exw
Line 538 ⟶ 1,720:
<span style="color: #000080;font-style:italic;">--BM("aLfAlfa",farm) -- none
--BM("aLfAlfa",farm,true) -- as -2</span>
<!--</langsyntaxhighlight>-->
{{out}}
Significantly lower character comparison counts than [[Knuth-Morris-Pratt_string_search#Phix]].
Line 554 ⟶ 1,736:
=={{header|Python}}==
Slightly modified from en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm#Python_implementation.
<langsyntaxhighlight lang="python">"""
This version uses ASCII for case-sensitive matching. For Unicode you may want to match in UTF-8
bytes instead of creating a 0x10FFFF-sized table.
Line 719 ⟶ 1,901:
print("Found", PAT2, "at:", string_search(PAT2, TEXT1))
print("Found", PAT3, "at:", string_search(PAT3, TEXT2))
</langsyntaxhighlight>{{out}}
<pre>
Found put at: [26, 90]
Line 725 ⟶ 1,907:
Found alfalfa at: [33, 87]
</pre>
=={{header|Raku}}==
{{trans|Phix}}
<syntaxhighlight lang="raku" line># 20220818 Raku programming solution
 
sub suffixes (@pat,\m) {
loop (my ($i,$f,$g,@suff)=m-2, 0, m-1, |flat 0 xx m-1,m; $i >= 0; --$i) {
if $i > $g and @suff[$i + m - 1 - $f] < $i - $g {
@suff[$i] = @suff[$i + m - 1 - $f]
} else {
$g = $i if $i < $g;
$f = $i;
while $g >= 0 and @pat[$g] eq @pat[$g + m - 1 - $f] { $g-- }
@suff[$i] = $f - $g
}
}
return @suff;
}
 
sub preBmGs (@pat,\m) {
my (@suff, @bmGs) := suffixes(@pat,m), [m xx m];
 
for m-1 ... 0 -> \k {
if @suff[k] == k + 1 {
loop (my $j=0; $j < m-1-k; $j++) { @bmGs[k]=m-1-k if @bmGs[$j] == m }
}
}
for ^(m-1) { @bmGs[m - 1 - @suff[$_]] = m - 1 - $_ }
return @bmGs;
}
 
sub BM (@txt,@pat) {
my (\m, \n, $j) = +@pat, +@txt, 0;
my (@bmGs, %bmBc) := preBmGs(@pat,m), hash @pat Z=> ( m-1 ... 1 );
 
return gather while $j <= n - m {
loop (my $i = m - 1; $i >= 0 and @pat[$i] eq @txt[$i + $j]; ) { $i-- }
if $i < 0 {
take $j;
$j += @bmGs[0]
} else {
$j += max @bmGs[$i], (%bmBc{@txt[$i + $j]} // m)-m+$i
}
}
}
 
my @texts = [
"GCTAGCTCTACGAGTCTA",
"GGCTATAATGCGTA",
"there would have been a time for such a word",
"needle need noodle needle",
"BharôtভাৰতBharôtভারতIndiaBhāratભારતBhāratभारतBhārataಭಾರತBhāratभारतBhāratamഭാരതംBhāratभारतBhāratभारतBharôtôଭାରତBhāratਭਾਰਤBhāratamभारतम्Bārataபாரதம்BhāratamഭാരതംBhāratadēsamభారతదేశం",
"InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented",
"Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.",
];
 
my @pats = [ "TCTA", "TAATAAA", "word", "needle", "ഭാരതം", "put", "and", "alfalfa"];
 
say "text$_ = @texts[$_]" for @texts.keys ;
say();
 
for @pats.keys {
my \j = $_ < 6 ?? $_ !! $_-1 ; # for searching text5 twice
say "Found '@pats[$_]' in 'text{j}' at indices ", BM @texts[j].comb, @pats[$_].comb
}</syntaxhighlight>
Output is the same as the [[Knuth-Morris-Pratt_string_search#Raku]] entry.
=={{header|Wren}}==
I've worked here from the Java code in the Wikipedia article though, as this version only finds the index of the first occurrence of the substring, I've added a routine to find all non-overlapping occurrences by repeatedly calling the ''BoyerMoore.indexOf'' method. Indices are zero-based.
 
The same examples have been used as in the Julia entry above.
<langsyntaxhighlight ecmascriptlang="wren">class BoyerMoore {
/**
* Returns the index within this string of the first occurrence of the
Line 857 ⟶ 2,103:
var j = (i < 5) ? i : i-1
System.print("Found '%(pats[i])' in 'text%(j+1)' at indices %(indicesOf.call(texts[j], pats[i]))")
}</langsyntaxhighlight>
 
{{out}}
Line 876 ⟶ 2,122:
Found 'alfalfa' in 'text6' at indices [33, 87]
</pre>
 
=={{header|Yabasic}}==
{{trans|FreeBASIC}}
<syntaxhighlight lang="vb">case_insensitive = false
 
sub preBmBc(pat$)
local m, i
 
m = len(pat$)
redim bmBc(m)
for i = 1 to m-1
bmBc(val(mid$(pat$, i, 1))) = m - i
next
end sub
 
sub suffixes(pat$)
local m, g, i, f
m = len(pat$)
g = m
redim suff(m)
suff(m) = m
for i = m-1 to 1 step -1
if i > g and suff(i + m - f) < i - g then
suff(i) = suff(i + m - f)
else
if i < g g = i
f = i
while g >= 1 and mid$(pat$, g, 1) = mid$(pat$, g + m - f, 1)
g = q - 1
wend
suff(i) = f - g
fi
next
end sub
 
sub preBmGs(pat$)
local m, j, i
m = len(pat$)
j = 1
redim suff(m)
redim bmGs(m)
for i = m to 1 step -1
if suff(i) = i then
while j < m - i
if bmGs(j) = m bmGs(j) = m - i
j = j + 1
wend
fi
next
for i = 1 to m-1
bmGs(m - suff(i)) = m - i
next
end sub
 
sub BM(pat$, s$, case_insensitive)
local pins
 
pins$ = "'" + pat$ + "' in " + "'" + s$ + "'"
if case_insensitive then
pat$ = lower$(pat$)
s$ = lower$(s$)
fi
//* Preprocessing *//
preBmGs(pat$)
preBmBc(pat$)
//* Searching *//
j = 0
n = len(s$)
m = len(pat$)
i = m
while j <= n - m
i = i - 1
if mid$(pat$,i,1) <> mid$(s$,i+j,1) break
if i < 1 then j = j + bmGs(0) else j = j + max(bmGs(i), bmBc(len(mid$(s$,i+j,1)) - m + i)) : fi
wend
many = instr(s$, pat$)
tmp$ = ""
if not many > 0 then
print "No ", pins$
else
while many > 0 //if not found loop will be skipped
tmp$ = tmp$ + str$(many) + ","
many = instr(s$, pat$, many + 1)
wend
print "Found ", pins$, " at indices [", tmp$, chr$(8), "]"
fi
end sub
 
BM("GCAGAGAG","GCATCGCAGAGAGTATACAGTACG")
BM("TCTA","GCTAGCTCTACGAGTCTA")
BM("TAATAAA","GGCTATAATGCGTA")
BM("word","there would have been a time for such a word")
BM("needle","needle need noodle needle")
book$ = "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented"
BM("put",book$)
BM("and",book$)
farm$ = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk."
BM("alfalfa",farm$)</syntaxhighlight>
{{out}}
<pre>Same as QBasic entry.</pre>
2,122

edits