Jump to content

Boyer-Moore string search

From Rosetta Code
(Redirected from Boyer-Moore String Search)
Task
Boyer-Moore string search
You are encouraged to solve this task according to the task description, using any language you may know.
Task


This algorithm is designed for pattern searching on certain types of devices which are backtracking-unfriendly such as Tape drives and Hard disks.

It works by first caching a segment of data from storage and match it against the pattern from the highest position backward to the lowest. If the matching fails, it will cache next segment of data and move the start point forward to skip the portion of data which will definitely fail to match. This continues until we successfully match the pattern or the pointer exceeds the data length.

Follow this link for more information about this algorithm.

To properly test this algorithm, it would be good to search for a string which contains repeated subsequences, such as alfalfa and the text being searched should not only include a match but that match should be preceded by words which partially match, such as alfredo, behalf, calfskin, halfhearted, malfunction or severalfold.

11l

Translation of: Python
V ALPHABET_SIZE = 256

F alphabet_index(Char c) -> Int
   ‘Return the index of the given ASCII character.’
   V val = c.code
   assert(Int(val) C 0 .< :ALPHABET_SIZE)
   R val

F match_length(String s, Int =idx1, Int =idx2) -> Int
   ‘Return the length of the match of the substrings of S beginning at idx1 and idx2.’
   I idx1 == idx2
      R s.len - idx1
   V match_count = 0
   L idx1 < s.len & idx2 < s.len & s[idx1] == s[idx2]
      match_count++
      idx1++
      idx2++
   R match_count

F fundamental_preprocess(String s) -> [Int]
   ‘Return Z, the Fundamental Preprocessing of S.

    Z[i] is the length of the substring beginning at i which is also a prefix of S.
    This pre-processing is done in O(n) time, where n is the length of S.’
   I s.empty
      R []
   I s.len == 1
      R [1]
   V z = [0] * s.len
   z[0] = s.len
   z[1] = match_length(s, 0, 1)
   L(i) 2 .< 1 + z[1]
      z[i] = z[1] - i + 1
   V l = 0
   V r = 0
   L(i) 2 + z[1] .< s.len
      I i <= r
         V k = i - l
         V b = z[k]
         V a = r - i + 1
         I b < a
            z[i] = b
         E
            z[i] = a + match_length(s, a, r + 1)
            l = i
            r = i + z[i] - 1
      E
         z[i] = match_length(s, 0, i)
         I z[i] > 0
            l = i
            r = i + z[i] - 1
   R z

F bad_character_table(String s) -> [[Int]]
   ‘
    Generates R for S, which is an array indexed by the position of some character c in the
    ASCII table. At that index in R is an array of length |S|+1, specifying for each
    index i in S (plus the index after S) the next location of character c encountered when
    traversing S from right to left starting at i. This is used for a constant-time lookup
    for the bad character rule in the Boyer-Moore string search algorithm, although it has
    a much larger size than non-constant-time solutions.
   ’
   I s.empty
      R [[Int]()] * :ALPHABET_SIZE
   V r = [[-1]] * :ALPHABET_SIZE
   V alpha = [-1] * :ALPHABET_SIZE
   L(c) s
      alpha[alphabet_index(c)] = L.index
      L(a) alpha
         r[L.index].append(a)
   R r

F good_suffix_table(String s) -> [Int]
   ‘
    Generates L for S, an array used in the implementation of the strong good suffix rule.
    L[i] = k, the largest position in S such that S[i:] (the suffix of S starting at i) matches
    a suffix of S[:k] (a substring in S ending at k). Used in Boyer-Moore, L gives an amount to
    shift P relative to T such that no instances of P in T are skipped and a suffix of P[:L[i]]
    matches the substring of T matched by a suffix of P in the previous match attempt.
    Specifically, if the mismatch took place at position i-1 in P, the shift magnitude is given
    by the equation len(P) - L[i]. In the case that L[i] = -1, the full shift table is used.
    Since only proper suffixes matter, L[0] = -1.
   ’
   V l = [-1] * s.len
   V _n_ = fundamental_preprocess(reversed(s))
   _n_.reverse()
   L(j) 0 .< s.len - 1
      V i = s.len - _n_[j]
      I i != s.len
         l[i] = j
   R l

F full_shift_table(String s) -> [Int]
   ‘
    Generates F for S, an array used in a special case of the good suffix rule in the Boyer-Moore
    string search algorithm. F[i] is the length of the longest suffix of S[i:] that is also a
    prefix of S. In the cases it is used, the shift magnitude of the pattern P relative to the
    text T is len(P) - F[i] for a mismatch occurring at i-1.
   ’
   V f = [0] * s.len
   V Z = fundamental_preprocess(s)
   V longest = 0
   L(zv) reversed(Z)
      V i = L.index
      longest = I zv == i + 1 {max(zv, longest)} E longest
      f[(len)-i - 1] = longest
   R f

F string_search(P, _t_) -> [Int]
   ‘
    Implementation of the Boyer-Moore string search algorithm. This finds all occurrences of P
    in T, and incorporates numerous ways of pre-processing the pattern to determine the optimal
    amount to shift the string and skip comparisons. In practice it runs in O(m) (and even
    sublinear) time, where m is the length of T. This implementation performs a case-sensitive
    search on ASCII characters. P must be ASCII as well.
   ’
   I P.empty | _t_.empty | _t_.len < P.len
      R []

   [Int] matches

   V r = bad_character_table(P)
   V l = good_suffix_table(P)
   V f = full_shift_table(P)

   V k = P.len - 1
   V previous_k = -1
   L k < _t_.len
      V i = P.len - 1
      V h = k
      L i >= 0 & h > previous_k & P[i] == _t_[h]
         i--
         h--
      I i == -1 | h == previous_k
         matches.append(k - P.len + 1)
         k += I P.len > 1 {P.len - f[1]} E 1
      E
         V char_shift = i - r[alphabet_index(_t_[h])][i]
         Int suffix_shift
         I i + 1 == P.len
            suffix_shift = 1
         E I l[i + 1] == -1
            suffix_shift = P.len - f[i + 1]
         E
            suffix_shift = P.len - 1 - l[i + 1]
         V shift = max(char_shift, suffix_shift)
         previous_k = I shift >= i + 1 {k} E previous_k
         k += shift
   R matches

V TEXT1 = ‘InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented’
V TEXT2 = ‘Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.’
V (PAT1, PAT2, PAT3) = (‘put’, ‘and’, ‘alfalfa’)

print(‘Found ’PAT1‘ at: ’string_search(PAT1, TEXT1))
print(‘Found ’PAT2‘ at: ’string_search(PAT2, TEXT1))
print(‘Found ’PAT3‘ at: ’string_search(PAT3, TEXT2))
Output:
Found put at: [26, 90]
Found and at: [101, 128, 171]
Found alfalfa at: [33, 87]

C++

#include <algorithm>
#include <cstdint>
#include <experimental/iterator>
#include <functional>
#include <iomanip>
#include <iostream>
#include <string>
#include <vector>

void display(const std::vector<int32_t>& numbers) {
	std::cout << "[";
    std::copy(std::begin(numbers), std::end(numbers), std::experimental::make_ostream_joiner(std::cout, ", "));
    std::cout << "]" << std::endl;
}

int32_t string_search_single(const std::string& haystack, const std::string& needle) {
	auto it = std::search(haystack.begin(), haystack.end(), std::boyer_moore_searcher(needle.begin(), needle.end()));

	if ( it != haystack.end() ) {
		return std::distance(haystack.begin(), it);
	}
	return -1;
}

std::vector<int32_t> string_search(const std::string& haystack, const std::string& needle) {
	std::vector<int32_t> result = {};
	uint64_t start = 0;
	int32_t index = 0;
	while ( index >= 0 && start < haystack.length() ) {
		std::string haystackReduced = haystack.substr(start);
		index = string_search_single(haystackReduced, needle);
		if ( index >= 0 ) {
			result.push_back(start + index);
			start += index + needle.length();
		}
	}
	return result;
}

int main() {
	const std::vector<std::string> texts = {
		"GCTAGCTCTACGAGTCTA",
		"GGCTATAATGCGTA",
		"there would have been a time for such a word",
		"needle need noodle needle",
		"DKnuthusesandprogramsanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguages",
		"Nearby farms grew an acre of alfalfa on the dairy's behalf, with bales of that alfalfa exchanged for milk."
	};

	const std::vector<std::string> patterns = { "TCTA", "TAATAAA", "word", "needle", "and", "alfalfa" };

	for ( uint64_t i = 0; i < texts.size(); ++i ) {
		std::cout << "text" << ( i + 1 ) << " = " << texts[i] << std::endl;
	}
	std::cout << std::endl;

	for ( uint64_t i = 0; i < texts.size(); ++i ) {
        std::vector<int32_t> indexes = string_search(texts[i], patterns[i]);
		std::cout << "Found " << std::quoted(patterns[i]) << " in 'text" << ( i + 1 ) << "' at indexes ";
		display(string_search(texts[i], patterns[i]));
	}
}
Output:
text1 = GCTAGCTCTACGAGTCTA
text2 = GGCTATAATGCGTA
text3 = there would have been a time for such a word
text4 = needle need noodle needle
text5 = DKnuthusesandprogramsanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguages
text6 = Nearby farms grew an acre of alfalfa on the dairy's behalf, with bales of that alfalfa exchanged for milk.

Found "TCTA" in 'text1' at indexes [6, 14]
Found "TAATAAA" in 'text2' at indexes []
Found "word" in 'text3' at indexes [40]
Found "needle" in 'text4' at indexes [0, 19]
Found "and" in 'text5' at indexes [10, 46, 73]
Found "alfalfa" in 'text6' at indexes [29, 79]

Emacs Lisp

(defun bm_compile_pattern (pattern)
  "Compile the pattern to a right most position map"
  (let ((patLen (length pattern))
        (rightMap (make-vector 256 -1))
        (j -1))
    (while (> patLen (setq j (1+ j)))
      (aset rightMap (elt pattern j) j) )
    rightMap
    )
  )
;;
(defun bm_make_suffix_table (text)
  (let ((suffix-table (make-vector (length text) -1)) (textLen (length text))
	(suffix-found nil)
	)
    (cl-loop for pos from (1- textLen) downto 1 do
	     (setq suffix-found nil)
	     (cl-loop for ptn from (- textLen 2) downto 0 while (not suffix-found) do
		      (let ((start1 pos) (end1 (1- textLen))
			    (start2 (- ptn (- (1- textLen) pos))) (end2 ptn)
			    (matched 't)
			    )
			(if (< start2 0) (setq start2 0))
			(cl-loop for idx1 from end1 downto start1 and idx2 from end2 downto start2 while matched do
				 (if (/= (elt text idx1) (elt text idx2))
				     (setq matched nil))
				 )
			(when matched
			  (aset suffix-table pos start2)
			  (setq suffix-found 't) )
			)
		      )
	     )
    suffix-table
    )
  )
;;
;;
(defun bm_substring_search (pattern text)
  "Boyer-Moore string search"
  (let ((patLen (length pattern))
        (txtLen (length text))
        (startPos 0)
        (result nil)
        (rightMap (bm_compile_pattern pattern))
	(suffixTable (bm_make_suffix_table pattern)))
    ;; Continue this loop when no result and not exceed the text length
    (while (and (not result) (<= (+ startPos patLen) txtLen))

      (let ((idx patLen)
	    (suffixSkip 0)
	    (badCharSkip 0)
            (skip 0))
        (while (and (= 0 skip) (<= 0 (setq idx (1- idx))))
	  (setq suffixSkip 0)
	  (setq badCharSkip 0)
          ;; skip when the character at position idx is different
          (when (/= (elt pattern idx) (elt text (+ startPos idx)))
	    (when (< idx (1- (length pattern)))
	      (setq suffixSkip (aref suffixTable (1+ idx))) )
	    (setq badCharSkip (- idx (aref rightMap (elt text (+ startPos idx)))))
            ;; looking up the right most position in pattern
	    
            (setq skip (max 1 badCharSkip suffixSkip))
            )
          )
        (if (< 0 skip)
            (setq startPos (+ startPos skip))
          (setq result startPos)
          )
        )
      )
    result
    )
  )
;;
(let ((pattern "alfalfa")
      (full_text "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk."))
  (bm_substring_search pattern full_text) )

)

outputs:

33

Fortran

       module bm_str
       implicit none
       private
       public  ::  bmstr
!
! PARAMETER definitions
!
       integer , private , parameter  ::  NO_OF_CHARS = 256, SIZEX=256
       contains
       pure subroutine badcharheuristic(Str,M,Badchar)
       implicit none
! Dummy arguments
!
       integer  ::  M
       integer , dimension(NO_OF_CHARS)  ::  Badchar
       character(1) , dimension(M)  ::  Str
       intent (in) M , Str
       intent (out) Badchar
!
! Local variables
!
       integer  ::  i
! Code starts here
        Badchar(NO_OF_CHARS) = -1 
       do i = 1 , M
           Badchar(iachar(Str(i))) = i
       enddo
       return
       end subroutine badcharheuristic
 
        function bmstr(Pat,M,Str,N) result(found)
       implicit none
!
! Dummy arguments
!
       integer  ::  M
       integer  ::  N
       character(len=m)   ::  Pat
       character(len=n)   ::  Str
       intent (in) M , N , Pat , Str
!
! Local variables
!
       integer , dimension(NO_OF_CHARS)  ::  badchar
       integer  ::  found
       integer  ::  i
       integer  ::  j
       integer  ::  s
! Code starts here
!
       found = -1
       if ( (M==0) .OR. (N==0) .OR. (M>N) ) return
       badchar = 0
       call badcharheuristic(Pat,M,badchar)
       i = 0
       s = 0
       do while ( s<=(N-M) )
           j = M
            do j = m,1,-1
               if((Pat(j:j) /= Str(s+j:s+j)) )exit ! Leave, the pattern doesn't match.
           enddo
           if ( j < 1 ) then ! Found, let's leave
               found = s + 1
               return
           endif
           i = badchar(iachar(Str(s+j:s+j)))
           s = s + MAX(1,j-i)
       enddo
       found = -1
       return
       end function bmstr
       end module bm_str

FreeBASIC

#define max(a, b) iif((a) > (b), (a), (b))

Dim Shared As Integer suff(), bmBc(), bmGs()

Sub preBmBc(pat As String)
    Dim As Integer m = Len(pat), i
    Redim bmBc(m)
    For i = 1 To m-1
        bmBc(Cint(pat[i])) = m - i
    Next 
End Sub

Sub suffixes(pat As String)
    Dim As Integer m = Len(pat), g = m, i, f 
    Redim suff(m)
    suff(m) = m
    For i = m-1 To 1 Step -1
        If i > g And suff(i + m - f) < i - g Then
            suff(i) = suff(i + m - f)
        Else
            If i < g Then g = i
            f = i
            While g >= 1 And pat[g] = pat[g + m - f]
                g -= 1
            Wend
            suff(i) = f - g
        End If
    Next 
End Sub

Sub preBmGs(pat As String)
    Dim As Integer m = Len(pat), j = 1, i
    Redim suff(m)
    Redim bmGs(m)
    For i = m To 1 Step -1
        If suff(i) = i Then
            While j < m - i
                If bmGs(j) = m Then bmGs(j) = m - i
                j += 1
            Wend
        End If
    Next 
    For i = 1 To m-1
        bmGs(m - suff(i)) = m - i
    Next 
End Sub

Sub BM(pat As String, s As String, case_insensitive As Boolean = False)
    Dim As String pins = "'" & pat & "' in " & "'" & s & "'"
    If case_insensitive Then
        pat = Lcase(pat)
        s = Lcase(s) 
    End If
    
    ' Preprocessing
    preBmGs(pat)
    preBmBc(pat)
    
    ' Searching
    Dim As Integer j = 0, n = Len(s), m = Len(pat), i = m
    
    While j <= n - m
        i -= 1
        If pat[i] <> s[i+j] Then Exit While
        j += Iif(i < 1, bmGs(0), max(bmGs(i), bmBc(Len(s[i+j]) - m + i)))
    Wend
    
    Dim As Integer many = Instr(s, pat)
    Dim As String tmp = ""
    
    If Not many > 0 Then
        Print "No "; pins
    Else
        Do While many > 0    'if not found loop will be skipped
            tmp &= Str(many) & ","
            many = Instr(many + 1, s, pat)
        Loop
        Print Using "Found & at indices [&]"; pins; tmp & Chr(8)
    End If
End Sub

BM("GCAGAGAG","GCATCGCAGAGAGTATACAGTACG")
BM("TCTA","GCTAGCTCTACGAGTCTA")
BM("TAATAAA","GGCTATAATGCGTA")
BM("word","there would have been a time for such a word")
BM("needle","needle need noodle needle")
Const book = "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesley" + _ 
"DKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeand" + _ 
"assemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented"
BM("put",book)
BM("and",book)
Const farm = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with " + _ 
"bales of all that alfalfa exchanged for milk."
BM("alfalfa",farm)

Sleep
Output:
Found 'GCAGAGAG' in 'GCATCGCAGAGAGTATACAGTACG' at indices [6]
Found 'TCTA' in 'GCTAGCTCTACGAGTCTA' at indices [7,15]
No 'TAATAAA' in 'GGCTATAATGCGTA'
Found 'word' in 'there would have been a time for such a word' at indices [41]
Found 'needle' in 'needle need noodle needle' at indices [1,20]
Found 'put' in 'InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented' at indices [27,91]
Found 'and' in InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented' at indices [102,129,172]
Found 'alfalfa' in 'Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.' at indices [34,88]

J

Translation of: Emacs Lisp
bmsearch0=: {{
   startPos=. 0
   rightMap=. (i.#x) (3 u: x)} 256#_1
   while. (#y) >: (#x)+startPos do.
     skip=. 0
     idx=. #x
     while. 0 <: idx=. idx-1 do.
       if. (idx{x) ~: (startPos+idx) { y do.
         skip=. 1 >. idx - rightMap {~ 3 u: y {~ startPos + idx
         break.
       end.
     end.
     if. skip do.
       startPos=. startPos+skip
     else.
       startPos return.
     end.
   end.
   ''
}}

Example use:

   'alfalfa' bmsearch0 'On behalf of an alfredo calfskin malfunction we donate alfalfa.'
55

Caution: bmsearch0 is an incomplete implementation of the algorithm. See talk page.

Here's a complete implementation of the algorithm:

Z=: {{ y {{ 1 i.~y~:(#y){.m }}\.y }}

bmsearch1=: {{
  mx=. <:nx=. #x
  my=. <:ny=. #y
  R=. |:>./\_1,(i.@# ([`]`((256#_1)"0))}&> 3&u:) x
  L=.{{ 
    j=.I.*y
    j ((#y)-y{~j)} _1#~#y }} Z&.|. x
  F=. >./\.(*]=#-i.@#) Z x
  k=. mx
  k0=. _1
  while. k < ny do.
    i=. mx
    h=. k
    while. (0<:i) * (k0<h) * (i{x) = h{y do.
      i=. i-1
      h=. h-1
    end.
    if. (_1=i)+.k0=h do.
      1+k-nx return.
    else.
      if. mx=i do.
        suffix_shift=. 1
      elseif. _1=L{~i1=. i+1 do.
        suffix_shift=. nx-F{~i1
      else.
        suffix_shift=. mx-L{~i1
      end.
      shift=. suffix_shift >. i-R{~<(3 u: h{y),i
      if. shift > i do. k0=. k end.
      k=. k + shift
    end.
  end.
  EMPTY
}}
   'alfalfa' bmsearch1 'On behalf of an alfredo calfskin malfunction we donate alfalfa.'
55

Testing performance on a relatively long text (and pattern):

   text=: 'acgt'{~?.1e7#4
   pat=. text{~9e6+i.1e5
   pat bmsearch0 text
9000000
   pat bmsearch1 text
9000000
   'pat bmsearch0 text' %&timex 'pat bmsearch1 text'
2.33477

In other words: bmsearch0 takes slightly over twice as long as bmsearch1 for a somewhat randomly picked example search (though the results are the same).

Java

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

/**
 * An implementation of the Boyer-Moore string search algorithm. 
 * It finds all occurrences of a pattern in a text, performing a case-insensitive search on ASCII characters.
 * 
 * For all full description of the algorithm visit:
 * https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm
 */
public final class BoyerMooreStringSearch {
	
	public static void main(String[] aArgs) {		
		List<String> texts = List.of(
			"GCTAGCTCTACGAGTCTA",
		    "GGCTATAATGCGTA",
		    "there would have been a time for such a word",
		    "needle need noodle needle",
		    "DKnuthusesandshowsanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustrate",
		    "Farms grew an acre of alfalfa on the dairy's behalf, with bales of that alfalfa exchanged for milk.");
		         
		List<String> patterns = List.of( "TCTA", "TAATAAA", "word", "needle", "put", "and", "alfalfa" );
		
		for ( int i = 0; i < texts.size(); i++ ) {
			System.out.println("text" + ( i + 1 ) + " = " + texts.get(i));
		}
		System.out.println();
		
		for ( int i = 0; i < patterns.size(); i++ ) {
			final int j = ( i < 5 ) ? i : i - 1;		
			System.out.println("Found \"" + patterns.get(i) + "\" in 'text" + ( j + 1 ) + "' at indexes "
				+ stringSearch(texts.get(j), patterns.get(i)));
		}
    }
	
	/**
	 * Return a list of indexes at which the given pattern matches the given text.
	 */	
	private static List<Integer> stringSearch(String aText, String aPattern) {	    
	    if ( aPattern.isEmpty() || aText.isEmpty() || aText.length() < aPattern.length() ) {
	        return Collections.emptyList();
	    }

	    List<Integer> matches = new ArrayList<Integer>();

	    // Preprocessing
	    List<List<Integer>> R = badCharacterTable(aPattern);
	    int[] L = goodSuffixTable(aPattern);
	    int[] F = fullShiftTable(aPattern);

	    int k = aPattern.length() - 1; // Represents the alignment of the end of aPattern relative to aText
	    int previousK = -1; // Represents the above alignment in the previous phase
	    while ( k < aText.length() ) {
	        int i = aPattern.length() - 1; // Index of the character to compare in aPattern
	        int h = k; // Index of the character to compare in aText
	        while ( i >= 0 && h > previousK && aPattern.charAt(i) == aText.charAt(h) ) {
	            i -= 1;
	            h -= 1;
	        }
	        if ( i == -1 || h == previousK ) { // Match has been found
	            matches.add(k - aPattern.length() + 1);
	            k += ( aPattern.length() > 1 ) ? aPattern.length() - F[1] : 1;
	        } else { // No match, shift by the maximum of the bad character and good suffix rules
	        	int suffixShift;
	            int charShift = i - R.get(alphabetIndex(aText.charAt(h))).get(i);
	            if ( i + 1 == aPattern.length() ) { // Mismatch occurred on the first character
	                suffixShift = 1;
	            } else if ( L[i + 1] == -1 ) { // Matched suffix does not appear anywhere in aPattern
	                suffixShift = aPattern.length() - F[i + 1];
	            } else { // Matched suffix appears in aPattern
	                suffixShift = aPattern.length() - 1 - L[i + 1];
	            }
	            int shift = Math.max(charShift, suffixShift);
	            if ( shift >= i + 1 ) { // Galil's rule
	            	previousK = k;
	            }
	            k += shift;
	        }
	    }
	    return matches;
	}
	
    /**
     * Create the shift table, F, for the given text, which is an array such that
     * F[i] is the length of the longest suffix of, aText.substring(i), which is also a prefix of the given text.
     * 
     * Use case: If a mismatch occurs at character index i - 1 in the pattern,
     * the magnitude of the shift of the pattern, P, relative to the text is: P.length() - F[i].
     */ 
	private static int[] fullShiftTable(String aText) {	    
	    int[] F = new int[aText.length()];
	    int[] Z = fundamentalPreprocess(aText);
	    int longest = 0;
	    Collections.reverse(Arrays.asList(Z));
	    for ( int i = 0; i < Z.length; i++ ) {
	    	int zv = Z[i];	
	    	if ( zv == i + 1 ) {
	    		longest = Math.max(zv, longest);
	    	}
	        F[F.length - i - 1] = longest;
	    }
	    return F;
	}
	
	/**
     * Create the good suffix table, L, for the given text, which is an array such that
     * L[i] = k, is the largest index in the given text, S,
     * such that S.substring(i) matches a suffix of S.substring(0, k).
     * 
     * Use case: If a mismatch of characters occurs at index i - 1 in the pattern, P,
     * then a shift of magnitude, P.length() - L[i], is such that no instances of the pattern in the text are omitted. 
     * Furthermore, a suffix of P.substring(0, L[i]) matches the same substring of the text that was matched by a
     * suffix in the pattern on the previous matching attempt.
     * In the case that L[i] = -1, the full shift table must be used.
     */
	private static int[] goodSuffixTable(String aText) {
		int[] L = IntStream.generate( () -> -1 ).limit(aText.length()).toArray();
	    String reversed = new StringBuilder(aText).reverse().toString();
	    int[] N = fundamentalPreprocess(reversed);
	    Collections.reverse(Arrays.asList(N));
	    for ( int j = 0; j < aText.length() - 1; j++ ) {
	        int i = aText.length() - N[j];
	        if ( i != aText.length() ) {
	            L[i] = j;
	        }
	    }
	    return L;
	}
	
	/**
     * Create the bad character table, R, for the given text,
     * which is a list indexed by the ASCII value of a character, C, in the given text.
     * 
     * Use case: The entry at index i of R is a list of size: 1 + length of the given text.
     * This inner list gives at each index j the next position at which character C will be found
     * while traversing the given text from right to left starting from index j.
     */
	private static List<List<Integer>> badCharacterTable(String aText) {	    
	    if ( aText.isEmpty() ) {
	        return Collections.emptyList();
	    }
	    
	    List<List<Integer>> R = IntStream.range(0, ALPHABET_SIZE).boxed()
	    	.map( i -> new ArrayList<Integer>(Collections.nCopies(1,-1)) ).collect(Collectors.toList());
	    List<Integer> alpha = new ArrayList<Integer>(Collections.nCopies(ALPHABET_SIZE, -1));
	    
	    for ( int i = 0; i < aText.length(); i++ ) {
	        alpha.set(alphabetIndex(aText.charAt(i)), i);
	        for ( int j = 0; j < alpha.size(); j++ ) {
	            R.get(j).add(alpha.get(j));
	        }
	    }
	    return R;
	}
	
	/**
	 * Create the fundamental preprocess, Z, of the given text, which is an array such that
     * Z[i] is the length of the substring of the given text beginning at index i which is also a prefix of the text.
     */
	private static int[] fundamentalPreprocess(String aText) {	    
	    if ( aText.isEmpty() ) {
	        return new int[0];
	    }
	    if ( aText.length() == 1 ) {
	        return new int[] { 1 };
	    }
	    
	    int[] Z = new int[aText.length()];	
	    Z[0] = aText.length();
	    Z[1] = matchLength(aText, 0, 1);
	    for ( int i = 2; i <= Z[1]; i++ ) {  
	        Z[i] = Z[1] - i + 1;
	    }
	    
	    // Define the left and right limits of the z-box
	    int left = 0;
	    int right = 0;
	    for ( int i = 2 + Z[1]; i < aText.length(); i++ ) {
	        if ( i <= right ) { // i falls within existing z-box
	            final int k = i - left;
	            final int b = Z[k];
	            final int a = right - i + 1;
	            if ( b < a ) { // b ends within existing z-box
	                Z[i] = b;
	            } else { // b ends at or after the end of the z-box,
	            		 // an explicit match to the right of the z-box is required
	                Z[i] = a + matchLength(aText, a, right + 1);
	                left = i;
	                right = i + Z[i] - 1;
	            }
	        } else { // i does not fall within existing z-box
	            Z[i] = matchLength(aText, 0, i);
	            if ( Z[i] > 0 ) {
	                left = i;
	                right = i + Z[i] - 1;
	            }
	        }
	    }	    
	    return Z;
	}
	
	/**
	 * Return the length of the match of the two substrings of the given text beginning at each of the given indexes.
	 */ 
	private static int matchLength(String aText, int aIndexOne, int aIndexTwo) {	    
	    if ( aIndexOne == aIndexTwo ) {
	        return aText.length() - aIndexOne;
	    }
	    
	    int matchCount = 0;
	    while ( aIndexOne < aText.length() && aIndexTwo < aText.length() 
	    	    && aText.charAt(aIndexOne) == aText.charAt(aIndexTwo) ) {
	        matchCount += 1;
	        aIndexOne += 1;
	        aIndexTwo += 1;
	    }
	    return matchCount;
	}	
	
	/**
	 * Return the ASCII index of the given character, if it is such, otherwise throw an illegal argument exception.
	 */ 
	private static int alphabetIndex(char aChar) {		
		final int result = (int) aChar;
		if ( result >= ALPHABET_SIZE ) {
			throw new IllegalArgumentException("Not an ASCII character:" + aChar);
		}
		return result;
	} 	

	/* The range of ASCII characters is 0..255, both inclusive. */
	private static final int ALPHABET_SIZE = 256;
	
}
text1 = GCTAGCTCTACGAGTCTA
text2 = GGCTATAATGCGTA
text3 = there would have been a time for such a word
text4 = needle need noodle needle
text5 = DKnuthusesandshowsanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustrate
text6 = Farms grew an acre of alfalfa on the dairy's behalf, with bales of that alfalfa exchanged for milk.

Found "TCTA" in 'text1' at indexes [6, 14]
Found "TAATAAA" in 'text2' at indexes []
Found "word" in 'text3' at indexes [40]
Found "needle" in 'text4' at indexes [0, 19]
Found "put" in 'text5' at indexes [32]
Found "and" in 'text5' at indexes [10, 43, 70]
Found "alfalfa" in 'text6' at indexes [22, 72]

Julia

Some of this code is derived from Python code by Ben Langmead. Because of that ancestry, it returns results that are 0-based in indexing, though Julia native 1-based indexing is used within the individual functions. The theorem references in the comments are to the definitive work of Daniel M. Gusfield, "Algorithms on Strings, Trees, and Sequences", 1997, available online as a pdf in several publicly indexed archives on the internet.

""" Rosetta Code task at rosettacode.org/mw/index.php?title=Boyer-Moore_string_search """

const ASCIIchars = String([Char(i) for i = 0:255]) # any standard or extended ASCII char
const NAChars = String(['A', 'C', 'G', 'T', 'U', 'N'])  # RNA and DNA bases
const lowercase_alphabet = "abcdefghijklmnopqrstuvwxyz " # a-z and space

""" Use Z algorithm to preprocess s """
function z_array(s)
    len = length(s)
    @assert len > 1
    z = [len; zeros(Int, len - 1)]
    # Initial comparison of s[1:] with prefix
    for i = 1:len-1
        if s[i+1] == s[i]
            z[2] += 1
        else
            break
        end
    end
    r, l = 0, 0
    if z[2] > 0
        r, l = z[2], 1
    end
    for k = 2:len-1
        @assert z[k+1] == 0
        if k > r
            # Case 1
            for i = k:len-1
                if s[i+1] == s[i-k+1]
                    z[k+1] += 1
                else
                    break
                end
            end
            r, l = k + z[k+1] - 1, k
        else
            # Case 2
            # Calculate length of beta
            nbeta = r - k + 1
            zkp = z[k-l+1]
            if nbeta > zkp
                # Case 2a: Zkp wins
                z[k+1] = zkp
            else
                # Case 2b: Compare characters just past r
                nmatch = 0
                for i = r+1:len-1
                    if s[i+1] == s[i-k+1]
                        nmatch += 1
                    else
                        break
                    end
                end
                l, r = k, r + nmatch
                z[k+1] = r - k + 1
            end
        end
    end
    return z
end

""" Compile the N array (Gusfield theorem 2.2.2) from the Z array """
function n_array(s)
    return reverse(z_array(reverse(s)))
end

"""
Compile L' array (Gusfield theorem 2.2.2) using p and N array.
L'[i] = largest index j less than n such that N[j] = |P[i:]|
"""
function big_l_prime_array(p, n)
    len = length(p)
    lp = zeros(Int, len)
    for j = 1:len-1
        i = len - n[j]
        if i < len
            lp[i+1] = j + 1
        end
    end
    return lp
end

"""
Compile L array (Gusfield theorem 2.2.2) using p and L' array.
L[i] = largest index j less than n such that N[j] >= |P[i:]|
"""
function big_l_array(p, lp)
    l = zeros(Int, length(p))
    l[2] = lp[2]
    for i = 3:length(p)
        l[i] = max(l[i-1], lp[i])
    end
    return l
end

""" Compile lp' array (Gusfield theorem 2.2.4) using N array. """
function small_l_prime_array(narray)
    len = length(narray)
    small_lp = zeros(Int, len)
    for i in eachindex(narray)
        if narray[i] == i  # prefix matching a suffix
            small_lp[len-i+1] = i
        end
    end
    for i = len-1:-1:1  # "smear" them out to the left
        if small_lp[i] == 0
            small_lp[i] = small_lp[i+1]
        end
    end
    return small_lp
end

""" Return tables needed to apply good suffix rule. """
function good_suffix_table(p)
    n = n_array(p)
    lp = big_l_prime_array(p, n)
    return lp, big_l_array(p, lp), small_l_prime_array(n)
end

"""
Given a mismatch at offset i, and given L/L' and l' arrays,
return amount to shift as determined by good suffix rule.
"""
function good_suffix_mismatch(i, big_l_prime, small_l_prime)
    len = length(big_l_prime)
    @assert i < len
    if i == len - 1
        return 0
    end
    i += 1  # i points to leftmost matching position of P
    if big_l_prime[i] > 0
        return len - big_l_prime[i]
    end
    return len - small_l_prime[i]
end

""" Given a full match of P to T, return amount to shift as determined by good suffix rule. """
good_suffix_match(small_l_prime) = length(small_l_prime) - small_l_prime[2]

"""
Given pattern string and list with ordered alphabet characters, create and return
a dense bad character table.  Table is indexed by offset then by character position in alphabet.
"""
function dense_bad_char_tab(p, amap)
    tab = Vector{Int}[]
    nxt = zeros(Int, length(amap))
    for i in eachindex(p)
        c = p[i]
        @assert haskey(amap, c)
        push!(tab, nxt[:])
        nxt[amap[c]] = i
    end
    return tab
end

""" Encapsulates pattern and associated Boyer-Moore preprocessing. """
struct BoyerMoore
    pat::String
    alphabet::String
    amap::Dict{Char,Int}
    bad_char::Vector{Vector{Int}}
    big_l::Vector{Int}
    small_l_prime::Vector{Int}
end

function BoyerMoore(p, alphabet = "ACGT")
    # Create map from alphabet characters to integers
    amap = Dict(alphabet[i] => i for i in eachindex(alphabet))
    # Make bad character rule table
    bad_char = dense_bad_char_tab(p, amap)
    # Create good suffix rule table
    _, big_l, small_l_prime = good_suffix_table(p)
    return BoyerMoore(p, alphabet, amap, bad_char, big_l, small_l_prime)
end

""" Return # skips given by bad character rule at offset i """
function bad_character_rule(bm, i, c)
    @assert haskey(bm.amap, c)
    ci = bm.amap[c]
    @assert i > bm.bad_char[i+1][ci] - 1
    return i - (bm.bad_char[i+1][ci] - 1)
end

""" Given a mismatch at offset i, return amount to shift per (weak) good suffix rule. """
function good_suffix_rule(bm, i)
    len = length(bm.big_l)
    @assert i < len
    if i == len - 1
        return 0
    end
    i += 1  # i points to leftmost matching position of P
    if bm.big_l[i+1] > 0
        return len - bm.big_l[i+1]
    end
    return len - bm.small_l_prime[i+1]
end

""" Return amount to shift in case where P matches T """
match_skip(bm) = length(bm.small_l_prime) - bm.small_l_prime[2]

#Let's make sure our rules give the expected results.
# GCTAGCTCTACGAGTCTA
p = "TCAA"
p_bm = BoyerMoore(p, "ACGT")
@show p_bm.amap, p_bm.bad_char
@show bad_character_rule(p_bm, 2, 'T') # 2

# GCTAGCTCTACGAGTCTA
# ACTA
p = "ACTA"
p_bm = BoyerMoore(p, "ACGT")
@show good_suffix_rule(p_bm, 0) # 3

# ACACGCTCTACGAGTCTA
# ACAC
p = "ACAC"
p_bm = BoyerMoore(p, "ACGT")
@show match_skip(p_bm) # 2

""" Do Boyer-Moore matching """
function boyer_moore(p, p_bm, t)
    i = 0
    occurrences = Int[]
    while i < length(t) - length(p) + 1
        shift = 1
        mismatched = false
        for j = length(p)-1:-1:0
            if p[j+1] != t[i+j+1]
                skip_bc = bad_character_rule(p_bm, j, t[i+j+1])
                skip_gs = good_suffix_rule(p_bm, j)
                shift = max(shift, skip_bc, skip_gs)
                mismatched = true
                break
            end
        end
        if !mismatched
            push!(occurrences, i)
            skip_gs = match_skip(p_bm)
            shift = max(shift, skip_gs)
        end
        i += shift
    end
    return occurrences
end

""" Do Boyer-Moore matching counts """
function boyer_moore_with_counts(p, p_bm, t)
    i = 0
    occurrences = Int[]
    alignments_tried, comparison_count = 0, 0
    while i < length(t) - length(p) + 1
        alignments_tried += 1
        shift = 1
        mismatched = false
        for j = length(p)-1:-1:0
            comparison_count += 1
            if p[j+1] != t[i+j+1]
                skip_bc = bad_character_rule(p_bm, j, t[i+j+1])
                skip_gs = good_suffix_rule(p_bm, j)
                shift = max(shift, skip_bc, skip_gs)
                mismatched = true
                break
            end
        end
        if !mismatched
            push!(occurrences, i)
            skip_gs = match_skip(p_bm)
            shift = max(shift, skip_gs)
        end
        i += shift
    end
    return occurrences, alignments_tried, comparison_count
end

const t1 = "GCTAGCTCTACGAGTCTA"
const p1 = "TCTA"
const p_bm1 = BoyerMoore(p1, "ACGT")
@show boyer_moore(p1, p_bm1, t1)

const t2 = "GGCTATAATGCGTA"
const p2 = "TAATAAA"
const p_bm2 = BoyerMoore(p2, "ACGT")
@show bad_character_rule(p_bm2, 1, 'T')

const p3 = "word"
const t3 = "there would have been a time for such a word"
const p_bm3 = BoyerMoore(p3, lowercase_alphabet)
occurrences, num_alignments, num_character_comparisons = boyer_moore_with_counts(p3, p_bm3, t3)
@show occurrences, num_alignments, num_character_comparisons

const p4 = "needle"
const t4 = "needle need noodle needle"
const p_bm4 = BoyerMoore(p4, lowercase_alphabet)
occurrences, num_alignments, num_character_comparisons = boyer_moore_with_counts(p4, p_bm4, t4)
@show occurrences, num_alignments, num_character_comparisons

const p5 = "put"
const t5 = "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented"
const p_bm5 = BoyerMoore(p5, ASCIIchars)
occurrences, num_alignments, num_character_comparisons = boyer_moore_with_counts(p5, p_bm5, t5)
@show occurrences, num_alignments, num_character_comparisons

p6 = "and"
p_bm6 = BoyerMoore(p6, ASCIIchars)
occurrences, num_alignments, num_character_comparisons = boyer_moore_with_counts(p6, p_bm6, t5)
@show occurrences, num_alignments, num_character_comparisons

p7 = "alfalfa"
t7 = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk."
p_bm7 = BoyerMoore(p7, ASCIIchars)
occurrences, num_alignments, num_character_comparisons = boyer_moore_with_counts(p7, p_bm7, t7)
@show occurrences, num_alignments, num_character_comparisons
Output:
(p_bm.amap, p_bm.bad_char) = (Dict('A' => 1, 'G' => 3, 'T' => 4, 'C' => 2), [[0, 0, 0, 0], [0, 0, 0, 1], [0, 2, 0, 1], [3, 2, 0, 1]])
bad_character_rule(p_bm, 2, 'T') = 2
good_suffix_rule(p_bm, 0) = 3       
match_skip(p_bm) = 2
boyer_moore(p1, p_bm1, t1) = [6, 14] 
bad_character_rule(p_bm2, 1, 'T') = 1
(occurrences, num_alignments, num_character_comparisons) = ([40], 12, 15)
(occurrences, num_alignments, num_character_comparisons) = ([0, 19], 5, 18)
(occurrences, num_alignments, num_character_comparisons) = ([26, 90], 70, 78)       
(occurrences, num_alignments, num_character_comparisons) = ([101, 128, 171], 71, 79)
(occurrences, num_alignments, num_character_comparisons) = ([33, 87], 20, 42)  

Nim

Translation of: Python
#[ This version uses ASCII for case-sensitive matching. For Unicode you may want to match in UTF-8
   bytes instead of creating a 0x10FFFF-sized table.
]#

import std/[algorithm, sequtils, strutils]

const AlphabetSize = 256

func reversed(s: string): string =
  ## Return the reverse of an ASCII string.
  for i in countdown(s.high, 0):
    result.add s[i]

proc alphabetIndex(c: char): int =
  ## Return the index of the given ASCII character.
  result = ord(c)
  assert result in 0..<ALPHABET_SIZE

proc matchLength(s: string; idx1, idx2: int): int =
  ## Return the length of the match of the substrings of "s" beginning at "idx1" and "idx2".
  if idx1 == idx2: return s.len - idx1
  var idx1 = idx1
  var idx2 = idx2
  while idx1 < s.len and idx2 < s.len and s[idx1] == s[idx2]:
    inc result
    inc idx1
    inc idx2

proc fundamentalPreprocess(s: string): seq[int] =
  ## Return "z", the Fundamental Preprocessing of "s".

  # "z[i]" is the length of the substring beginning at "i" which is also a prefix of "s".
  # This preprocessing is done in O(n) time, where n is the length of "s".
  if s.len == 0: return
  if s.len == 1: return @[1]
  result = repeat(0, s.len)
  result[0] = s.len
  result[1] = s.matchLength(0, 1)
  for i in 2..result[1]:
    result[i] = result[1] - i + 1
  # Defines lower and upper limits of z-box.
  var l, r = 0
  for i in (2 + result[1])..s.high:
    if i <= r:    # "i" falls within existing z-box.
      let k = i - l
      let b = result[k]
      let a = r - i + 1
      if b < a:   # "b" ends within existing z-box.
        result[i] = b
      else:  # "b" ends at or after the end of the z-box.
        # We need to do an explicit match to the right of the z-box.
        result[i] = a + s.matchLength(a, r + 1)
        l = i
        r = i + result[i] - 1
    else:  # "i" does not reside within existing z-box.
      result[i] = s.matchLength(0, i)
      if result[i] > 0:
        l = i
        r = i + result[i] - 1

proc badCharacterTable(s: string): seq[seq[int]] =
  ## Generates "R" for "s", which is an array indexed by the position of some character "c"
  ## in the ASCII table.

  # At that index in "R" is an array of length |s|+1, specifying for each index "i" in "s"
  # (plus the index after "s") the next location of character "c" encountered when traversing
  # "S" from right to left starting at "i". This is used for a constant-time lookup for the bad
  # character rule in the Boyer-Moore string search algorithm, although it has a much larger
  # size than non-constant-time solutions.
  if s.len == 0: return newSeqWith(AlphabetSize, newSeq[int]())
  result = repeat(@[-1], AlphabetSize)
  var alpha = repeat(-1, AlphabetSize)
  for i, c in s:
    alpha[alphabetIndex(c)] = i
    for j, a in alpha:
      result[j].add a

proc goodSuffixTable(s: string): seq[int] =
  ## Generates "L" for "s", an array used in the implementation of the strong good suffix rule.

  # "L[i] = k", the largest position in S such that "s[i:]" (the suffix of "s" starting at "i")
  # matches a suffix of "s[:k]" (a substring in "s" ending at "k"). Used in Boyer-Moore, "L"
  # gives an amount to shift "P" relative to "T" such that no instances of "P" in "T" are skipped
  # and a suffix of "P[:L[i]]" matches the substring of "T" matched by a suffix of "P" in the
  # previous match attempt.
  # Specifically, if the mismatch took place at position "i-1" in "P", the shift magnitude is
  # given by the formula "len(P) - L[i]". In the case that "L[i] = -1", the full shift table
  # is used. Since only proper suffixes matter, "L[0] = -1".
  result =repeat(-1, s.len)
  var n = fundamentalPreprocess(reversed(s))
  n.reverse()
  for j in 0..(s.len - 2):
    let i = s.len - n[j]
    if i != s.len:
      result[i] = j

proc fullShiftTable(s: string): seq[int] =
  ## Generates "F" for "s", an array used in a special case of the good suffix rule in the
  ## Boyer-Moore string search algorithm.

  # "F[i]" is the length of the longest suffix of "s[i:]" that is also a prefix of "s". In
  # the cases it is used, the shift magnitude of the pattern "P" relative to the text "T" is
  # "len(P) - F[i]" for a mismatch occurring at "i-1".
  result = repeat(0, s.len)
  let z = fundamentalPreprocess(s)
  var longest = 0
  for i, zv in reversed(z):
    if zv == i + 1:
      longest = max(zv, longest)
      result[^(i + 1)] = longest

proc stringSearch(p, t: string): seq[int] =
  ## Implementation of the Boyer-Moore string search algorithm.

  # This finds all occurrences of "p" in "t", and incorporates numerous ways of preprocessing
  # the pattern to determine the optimal amount to shift the string and skip comparisons.
  # In practice it runs in O(m) (and even sublinear) time, where "m" is the length of "t".
  # This implementation performs a case-sensitive search on ASCII characters. "p" must be
  # ASCII as well.

  if p.len == 0 or t.len == 0 or t.len < p.len: return

  # Preprocessing
  let r = badCharacterTable(p)
  let l = goodSuffixTable(p)
  let f = fullShiftTable(p)

  var k = p.len - 1   # Represents alignment of end of "p" relative to "t".
  var prevk = -1      # Represents alignment in previous phase (Galil's rule).
  while k < t.len:
    var i = p.len - 1  # Character to compare in "p".
    var h = k          # Character to compare in "t".
    while i >= 0 and h > prevk and p[i] == t[h]:  # Matches starting from end of "p".
      dec i
      dec h
    if i == -1 or h == prevk:  # Match has been found (Galil's rule).
      result.add k - p.len + 1
      inc k, (if p.len > 1: p.len - f[1] else: 1)
    else:  # No match: shift by max of bad character and good suffix rules.
      let charShift = i - r[alphabetIndex(t[h])][i]
      let suffixShift = if i + 1 == p.len:    # Mismatch happened on first attempt.
                          1
                        elif l[i + 1] == -1:  # Matched suffix does not appear anywhere in "p".
                          p.len - f[i + 1]
                        else:                 # Matched suffix appears in "p".
                          p.len - 1 - l[i + 1]
      let shift = max(charShift, suffixShift)
      if shift >= i + 1: prevk = k  # Galil's rule
      inc k, shift

const
  Text1 = "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesley" &
          "DKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassembly" &
          "languagestoillustratetheconceptsandalgorithmsastheyarepresented"
  Text2 = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, " &
          "with bales of all that alfalfa exchanged for milk."
  (Pat1, Pat2, Pat3) = ("put", "and", "alfalfa")

echo "Found '", Pat1, "' at: ", Pat1.stringSearch(Text1).join(", ")
echo "Found '", Pat2, "' at: ", Pat2.stringSearch(Text1).join(", ")
echo "Found '", Pat3, "' at: ", Pat3.stringSearch(Text2).join(", ")
Output:
Found 'put' at: 26, 90
Found 'and' at: 101, 128, 171
Found 'alfalfa' at: 33, 87

Pascal

Works with FPC (currently only version 3.3.1).

program BMTest;
{$mode objfpc}{$h+}
{$modeswitch functionreferences}
{$modeswitch anonymousfunctions}
uses
  SysUtils, Math;

type
  TSearchFun = reference to function(const s: rawbytestring): specialize TArray<SizeInt>;

{ returns a function that performs a case-sensitive search for all occurrences(1-based)
  of the specified pattern using the Boyer-Moore algorithm with Galil optimization }
function BmgCreate(const aPattern: rawbytestring): TSearchFun;
var
  BcTable: array[Byte] of Integer; //bad character shifts
  p: PByte absolute aPattern;
  procedure FillBcTable;
  var
    I: Integer;
  begin
    FillDWord(BcTable, Length(BcTable), DWord(Length(aPattern)));
    for I := 0 to Length(aPattern) - 2 do
      BcTable[p[I]] := Pred(Length(aPattern) - I);
  end;
var
  GsTable: array of Integer = nil; //good suffix shifts
  procedure MakeGsTable;
    function IsPrefix(aPos: Integer): Boolean;
    var
      I, SuffixLen: Integer;
    begin
      SuffixLen := Length(aPattern) - aPos;
      for I := 0 to Pred(SuffixLen) do
        if (p[I] <> p[aPos + I]) then exit(False);
      Result := True;
    end;
    function GetSuffixLen(aPos: Integer): Integer;
    begin
      Result := 0;
      while(Result < aPos)and(p[aPos - Result] = p[Pred(Length(aPattern) - Result)])do
        Inc(Result);
    end;
  var
    I, LastPrefix, SuffixLen: Integer;
  begin
    SetLength(GsTable, Length(aPattern));
    LastPrefix := Pred(Length(aPattern));
    for I := Pred(Length(aPattern)) downto 0 do begin
      if IsPrefix(Succ(I)) then
        LastPrefix := Succ(I);
      GsTable[I] := LastPrefix + Length(aPattern) - Succ(I);
    end;
    for I := 0 to Length(aPattern) - 2 do begin
      SuffixLen := GetSuffixLen(I);
      if p[I - SuffixLen] <> p[Pred(Length(aPattern) - SuffixLen)] then
        GsTable[Pred(Length(aPattern) - SuffixLen)] := Pred(Length(aPattern) + SuffixLen - I);
    end;
  end;
var
  Needle: rawbytestring;
begin
  if aPattern <> '' then begin
    Needle := aPattern;
    FillBcTable;
    MakeGsTable;
  end else
    Needle := '';
{ returns an empty array if there are no matches or the pattern is empty }
  Result := function(const aHaystack: rawbytestring): specialize TArray<SizeInt>
  var
    pNeedle: PByte absolute Needle;
    pHaystack: PByte absolute aHaystack;
    I, J, NeedleLast, OutPos, OldPfxEnd: SizeInt;
  const
    OutInitLen = 4;
  begin
    Result := nil;
    if (Needle = '') or (Length(aHaystack) < Length(Needle)) then exit;
    SetLength(Result, OutInitLen);
    OutPos := 0;
    NeedleLast := Pred(Length(Needle));
    I := NeedleLast;
    OldPfxEnd := 0;
    while I < Length(aHaystack) do begin
      J := NeedleLast;
      while (J >= OldPfxEnd) and (pNeedle[J] = pHaystack[I]) do begin
        Dec(J); Dec(I);
      end;
      if J < OldPfxEnd then begin
        if OutPos = Length(Result) then SetLength(Result, OutPos * 2);
        Result[OutPos] := I - OldPfxEnd + 2;
        Inc(OutPos);
        Inc(I, Succ(GsTable[0] - OldPfxEnd));
        OldPfxEnd := Length(Needle)*2 - GsTable[0];
      end else begin
        Inc(I, Max(BcTable[pHaystack[I]], GsTable[J]));
        OldPfxEnd := 0;
      end;
    end;
    SetLength(Result, OutPos);
  end;
end;

procedure WriteArray(const a: array of SizeInt);
var
  I: SizeInt;
begin
  Write('[');
  for I := 0 to High(a) do
    if I < High(a) then Write(a[I], ', ')
    else Write(a[I]);
  WriteLn(']');
end;

const
  Text1 = 'Nearby farms grew a half acre of alfalfa on the dairy''s behalf, with bales of all that alfalfa exchanged for milk';
  Text2 = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
  Text3 = 'CAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGT';
  Text4 = 'AGGTGTGGAAACAAGCACCTAGATGTGCTGAACCCGGGGCACACGTTCAGTCAGCGACTC';
var
  BmgSearch: TSearchFun;
begin
  WriteArray(BmgCreate('alfalfa')(Text1));
  WriteArray(BmgCreate('aaaaaaaaaaaaaaaaaaaa')(Text2));
  BmgSearch := BmgCreate('CAGTCAG');
  WriteArray(BmgSearch(Text3));
  WriteArray(BmgSearch(Text4));
end.
Output:
[34, 88]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53]
[48]

Perl

Translation of: Raku
use v5.36;
use List::AllUtils <min max zip_by>;

sub suffixes ($m,@pat) {
   my($f, $g) = (0, $m-1);
   my @suff = (0) x $m-1; push @suff, $m;
   for (my $i = $m-2; $i >= 0; --$i) {
      if ($i > $g and $suff[$i + $m - 1 - $f] < $i - $g) {
         $suff[$i] = $suff[$i + $m - 1 - $f]
      } else {
         $g = $i if $i < $g;
         $f = $i;
         while ($g >= 0 and $pat[$g] eq $pat[$g + $m - 1 - $f]) { $g-- }
         $suff[$i] = $f - $g
      }
   }
   @suff
}

sub preBmGs ($m,@pat) {
   my @suff = suffixes($m,@pat);
   my @bmGs = ($m) x $m;

   for my $k (reverse 0..$m-2) {
      if ($suff[$k] == $k + 1) {
         for (my $j=0; $j < $m-1-$k; $j++) { $bmGs[$k]=$m-1-$k if $bmGs[$j] == $m }
      }
   }
   for (0..$m-2) { $bmGs[$m - 1 - $suff[$_]] = $m - 1 - $_ }
   @bmGs
}

sub BM ($txt,$pat) {
    my @txt = split '', $txt;
    my @pat = split '', $pat;
    my ($m, $n, $j)    = (length($pat), length($txt), 0);
    my @bmGs = preBmGs($m,@pat);

    my $x = min $m-1, $#pat;
    my %bmBc = zip_by { @_ } [@pat[0..$x-1]], [reverse 1..$x];

    my @I;
    while ($j <= $n - $m) {
        my $i = $m - 1;
        for (; $i >= 0 and $pat[$i] eq $txt[$i + $j]; ) { $i-- }
        if ($i < 0) {
            push @I, $j;
            $j += $bmGs[0]
        } else {
            $j += max $bmGs[$i], ($bmBc{$txt[$i + $j]} // $m)-$m+$i
        }
    }
    @I
}

my @texts = (
   "GCTAGCTCTACGAGTCTA",
   "GGCTATAATGCGTA",
   "there would have been a time for such a word",
   "needle need noodle needle",   "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented",
   "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.",
);
my @pats = <TCTA TAATAAA word needle put and alfalfa>;

say "text$_ = $texts[$_]" for 0..$#texts;
say '';

for (0.. $#pats) {
   my $j = $_ < 5 ? $_ : $_-1 ; # for searching text4 twice
   say "Found '$pats[$_]' in 'text$j' at indices " . join ', ', BM($texts[$j],$pats[$_]);
}
Output:
text0 = GCTAGCTCTACGAGTCTA
text1 = GGCTATAATGCGTA
text2 = there would have been a time for such a word
text3 = needle need noodle needle
text4 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented
text5 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.

Found 'TCTA' in 'text0' at indices 6, 14
Found 'TAATAAA' in 'text1' at indices
Found 'word' in 'text2' at indices 40
Found 'needle' in 'text3' at indices 0, 19
Found 'put' in 'text4' at indices 26, 90
Found 'and' in 'text4' at indices 101, 128, 171
Found 'alfalfa' in 'text5' at indices 33, 87

Phix

--
-- demo\rosetta\BoyerMoore.exw
-- ===========================
--
-- based on https://www-igm.univ-mlv.fr/~lecroq/string/node14.html
--
with javascript_semantics
function preBmBc(string pat)
    integer m = length(pat)
    sequence bmBc = repeat(m,255)
    for i=1 to m-1 do
      bmBc[pat[i]] = m - i
    end for
    return bmBc
end function
 
function suffixes(string pat)
    integer m = length(pat), g = m, f
    sequence suff = repeat(0,m)
    suff[m] = m;
    for i=m-1 to 1 by -1 do
        if i > g and suff[i + m - f] < i - g then
            suff[i] = suff[i + m - f]
        else
            if i < g then g = i end if
            f = i
            while g >= 1 and pat[g] == pat[g + m - f] do
                g -= 1;
            end while
            suff[i] = f - g
        end if
    end for
    return suff
end function
 
function preBmGs(string pat)
    integer m = length(pat), j = 1
    sequence suff = suffixes(pat),
             bmGs = repeat(m,m)
    for i=m to 1 by -1 do
        if (suff[i] == i) then
            while j < m - i do
                if (bmGs[j] == m) then
                    bmGs[j] = m - i;
                end if
                j += 1
            end while
        end if
    end for
    for i=1 to m-1 do
        bmGs[m - suff[i]] = m - i;
    end for
    return bmGs
end function
 
procedure BM(string pat, s, bool case_insensitive = false)
    string pins = sprintf("`%s` in `%s`",{pat,shorten(s,"characters",10)})
    if case_insensitive then
        pat = lower(pat)
        s = lower(s) 
    end if
    /* Preprocessing */
    sequence bmGs = preBmGs(pat),
             bmBc = preBmBc(pat)
    /* Searching */
    sequence res = {}
    integer i, j = 0,
            n = length(s),
            m = length(pat),
            cc = 0
    while j <= n - m do
        for i=m to 1 by -1 do
            cc += 1
            if pat[i]!=s[i+j] then exit end if
        end for
        if i<1 then
            res &= j+1
            j += bmGs[1];
        else
            j += max(bmGs[i], bmBc[s[i + j]] - m + i);
        end if
    end while
    string ccs = sprintf("(%d character comparisons)",cc)
    if length(res) then
        string many = ordinant(length(res))
        printf(1,"Found %s %s at %v %s\n",{pins,many,res,ccs})
    else
        printf(1,"No %s %s\n",{pins,ccs})
    end if
end procedure
   
BM("GCAGAGAG","GCATCGCAGAGAGTATACAGTACG")
BM("TCTA","GCTAGCTCTACGAGTCTA")
BM("TAATAAA","GGCTATAATGCGTA")
BM("word","there would have been a time for such a word")
BM("needle","needle need noodle needle")
constant book = "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesley"&
                "DKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeand"&
                "assemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented"
BM("put",book)
BM("and",book)
constant farm = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with "&
                "bales of all that alfalfa exchanged for milk."
BM("alfalfa",farm)
--BM("aLfAlfa",farm)        -- none
--BM("aLfAlfa",farm,true)   -- as -2
Output:

Significantly lower character comparison counts than Knuth-Morris-Pratt_string_search#Phix.

Found `GCAGAGAG` in `GCATCGCAGAGAGTATACAGTACG` once at {6} (17 character comparisons)
Found `TCTA` in `GCTAGCTCTACGAGTCTA` twice at {7,15} (14 character comparisons)
No `TAATAAA` in `GGCTATAATGCGTA` (4 character comparisons)
Found `word` in `there woul...uch a word (44 characters)` once at {41} (15 character comparisons)
Found `needle` in `needle need noodle needle` twice at {1,20} (18 character comparisons)
Found `put` in `Inhisbooks...epresented (202 characters)` twice at {27,91} (78 character comparisons)
Found `and` in `Inhisbooks...epresented (202 characters)` three times at {102,129,172} (79 character comparisons)
Found `alfalfa` in `Nearby far... for milk. (114 characters)` twice at {34,88} (42 character comparisons)

Python

Slightly modified from en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm#Python_implementation.

"""
This version uses ASCII for case-sensitive matching. For Unicode you may want to match in UTF-8
bytes instead of creating a 0x10FFFF-sized table.
"""

from typing import List

ALPHABET_SIZE = 256

def alphabet_index(c: str) -> int:
    """Return the index of the given ASCII character. """
    val = ord(c)
    assert 0 <= val < ALPHABET_SIZE
    return val

def match_length(S: str, idx1: int, idx2: int) -> int:
    """Return the length of the match of the substrings of S beginning at idx1 and idx2."""
    if idx1 == idx2:
        return len(S) - idx1
    match_count = 0
    while idx1 < len(S) and idx2 < len(S) and S[idx1] == S[idx2]:
        match_count += 1
        idx1 += 1
        idx2 += 1
    return match_count

def fundamental_preprocess(S: str) -> List[int]:
    """Return Z, the Fundamental Preprocessing of S.

    Z[i] is the length of the substring beginning at i which is also a prefix of S.
    This pre-processing is done in O(n) time, where n is the length of S.
    """
    if len(S) == 0:  # Handles case of empty string
        return []
    if len(S) == 1:  # Handles case of single-character string
        return [1]
    z = [0 for x in S]
    z[0] = len(S)
    z[1] = match_length(S, 0, 1)
    for i in range(2, 1 + z[1]):  # Optimization from exercise 1-5
        z[i] = z[1] - i + 1
    # Defines lower and upper limits of z-box
    l = 0
    r = 0
    for i in range(2 + z[1], len(S)):
        if i <= r:  # i falls within existing z-box
            k = i - l
            b = z[k]
            a = r - i + 1
            if b < a:  # b ends within existing z-box
                z[i] = b
            else:  # b ends at or after the end of the z-box, we need to do an explicit match to the right of the z-box
                z[i] = a + match_length(S, a, r + 1)
                l = i
                r = i + z[i] - 1
        else:  # i does not reside within existing z-box
            z[i] = match_length(S, 0, i)
            if z[i] > 0:
                l = i
                r = i + z[i] - 1
    return z

def bad_character_table(S: str) -> List[List[int]]:
    """
    Generates R for S, which is an array indexed by the position of some character c in the
    ASCII table. At that index in R is an array of length |S|+1, specifying for each
    index i in S (plus the index after S) the next location of character c encountered when
    traversing S from right to left starting at i. This is used for a constant-time lookup
    for the bad character rule in the Boyer-Moore string search algorithm, although it has
    a much larger size than non-constant-time solutions.
    """
    if len(S) == 0:
        return [[] for a in range(ALPHABET_SIZE)]
    R = [[-1] for a in range(ALPHABET_SIZE)]
    alpha = [-1 for a in range(ALPHABET_SIZE)]
    for i, c in enumerate(S):
        alpha[alphabet_index(c)] = i
        for j, a in enumerate(alpha):
            R[j].append(a)
    return R

def good_suffix_table(S: str) -> List[int]:
    """
    Generates L for S, an array used in the implementation of the strong good suffix rule.
    L[i] = k, the largest position in S such that S[i:] (the suffix of S starting at i) matches
    a suffix of S[:k] (a substring in S ending at k). Used in Boyer-Moore, L gives an amount to
    shift P relative to T such that no instances of P in T are skipped and a suffix of P[:L[i]]
    matches the substring of T matched by a suffix of P in the previous match attempt.
    Specifically, if the mismatch took place at position i-1 in P, the shift magnitude is given
    by the equation len(P) - L[i]. In the case that L[i] = -1, the full shift table is used.
    Since only proper suffixes matter, L[0] = -1.
    """
    L = [-1 for c in S]
    N = fundamental_preprocess(S[::-1])  # S[::-1] reverses S
    N.reverse()
    for j in range(0, len(S) - 1):
        i = len(S) - N[j]
        if i != len(S):
            L[i] = j
    return L

def full_shift_table(S: str) -> List[int]:
    """
    Generates F for S, an array used in a special case of the good suffix rule in the Boyer-Moore
    string search algorithm. F[i] is the length of the longest suffix of S[i:] that is also a
    prefix of S. In the cases it is used, the shift magnitude of the pattern P relative to the
    text T is len(P) - F[i] for a mismatch occurring at i-1.
    """
    F = [0 for c in S]
    Z = fundamental_preprocess(S)
    longest = 0
    for i, zv in enumerate(reversed(Z)):
        longest = max(zv, longest) if zv == i + 1 else longest
        F[-i - 1] = longest
    return F

def string_search(P, T) -> List[int]:
    """
    Implementation of the Boyer-Moore string search algorithm. This finds all occurrences of P
    in T, and incorporates numerous ways of pre-processing the pattern to determine the optimal
    amount to shift the string and skip comparisons. In practice it runs in O(m) (and even
    sublinear) time, where m is the length of T. This implementation performs a case-sensitive
    search on ASCII characters. P must be ASCII as well.
    """
    if len(P) == 0 or len(T) == 0 or len(T) < len(P):
        return []

    matches = []

    # Preprocessing
    R = bad_character_table(P)
    L = good_suffix_table(P)
    F = full_shift_table(P)

    k = len(P) - 1      # Represents alignment of end of P relative to T
    previous_k = -1     # Represents alignment in previous phase (Galil's rule)
    while k < len(T):
        i = len(P) - 1  # Character to compare in P
        h = k           # Character to compare in T
        while i >= 0 and h > previous_k and P[i] == T[h]:  # Matches starting from end of P
            i -= 1
            h -= 1
        if i == -1 or h == previous_k:  # Match has been found (Galil's rule)
            matches.append(k - len(P) + 1)
            k += len(P) - F[1] if len(P) > 1 else 1
        else:  # No match, shift by max of bad character and good suffix rules
            char_shift = i - R[alphabet_index(T[h])][i]
            if i + 1 == len(P):  # Mismatch happened on first attempt
                suffix_shift = 1
            elif L[i + 1] == -1:  # Matched suffix does not appear anywhere in P
                suffix_shift = len(P) - F[i + 1]
            else:               # Matched suffix appears in P
                suffix_shift = len(P) - 1 - L[i + 1]
            shift = max(char_shift, suffix_shift)
            previous_k = k if shift >= i + 1 else previous_k  # Galil's rule
            k += shift
    return matches

TEXT1 = 'InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented'
TEXT2 = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk."
PAT1, PAT2, PAT3 = 'put', 'and', 'alfalfa'

print("Found", PAT1, "at:", string_search(PAT1, TEXT1))
print("Found", PAT2, "at:", string_search(PAT2, TEXT1))
print("Found", PAT3, "at:", string_search(PAT3, TEXT2))
Output:
Found put at: [26, 90]
Found and at: [101, 128, 171]
Found alfalfa at: [33, 87]

Raku

Translation of: Phix
# 20220818 Raku programming solution

sub suffixes (@pat,\m) {
   loop (my ($i,$f,$g,@suff)=m-2, 0, m-1, |flat 0 xx m-1,m; $i >= 0; --$i) {
      if $i > $g and @suff[$i + m - 1 - $f] < $i - $g {
         @suff[$i] = @suff[$i + m - 1 - $f]
      } else {
	 $g = $i if $i < $g;
         $f = $i;
	 while $g >= 0 and @pat[$g] eq @pat[$g + m - 1 - $f] { $g-- }
         @suff[$i] = $f - $g
      }
   }
   return @suff;
}

sub preBmGs (@pat,\m) {
   my (@suff, @bmGs) := suffixes(@pat,m), [m xx m]; 

   for m-1 ... 0 -> \k {
      if @suff[k] == k + 1 {
         loop (my $j=0; $j < m-1-k; $j++) { @bmGs[k]=m-1-k if @bmGs[$j] == m }
      }
   }
   for ^(m-1) { @bmGs[m - 1 - @suff[$_]] = m - 1 - $_ }
   return @bmGs;
}

sub BM (@txt,@pat) {
   my (\m, \n, $j)    = +@pat, +@txt, 0;
   my (@bmGs, %bmBc) := preBmGs(@pat,m), hash @pat Z=> ( m-1 ... 1 );

   return gather while $j <= n - m {
      loop (my $i = m - 1; $i >= 0 and @pat[$i] eq @txt[$i + $j]; ) { $i-- }
      if $i < 0 {
	 take $j;
         $j += @bmGs[0]
      } else {
         $j += max @bmGs[$i], (%bmBc{@txt[$i + $j]} // m)-m+$i
      }
   }
}

my @texts = [
   "GCTAGCTCTACGAGTCTA",
   "GGCTATAATGCGTA",
   "there would have been a time for such a word",
   "needle need noodle needle",
   "BharôtভাৰতBharôtভারতIndiaBhāratભારતBhāratभारतBhārataಭಾರತBhāratभारतBhāratamഭാരതംBhāratभारतBhāratभारतBharôtôଭାରତBhāratਭਾਰਤBhāratamभारतम्Bārataபாரதம்BhāratamഭാരതംBhāratadēsamభారతదేశం",
   "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented",
   "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.",
];

my @pats = [ "TCTA", "TAATAAA", "word", "needle", "ഭാരതം", "put", "and", "alfalfa"];

say "text$_ = @texts[$_]" for @texts.keys ;
say();

for @pats.keys {
   my \j = $_ < 6 ?? $_ !! $_-1 ; # for searching text5 twice
   say "Found '@pats[$_]' in 'text{j}' at indices ", BM @texts[j].comb, @pats[$_].comb
}

Output is the same as the Knuth-Morris-Pratt_string_search#Raku entry.

Wren

I've worked here from the Java code in the Wikipedia article though, as this version only finds the index of the first occurrence of the substring, I've added a routine to find all non-overlapping occurrences by repeatedly calling the BoyerMoore.indexOf method. Indices are zero-based.

The same examples have been used as in the Julia entry above.

class BoyerMoore {
    /**
     * Returns the index within this string of the first occurrence of the
     * specified substring. If it is not a substring, return -1.
     *
     * There is no Galil because it only generates one match.
     *
     * @param haystack The string to be scanned
     * @param needle The target string to search
     * @return The start index of the substring
     */
    static indexOf(haystack, needle) {
        haystack = haystack.bytes.toList
        needle = needle.bytes.toList
        var nc = needle.count
        if (nc == 0) return 0
        var charTable = makeCharTable_(needle)
        var offsetTable = makeOffsetTable_(needle)
        var i = nc - 1
        while (i < haystack.count) {
            var j = nc - 1
            while (needle[j] == haystack[i]) {
                if (j == 0) return i
                i = i - 1
                j = j - 1
            }
            i = i + offsetTable[nc - 1 - j].max(charTable[haystack[i]])
        }
        return -1
    }

    /**
     * Makes the jump table based on the mismatched character information.
     */
    static makeCharTable_(needle) {
        var ALPHABET_SIZE = 256 // use bytes rather than codepoints
        var nc = needle.count
        var table = List.filled(ALPHABET_SIZE, nc)
        for (i in 0...nc) table[needle[i]] = nc - 1 - i
        return table
    }

    /**
     * Makes the jump table based on the scan offset which mismatch occurs.
     * (bad character rule).
     */
    static makeOffsetTable_(needle) {
        var nc = needle.count
        var table = List.filled(nc, 0)
        var lastPrefixPosition = nc
        for (i in nc...0) {
            if (isPrefix_(needle, i)) lastPrefixPosition = i
            table[nc-1] = lastPrefixPosition - i - nc
        }
        for (i in 0...nc-1) {
            var slen = suffixLength_(needle, i)
            table[slen] = nc - 1 - i + slen
        }
        return table
    }

    /**
     * Is needle[p..-1] a prefix of needle?
     */
    static isPrefix_(needle, p) {
        var i = p
        var j = 0
        var nc = needle.count
        while (i < nc) {
            if (needle[i] != needle[j]) return false
            i = i + 1
            j = j + 1
        }
        return true
    }

    /**
     * Returns the maximum length of the substring ends at p and is a suffix.
     * (good suffix rule)
     */
   static suffixLength_(needle, p) {
        var len = 0
        var nc = needle.count
        var i = p
        var j = nc - 1
        while (i >= 0 && needle[i] == needle[j]) {
            len = len + 1
            i = i - 1
            j = j - 1
        }
        return len
    }
}

/*
 * Uses the BoyerMoore class to find the indices of ALL non-overlapping matches of the specified substring
 * and return a list of them. Returns an empty list if it's not a substring.
 */
var indicesOf = Fn.new { |haystack, needle|
    var indices = []
    var hc = haystack.bytes.count
    var bc = needle.bytes.count
    var start = 0
    while (true) {
        var haystack2 = haystack[start..-1]
        var index = BoyerMoore.indexOf(haystack2, needle)
        if (index == -1) return indices
        indices.add(start + index)
        start = start + index + bc
        if (start >= hc) return indices
    }
}

var texts = [ 
    "GCTAGCTCTACGAGTCTA",
    "GGCTATAATGCGTA",
    "there would have been a time for such a word",
    "needle need noodle needle",
"InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented",
    "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk."
]
var pats = ["TCTA", "TAATAAA", "word", "needle", "put", "and", "alfalfa"]
for (i in 0...texts.count) System.print("text%(i+1) = %(texts[i])")
System.print()
for (i in 0...pats.count) {
    var j = (i < 5) ? i : i-1
    System.print("Found '%(pats[i])' in 'text%(j+1)' at indices %(indicesOf.call(texts[j], pats[i]))")
}
Output:
text1 = GCTAGCTCTACGAGTCTA
text2 = GGCTATAATGCGTA
text3 = there would have been a time for such a word
text4 = needle need noodle needle
text5 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented
text6 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.

Found 'TCTA' in 'text1' at indices [6, 14]
Found 'TAATAAA' in 'text2' at indices []
Found 'word' in 'text3' at indices [40]
Found 'needle' in 'text4' at indices [0, 19]
Found 'put' in 'text5' at indices [26, 90]
Found 'and' in 'text5' at indices [101, 128, 171]
Found 'alfalfa' in 'text6' at indices [33, 87]

Yabasic

Translation of: FreeBASIC
case_insensitive = false

sub preBmBc(pat$)
    local m, i

    m = len(pat$)
    redim bmBc(m)
    for i = 1 to m-1
        bmBc(val(mid$(pat$, i, 1))) = m - i
    next 
end sub

sub suffixes(pat$)
    local m, g, i, f
	
	m = len(pat$)
    g = m
    redim suff(m)
    suff(m) = m
    for i = m-1 to 1 step -1
        if i > g and suff(i + m - f) < i - g then
            suff(i) = suff(i + m - f)
        else
            if i < g  g = i
            f = i
            while g >= 1 and mid$(pat$, g, 1) = mid$(pat$, g + m - f, 1)
                g = q - 1
            wend
            suff(i) = f - g
        fi
    next 
end sub

sub preBmGs(pat$)
    local m, j, i
	
	m = len(pat$)
    j = 1
    redim suff(m)
    redim bmGs(m)
    for i = m to 1 step -1
        if suff(i) = i then
            while j < m - i
                if bmGs(j) = m  bmGs(j) = m - i
                j = j + 1
            wend
        fi
    next 
    for i = 1 to m-1
        bmGs(m - suff(i)) = m - i
    next 
end sub

sub BM(pat$, s$, case_insensitive)
    local pins

    pins$ = "'" + pat$ + "' in " + "'" + s$ + "'"
    if case_insensitive then
        pat$ = lower$(pat$)
        s$ = lower$(s$) 
    fi
    
    //* Preprocessing *//
    preBmGs(pat$)
    preBmBc(pat$)
    
    //* Searching *//
    j = 0
	n = len(s$)
	m = len(pat$)
	i = m
    
    while j <= n - m
        i = i - 1
        if mid$(pat$,i,1) <> mid$(s$,i+j,1)  break
        
        if i < 1 then j = j + bmGs(0) else j = j + max(bmGs(i), bmBc(len(mid$(s$,i+j,1)) - m + i)) : fi
    wend
    
    many = instr(s$, pat$)
    tmp$ = ""
    
    if not many > 0 then
        print "No ", pins$
    else
        while many > 0    //if not found loop will be skipped
            tmp$ = tmp$ + str$(many) + ","
            many = instr(s$, pat$, many + 1)
        wend
        print "Found ", pins$, " at indices [", tmp$, chr$(8), "]"
    fi
end sub

BM("GCAGAGAG","GCATCGCAGAGAGTATACAGTACG")
BM("TCTA","GCTAGCTCTACGAGTCTA")
BM("TAATAAA","GGCTATAATGCGTA")
BM("word","there would have been a time for such a word")
BM("needle","needle need noodle needle")
book$ = "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented"
BM("put",book$)
BM("and",book$)
farm$ = "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk."
BM("alfalfa",farm$)
Output:
Same as QBasic entry.
Cookies help us deliver our services. By using our services, you agree to our use of cookies.