Boyer-Moore string search: Difference between revisions

m
(Added 11l)
Line 16:
 
F alphabet_index(Char c) -> Int
‘Return the index of the given ASCII character.
V val = c.code
assert(Int(val) C 0 .< :ALPHABET_SIZE)
R val
 
F match_length(String _s_s, Int =idx1, Int =idx2) -> Int
‘Return the length of the match of the substrings of S beginning at idx1 and idx2.’
I idx1 == idx2
R _s_s.len - idx1
V match_count = 0
L idx1 < _s_s.len & idx2 < _s_s.len & _s_s[idx1] == _s_s[idx2]
match_count++
idx1++
Line 32:
R match_count
 
F fundamental_preprocess(String _s_s) -> [Int]
‘Return Z, the Fundamental Preprocessing of S.
 
Z[i] is the length of the substring beginning at i which is also a prefix of S.
This pre-processing is done in O(n) time, where n is the length of S.
I s.empty
I _s_.empty
R []
I _s_s.len == 1
R [1]
V z = _s_s.map(x -> 0)
z[0] = _s_s.len
z[1] = match_length(_s_s, 0, 1)
L(i) 2 .< 1 + z[1]
z[i] = z[1] - i + 1
V l = 0
V r = 0
L(i) 2 + z[1] .< _s_s.len
I i <= r
V k = i - l
Line 57 ⟶ 56:
z[i] = b
E
z[i] = a + match_length(_s_s, a, r + 1)
l = i
r = i + z[i] - 1
E
z[i] = match_length(_s_s, 0, i)
I z[i] > 0
l = i
Line 67 ⟶ 66:
R z
 
F bad_character_table(String _s_s) -> [[Int]]
Generates R for S, which is an array indexed by the position of some character c in the
Line 75 ⟶ 74:
for the bad character rule in the Boyer-Moore string search algorithm, although it has
a much larger size than non-constant-time solutions.
I _s_s.empty
R [[Int]()] * :ALPHABET_SIZE
V _r_r = (0 .< :ALPHABET_SIZE).map(a -> [-1])
V alpha = (0 .< :ALPHABET_SIZE).map(a -> -1)
L(c) _s_s
V ialpha[alphabet_index(c)] = L.index
alpha[alphabet_index(c)] = i
L(a) alpha
V j = r[L.index].append(a)
R _r_r
_r_[j].append(a)
R _r_
 
F good_suffix_table(String _s_s) -> [Int]
Generates L for S, an array used in the implementation of the strong good suffix rule.
Line 98 ⟶ 95:
by the equation len(P) - L[i]. In the case that L[i] = -1, the full shift table is used.
Since only proper suffixes matter, L[0] = -1.
V _l_l = _s_s.map(c -> -1)
V _n_ = fundamental_preprocess(reversed(_s_s))
_n_.reverse()
L(j) 0 .< _s_s.len - 1
V i = _s_s.len - _n_[j]
I i != _s_s.len
_l_l[i] = j
R _l_l
 
F full_shift_table(String _s_s) -> [Int]
Generates F for S, an array used in a special case of the good suffix rule in the Boyer-Moore
Line 114 ⟶ 111:
prefix of S. In the cases it is used, the shift magnitude of the pattern P relative to the
text T is len(P) - F[i] for a mismatch occurring at i-1.
V _f_f = _s_s.map(c -> 0)
V Z = fundamental_preprocess(_s_s)
V longest = 0
L(zv) reversed(Z)
V i = L.index
longest = I zv == i + 1 {max(zv, longest)} E longest
_f_f[(len)-i - 1] = longest
R _f_f
 
F string_search(P, _t_) -> [Int]
Line 131 ⟶ 128:
sublinear) time, where m is the length of T. This implementation performs a case-sensitive
search on ASCII characters. P must be ASCII as well.
I P.empty | _t_.empty | _t_.len < P.len
R []
Line 137 ⟶ 134:
[Int] matches
 
V _r_r = bad_character_table(P)
V _l_l = good_suffix_table(P)
V _f_f = full_shift_table(P)
 
V k = P.len - 1
Line 151 ⟶ 148:
I i == -1 | h == previous_k
matches.append(k - P.len + 1)
k += I P.len > 1 {P.len - _f_f[1]} E 1
E
V char_shift = i - _r_r[alphabet_index(_t_[h])][i]
Int suffix_shift
I i + 1 == P.len
suffix_shift = 1
E I _l_l[i + 1] == -1
suffix_shift = P.len - _f_f[i + 1]
E
suffix_shift = P.len - 1 - _l_l[i + 1]
V shift = max(char_shift, suffix_shift)
previous_k = I shift >= i + 1 {k} E previous_k
1,480

edits