Boyer-Moore string search: Difference between revisions
m
→{{header|11l}}
Alextretyak (talk | contribs) (Added 11l) |
Alextretyak (talk | contribs) m (→{{header|11l}}) |
||
Line 16:
F alphabet_index(Char c) -> Int
‘Return the index of the given ASCII character.
V val = c.code
assert(Int(val) C 0 .< :ALPHABET_SIZE)
R val
F match_length(String
‘Return the length of the match of the substrings of S beginning at idx1 and idx2.’
I idx1 == idx2
R
V match_count = 0
L idx1 <
match_count++
idx1++
Line 32:
R match_count
F fundamental_preprocess(String
‘Return Z, the Fundamental Preprocessing of S.
Z[i] is the length of the substring beginning at i which is also a prefix of S.
This pre-processing is done in O(n) time, where n is the length of S.’
I
R []
I
R [1]
V z =
z[0] =
z[1] = match_length(
L(i) 2 .< 1 + z[1]
z[i] = z[1] - i + 1
V l = 0
V r = 0
L(i) 2 + z[1] .<
I i <= r
V k = i - l
Line 57 ⟶ 56:
z[i] = b
E
z[i] = a + match_length(
l = i
r = i + z[i] - 1
E
z[i] = match_length(
I z[i] > 0
l = i
Line 67 ⟶ 66:
R z
F bad_character_table(String
‘
Generates R for S, which is an array indexed by the position of some character c in the
Line 75 ⟶ 74:
for the bad character rule in the Boyer-Moore string search algorithm, although it has
a much larger size than non-constant-time solutions.
I
R [[Int]()] * :ALPHABET_SIZE
V
V alpha = (0 .< :ALPHABET_SIZE).map(a -> -1)
L(c)
L(a) alpha
▲ R _r_
F good_suffix_table(String
‘
Generates L for S, an array used in the implementation of the strong good suffix rule.
Line 98 ⟶ 95:
by the equation len(P) - L[i]. In the case that L[i] = -1, the full shift table is used.
Since only proper suffixes matter, L[0] = -1.
V
V _n_ = fundamental_preprocess(reversed(
_n_.reverse()
L(j) 0 .<
V i =
I i !=
R
F full_shift_table(String
‘
Generates F for S, an array used in a special case of the good suffix rule in the Boyer-Moore
Line 114 ⟶ 111:
prefix of S. In the cases it is used, the shift magnitude of the pattern P relative to the
text T is len(P) - F[i] for a mismatch occurring at i-1.
V
V Z = fundamental_preprocess(
V longest = 0
L(zv) reversed(Z)
V i = L.index
longest = I zv == i + 1 {max(zv, longest)} E longest
R
F string_search(P, _t_) -> [Int]
Line 131 ⟶ 128:
sublinear) time, where m is the length of T. This implementation performs a case-sensitive
search on ASCII characters. P must be ASCII as well.
I P.empty | _t_.empty | _t_.len < P.len
R []
Line 137 ⟶ 134:
[Int] matches
V
V
V
V k = P.len - 1
Line 151 ⟶ 148:
I i == -1 | h == previous_k
matches.append(k - P.len + 1)
k += I P.len > 1 {P.len -
E
V char_shift = i -
Int suffix_shift
I i + 1 == P.len
suffix_shift = 1
E I
suffix_shift = P.len -
E
suffix_shift = P.len - 1 -
V shift = max(char_shift, suffix_shift)
previous_k = I shift >= i + 1 {k} E previous_k
|