Knuth-Morris-Pratt string search: Difference between revisions

Content added Content deleted
(Add Python)
(added Raku programming solution)
Line 385: Line 385:
Found `and` in `Inhisbooks...` 3 times at [101, 128, 171].
Found `and` in `Inhisbooks...` 3 times at [101, 128, 171].
Found `alfalfa` in `Nearby far...` 2 times at [33, 87].
Found `alfalfa` in `Nearby far...` 2 times at [33, 87].
</pre>

=={{header|Raku}}==
Based on pseudocode found in WP ([https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#Description_of_pseudocode_for_the_search_algorithm kmp_search ] & [https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#Description_of_pseudocode_for_the_table-building_algorithm kmp_table]). Data and output format mostly follows the Wren entry.
<lang perl6># 20220810 Raku programming solution

sub kmp_search (@S,@W) {

sub kmp_table (@W) {
loop (my ($pos,$cnd,@T) = 1,0,-1 ; $pos < @W.elems ; ($pos, $cnd)>>.++) {
if @W[$pos] eq @W[$cnd] {
@T[$pos] = @T[$cnd]
} else {
@T[$pos] = $cnd;
while $cnd ≥ 0 and @W[$pos] ne @W[$cnd] { $cnd = @T[$cnd] }
}
}
@T[$pos] = $cnd;
return @T
}

my ($j,$k,@T) = 0,0, |kmp_table @W;

return gather while $j < @S.elems {
if @W[$k] eq @S[$j] {
($j, $k)>>.++;
if $k == @W.elems {
take $j - $k;
$k = @T[$k]
}
} else {
($j, $k)>>.++ if ($k = @T[$k]) < 0
}
}
}

my @texts = [
"GCTAGCTCTACGAGTCTA",
"GGCTATAATGCGTA",
"there would have been a time for such a word",
"needle need noodle needle",
"BharôtভাৰতBharôtভারতIndiaBhāratભારતBhāratभारतBhārataಭಾರತBhāratभारतBhāratamഭാരതംBhāratभारतBhāratभारतBharôtôଭାରତBhāratਭਾਰਤBhāratamभारतम्Bārataபாரதம்BhāratamഭാരതംBhāratadēsamభారతదేశం",
"InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented",
"Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.",
];

my @pats = ["TCTA", "TAATAAA", "word", "needle", "ഭാരതം", "put", "and", "alfalfa"];

say "text$_ = @texts[$_]" for @texts.keys ;
say();

for @pats.keys {
my \j = $_ < 6 ?? $_ !! $_-1 ; # for searching text5 twice
say "Found '@pats[$_]' in 'text{j}' at indices ", kmp_search @texts[j].comb, @pats[$_].comb
}</lang>
{{out}}
<pre>
text0 = GCTAGCTCTACGAGTCTA
text1 = GGCTATAATGCGTA
text2 = there would have been a time for such a word
text3 = needle need noodle needle
text4 = BharôtভাৰতBharôtভারতIndiaBhāratભારતBhāratभारतBhārataಭಾರತBhāratभारतBhāratamഭാരതംBhāratभारतBhāratभारतBharôtôଭାରତBhāratਭਾਰਤBhāratamभारतम्Bārataபாரதம்BhāratamഭാരതംBhāratadēsamభారతదేశం
text5 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented
text6 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.

Found 'TCTA' in 'text0' at indices (6 14)
Found 'TAATAAA' in 'text1' at indices ()
Found 'word' in 'text2' at indices (40)
Found 'needle' in 'text3' at indices (0 19)
Found 'ഭാരതം' in 'text4' at indices (68 138)
Found 'put' in 'text5' at indices (26 90)
Found 'and' in 'text5' at indices (101 128 171)
Found 'alfalfa' in 'text6' at indices (33 87)
</pre>
</pre>