Knuth-Morris-Pratt string search: Difference between revisions

← Older edit

Knuth-Morris-Pratt string search (view source)

Revision as of 11:53, 12 December 2023

22,200 bytes added , 5 months ago

m

→‎{{header|Wren}}: Changed to Wren S/H

PureFox

9,476

edits

Revision as of 17:55, 12 August 2022 (view source) Hkdtam (talk \| contribs) m (→‎{{header\|Raku}}: insignificant changes) ← Older edit		Latest revision as of 11:53, 12 December 2023 (view source) PureFox (talk \| contribs) m (→‎{{header\|Wren}}: Changed to Wren S/H)
(11 intermediate revisions by 9 users not shown)
Line 3: [https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm < About Knuth-Morris-Pratt String Search Algorithm >] =={{header\|~~Emacs Lisp~~11l}}== {{trans\|Python}} <syntaxhighlight lang="11l">F kmp_table(String word) -> [Int] V position = 1 V candidate = 0 V table = [0] * (word.len + 1) ~~<lang lisp>~~ table[0] = -1 L position < word.len I word[position] == word[candidate] table[position] = table[candidate] E table[position] = candidate L candidate >= 0 & word[position] != word[candidate] candidate = table[candidate] position++ candidate++ table[position] = candidate R table F kmp_search(String text, word) V text_position = 0 V word_position = 0 [Int] positions V number_of_positions = 0 V table = kmp_table(word) L text_position < text.len I word[word_position] == text[text_position] text_position++ word_position++ I word_position == word.len positions.append(text_position - word_position) number_of_positions++ word_position = table[word_position] E word_position = table[word_position] I word_position < 0 text_position++ word_position++ R (positions, number_of_positions) V TEST_CASES = [(‘GCTAGCTCTACGAGTCTA’, ‘TCTA’), (‘GGCTATAATGCGTA’, ‘TAATAAA’), (‘there would have been a time for such a word’, ‘word’), (‘needle need noodle needle’, ‘needle’), (‘InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesley’"" ‘DKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeand’"" ‘assemblylanguagestoillustratetheconceptsandalgorithmsastheyare’"" ‘presented’, ‘put’), (‘InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesley’"" ‘DKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeand’"" ‘assemblylanguagestoillustratetheconceptsandalgorithmsastheyare’"" ‘presented’, ‘and’), (‘Nearby farms grew a half acre of alfalfa on the dairy's behalf, ’"" ‘with bales of all that alfalfa exchanged for milk.’, ‘alfalfa’)] L(text, word) TEST_CASES V (positions, number_of_positions) = kmp_search(text, word) I number_of_positions == 0 print(‘`’word‘` not found in `’text[0.<10]‘...`’) E I number_of_positions == 1 print(‘Found `’word‘` in `’text[0.<10]‘...` once at ’positions‘.’) E print(‘Found `’word‘` in `’text[0.<10]‘...` ’number_of_positions‘ times at ’positions‘.’)</syntaxhighlight> {{out}} <pre>Found `TCTA` in `GCTAGCTCTA...` 2 times at [6, 14]. `TAATAAA` not found in `GGCTATAATG...` Found `word` in `there woul...` once at [40]. Found `needle` in `needle nee...` 2 times at [0, 19]. Found `put` in `Inhisbooks...` 2 times at [26, 90]. Found `and` in `Inhisbooks...` 3 times at [101, 128, 171]. Found `alfalfa` in `Nearby far...` 2 times at [33, 87].</pre> =={{header\|Emacs Lisp}}== <syntaxhighlight lang="lisp"> (defun kmp_compile_pattern (pattern) "Compile pattern to DFA." Line 44 ⟶ 122: (setq textPos (1+ textPos)) ) (if (= patPos M) (- textPos M) N ) ) ) )</syntaxhighlight> ~~</lang>~~ =={{header\|JFortran}}== <syntaxhighlight lang="Fortran"> module kmp_mod use iso_fortran_env implicit none private public :: kmpsearch contains !This subroutine creates the "fail" table used for the KMP algorithm pure subroutine createtable(pattern,patlen,table) implicit none ! ! Dummy arguments ! integer :: patlen character() :: pattern integer , dimension(:) :: table intent (in) patlen , pattern intent (inout) table ! ! Local variables ! integer :: counter integer :: i integer :: j !Code counter=2 i=1 j=0 !Determine the table/fail value for each letter in the pattern do i=1 , patlen j=i do ! If j is equal to zero, then there is no offset/fail value required. Therefore, just set the fail value to 0 if( j==1 )then table(counter)=1 counter=counter+1 exit end if ! If a match is made, the fail value can be imcremented by one (based off the fail value of the previous character in the pattern) if( pattern(table(j):table(j))==pattern(i:i) )then table(counter)=table(j)+1 counter=counter+1 exit end if ! If neither if statement is true, restart the fail value counter j=table(j) end do end do end subroutine createtable ! This subroutine executes the string search using the KMP algorithm (fail table) pure function kmpsearch(pattern,patlen,line,linelen) & & result(foundmatch) implicit none ! ! Dummy arguments ! character() :: line integer :: linelen integer :: patlen character() :: pattern intent (in) line , linelen , patlen , pattern ! ! Local variables ! integer :: foundmatch integer :: indice integer :: match integer , dimension(256) :: table !Code if( (patlen==0) .or. (linelen==0) .or. (linelen<patlen) )then foundmatch=-1 return end if indice=1 match=0 foundmatch=-1 call createtable(pattern,patlen,table) !Search the entire string do while ( indice+match<=linelen ) if( match+1<patlen+1 )then ! If the character matches the character we are expecting (based on where in the pattern we have already matched characters with) increment the match indice if( line(indice+match:indice+match)==pattern(match+1:match+1) )then match=match+1 ! If the match indice is equal to the length of the pattern, that means we have matched the entire pattern if( match==patlen )then foundmatch=indice !-1 exit !Found end if !Look at the table to determine what indice to check next ! If no match was made on the first letter of the pattern, then just increment the indice counter by one and check again else if( match==0 )then indice=indice+1 ! Check how many characters to skip ahead (the point of the KMP algorithm) else indice=indice+match+1-table(match+1) match=table(match+1)-1 end if ! If no match was made after scanning the length of the pattern, need increment the indice counter and check again else if( match==0 )then indice=indice+1 !Not even a partial match, try next character else !Check how many characters to skip ahead (the point of the KMP algorithm) indice=indice+match+1-table(match+1) match=table(match+1)-1 end if end do ! 'No match found' return end function kmpsearch end module kmp_mod</syntaxhighlight> =={{header\|FreeBASIC}}== ~~Implementation:<lang J>kmp_table=: {{~~ '''Adapted from [[#Wren\|Wren]]''' <syntaxhighlight lang="vb">Dim Shared As Integer t() Dim Shared As Integer indices() Sub Tabla(patron As String) Dim As Integer nc = Len(patron) Redim As Integer t(nc) Dim As Integer idx = 1 ' index into table Dim As Integer lon = 0 ' length of previous longest prefix While idx < nc If patron[idx] = patron[lon] Then lon += 1 t(idx) = lon idx += 1 Elseif lon <> 0 Then lon = t(lon-1) Else t(idx) = 0 idx += 1 End If Wend End Sub Sub BusquedaKMP(texto As String, patron As String) Dim As Integer hc = Len(texto) Dim As Integer nc = Len(patron) Dim As Integer i = 0 ' index into haystack Dim As Integer j = 0 ' index into needle Dim As Integer cont = 1 Tabla(patron) While i < hc If patron[j] = texto[i] Then i += 1 j += 1 End If If j = nc Then Redim Preserve indices(cont) indices(cont) = (i - j) cont += 1 j = t(j-1) Elseif i < hc And patron[j] <> texto[i] Then If j <> 0 Then j = t(j-1) Else i += 1 End If Wend End Sub Dim As Integer i, j, k Dim As String cadenas(1 To 6) = {"GCTAGCTCTACGAGTCTA", "GGCTATAATGCGTA", _ "there would have been a time for such a word", "needle need noodle needle", _ "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented", _ "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk."} Dim As String palabra(1 To 7) = { _ "TCTA", "TAATAAA", "word", "needle", "put", "and", "alfalfa"} For i = 1 To Ubound(cadenas) Print Using "text# = &"; i; cadenas(i) Next Print For i = 1 To Ubound(palabra) j = Iif(i < 6, i, i-1) Print Using "Found '&' in 'text#' at indices [ "; palabra(i); j; BusquedaKMP(cadenas(j), palabra(i)) If Ubound(indices) > 0 Then For k = 1 To Ubound(indices) Print indices(k) & ", "; Next Else Print " "; End If Print Chr(8) & Chr(8) & " ]" Erase indices Next Sleep</syntaxhighlight> {{out}} <pre>text1 = GCTAGCTCTACGAGTCTA text2 = GGCTATAATGCGTA text3 = there would have been a time for such a word text4 = needle need noodle needle text5 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented text6 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk. Found 'TCTA' in 'text1' at indices [ 6, 14 ] Found 'TAATAAA' in 'text2' at indices [ ] Found 'word' in 'text3' at indices [ 40 ] Found 'needle' in 'text4' at indices [ 0, 19 ] Found 'put' in 'text5' at indices [ 26, 90 ] Found 'and' in 'text5' at indices [ 101, 128, 171 ] Found 'alfalfa' in 'text6' at indices [ 33, 87 ]</pre> =={{header\|J}}== Implementation:<syntaxhighlight lang="j">kmp_table=: {{ j=. 0 T=. _1 Line 81 ⟶ 370: end. I. b }}</~~lang~~syntaxhighlight> Task examples:<~~lang~~syntaxhighlight Jlang="j"> text1=: 'InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented' text2=: 'Nearby farms grew a half acre of alfalfa on the dairy''s behalf, with bales of all that alfalfa exchanged for milk.' 'put' kmp_search text1 Line 92 ⟶ 381: 'alfalfa' kmp_search text2 33 87</~~lang~~syntaxhighlight> (J uses index 0 for the first character in a sequence of characters.) =={{header\|jq}}== '''Adapted from [[#Wren\|Wren]]''' {{Works with\|jq}} The program will also work with gojq once the prefix "KMP::" is omitted in the two lines in which it occurs. <syntaxhighlight lang=jq># search for the string needle in the string haystack def KMP::search(haystack; needle): def table_($needle): ($needle\|length) as $nc \| {t: [range(0; $nc) \| 0], i: 1, # index into table len: 0 # length of previous longest prefix } \| until(.i >= .nc; if $needle[.i] == $needle[.len] then .len += 1 \| .t[.i] = .len \| .i += 1 elif .len != 0 then .len = .t[.len-1] else .t[.i] = 0 \| .i += 1 end ) \| .t ; { haystack: (haystack\|explode), needle: (needle\|explode), hc: (haystack\|length), nc: (needle\|length), indices: [], i: 0, # index into haystack j: 0 # index into needle } \| table_(.needle) as $t \| until (.i >= .hc; if .needle[.j] == .haystack[.i] then .i += 1 \| .j += 1 else . end \| if .j == .nc then .indices = .indices + [.i - .j] \| .j = $t[.j-1] elif (.i < .hc and .needle[.j] != .haystack[.i]) then if .j != 0 then .j = $t[.j-1] else .i += 1 end else . end ) \| .indices ; # Examples def texts: [ "GCTAGCTCTACGAGTCTA", "GGCTATAATGCGTA", "there would have been a time for such a word", "needle need noodle needle", "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented", "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk." ]; def pats: ["TCTA", "TAATAAA", "word", "needle", "put", "and", "alfalfa"]; def tasks($texts; $pats): range (0; $texts\|length) \| "text\(.+1) = \($texts[.])", "", (range (0; $pats\|length) as $i \| (if $i < 5 then $i else $i-1 end) as $j \| "Found '\($pats[$i])' in text\($j+1) at indices \(KMP::search($texts[$j]; $pats[$i]) )" ) ; tasks(texts; pats)</syntaxhighlight> '''Invocation''': jq -nr -f kmp-string-search.jq {{output}} <pre>text1 = GCTAGCTCTACGAGTCTA text2 = GGCTATAATGCGTA text3 = there would have been a time for such a word text4 = needle need noodle needle text5 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented text6 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk. Found 'TCTA' in text1 at indices [6, 14] Found 'TAATAAA' in text2 at indices [] Found 'word' in text3 at indices [40] Found 'needle' in text4 at indices [0, 19] Found 'put' in text5 at indices [26, 90] Found 'and' in text5 at indices [101, 128, 171] Found 'alfalfa' in text6 at indices [33, 87]</pre> =={{header\|Julia}}== <syntaxhighlight lang="julia">""" ~~<lang ruby>"""~~ function kmp_table(W) Line 173 ⟶ 553: println("Found <$pat2> at: (one-based indices) ", kmp_search(text1, pat2)) println("Found <$pat3> at: (one-based indices) ", kmp_search(text2, pat3)) </~~lang~~syntaxhighlight>{{out}} <pre>Found <put> at: (one-based indices) [27, 91] ~~<pre>~~ ~~Found <put> at: (one-based indices) [27, 91]~~ Found <and> at: (one-based indices) [102, 129, 172] Found <alfalfa> at: (one-based indices) [34, 88]</pre> ~~</pre>~~ =={{header\|Nim}}== This is a translation of pseudocode in Wikipedia page. We use the examples of Wren solution. <syntaxhighlight lang="Nim">import std/[strformat, strutils] func kmpTable(w: string): seq[int] = ## Build the partial match table for "w". result.setLen w.len + 1 var pos = 1 var cnd = 0 result[0] = -1 while pos < w.len: if w[pos] == w[cnd]: result[pos] = result[cnd] else: result[pos] = cnd while cnd >= 0 and w[pos] != w[cnd]: cnd = result[cnd] inc pos inc cnd result[pos] = cnd func kmpSearch(s, w: string): seq[int] = ## Return the positions of "w" ins "s". var j, k = 0 let t = kmpTable(w) while j < s.len: if w[k] == s[j]: inc j inc k if k == w.len: result.add j - k k = t[k] else: k = t[k] if k < 0: inc j inc k const Texts = ["GCTAGCTCTACGAGTCTA", "GGCTATAATGCGTA", "there would have been a time for such a word", "needle need noodle needle", "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuth" & "usesanimaginarycomputertheMIXanditsassociatedmachinecodeandassembly" & "languagestoillustratetheconceptsandalgorithmsastheyarepresented", "Nearby farms grew a half acre of alfalfa on the dairy's behalf, " & "with bales of all that alfalfa exchanged for milk." ] # Couples (pattern, index of text to use). Patterns = {"TCTA": 0, "TAATAAA": 1, "word": 2, "needle": 3, "put": 4, "and": 4, "alfalfa": 5} for i, text in Texts: echo &"Text{i+1} = {text}" echo() for i, (pattern, j) in Patterns: let indices = Texts[j].kmpSearch(pattern) if indices.len > 0: echo &"Found “{pattern}” in “Text{j+1}” at indices {indices.join(\", \")}." else: echo &"Not found “{pattern}” in “Text{j+1}”." </syntaxhighlight> {{out}} <pre>Text1 = GCTAGCTCTACGAGTCTA Text2 = GGCTATAATGCGTA Text3 = there would have been a time for such a word Text4 = needle need noodle needle Text5 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented Text6 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk. Found “TCTA” in “Text1” at indices 6, 14. Not found “TAATAAA” in “Text2”. Found “word” in “Text3” at indices 40. Found “needle” in “Text4” at indices 0, 19. Found “put” in “Text5” at indices 26, 90. Found “and” in “Text5” at indices 101, 128, 171. Found “alfalfa” in “Text6” at indices 33, 87.</pre> =={{header\|Perl}}== {{trans\|Raku}} <syntaxhighlight lang="perl" line>use v5.36; sub kmp_search ($S, $W) { my @S = split '', $S; my @W = split '', $W; sub kmp_table (@W) { my($pos,$cnd,@T) = (1,0,-1); for (; $pos < @W ; $pos++, $cnd++) { if ($W[$pos] eq $W[$cnd]) { $T[$pos] = $T[$cnd] } else { $T[$pos] = $cnd; while ($cnd >= 0 and $W[$pos] ne $W[$cnd]) { $cnd = $T[$cnd] } } } $T[$pos] = $cnd; @T } my @I; my ($k,@T) = (0,kmp_table @W); for (my $j=0; $j < @S;) { if ($W[$k] eq $S[$j]) { $j++; $k++; if ($k == @W) { push @I, $j - $k; $k = $T[$k] } } else { ($j++, $k++) if ($k = $T[$k]) < 0 } } @I } my @texts = ( "GCTAGCTCTACGAGTCTA", "GGCTATAATGCGTA", "there would have been a time for such a word", "needle need noodle needle", "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented", "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk.", ); my @pats = <TCTA TAATAAA word needle put and alfalfa>; say "text$_ = $texts[$_]" for 0..$#texts; say ''; for (0.. $#pats) { my $j = $_ < 5 ? $_ : $_-1 ; # for searching text4 twice say "Found '$pats[$_]' in 'text$j' at indices " . join ', ', kmp_search($texts[$j],$pats[$_]); }</syntaxhighlight> {{out}} <pre>text0 = GCTAGCTCTACGAGTCTA text1 = GGCTATAATGCGTA text2 = there would have been a time for such a word text3 = needle need noodle needle text4 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented text5 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk. Found 'TCTA' in 'text0' at indices 6, 14 Found 'TAATAAA' in 'text1' at indices Found 'word' in 'text2' at indices 40 Found 'needle' in 'text3' at indices 0, 19 Found 'put' in 'text4' at indices 26, 90 Found 'and' in 'text4' at indices 101, 128, 171 Found 'alfalfa' in 'text5' at indices 33, 87</pre> =={{header\|Phix}}== <!--<~~lang~~syntaxhighlight ~~Phix~~lang="phix">(phixonline)--> <span style="color: #000080;font-style:italic;">-- -- demo\rosetta\KnuthMorrisPratt.exw Line 257 ⟶ 787: <span style="color: #000080;font-style:italic;">--KMP("aLfAlfa",farm) -- none --KMP("aLfAlfa",farm,true) -- as -2</span> <!--</~~lang~~syntaxhighlight>--> {{out}} Significantly higher character comparison counts than [[Boyer-Moore_string_search#Phix]].<br> Also higher than those on the above (lecroq) link, probably because this carries on for all matches. <pre>Found `GCAGAGAG` in `GCATCGCAGAGAGTATACAGTACG` once at {6} (26 character comparisons) ~~<pre>~~ ~~Found `GCAGAGAG` in `GCATCGCAGAGAGTATACAGTACG` once at {6} (26 character comparisons)~~ Found `TCTA` in `GCTAGCTCTACGAGTCTA` twice at {7,15} (19 character comparisons) No `TAATAAA` in `GGCTATAATGCGTA` (16 character comparisons) Line 269 ⟶ 798: Found `put` in `Inhisbooks...epresented (202 characters)` twice at {27,91} (205 character comparisons) Found `and` in `Inhisbooks...epresented (202 characters)` three times at {102,129,172} (216 character comparisons) Found `alfalfa` in `Nearby far... for milk. (114 characters)` twice at {34,88} (125 character comparisons)</pre> ~~</pre>~~ =={{header\|Python}}== Based on pseudocode found in the [https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm Wikipedia article]. Uses test cases from the [[#Wren]] solution. <~~lang~~syntaxhighlight lang="python">"""Knuth-Morris-Pratt string search. Requires Python >= 3.7.""" from typing import List from typing import Tuple Line 374 ⟶ 902: if __name__ == "__main__": main() </syntaxhighlight> ~~</lang>~~ {{out}} <pre>Found `TCTA` in `GCTAGCTCTA...` 2 times at [6, 14]. ~~<pre>~~ ~~Found `TCTA` in `GCTAGCTCTA...` 2 times at [6, 14].~~ `TAATAAA` not found in `GGCTATAATG...` Found `word` in `there woul...` once at [40]. Line 384 ⟶ 910: Found `put` in `Inhisbooks...` 2 times at [26, 90]. Found `and` in `Inhisbooks...` 3 times at [101, 128, 171]. Found `alfalfa` in `Nearby far...` 2 times at [33, 87].</pre> ~~</pre>~~ =={{header\|Raku}}== Based on pseudocode found in WP ([https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#Description_of_pseudocode_for_the_search_algorithm kmp_search ] & [https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#Description_of_pseudocode_for_the_table-building_algorithm kmp_table]). Data and output format mostly follows the Wren entry. <syntaxhighlight lang="raku" ~~perl6~~line># 20220810 Raku programming solution sub kmp_search (@S where .Bool, @W where .Bool) { Line 437 ⟶ 962: my \j = $_ < 6 ?? $_ !! $_-1 ; # for searching text5 twice say "Found '@pats[$_]' in 'text{j}' at indices ", kmp_search @texts[j].comb, @pats[$_].comb }</~~lang~~syntaxhighlight> {{out}} <pre>text0 = GCTAGCTCTACGAGTCTA ~~<pre>~~ ~~text0 = GCTAGCTCTACGAGTCTA~~ text1 = GGCTATAATGCGTA text2 = there would have been a time for such a word Line 455 ⟶ 979: Found 'put' in 'text5' at indices (26 90) Found 'and' in 'text5' at indices (101 128 171) Found 'alfalfa' in 'text6' at indices (33 87)</pre> ~~</pre>~~ =={{header\|Sidef}}== {{trans\|Raku}} <syntaxhighlight lang="ruby">func kmp_search (String S, String W) { S.graphemes! W.graphemes! func kmp_table { var (pos, cnd, T) = (1, 0, -1) for (nil; pos < W.len; (++pos; ++cnd)) { if (W[pos] == W[cnd]) { T[pos] = T[cnd] } else { T[pos] = cnd while ((cnd >= 0) && (W[pos] != W[cnd])) { cnd = T[cnd] } } } T[pos] = cnd return T } var (k, T, *I) = (0, kmp_table()) for (var j = 0; j < S.len; nil) { if (W[k] == S[j]) { ++j ++k if (k == W.len) { I << (j - k) k = T[k] } } elsif ((k = T[k]) < 0) { ++j ++k } } return I } var texts = [ "GCTAGCTCTACGAGTCTA", "GGCTATAATGCGTA", "there would have been a time for such a word", "needle need noodle needle", "BharôtভাৰতBharôtভারতIndiaBhāratભારતBhāratभारतBhārataಭಾರತBhāratभारतBhārat"+ "amഭാരതംBhāratभारतBhāratभारतBharôtôଭାରତBhāratਭਾਰਤBhāratamभारतम्Bārataபாரத"+ "ம்BhāratamഭാരതംBhāratadēsamభారతదేశం", "InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnu"+ "thusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblyl"+ "anguagestoillustratetheconceptsandalgorithmsastheyarepresented", "Nearby farms grew a half acre of alfalfa on the dairy's behalf, with ba"+ "les of all that alfalfa exchanged for milk.", ] var pats = ["TCTA", "TAATAAA", "word", "needle", "ഭാരതം", "put", "and", "alfalfa"] texts.each_kv {\|k,v\| say "text#{k} = #{v}" }; say '' pats.each_kv {\|k,v\| var j = (k < 6 ? k : k-1) say ("Found '#{v}' in 'text#{j}' at indices ", kmp_search(texts[j], v)) }</syntaxhighlight> {{out}} <pre>text0 = GCTAGCTCTACGAGTCTA text1 = GGCTATAATGCGTA text2 = there would have been a time for such a word text3 = needle need noodle needle text4 = BharôtভাৰতBharôtভারতIndiaBhāratભારતBhāratभारतBhārataಭಾರತBhāratभारतBhāratamഭാരതംBhāratभारतBhāratभारतBharôtôଭାରତBhāratਭਾਰਤBhāratamभारतम्Bārataபாரதம்BhāratamഭാരതംBhāratadēsamభారతదేశం text5 = InhisbookseriesTheArtofComputerProgrammingpublishedbyAddisonWesleyDKnuthusesanimaginarycomputertheMIXanditsassociatedmachinecodeandassemblylanguagestoillustratetheconceptsandalgorithmsastheyarepresented text6 = Nearby farms grew a half acre of alfalfa on the dairy's behalf, with bales of all that alfalfa exchanged for milk. Found 'TCTA' in 'text0' at indices [6, 14] Found 'TAATAAA' in 'text1' at indices [] Found 'word' in 'text2' at indices [40] Found 'needle' in 'text3' at indices [0, 19] Found 'ഭാരതം' in 'text4' at indices [68, 138] Found 'put' in 'text5' at indices [26, 90] Found 'and' in 'text5' at indices [101, 128, 171] Found 'alfalfa' in 'text6' at indices [33, 87]</pre> =={{header\|Wren}}== This is based on the code [https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/ here]. The examples used are the same as in the [[Boyer-Moore_string_search#Wren]] task. <~~lang~~syntaxhighlight ~~ecmascript~~lang="wren">class KMP { static search(haystack, needle) { haystack = haystack.bytes.toList Line 524 ⟶ 1,132: var j = (i < 5) ? i : i-1 System.print("Found '%(pats[i])' in 'text%(j+1)' at indices %(KMP.search(texts[j], pats[i]))") }</~~lang~~syntaxhighlight> {{out}} <pre>text1 = GCTAGCTCTACGAGTCTA ~~<pre>~~ ~~text1 = GCTAGCTCTACGAGTCTA~~ text2 = GGCTATAATGCGTA text3 = there would have been a time for such a word Line 541 ⟶ 1,147: Found 'put' in 'text5' at indices [26, 90] Found 'and' in 'text5' at indices [101, 128, 171] Found 'alfalfa' in 'text6' at indices [33, 87]</pre> ~~</pre>~~