Aho–Corasick algorithm
The Aho–Corasick algorithm is an efficient string-searching algorithm designed to locate all occurrences of a finite set of pattern strings (a dictionary) within an input text. Developed by Alfred V. Aho and Margaret J. Corasick in 1975, it is widely used in applications requiring multiple pattern matching, such as text processing, bioinformatics, and network security.
- Problem Description
Given a text string T of length n and a set of k pattern strings P = {P₁, P₂, ..., Pₖ} with a total length m (sum of lengths of all patterns), the Aho–Corasick algorithm finds all occurrences of any pattern Pᵢ in T. Specifically, it reports the starting positions in T where each pattern appears as a substring. The algorithm is particularly efficient when searching for multiple patterns simultaneously, as it avoids scanning the text repeatedly for each pattern.
The problem can be formalized as:
- Input: A text T and a set of patterns P = {P₁, P₂, ..., Pₖ}.
- Output: All pairs (i, j) such that Pᵢ matches T starting at position j (i.e., T[j:j+|Pᵢ|] = Pᵢ).
- Performance
The Aho–Corasick algorithm is highly efficient:
- Preprocessing Time: Constructing the trie takes O(m) time, where m is the total length of all patterns. Computing failure and output links also takes O(m) using a breadth-first search on the trie.
- Search Time: Scanning the text takes O(n + z) time, where n is the length of the text and z is the total number of pattern occurrences (matches reported).
- Space Complexity: The trie and automaton require O(m) space to store nodes, transitions, and links.
Thus, the total time complexity is O(m + n + z), which is optimal for reporting all matches in a single pass.
- Applications
The Aho–Corasick algorithm is used in:
- Full-text search systems to find multiple keywords in documents.
- Intrusion detection systems to match signatures of malicious patterns in network traffic.
- Bioinformatics for searching DNA sequences for multiple motifs.
- Data compression and natural language processing for pattern-based analysis.
C++
#include <cstdint>
#include <deque>
#include <iostream>
#include <map>
#include <stdexcept>
#include <string>
#include <vector>
uint32_t max_state_count; // The maximum number of states required to process the 'targets' words
// 'results' stores the indexes of all the 'target' words in 'targets' that end in the current state.
// Multiple indexes are stored in a single value by bit-mapping each index, that is, multiplying by a power of two.
std::vector<uint32_t> results;
// 'failures' stores the index of the failure link state in the trie for each index of a 'target' word in 'targets'
std::vector<uint32_t> failures;
std::vector<std::vector<uint32_t>>trie; // The trie containing the characters of the 'target' words
constexpr uint32_t ALPHABET_SIZE = 26;
template <typename T>
void print_vector(const std::vector<T>& vec) {
std::cout << "[";
for ( uint32_t i = 0; i < vec.size(); ++i ) {
std::cout << vec[i];
if ( i < vec.size() - 1 ) {
std::cout << ", ";
}
}
std::cout << "]" << std::endl;
}
// Throw an exception if the given 'word' contains a character which is not a lower case alphabetic letter
void validate_lower_case_letters(const std::string& word) {
for ( const char& ch : word ) {
if ( ch < 'a' || ch > 'z' ) {
throw std::invalid_argument("Invalid character in pattern: " + ch);
}
}
}
// Return the next state to which the matching machine will transition
uint32_t find_next_state(uint32_t current_state, const char& next_character) {
const uint32_t ch = next_character - 'a';
// Follow the links to the first state not undefined
while ( trie[current_state][ch] == 0 ) {
current_state = failures[current_state];
}
return trie[current_state][ch];
}
uint32_t build_matching_machine(const std::vector<std::string>& targets) {
// Initialisation
failures.assign(max_state_count, 0);
results.assign(max_state_count, 0);
trie = { max_state_count, std::vector<uint32_t>(ALPHABET_SIZE, 0) };
uint32_t next_state_id = 1; // Initially there is only the empty state
// Build the trie
for ( uint32_t i = 0; i < targets.size(); ++i ) {
std::string target = targets[i];
validate_lower_case_letters(target);
uint32_t current_state = 0;
// Process all characters of the current target
for ( const char& ch : target ) {
const uint32_t char_code = ch - 'a';
// Allocate a new state for 'char_code', if one does not already exist
if ( trie[current_state][char_code] == 0 ) {
trie[current_state][char_code] = next_state_id++;
}
current_state = trie[current_state][char_code];
}
// Bit-map the index of the 'target' word in 'targets' to the 'results' list,
// indicating that it ends in the 'current_state'
results[current_state] |= ( 1 << i );
}
// Determine 'failures' values using a breadth first search
std::deque<uint32_t> queue;
// If an alphabetic character has a positive state, its failure value is zero
for ( uint32_t ch = 0; ch < ALPHABET_SIZE; ++ch ) {
if ( trie[0][ch] > 0 ) {
failures[trie[0][ch]] = 0;
queue.emplace_back(trie[0][ch]);
}
}
while ( ! queue.empty() ) {
const uint32_t state = queue.front();
queue.pop_front();
// Determine the 'failures' value for all alphabetic characters with an undefined state
for ( uint32_t ch = 0; ch < ALPHABET_SIZE; ++ch ) {
if ( trie[state][ch] != 0 ) {
uint32_t failure = failures[state];
while ( trie[failure][ch] == 0 ) {
failure = failures[failure];
}
failure = trie[failure][ch];
failures[trie[state][ch]] = failure;
// Merge 'results' values
results[trie[state][ch]] |= results[failure];
// Insert the next level state into the queue
queue.emplace_back(trie[state][ch]);
}
};
}
return next_state_id; // 'next_state_id' is also the total number of states created
}
// Display all occurrences of the target words in the given text
void search_for_targets(const std::string& text, const std::vector<std::string>& targets) {
validate_lower_case_letters(text);
// Initialisation
std::map<std::string, std::vector<uint32_t>> printable_results;
for ( const std::string& target : targets ) {
printable_results[target] = std::vector<uint32_t>();
}
uint32_t current_state = 0;
// Process the characters of 'text' through the matching machine
for ( uint32_t i = 0; i < text.length(); ++i ) {
current_state = find_next_state(current_state, text[i]);
// If a match is not found, move to next character of 'text'
if ( results[current_state] == 0 ) {
continue;
}
// A match has been found, so store the start index of the target word in the 'printable_results' map
for ( uint32_t j = 0; j < targets.size(); ++j ) {
if ( ( results[current_state] & ( 1 << j ) ) > 0 ) {
printable_results[targets[j]].emplace_back(i + 1 - targets[j].length());
}
}
}
// Print the results of the search
for ( std::pair<std::string, std::vector<uint32_t>> pair : printable_results ) {
std::cout << "The word \"" << pair.first << "\" appears in \"" << text << "\" starting at indexes ";
print_vector(pair.second);
}
}
int main() {
std::string text = "abaaabaa";
std::vector<std::string> targets = { "a", "bb", "aa", "abaa", "abaaa" };
max_state_count = 0;
for ( const std::string& target : targets ) {
max_state_count += target.length();
}
build_matching_machine(targets);
search_for_targets(text, targets);
}
- Output:
The word "a" appears in "abaaabaa" starting at indexes [0, 2, 3, 4, 6, 7] The word "aa" appears in "abaaabaa" starting at indexes [2, 3, 6] The word "abaa" appears in "abaaabaa" starting at indexes [0, 4] The word "abaaa" appears in "abaaabaa" starting at indexes [0] The word "bb" appears in "abaaabaa" starting at indexes []
FreeBASIC
' Constants for array sizes
Const MAX_NODES = 200005
Const ALPHABET_SIZE = 26
' Node structure for the AC Automaton
Type Node
As Integer son(25) ' Links to child nodes
As Integer ans ' Answer count
As Integer fail ' Failure link
As Integer du ' Degree (for topological sort)
As Integer idx ' Pattern index
End Type
' AC Automaton structure
Type ACAutomaton
As Node tr(MAX_NODES) ' Array of nodes
As Integer tot ' Total number of nodes
As Integer final_ans(MAX_NODES) ' Final answers
As Integer pidx ' Pattern index counter
End Type
' Declare the shared automaton
Dim Shared ac As ACAutomaton
' Initialize the automaton
Sub ac_init()
ac.tot = 0
ac.pidx = 0
' Clear all nodes
For i As Integer = 0 To MAX_NODES
For j As Integer = 0 To ALPHABET_SIZE - 1
ac.tr(i).son(j) = 0
Next j
ac.tr(i).ans = 0
ac.tr(i).fail = 0
ac.tr(i).du = 0
ac.tr(i).idx = 0
ac.final_ans(i) = 0
Next
End Sub
' Insert a pattern into the trie
Function ac_insert(pattern As String) As Integer
Dim As Integer i, u, char_code
u = 0
For i = 0 To Len(pattern) - 1
char_code = Asc(Mid(pattern, i + 1, 1)) - Asc("a")
If ac.tr(u).son(char_code) = 0 Then
ac.tot += 1
ac.tr(u).son(char_code) = ac.tot
End If
u = ac.tr(u).son(char_code)
Next
If ac.tr(u).idx = 0 Then
ac.pidx += 1
ac.tr(u).idx = ac.pidx
End If
Return ac.tr(u).idx
End Function
' Build the AC Automaton (construct failure links)
Sub ac_build()
Dim As Integer i, u
' Queue for BFS
Dim As Integer queue(MAX_NODES)
Dim As Integer front = 0, rear = 0
' Push the first level nodes to the queue
For i = 0 To ALPHABET_SIZE - 1
If ac.tr(0).son(i) <> 0 Then
queue(rear) = ac.tr(0).son(i)
rear += 1
End If
Next
' BFS to build failure links
While front < rear
u = queue(front)
front += 1
For i = 0 To ALPHABET_SIZE - 1
Dim As Integer son_node_idx = ac.tr(u).son(i)
Dim As Integer fail_node_idx = ac.tr(u).fail
If son_node_idx <> 0 Then
ac.tr(son_node_idx).fail = ac.tr(fail_node_idx).son(i)
ac.tr(ac.tr(son_node_idx).fail).du += 1
queue(rear) = son_node_idx
rear += 1
Else
ac.tr(u).son(i) = ac.tr(fail_node_idx).son(i)
End If
Next
Wend
End Sub
' Query the automaton with a text
Sub ac_query(text As String)
Dim As Integer i, u, char_code
u = 0
For i = 0 To Len(text) - 1
char_code = Asc(Mid(text, i + 1, 1)) - Asc("a")
u = ac.tr(u).son(char_code)
ac.tr(u).ans += 1
Next
End Sub
' Calculate final answers using topological sort
Sub ac_calculate_final_answers()
Dim As Integer i, u, v, node_idx
' Queue for topological sort
Dim As Integer queue(MAX_NODES)
Dim As Integer front = 0, rear = 0
' Find nodes with no incoming edges
For i = 0 To ac.tot
If ac.tr(i).du = 0 Then
queue(rear) = i
rear += 1
End If
Next
' Process the queue
While front < rear
u = queue(front)
front += 1
node_idx = ac.tr(u).idx
If node_idx <> 0 Then
ac.final_ans(node_idx) = ac.tr(u).ans
End If
v = ac.tr(u).fail
ac.tr(v).ans += ac.tr(u).ans
ac.tr(v).du -= 1
If ac.tr(v).du = 0 Then
queue(rear) = v
rear += 1
End If
Wend
End Sub
' Main program
Sub Main()
ac_init()
' Sample data from the Python code
Dim As Integer i, n
n = 5
Dim pattern_end_node_ids(n) As Integer
Dim patterns(n-1) As String
patterns(0) = "a"
patterns(1) = "bb"
patterns(2) = "aa"
patterns(3) = "abaa"
patterns(4) = "abaaa"
Dim text As String = "abaaabaa"
Print "Inserting patterns..."
For i = 1 To n
pattern_end_node_ids(i) = ac_insert(patterns(i-1))
Next
Print "Building failure links..."
ac_build()
Print "Querying text..."
ac_query(text)
Print "Calculating final answers..."
ac_calculate_final_answers()
Print !"\nResults:"
For i = 1 To n
Print "Pattern """; patterns(i-1); """ count:"; ac.final_ans(pattern_end_node_ids(i))
Next
End Sub
main()
Sleep
- Output:
Inserting patterns... Building failure links... Querying text... Calculating final answers... Results: Pattern "a" count: 6 Pattern "bb" count: 0 Pattern "aa" count: 3 Pattern "abaa" count: 2 Pattern "abaaa" count: 1
Go
package main
import (
"bufio"
"fmt"
"os"
)
type Node struct {
son [26]int
ans int
fail int
du int
idx int
}
type ACAutomaton struct {
tr []Node
tot int
finalAns []int
pidx int
}
func NewACAutomaton(maxNodes int) *ACAutomaton {
ac := &ACAutomaton{
tr: make([]Node, maxNodes),
tot: 0,
finalAns: []int{},
pidx: 0,
}
ac.tr[0] = Node{}
return ac
}
func (ac *ACAutomaton) Init() {
ac.tr = make([]Node, len(ac.tr))
ac.tr[0] = Node{}
ac.tot = 0
ac.pidx = 0
ac.finalAns = []int{}
}
func (ac *ACAutomaton) Insert(pattern string) int {
u := 0
for _, char := range pattern {
charCode := int(char - 'a')
if ac.tr[u].son[charCode] == 0 {
ac.tot++
ac.tr[u].son[charCode] = ac.tot
}
u = ac.tr[u].son[charCode]
}
if ac.tr[u].idx == 0 {
ac.pidx++
ac.tr[u].idx = ac.pidx
}
return ac.tr[u].idx
}
func (ac *ACAutomaton) Build() {
q := []int{}
for i := 0; i < 26; i++ {
if ac.tr[0].son[i] != 0 {
q = append(q, ac.tr[0].son[i])
}
}
for len(q) > 0 {
u := q[0]
q = q[1:]
for i := 0; i < 26; i++ {
sonNodeIdx := ac.tr[u].son[i]
failNodeIdx := ac.tr[u].fail
if sonNodeIdx != 0 {
ac.tr[sonNodeIdx].fail = ac.tr[failNodeIdx].son[i]
ac.tr[ac.tr[sonNodeIdx].fail].du++
q = append(q, sonNodeIdx)
} else {
ac.tr[u].son[i] = ac.tr[failNodeIdx].son[i]
}
}
}
}
func (ac *ACAutomaton) Query(text string) {
u := 0
for _, char := range text {
charCode := int(char - 'a')
u = ac.tr[u].son[charCode]
ac.tr[u].ans++
}
}
func (ac *ACAutomaton) CalculateFinalAnswers() {
ac.finalAns = make([]int, ac.pidx+1)
q := []int{}
for i := 0; i <= ac.tot; i++ {
if ac.tr[i].du == 0 {
q = append(q, i)
}
}
for len(q) > 0 {
u := q[0]
q = q[1:]
nodeIdx := ac.tr[u].idx
if nodeIdx != 0 {
ac.finalAns[nodeIdx] = ac.tr[u].ans
}
v := ac.tr[u].fail
ac.tr[v].ans += ac.tr[u].ans
ac.tr[v].du--
if ac.tr[v].du == 0 {
q = append(q, v)
}
}
}
func fastInput() string {
scanner := bufio.NewScanner(os.Stdin)
scanner.Scan()
return scanner.Text()
}
func main() {
MAX_NODES := 200000 + 6
n := 5
ac := NewACAutomaton(MAX_NODES)
patternEndNodeIds := make([]int, n+1)
myInput := []string{"a", "bb", "aa", "abaa", "abaaa"}
text := "abaaabaa"
for i := 1; i <= n; i++ {
pattern := myInput[i-1]
patternEndNodeIds[i] = ac.Insert(pattern)
}
ac.Build()
ac.Query(text)
ac.CalculateFinalAnswers()
for i := 1; i <= n; i++ {
uniqueID := patternEndNodeIds[i]
fmt.Println(ac.finalAns[uniqueID])
}
}
- Output:
6 0 3 2 1
Java
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Queue;
import java.util.function.Function;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public final class AhoCorasickAlgorithm {
public static void main(String[] args) {
String text = "abaaabaa";
List<String> targets = List.of( "a", "bb", "aa", "abaa", "abaaa" );
maxStateCount = targets.stream().mapToInt( s -> Integer.valueOf(s.length()) ).sum();
buildMatchingMachine(targets);
searchForTargets(text, targets);
}
// Build a pattern matching machine and return the number of states created
private static int buildMatchingMachine(List<String> targets) {
// Initialisation
failures = new ArrayList<Integer>(Collections.nCopies(maxStateCount, 0));
results = new ArrayList<Integer>(Collections.nCopies(maxStateCount, 0));
trie = IntStream.range(0, maxStateCount)
.mapToObj( i -> new ArrayList<Integer>(Collections.nCopies(ALPHABET_SIZE, 0)) )
.collect(Collectors.toList());
int nextStateID = 1; // Initially there is only the empty state
// Build the trie
for ( int i = 0; i < targets.size(); i++ ) {
String target = targets.get(i);
validateLowerCaseLetters(target);
int currentState = 0;
// Process all characters of the current target
for ( char ch : target.toCharArray() ) {
final int charCode = ch - 'a';
// Allocate a new state for 'charCode', if one does not already exist
if ( trie.get(currentState).get(charCode) == 0) {
trie.get(currentState).set(charCode, nextStateID++);
}
currentState = trie.get(currentState).get(charCode);
}
// Bit-map the index of the 'target' word in 'targets' to the 'results' list,
// indicating that it ends in the 'currentState'
results.set(currentState, results.get(currentState) | ( 1 << i ));
}
// Determine 'failures' values using a breadth first search
Queue<Integer> queue = new ArrayDeque<Integer>();
// If an alphabetic character has a positive state, its failure value is zero
IntStream.range(0, ALPHABET_SIZE).forEach( ch -> {
if ( trie.getFirst().get(ch) > 0 ) {
failures.set(trie.getFirst().get(ch), 0);
queue.offer(trie.getFirst().get(ch));
}
} );
while ( ! queue.isEmpty() ) {
final int state = queue.poll();
// Determine the 'failures' value for all alphabetic characters with an undefined state
IntStream.range(0, ALPHABET_SIZE).forEach( ch -> {
if ( trie.get(state).get(ch) != 0 ) {
int failure = failures.get(state);
while ( trie.get(failure).get(ch) == 0 ) {
failure = failures.get(failure);
}
failure = trie.get(failure).get(ch);
failures.set(trie.get(state).get(ch), failure);
// Merge 'results' values
results.set(trie.get(state).get(ch), results.get(trie.get(state).get(ch)) | results.get(failure));
// Insert the next level state into the queue
queue.offer(trie.get(state).get(ch));
}
} );
}
return nextStateID; // 'nextStateID' is also the total number of states created
}
// Display all occurrences of the target words in the given text
private static void searchForTargets(String text, List<String> targets) {
validateLowerCaseLetters(text);
// Initialisation
Map<String, List<Integer>> printableResults = targets.stream().collect(Collectors.toMap(
Function.identity(), v -> new ArrayList<Integer>(), (previous, next) -> next, LinkedHashMap::new));
int currentState = 0;
// Process the characters of 'text' through the matching machine
for ( int i = 0; i < text.length(); i++ ) {
currentState = findNextState(currentState, text.charAt(i));
// If a match is not found, move to next character of 'text'
if ( results.get(currentState) == 0 ) {
continue;
}
// A match has been found, so store the start index of the target word in the 'printableResults' map
for ( int j = 0; j < targets.size(); j++ ) {
if ( ( results.get(currentState) & ( 1 << j ) ) > 0 ) {
printableResults.get(targets.get(j)).addLast(i + 1 - targets.get(j).length());
}
}
}
// Print the results of the search
for ( Map.Entry<String, List<Integer>> entry : printableResults.entrySet() ) {
System.out.println("The word \"" + entry.getKey() + "\" appears in \"" + text
+ "\" starting at indexes " + entry.getValue());
}
}
// Return the next state to which the matching machine will transition
private static int findNextState(int currentState, char nextCharacter) {
final int ch = nextCharacter - 'a';
// Follow the links to the first state not undefined
while ( trie.get(currentState).get(ch) == 0 ) {
currentState = failures.get(currentState);
}
return trie.get(currentState).get(ch);
}
// Throw an exception if the given 'word' contains a character which is not a lower case alphabetic letter
private static void validateLowerCaseLetters(String word) {
for ( char ch : word.toCharArray() ) {
if ( ch < 'a' || ch > 'z' ) {
throw new IllegalArgumentException("Invalid character in pattern: " + ch);
}
}
}
private static int maxStateCount; // The maximum number of states required to process the 'targets' words
// 'results' stores the indexes of all the 'target' words in 'targets' that end in the current state.
// Multiple indexes are stored in a single value by bit-mapping each index, that is, multiplying by a power of two.
private static List<Integer> results;
// 'failures' stores the index of the failure link state in the trie for each index of a 'target' word in 'targets'
private static List<Integer> failures;
private static List<List<Integer>> trie; // The trie containing the characters of the 'target' words
private static final int ALPHABET_SIZE = 26;
}
- Output:
The word "a" appears in "abaaabaa" starting at indexes [0, 2, 3, 4, 6, 7] The word "bb" appears in "abaaabaa" starting at indexes [] The word "aa" appears in "abaaabaa" starting at indexes [2, 3, 6] The word "abaa" appears in "abaaabaa" starting at indexes [0, 4] The word "abaaa" appears in "abaaabaa" starting at indexes [0]
JavaScript
// Node class representing a node in the Trie/Automaton
class Node {
constructor() {
// son: Children nodes for 'a' through 'z'. Stores index in the main 'tr' array.
this.son = new Array(26).fill(0);
// ans: Count of times this node is visited during query (intermediate count)
this.ans = 0;
// fail: Index of the failure link node in the 'tr' array.
this.fail = 0;
// du: In-degree for the topological sort based on failure links.
this.du = 0;
// idx: Unique ID assigned if this node marks the end of a pattern (0 otherwise).
this.idx = 0;
}
}
// ACAutomaton class implementing the Aho-Corasick algorithm
class ACAutomaton {
constructor(maxNodes) {
// tr: Array storing all Node objects. tr[0] is the root.
this.tr = new Array(maxNodes);
for (let i = 0; i < maxNodes; i++) {
this.tr[i] = new Node(); // Pre-allocate nodes
}
// tot: Total number of nodes created (index for the next new node).
this.tot = 0; // Root is node 0, next node will be 1
// final_ans: Array to store final counts for each pattern (indexed by pattern ID).
this.final_ans = [];
// pidx: Counter for assigning unique IDs to patterns.
this.pidx = 0;
}
// Optional: Method to reset the automaton (if reusing the same instance)
init() {
const maxNodes = this.tr.length;
this.tr = new Array(maxNodes);
for (let i = 0; i < maxNodes; i++) {
this.tr[i] = new Node();
}
this.tot = 0;
this.pidx = 0;
this.final_ans = [];
}
// Inserts a pattern into the Trie structure
insert(pattern) {
let u = 0; // Start at the root node
for (let i = 0; i < pattern.length; i++) {
const char = pattern[i];
const charCode = char.charCodeAt(0) - 'a'.charCodeAt(0);
// If child node for this character doesn't exist, create it
if (this.tr[u].son[charCode] === 0) {
this.tot++; // Increment total node count
this.tr[u].son[charCode] = this.tot; // Link parent to new child node index
// Note: Node object for this.tr[this.tot] was already created in constructor
}
// Move to the child node
u = this.tr[u].son[charCode];
}
// Mark the end node of the pattern with a unique ID if it doesn't have one
if (this.tr[u].idx === 0) {
this.pidx++; // Increment pattern ID counter
this.tr[u].idx = this.pidx; // Assign the ID
}
// Return the unique ID associated with this pattern
return this.tr[u].idx;
}
// Builds the failure links using BFS
build() {
const q = []; // Use array as a queue (push -> enqueue, shift -> dequeue)
// Initialize queue with children of the root
for (let i = 0; i < 26; i++) {
if (this.tr[0].son[i] !== 0) {
q.push(this.tr[0].son[i]);
// Optional: Nodes directly under root have root (0) as fail link (already default)
// this.tr[this.tr[0].son[i]].fail = 0;
}
}
while (q.length > 0) {
const u = q.shift(); // Dequeue node index
// Iterate through all possible characters ('a' to 'z')
for (let i = 0; i < 26; i++) {
const sonNodeIdx = this.tr[u].son[i];
const failNodeIdx = this.tr[u].fail;
if (sonNodeIdx !== 0) {
// If a direct child exists:
// Its failure link is the node reached by following the parent's
// failure link and then taking the same character transition.
this.tr[sonNodeIdx].fail = this.tr[failNodeIdx].son[i];
// Increment the in-degree of the node pointed to by the fail link
// (for the final calculation step)
this.tr[this.tr[sonNodeIdx].fail].du++;
// Enqueue the child node
q.push(sonNodeIdx);
} else {
// If a direct child doesn't exist (son is 0):
// Create a "virtual" transition by pointing this character's slot
// to the node reached by following the parent's failure link
// and taking the same character transition. This optimizes the query phase.
this.tr[u].son[i] = this.tr[failNodeIdx].son[i];
}
}
}
}
// Queries the text against the built automaton
query(text) {
let u = 0; // Start at the root
for (let i = 0; i < text.length; i++) {
const char = text[i];
const charCode = char.charCodeAt(0) - 'a'.charCodeAt(0);
// Follow the transitions (or failure links implicitly via build step)
u = this.tr[u].son[charCode];
// Increment the count for the current node
this.tr[u].ans++;
// Note: The original python code only increments tr[u].ans here.
// If we needed to count all matches including suffixes found via fail links
// *during* the query, we'd need another loop here:
// let temp = u;
// while (temp !== 0) {
// this.tr[temp].ans++; // Or increment a separate count
// temp = this.tr[temp].fail;
// }
// However, the provided Python code does the aggregation in calculate_final_answers.
}
}
// Calculates the final counts for each pattern using failure links
calculate_final_answers() {
// Initialize final answer array based on the number of unique patterns found
this.final_ans = new Array(this.pidx + 1).fill(0);
const q = []; // Queue for topological sort
// Find all nodes with an in-degree of 0 (start points for the sort)
// Iterate from 0 (root) up to the last created node index
for (let i = 0; i <= this.tot; i++) {
if (this.tr[i].du === 0) {
q.push(i);
}
}
// Perform topological sort on the reversed failure link graph
while (q.length > 0) {
const u = q.shift(); // Dequeue node index
// If this node represents the end of a pattern, store its accumulated count
const nodeIdx = this.tr[u].idx;
if (nodeIdx !== 0) {
this.final_ans[nodeIdx] = this.tr[u].ans;
}
// Propagate the count to the node pointed to by the failure link
const v = this.tr[u].fail;
if (v !== undefined) { // Check if fail link exists (root's fail is 0)
this.tr[v].ans += this.tr[u].ans;
// Decrease the in-degree of the fail link node
this.tr[v].du--;
// If the fail link node's in-degree becomes 0, enqueue it
if (this.tr[v].du === 0) {
q.push(v);
}
}
}
}
}
// --- Main Execution ---
const MAX_NODES = 200000 + 6;
// const MAX_N = 200000 + 6; // Not strictly needed with dynamic arrays
const ac = new ACAutomaton(MAX_NODES);
const n = 5;
const pattern_end_node_ids = new Array(n + 1).fill(0); // Using 1-based indexing like Python example
// Input data (hardcoded as per the example)
const my_input = ["a", "bb", "aa", "abaa", "abaaa"];
const text = "abaaabaa";
console.log("Inserting patterns...");
// Insert patterns and store their unique IDs
for (let i = 1; i <= n; i++) {
const pattern = my_input[i - 1]; // Adjust index for 0-based my_input
pattern_end_node_ids[i] = ac.insert(pattern);
// console.log(`Inserted "${pattern}", assigned ID: ${pattern_end_node_ids[i]}`);
}
console.log("Building failure links...");
ac.build();
console.log("Querying text...");
ac.query(text);
console.log("Calculating final answers...");
ac.calculate_final_answers();
console.log("Results:");
// Print the final counts for each pattern
for (let i = 1; i <= n; i++) {
const unique_id = pattern_end_node_ids[i];
console.log(`Pattern "${my_input[i-1]}" count: ${ac.final_ans[unique_id]}`);
}
- Output:
Inserting patterns... Building failure links... Querying text... Calculating final answers... Results: Pattern "a" count: 6 Pattern "bb" count: 0 Pattern "aa" count: 3 Pattern "abaa" count: 2 Pattern "abaaa" count: 1
Lua
-- Start of helper functions section --
local function stringToTable(str)
local t = {}
for i = 1, #str do
t[i] = str:sub(i, i)
end
return t
end
local function tableEmpty(t)
return #t == 0
end
local function popLeft(t)
local value = t[1]
for i = 1, #t - 1 do
t[i] = t[i+1]
end
t[#t] = nil
return value
end
local function tableWithDupl(value, n)
local out = {}
for i = 1, n do
out[i] = value
end
return out
end
-- End of helper functions section --
-- Node definition
local Node = {}
Node.__index = Node
function Node.new()
local self = setmetatable({}, Node)
self.son = {}
for i = 1, 26 do
self.son[i] = 0
end
self.ans = 0
self.fail = 0
self.du = 0
self.idx = 0
return self
end
-- Automaton definition
local Automaton = {}
Automaton.__index = Automaton
function Automaton.new(maxNodes)
local self = setmetatable({}, Automaton)
-- Preallocate self.tr with dummy values.
-- We use 0-indexed node IDs; however, nodes are stored in self.tr[u+1].
self.tr = tableWithDupl(0, maxNodes)
self.tr[1] = Node.new() -- root node, with id = 0
self.tot = 0 -- number of new nodes allocated (excluding root)
self.finalAnswer = {}
self.pidx = 0 -- used to assign an id to pattern endpoints
return self
end
function Automaton:reset()
self.tr = tableWithDupl(0, #self.tr)
self.tr[1] = Node.new()
self.tot = 0
self.finalAnswer = {}
self.pidx = 0
end
function Automaton:insert(pattern)
local u = 0 -- using 0-index for nodes; stored in self.tr[u+1]
for i, char in ipairs(stringToTable(pattern)) do
local charCode = string.byte(char) - string.byte("a") + 1 -- map 'a'-'z' to 1-26
if self.tr[u+1].son[charCode] == 0 then
self.tot = self.tot + 1
self.tr[u+1].son[charCode] = self.tot
self.tr[self.tot+1] = Node.new() -- allocate new node at index tot
end
u = self.tr[u+1].son[charCode]
end
if self.tr[u+1].idx == 0 then
self.pidx = self.pidx + 1
self.tr[u+1].idx = self.pidx
end
return self.tr[u+1].idx
end
function Automaton:build()
local queue = {}
-- Initialise the queue with direct children of the root node.
for i = 1, 26 do
if self.tr[1].son[i] ~= 0 then
table.insert(queue, self.tr[1].son[i])
else
-- Set missing transitions at root to point back to root (node 0)
self.tr[1].son[i] = 0
end
end
while not tableEmpty(queue) do
local u = popLeft(queue)
for i = 1, 26 do
local sonNodeIndex = self.tr[u+1].son[i]
local failNodeIndex = self.tr[u+1].fail
if sonNodeIndex ~= 0 then
self.tr[sonNodeIndex+1].fail = self.tr[failNodeIndex+1].son[i]
self.tr[self.tr[sonNodeIndex+1].fail+1].du = self.tr[self.tr[sonNodeIndex+1].fail+1].du + 1
table.insert(queue, sonNodeIndex)
else
self.tr[u+1].son[i] = self.tr[failNodeIndex+1].son[i]
end
end
end
end
function Automaton:query(text)
local u = 0
for i, char in ipairs(stringToTable(text)) do
local charCode = string.byte(char) - string.byte("a") + 1 -- adjust for 1-indexing
u = self.tr[u+1].son[charCode]
self.tr[u+1].ans = self.tr[u+1].ans + 1
end
end
function Automaton:calculateFinalAnswers()
self.finalAnswer = tableWithDupl(0, self.pidx+1)
local queue = {}
-- Iterate over all nodes: nodes are 0-indexed; stored at self.tr[u+1] for u = 0...tot
for u = 0, self.tot do
if self.tr[u+1].du == 0 then
table.insert(queue, u)
end
end
while not tableEmpty(queue) do
local u = popLeft(queue)
local nodeIndex = self.tr[u+1].idx
if nodeIndex ~= 0 then
self.finalAnswer[nodeIndex+1] = self.tr[u+1].ans
end
local v = self.tr[u+1].fail
self.tr[v+1].ans = self.tr[v+1].ans + self.tr[u+1].ans
self.tr[v+1].du = self.tr[v+1].du - 1
if self.tr[v+1].du == 0 then
table.insert(queue, v)
end
end
end
-- main code --
local MAX_NODES = 200000 + 6
local ac = Automaton.new(MAX_NODES)
local n = 5
local patternEndNoteIds = tableWithDupl(0, n+1)
local sampleInput = {"a", "bb", "aa", "abaa", "abaaa", "abaaabaa"}
local text = "abaaabaa"
for i = 1, n do
local pattern = sampleInput[i]
patternEndNoteIds[i+1] = ac:insert(pattern)
end
ac:build()
ac:query(text)
ac:calculateFinalAnswers()
for i = 1, n do
local uniqueId = patternEndNoteIds[i+1]
print(ac.finalAnswer[uniqueId+1])
end
Python
import sys
from collections import deque
def fast_input():
return sys.stdin.readline().strip()
class Node:
def __init__(self):
self.son = [0] * 26
self.ans = 0
self.fail = 0
self.du = 0
self.idx = 0
class ACAutomaton:
def __init__(self, max_nodes):
self.tr = [Node() for _ in range(max_nodes)]
self.tr[0] = Node()
self.tot = 0
self.final_ans = []
self.pidx = 0
def init(self):
self.tr = [Node() for _ in range(len(self.tr))]
self.tr[0] = Node()
self.tot = 0
self.pidx = 0
self.final_ans = []
def insert(self, pattern):
u = 0
for char in pattern:
char_code = ord(char) - ord('a')
if self.tr[u].son[char_code] == 0:
self.tot += 1
self.tr[u].son[char_code] = self.tot
u = self.tr[u].son[char_code]
if self.tr[u].idx == 0:
self.pidx += 1
self.tr[u].idx = self.pidx
return self.tr[u].idx
def build(self):
q = deque()
for i in range(26):
if self.tr[0].son[i] != 0:
q.append(self.tr[0].son[i])
while q:
u = q.popleft()
for i in range(26):
son_node_idx = self.tr[u].son[i]
fail_node_idx = self.tr[u].fail
if son_node_idx != 0:
self.tr[son_node_idx].fail = self.tr[fail_node_idx].son[i]
self.tr[self.tr[son_node_idx].fail].du += 1
q.append(son_node_idx)
else:
self.tr[u].son[i] = self.tr[fail_node_idx].son[i]
def query(self, text):
u = 0
for char in text:
char_code = ord(char) - ord('a')
u = self.tr[u].son[char_code]
self.tr[u].ans += 1
def calculate_final_answers(self):
self.final_ans = [0] * (self.pidx + 1)
q = deque()
for i in range(self.tot + 1):
if self.tr[i].du == 0:
q.append(i)
while q:
u = q.popleft()
node_idx = self.tr[u].idx
if node_idx != 0:
self.final_ans[node_idx] = self.tr[u].ans
v = self.tr[u].fail
self.tr[v].ans += self.tr[u].ans
self.tr[v].du -= 1
if self.tr[v].du == 0:
q.append(v)
if __name__ == "__main__":
MAX_NODES = 200000 + 6
MAX_N = 200000 + 6
ac = ACAutomaton(MAX_NODES)
n = 5
pattern_end_node_ids = [0] * (n + 1)
my_input=["a","bb","aa","abaa","abaaa","abaaabaa"]
text = "abaaabaa"
for i in range(1, n+1):
pattern = my_input[i-1]
pattern_end_node_ids[i] = ac.insert(pattern)
ac.build()
ac.query(text)
ac.calculate_final_answers()
for i in range(1, n + 1):
unique_id = pattern_end_node_ids[i]
print(ac.final_ans[unique_id])
- Output:
6 0 3 2 1
Rust
use std::collections::VecDeque;
use std::io::{self, BufRead};
// Define the Node struct, similar to the Python class
#[derive(Clone, Default, Debug)] // Derive Default for easy initialization, Clone for vec! macro
struct Node {
son: [usize; 26], // Child node indices (0 means no child)
ans: usize, // Counter for matches ending/passing through here
fail: usize, // Failure link node index
du: usize, // In-degree in the failure graph (for topological sort)
idx: usize, // Unique ID if this node marks the end of a pattern (0 otherwise)
}
// Define the ACAutomaton struct
struct ACAutomaton {
tr: Vec<Node>, // Vector storing all nodes
tot: usize, // Total number of nodes created (next available index)
final_ans: Vec<usize>, // Stores final counts for each pattern ID
pidx: usize, // Counter for assigning unique pattern IDs
}
impl ACAutomaton {
// Constructor
fn new(max_nodes: usize) -> Self {
ACAutomaton {
// Initialize with max_nodes + 1 capacity (index 0 is the root)
// Using vec![Node::default(); max_nodes] pre-allocates and default-initializes
tr: vec![Node::default(); max_nodes],
tot: 0, // Root node is at index 0, next node will be 1
final_ans: Vec::new(),
pidx: 0, // Pattern IDs start from 1
}
}
// Resets the automaton (optional, similar to Python's init)
// Note: In Rust, often you'd just create a new instance instead of resetting.
fn init(&mut self, max_nodes: usize) {
self.tr = vec![Node::default(); max_nodes];
self.tot = 0;
self.pidx = 0;
self.final_ans.clear();
}
// Inserts a pattern into the Trie
fn insert(&mut self, pattern: &str) -> usize {
let mut u = 0; // Start at the root node (index 0)
for char in pattern.chars() {
// Ensure character is a lowercase letter
if !('a'..='z').contains(&char) {
// Handle invalid characters if necessary, here we panic
panic!("Invalid character in pattern: {}", char);
}
let char_code = (char as u8 - b'a') as usize;
// If child node doesn't exist, create it
if self.tr[u].son[char_code] == 0 {
self.tot += 1;
// Check if we exceed allocated size
if self.tot >= self.tr.len() {
// In a real scenario, you might resize self.tr or return an error
panic!("Exceeded maximum number of nodes");
}
self.tr[u].son[char_code] = self.tot;
}
// Move to the child node
u = self.tr[u].son[char_code];
}
// If this node hasn't been marked as an end node yet, assign a new ID
if self.tr[u].idx == 0 {
self.pidx += 1;
self.tr[u].idx = self.pidx;
}
// Return the unique ID for the pattern ending at this node
self.tr[u].idx
}
// Builds the failure links using BFS
fn build(&mut self) {
let mut q = VecDeque::new();
// Initialize queue with direct children of the root
for i in 0..26 {
if self.tr[0].son[i] != 0 {
q.push_back(self.tr[0].son[i]);
// Failure link for root's children is the root itself (index 0)
// Their in-degree (du) remains 0 as root doesn't contribute
}
}
while let Some(u) = q.pop_front() {
for i in 0..26 {
let son_node_idx = self.tr[u].son[i];
// Use a temporary variable to avoid borrowing issues
let fail_node_idx_for_u = self.tr[u].fail;
if son_node_idx != 0 {
// Find failure link for the child: follow parent's fail link
// and see if it has a child for the same character `i`.
self.tr[son_node_idx].fail = self.tr[fail_node_idx_for_u].son[i];
// Increment the in-degree (du) of the node pointed to by the failure link
// This is needed for the final answer calculation (topological sort)
let target_fail_node_idx = self.tr[son_node_idx].fail;
self.tr[target_fail_node_idx].du += 1;
// Add the child node to the queue for processing
q.push_back(son_node_idx);
} else {
// If child doesn't exist for character `i`, make the current node's
// `son[i]` point to the same node that its failure node points to for `i`.
// This optimizes the query process by pre-calculating transitions.
self.tr[u].son[i] = self.tr[fail_node_idx_for_u].son[i];
}
}
}
}
// Queries the automaton with the given text
fn query(&mut self, text: &str) {
let mut u = 0; // Start at the root
for char in text.chars() {
// Ensure character is a lowercase letter
if !('a'..='z').contains(&char) {
// Handle invalid characters if necessary, here we skip
// Or you could panic: panic!("Invalid character in text: {}", char);
continue;
}
let char_code = (char as u8 - b'a') as usize;
// Transition to the next state using the trie links
// (which also incorporate failure links thanks to build())
u = self.tr[u].son[char_code];
// Increment the answer count for the current node
// This node represents the longest suffix of the text processed so far
// that is also a prefix in the dictionary.
self.tr[u].ans += 1;
// Note: The final propagation of counts happens in calculate_final_answers
}
}
// Calculates final answers by propagating counts up the failure links
// using a topological sort based on the failure graph.
fn calculate_final_answers(&mut self) {
// Initialize final_ans vector (size is based on the number of unique patterns)
// +1 because pattern IDs (pidx) are 1-based.
self.final_ans = vec![0; self.pidx + 1];
let mut q = VecDeque::new();
// Initialize queue with nodes having in-degree 0 in the failure graph
// These are the nodes that are not pointed to by any failure link (or only by root's children initially)
// Iterate from 0 to self.tot (inclusive) as node indices go up to self.tot
for i in 0..=self.tot {
if self.tr[i].du == 0 {
q.push_back(i);
}
}
// Process nodes in topological order
while let Some(u) = q.pop_front() {
let node_idx = self.tr[u].idx;
let current_ans = self.tr[u].ans; // Store ans before modifying fail node
// If this node corresponds to the end of a pattern, record its count
if node_idx != 0 {
if node_idx > self.pidx {
// Should not happen if logic is correct
panic!("Invalid node_idx encountered: {}", node_idx);
}
self.final_ans[node_idx] = current_ans;
}
// Propagate the count of this node to its failure node
let v = self.tr[u].fail;
self.tr[v].ans += current_ans; // Add counts from u to its failure node v
// Decrease the in-degree of the failure node
self.tr[v].du -= 1;
// If the failure node's in-degree becomes 0, add it to the queue
if self.tr[v].du == 0 {
q.push_back(v);
}
}
}
// Helper function to get the final answer for a specific pattern ID
fn get_ans(&self, pattern_id: usize) -> usize {
if pattern_id > 0 && pattern_id < self.final_ans.len() {
self.final_ans[pattern_id]
} else {
// Handle invalid ID, return 0 or panic/error
0
}
}
}
fn main() -> io::Result<()> {
// Constants similar to Python
const MAX_NODES: usize = 200000 + 6;
// MAX_N is implicitly handled by vector size if reading from input
// Create the automaton
let mut ac = ACAutomaton::new(MAX_NODES);
// --- Example Usage (Matching Python Code) ---
let n = 5;
let mut pattern_end_node_ids = vec![0; n + 1]; // Use vec! macro for initialization
let my_input = ["a", "bb", "aa", "abaa", "abaaa"]; // Removed "abaaabaa" as it was the text
let text = "abaaabaa";
println!("Inserting patterns:");
for i in 0..n {
let pattern = my_input[i];
println!(" - \"{}\"", pattern);
// Store the unique ID returned by insert. Note the index shift (i+1).
pattern_end_node_ids[i + 1] = ac.insert(pattern);
}
println!("Total nodes after insert: {}", ac.tot);
println!("Pattern IDs assigned: {:?}", &pattern_end_node_ids[1..=n]);
println!("\nBuilding failure links...");
ac.build();
println!("Build complete.");
println!("\nQuerying text: \"{}\"", text);
ac.query(text);
println!("Query complete.");
println!("\nCalculating final answers...");
ac.calculate_final_answers();
println!("Calculation complete.");
println!("\nResults:");
for i in 1..=n {
let unique_id = pattern_end_node_ids[i];
let count = ac.get_ans(unique_id); // Use helper or direct access
// let count = ac.final_ans[unique_id]; // Direct access also works if calculate_final_answers was called
println!("Pattern \"{}\" (ID {}): {}", my_input[i-1], unique_id, count);
}
println!();
Ok(())
}
- Output:
Inserting patterns: - "a" - "bb" - "aa" - "abaa" - "abaaa" Total nodes after insert: 8 Pattern IDs assigned: [1, 2, 3, 4, 5] Building failure links... Build complete. Querying text: "abaaabaa" Query complete. Calculating final answers... Calculation complete. Results: Pattern "a" (ID 1): 6 Pattern "bb" (ID 2): 0 Pattern "aa" (ID 3): 3 Pattern "abaa" (ID 4): 2 Pattern "abaaa" (ID 5): 1
Wren
import "./dynamic" for Struct
import "./queue" for Queue
/* Node class representing a node in the Trie/Automaton:
son: Children nodes for 'a' through 'z'. Stores index in the main 'tr' list.
ans: Count of times this node is visited during query (intermediate count).
fail: Index of the failure link node in the 'tr' list.
du: In-degree for the topological sort based on failure links.
idx: Unique ID assigned if this node marks the end of a pattern (0 otherwise).
*/
var Node_t = Struct.create("Node", ["son", "ans", "fail", "du", "idx"])
class Node is Node_t {
construct new() {
super(List.filled(26, 0), 0, 0, 0, 0)
}
}
/* ACAutomaton class implementing the Aho-Corasick algorithm. */
class ACAutomaton {
construct new(maxNodes) {
init_(maxNodes)
}
// Private method to initialize fields.
init_(maxNodes) {
// tr: List storing all Node objects. tr[0] is the root.
_tr = List.filled(maxNodes, null)
for (i in 0...maxNodes) _tr[i] = Node.new() // pre-allocate nodes
// tot: Total number of nodes created (index for the next new node).
_tot = 0 // root is node 0, next node will be 1
// finalAns: List to store final counts for each pattern (indexed by pattern ID).
_finalAns = []
// pidx: Counter for assigning unique IDs to patterns.
_pidx = 0
}
// Optional: Method to reset the automaton (if reusing the same instance)
reset() {
init_(_tr.count)
}
// Inserts a pattern into the Trie structure.
insert(pattern) {
var u = 0 // start at the root node
for (cp in pattern.codePoints) {
// Ensure cp represents a lowercase letter ASCII letter.
if (cp < 97 || cp > 122) {
Fiber.abort("Invalid character \"%(String.fromCodePoint(cp))\" in pattern.")
}
var charCode = cp - 97 // subtract code for 'a'
// If child node for this character doesn't exist, create it.
if (_tr[u].son[charCode] == 0) {
_tot = _tot + 1 // increment total node count
// Link parent to new child node index.
// Note: Node object for _tr[_tot] was already created in constructor.
_tr[u].son[charCode] = _tot
}
// Move to the child node.
u = _tr[u].son[charCode]
}
// Mark the end node of the pattern with a unique ID if it doesn't have one.
if (_tr[u].idx == 0) {
_pidx = _pidx + 1 // increment pattern ID counter
_tr[u].idx = _pidx // assign the ID
}
// Return the unique ID associated with this pattern.
return _tr[u].idx
}
// Property getter for 'finalAns'.
finalAns { _finalAns }
// Builds the failure links using BFS.
build() {
var q = Queue.new()
// Initialize queue with children of the root.
for (i in 0..25) {
if (_tr[0].son[i] != 0) {
q.push(_tr[0].son[i])
// Optional: Nodes directly under root have root (0)
// as fail link (already default).
// _tr[_tr[0].son[i]].fail = 0
}
}
while (!q.isEmpty) {
var u = q.pop() // dequeue node index
// Iterate through all possible characters ('a' to 'z').
for (i in 0..25) {
var sonNodeIdx = _tr[u].son[i]
var failNodeIdx = _tr[u].fail
if (sonNodeIdx != 0) {
// If a direct child exists:
// Its failure link is the node reached by following the parent's
// failure link and then taking the same character transition.
_tr[sonNodeIdx].fail = _tr[failNodeIdx].son[i]
// Increment the in-degree of the node pointed to by the fail link
// (for the final calculation step).
_tr[_tr[sonNodeIdx].fail].du = _tr[_tr[sonNodeIdx].fail].du + 1
// Enqueue the child node.
q.push(sonNodeIdx)
} else {
// If a direct child doesn't exist (son is 0):
// Create a "virtual" transition by pointing this character's slot
// to the node reached by following the parent's failure link
// and taking the same character transition. This optimizes the query phase.
_tr[u].son[i] = _tr[failNodeIdx].son[i]
}
}
}
}
// Queries the text against the built automaton.
query(text) {
var u = 0 // start at the root
for (cp in text.codePoints) {
// Ensure cp represents a lowercase letter ASCII letter.
if (cp < 97 || cp > 122) {
Fiber.abort("Invalid character \"%(String.fromCodePoint(cp))\" in text.")
}
var charCode = cp - 97 // subtract the code for 'a'
// Follow the transitions (or failure links implicitly via build step).
u = _tr[u].son[charCode]
// Increment the count for the current node.
_tr[u].ans = _tr[u].ans + 1
}
}
// Calculates the final counts for each pattern using failure links.
calculateFinalAnswers() {
// Initialize final answer array based on the number of unique patterns found.
_finalAns = List.filled(_pidx + 1, 0)
var q = Queue.new() // queue for topological sort
// Find all nodes with an in-degree of 0 (start points for the sort)
// Iterate from 0 (root) up to the last created node index.
for (i in 0.._tot) {
if (_tr[i].du == 0) q.push(i)
}
// Perform topological sort on the reversed failure link graph.
while (!q.isEmpty) {
var u = q.pop() // dequeue node index
// If this node represents the end of a pattern, store its accumulated count.
var nodeIdx = _tr[u].idx
if (nodeIdx != 0) _finalAns[nodeIdx] = _tr[u].ans
// Propagate the count to the node pointed to by the failure link.
var v = _tr[u].fail
if (v) { // Check if fail link exists (root's fail is 0)
_tr[v].ans = _tr[v].ans + _tr[u].ans
// Decrease the in-degree of the fail link node.
_tr[v].du = _tr[v].du - 1
// If the fail link node's in-degree becomes 0, enqueue it.
if (_tr[v].du == 0) q.push(v)
}
}
}
}
var MAX_NODES = 200000 + 6
var ac = ACAutomaton.new(MAX_NODES)
var n = 5
var patternEndNodeIds = List.filled(n + 1, 0) // using 1-based indexing
// Input data (hardcoded as per the example)
var myInput = ["a", "bb", "aa", "abaa", "abaaa"]
var text = "abaaabaa"
System.print("Inserting patterns...")
// Insert patterns and store their unique IDs
for (i in 1..n) {
var pattern = myInput[i - 1] // adjust index for 0-based myInput
patternEndNodeIds[i] = ac.insert(pattern)
// System.print("Inserted \"%(pattern)\", assigned ID: %(patternEndNodeIds[i])")
}
System.print("Building failure links...")
ac.build()
System.print("Querying text...")
ac.query(text)
System.print("Calculating final answers...")
ac.calculateFinalAnswers()
System.print("\nResults:")
// Print the final counts for each pattern.
for (i in 1..n) {
var uniqueId = patternEndNodeIds[i]
System.print("Pattern \"%(myInput[i-1])\" count: %(ac.finalAns[uniqueId])")
}
- Output:
Inserting patterns... Building failure links... Querying text... Calculating final answers... Results: Pattern "a" count: 6 Pattern "bb" count: 0 Pattern "aa" count: 3 Pattern "abaa" count: 2 Pattern "abaaa" count: 1