Execute a Markov algorithm
This page uses content from Wikipedia. The original article was at Markov_algorithm. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance) |
You are encouraged to solve this task according to the task description, using any language you may know.
Create an interpreter for a Markov Algorithm. Rules have the syntax:
<comment> ::= # {<any character>} <rule> ::= <pattern> <whitespace> -> <whitespace> [.] <replacement> <whitespace> ::= (<tab> | <space>) [<whitespace>]
There is one rule per line. If there is a . present before the <replacement>, then this is a terminating rule in which case the interpreter must halt execution.
Use the following three tests on entries:
Ruleset 1:
# This rules file is extracted from Wikipedia: # http://en.wikipedia.org/wiki/Markov_Algorithm A -> apple B -> bag S -> shop T -> the the shop -> my brother a never used -> .terminating rule
Sample text of:
"I bought a B of As from T S."
Should generate the output:
"I bought a bag of apples from my brother."
Ruleset 2:
A test of the terminating rule
# Slightly modified from the rules on Wikipedia A -> apple B -> bag S -> .shop T -> the the shop -> my brother a never used -> .terminating rule
Sample text of:
"I bought a B of As from T S."
Should generate:
"I bought a bag of apples from T shop."
Ruleset 3:
A stretch goal. This tests the for correct substitution order and may trap simple regexp based replacement routines if special regexp characters are not escaped
# BNF Syntax testing rules A -> apple WWWW -> with Bgage -> ->.* B -> bag ->.* -> money W -> WW S -> .shop T -> the the shop -> my brother a never used -> .terminating rule
Sample text of:
"I bought a B of As W my Bgage from T S.""
Should generate:
"I bought a bag of apples with my money from T shop."
J
Solution:<lang j>require'strings regex'
NB. Lex a Markov program markovLexer =: verb define rules =. LF cut TAB&=`(,:&' ')}y rules =. a: -.~ (dltb@:{.~ i:&'#')&.> rules 0 _1 {"1 '\s+->\s+' (rxmatch rxcut ])S:0 rules )
NB. Given ruleset and target string, output NB. result. markov =: markovLexer@[ stringreplace^:_ ]
NB. Same as above, but output all intermediate NB. evaluations markovDebug =: markovLexer@[ stringreplace^:a: ]</lang>
Example:<lang j> m1 =. noun define # This rules file is extracted from Wikipedia: # http://en.wikipedia.org/wiki/Markov_Algorithm A -> apple B -> bag S -> shop T -> the the shop -> my brother a never used -> .terminating rule )
m1 markov 'I bought a B of As from T S.'
I bought a bag of apples from my brother.
m1 markovDebug 'I bought a B of As from T S.'
I bought a B of As from T S. I bought a bag of apples from the shop. I bought a bag of apples from my brother.</lang> Discussion: This solution implemented in 20 seconds and doesn't fully implement a Markov algorithm. More details on the talk page.
Python
The example uses a regexp to parse the syntax of the grammar. This regexp is multi-line and verbose and uses named groups to aid in understanding the regexp and to allow more meaningful group names to be used when extracting the replacement data from the grammars in function extractreplacements
.
<lang python>import re
syntaxre = r"""(?mx) ^(?:
(?: (?P<comment> \# .* ) ) | (?: (?P<blank> \s* ) (?: \n | $ ) ) | (?: (?P<rule> (?P<pat> .+? ) \s+ -> \s+ (?P<term> \.)? (?P<repl> .+) ) )
)$ """
grammar1 = """\
- This rules file is extracted from Wikipedia:
- http://en.wikipedia.org/wiki/Markov_Algorithm
A -> apple B -> bag S -> shop T -> the the shop -> my brother a never used -> .terminating rule """
grammar2 = \
- Slightly modified from the rules on Wikipedia
A -> apple B -> bag S -> .shop T -> the the shop -> my brother a never used -> .terminating rule
text1 = "I bought a B of As from T S."
def extractreplacements(grammar):
return [ (matchobj.group('pat'), matchobj.group('repl'), bool(matchobj.group('term'))) for matchobj in re.finditer(syntaxre, grammar) if matchobj.groupdict()['rule']]
def replace(text, replacements):
terminate = False while not terminate: for (pat, repl, isterm) in replacements: if pat in text: text = text.replace(pat, repl, 1) terminate = isterm break else: terminate = True return text
if __name__ == '__main__':
assert replace(text1, extractreplacements(grammar1)) \ == 'I bought a bag of apples from my brother.' assert replace(text1, extractreplacements(grammar2)) \ == 'I bought a bag of apples from T shop.'</lang>
Ruby
<lang Ruby>raise "Please input an input code file, an input data file, and an output file." if ARGV.size < 3
rules = File.readlines(ARGV[0]).inject([]) do |rules, line|
if line =~ /^\s*#/ rules elsif line =~ /^(.+)\s+->\s+(\.?)(.*)$/ rules << [$1, $3, $2 != ""] else raise "Syntax error: #{line}" end
end
File.open(ARGV[2], "w") do |file|
file.write(File.read(ARGV[1]).tap { |input_data| while (matched = rules.find { |match, replace, term| input_data[match] and input_data.sub!(match, replace) }) and !matched[2] end })
end</lang>
Tcl
<lang tcl>package require Tcl 8.5 if {$argc < 3} {error "usage: $argv0 ruleFile inputFile outputFile"} lassign $argv ruleFile inputFile outputFile
- Read the file of rules
set rules {} set f [open $ruleFile] foreach line [split [read $f] \n[close $f]] {
if {[string match "#*" $line] || $line eq ""} continue if {[regexp {^(.+)\s+->\s+(\.?)(.*)$} $line -> from final to]} {
lappend rules $from $to [string equal "." $to] [string length $from]
} else {
error "Syntax error: \"$line\""
}
}
- Apply the rules
set f [open $inputFile] set out [open $outputFile w] foreach line [split [read $f] \n[close $f]] {
set any 1 while {$any} {
set any 0 foreach {from to stop fl} $rules { # If we match the 'from' pattern... if {[set idx [string first $from $line]] < 0} { continue }
# Change for the 'to' replacement set line [string replace $line $idx [expr {$idx+$fl-1}] $to]
# Stop if we terminate, otherwise note that we've more work to do if {$stop} { set any 0 break } else { set any 1 } }
}
# Output the processed line puts $out $line
} close $out</lang> In the case where there are no terminating rules and no overlapping issues, the following is an alternative: <lang tcl>package require Tcl 8.5 if {$argc < 3} {error "usage: $argv0 ruleFile inputFile outputFile"} lassign $argv ruleFile inputFile outputFile
- Read the file of rules
set rules {} set f [open $ruleFile] foreach line [split [read $f] \n[close $f]] {
if {[string match "#*" $line] || $line eq ""} continue if {[regexp {^(.+)\s+->\s+(.*)$} $line -> from to]} { dict set rules $from $to } else {
error "Syntax error: \"$line\""
}
}
- Apply the rules in a simplistic manner
set in [open $inputFile] set out [open $outputFile w] set data [read $in] close $in while 1 {
set newData [string map $rules $data] if {$newData eq $data} break set data $newData
} puts $out $data close $out</lang>