Execute a Markov algorithm: Difference between revisions

Content added Content deleted

Inline

Revision as of 18:40, 16 December 2009

This page uses content from Wikipedia. The original article was at Markov_algorithm. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)

Create an interpreter for a Markov Algorithm. Rules have the syntax:

<ruleset> ::= ((<comment> | <rule>) <newline>+)*
<comment> ::= # {<any character>}
<rule> ::= <pattern> <whitespace> -> <whitespace> [.] <replacement>
<whitespace> ::= (<tab> | <space>) [<whitespace>]

There is one rule per line. If there is a . present before the <replacement>, then this is a terminating rule in which case the interpreter must halt execution. A ruleset consists of a sequence of rules, with optional comments.

In order to promote flexibility, the interpreter should load the set of rules from one file, take the string to operate on from a second file, and write the output to a third.

Use the following three tests on entries:

Ruleset 1:

# This rules file is extracted from Wikipedia:
# http://en.wikipedia.org/wiki/Markov_Algorithm
A -> apple
B -> bag
S -> shop
T -> the
the shop -> my brother
a never used -> .terminating rule

Sample text of:

   "I bought a B of As from T S."

Should generate the output:

   "I bought a bag of apples from my brother."

Ruleset 2:
A test of the terminating rule

# Slightly modified from the rules on Wikipedia
A -> apple
B -> bag
S -> .shop
T -> the
the shop -> my brother
a never used -> .terminating rule

Sample text of:

   "I bought a B of As from T S."

Should generate:

   "I bought a bag of apples from T shop."

Ruleset 3:
A stretch goal. This tests for correct substitution order and may trap simple regexp based replacement routines if special regexp characters are not escaped.

# BNF Syntax testing rules
A -> apple
WWWW -> with
Bgage -> ->.*
B -> bag
->.* -> money
W -> WW
S -> .shop
T -> the
the shop -> my brother
a never used -> .terminating rule

Sample text of:

    "I bought a B of As W my Bgage from T S."

Should generate:

    "I bought a bag of apples with my money from T shop."

C++

Note: Non-use of iswhite is intentional, since depending on the locale, other chars besides space and tab might be detected by that function. <lang cpp>

include <cstdlib>
include <iostream>
include <fstream>
include <vector>
include <string>

struct rule {

 std::string pattern;
 std::string replacement;
 bool terminal;
 rule(std::string pat, std::string rep, bool term):
   pattern(pat),
   replacement(rep),
   terminal(term)
 {
 }

};

std::string const whitespace = " \t"; std::string::size_type const npos = std::string::npos;

bool is_whitespace(char c) {

 return whitespace.find(c) != npos;

}

int main(int argc, char* argv[]) {

 if (argc != 3)
 {
   std::cout << "usage:\n " << argv[0] << " rulefile text\n";
   return EXIT_FAILURE;
 }

 std::ifstream rulefile(argv[1]);
 std::vector<rule> rules;
 std::string line;
 while (std::getline(rulefile, line))
 {
   std::string::size_type pos;

   // remove comments
   pos = line.find('#');
   if (pos != npos)
     line.resize(pos);

   // ignore lines consisting only of whitespace
   if (line.find_first_not_of(whitespace) == npos)
     continue;

   // find "->" surrounded by whitespace
   pos = line.find("->");
   while (pos != npos && (pos == 0 || !is_whitespace(line[pos-1])))
     pos = line.find("->", pos+1);

   if (pos == npos || line.length() < pos+3 || !is_whitespace(line[pos+2]))
   {
     std::cerr << "invalid rule: " << line << "\n";
     return EXIT_FAILURE;
   }

   std::string pattern = line.substr(0, pos-1);
   std::string replacement = line.substr(pos+3);

   // remove additional separating whitespace
   pattern.erase(pattern.find_last_not_of(whitespace)+1);
   replacement.erase(0, replacement.find_first_not_of(whitespace));

   // test for terminal rule
   bool terminal = !replacement.empty() && replacement[0] == '.';
   if (terminal)
     replacement.erase(0,1);

   rules.push_back(rule(pattern, replacement, terminal));
 }

 std::string text = argv[2];
 std::vector<rule>::iterator iter = rules.begin();
 while (iter != rules.end())
 {
   std::string::size_type pos = text.find(iter->pattern);
   if (pos != npos)
   {
     text.replace(pos, iter->pattern.length(), iter->replacement);
     if (iter->terminal)
       break;
     iter = rules.begin();
   }
   ++iter;
 }

 std::cout << text << "\n";

} </lang>

Haskell

This program expects a source file as an argument and uses the standard input and output devices for the algorithm's I/O.

<lang haskell>import Data.List (isPrefixOf) import Data.Maybe (catMaybes) import Control.Monad import Text.ParserCombinators.Parsec import System.IO import System.Environment (getArgs)

main = do

  args <- getArgs
  unless (length args == 1) $
      fail "Please provide exactly one source file as an argument."
  let sourcePath = head args
  source <- readFile sourcePath
  input <- getContents
  case parse markovParser sourcePath source of
      Right rules -> putStrLn $ runMarkov rules input
      Left  err   -> hPutStrLn stderr $ "Parse error at " ++ show err

data Rule = Rule

  {from :: String, terminating :: Bool, to :: String}

markovParser :: Parser [Rule] markovParser = liftM catMaybes $

   (comment <|> rule) `sepEndBy` many1 newline
 where comment = char '#' >> skipMany nonnl >> return Nothing
       rule = liftM Just $ liftM3 Rule
           (manyTill (nonnl <?> "pattern character") $ try arrow)
           (succeeds $ char '.')
           (many nonnl)
       arrow = ws >> string "->" >> ws <?> "whitespace-delimited arrow"
       nonnl = noneOf "\n"
       ws = many1 $ oneOf " \t"
       succeeds p = option False $ p >> return True

runMarkov :: [Rule] -> String -> String runMarkov rules s = f rules s

 where f []                              s = s
       f (Rule from terminating to : rs) s = g "" s
         where g _      ""    = f rs s
               g before ahead@(a : as) = if from `isPrefixOf` ahead
                 then let new = reverse before ++ to ++ drop (length from) ahead
                      in if terminating then new else f rules new
                 else g (a : before) as</lang>

J

Solution:<lang j>require'strings regex'

markovLexer =: verb define

 rules =.  LF cut TAB&=`(,:&' ')}y
 rules =.  a: -.~ (dltb@:{.~ i:&'#')&.> rules
 rules =.  0 _1 {"1 '\s+->\s+' (rxmatch rxcut ])S:0 rules
 (,. ] (}.&.>~ ,. ]) ('.'={.)&.>)/ |: rules

)

replace =: dyad define

 'index patternLength replacement'=.  x
 'head tail' =.  index split y
 head, replacement, patternLength }. tail

)

matches =: E. i. 1:

markovStrict =: dyad define

 ruleIdx =. 0 [ rules =.  markovLexer x
 while. ruleIdx < #rules do.
   'pattern replacement terminating' =. ruleIdx { rules
   if. (#y) > index =. pattern matches y do.
     y =. (index ; (#pattern) ; replacement) replace y
     if. terminating do.  ruleIdx =. #rules 
     else.                ruleIdx =. 0 end.
   else.
     ruleIdx =. 1 + ruleIdx
   end.
 end.
 y

)</lang>

Example:<lang j> m1 =. noun define # This rules file is extracted from Wikipedia: # http://en.wikipedia.org/wiki/Markov_Algorithm A -> apple B -> bag S -> shop T -> the the shop -> my brother a never used -> .terminating rule )

  m1 markov 'I bought a B of As from T S.'

I bought a bag of apples from my brother. </lang> Discussion: Reasons for the explicit while. loop and more details are available on the the talk page.

Perl

This program expects a source file as an argument and uses the standard input and output devices for the algorithm's I/O.

<lang perl>@ARGV == 1 or die "Please provide exactly one source file as an argument.\n"; open my $source, '<', $ARGV[0] or die qq[I couldn't open "$ARGV[0]" for reading. ($!.)\n]; my @rules; while (<$source>)

  {/\A#/ and next;
   my @a = /(.*?)\s+->\s+(\.?)(.*)/ or die "Syntax error: $_";
   push @rules, \@a;}

close $source;

my $input = do {local $/; <>;};

OUTER:

  {foreach (@rules)
      {my ($from, $terminating, $to) = @$_;
       $input =~ s/\Q$from\E/$to/
           and ($terminating ? last OUTER : redo OUTER);}}

print $input;</lang>

Python

The example uses a regexp to parse the syntax of the grammar. This regexp is multi-line, verbose and uses named groups to aid in understanding the regexp and to allow more meaningful group names to be used when extracting the replacement data from the grammars in function extractreplacements.

<lang python>import re

syntaxre = r"""(?mx) ^(?:

 (?: (?P<comment> \# .* ) ) |
 (?: (?P<blank>   \s*  ) (?: \n | $ )  ) |
 (?: (?P<rule>    (?P<pat> .+? ) \s+ -> \s+ (?P<term> \.)? (?P<repl> .+) ) )

)$ """

grammar1 = """\

This rules file is extracted from Wikipedia:
http://en.wikipedia.org/wiki/Markov_Algorithm

A -> apple B -> bag S -> shop T -> the the shop -> my brother a never used -> .terminating rule """

grammar2 = \

Slightly modified from the rules on Wikipedia

A -> apple B -> bag S -> .shop T -> the the shop -> my brother a never used -> .terminating rule

grammar3 = \

BNF Syntax testing rules

A -> apple WWWW -> with Bgage -> ->.* B -> bag ->.* -> money W -> WW S -> .shop T -> the the shop -> my brother a never used -> .terminating rule

text1 = "I bought a B of As from T S."

text2 = "I bought a B of As W my Bgage from T S."

def extractreplacements(grammar):

   return [ (matchobj.group('pat'), matchobj.group('repl'), bool(matchobj.group('term')))
               for matchobj in re.finditer(syntaxre, grammar)
               if matchobj.group('rule')]

def replace(text, replacements):

   while True:
       for pat, repl, term in replacements:
           if pat in text:
               text = text.replace(pat, repl, 1)
               if term:
                   return text
               break
       else:
           return text

if __name__ == '__main__':

   assert replace(text1, extractreplacements(grammar1)) \
          == 'I bought a bag of apples from my brother.'
   assert replace(text1, extractreplacements(grammar2)) \
          == 'I bought a bag of apples from T shop.'
   # Stretch goal
   assert replace(text2, extractreplacements(grammar3)) \
          == 'I bought a bag of apples with my money from T shop.'</lang>

Ruby

Works with: Ruby version 1.8.7

<lang Ruby>raise "Please input an input code file, an input data file, and an output file." if ARGV.size < 3

rules = File.readlines(ARGV[0]).inject([]) do |rules, line|

 if line =~ /^\s*#/
   rules
 elsif line =~ /^(.+)\s+->\s+(\.?)(.*)$/
   rules << [$1, $3, $2 != ""]
 else
   raise "Syntax error: #{line}"
 end

end

File.open(ARGV[2], "w") do |file|

 file.write(File.read(ARGV[1]).tap { |input_data|
   while (matched = rules.find { |match, replace, term|
     input_data[match] and input_data.sub!(match, replace)
   }) and !matched[2]
   end
 })

end</lang>

Tcl

Works with: Tcl version 8.5

<lang tcl>package require Tcl 8.5 if {$argc < 3} {error "usage: $argv0 ruleFile inputFile outputFile"} lassign $argv ruleFile inputFile outputFile

Read the file of rules

set rules {} set f [open $ruleFile] foreach line [split [read $f] \n[close $f]] {

   if {[string match "#*" $line] || $line eq ""} continue
   if {[regexp {^(.+)\s+->\s+(\.?)(.*)$} $line -> from final to]} {

lappend rules $from $to [string equal "." $to] [string length $from]

   } else {

error "Syntax error: \"$line\""

}

Apply the rules

set f [open $inputFile] set out [open $outputFile w] foreach line [split [read $f] \n[close $f]] {

   set any 1
   while {$any} {

set any 0 foreach {from to stop fl} $rules { # If we match the 'from' pattern... if {[set idx [string first $from $line]] < 0} { continue }

# Change for the 'to' replacement set line [string replace $line $idx [expr {$idx+$fl-1}] $to]

# Stop if we terminate, otherwise note that we've more work to do if {$stop} { set any 0 break } else { set any 1 } }

   # Output the processed line
   puts $out $line

} close $out</lang> In the case where there are no terminating rules and no overlapping issues, the following is an alternative: <lang tcl>package require Tcl 8.5 if {$argc < 3} {error "usage: $argv0 ruleFile inputFile outputFile"} lassign $argv ruleFile inputFile outputFile

Read the file of rules

set rules {} set f [open $ruleFile] foreach line [split [read $f] \n[close $f]] {

   if {[string match "#*" $line] || $line eq ""} continue
   if {[regexp {^(.+)\s+->\s+(.*)$} $line -> from to]} {
       dict set rules $from $to
   } else {

error "Syntax error: \"$line\""

}

Apply the rules in a simplistic manner

set in [open $inputFile] set out [open $outputFile w] set data [read $in] close $in while 1 {

   set newData [string map $rules $data]
   if {$newData eq $data} break
   set data $newData

} puts $out $data close $out</lang>