Syntax Analyzer

Task
Compiler/syntax analyzer
You are encouraged to solve this task according to the task description, using any language you may know.

A Syntax analyzer transforms a token stream (from the Lexical analyzer) into a Syntax tree, based on a grammar.

Take the output from the Lexical analyzer task, and convert it to an Abstract Syntax Tree (AST), based on the grammar below. The output should be in a flattened format.

The program should read input from a file and/or stdin, and write output to a file and/or stdout. If the language being used has a parser module/library/class, it would be great if two versions of the solution are provided: One without the parser module, and one with.

Grammar

The simple programming language to be analyzed is more or less a (very tiny) subset of C. The formal grammar in Extended Backus-Naur Form (EBNF):

<lang EBNF>

   stmt_list           =   {stmt} ;
   stmt                =   ';'
                         | Identifier '=' expr ';'
                         | 'while' paren_expr stmt
                         | 'if' paren_expr stmt ['else' stmt]
                         | 'print' '(' prt_list ')' ';'
                         | 'putc' paren_expr ';'
                         | '{' stmt_list '}'
                         ;
   paren_expr          =   '(' expr ')' ;
   prt_list            =   (string | expr) {',' (String | expr)} ;
   expr                =   and_expr            {'||' and_expr} ;
   and_expr            =   equality_expr       {'&&' equality_expr} ;
   equality_expr       =   relational_expr     [('==' | '!=') relational_expr] ;
   relational_expr     =   addition_expr       [('<' | '<=' | '>' | '>=') addition_expr] ;
   addition_expr       =   multiplication_expr {('+' | '-') multiplication_expr} ;
   multiplication_expr =   primary             {('*' | '/' | '%') primary } ;
   primary             =   Identifier
                         | Integer
                         | '(' expr ')'
                         | ('+' | '-' | '!') primary
                         ;</lang>

The resulting AST should be formulated as a Binary Tree.

Example - given the simple program (below), stored in a file called while.t, create the list of tokens, using one of the Lexical analyzer solutions
lex < while.t > while.lex
Run one of the Syntax analyzer solutions
parse < while.lex > while.ast
The following table shows the input to lex, lex output, and the AST produced by the parser
Input to lex Output from lex, input to parse Output from parse

<lang c>count = 1;

while (count < 10) {
    print("count is: ", count, "\n");
    count = count + 1;
}</lang>
    1      1 Identifier      count
    1      7 Op_assign
    1      9 Integer             1
    1     10 Semicolon
    2      1 Keyword_while
    2      7 LeftParen
    2      8 Identifier      count
    2     14 Op_less
    2     16 Integer            10
    2     18 RightParen
    2     20 LeftBrace
    3      5 Keyword_print
    3     10 LeftParen
    3     11 String          "count is: "
    3     23 Comma
    3     25 Identifier      count
    3     30 Comma
    3     32 String          "\n"
    3     36 RightParen
    3     37 Semicolon
    4      5 Identifier      count
    4     11 Op_assign
    4     13 Identifier      count
    4     19 Op_add
    4     21 Integer             1
    4     22 Semicolon
    5      1 RightBrace
    6      1 End_of_input
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
While
Less
Identifier    count
Integer       10
Sequence
Sequence
;
Sequence
Sequence
Sequence
;
Prts
String        "count is: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
Specifications
List of node type names
Identifier String Integer Sequence If Prtc Prts Prti While Assign Negate Not Multiply Divide Mod
Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or

In the text below, Null/Empty nodes are represented by ";".

Non-terminal (internal) nodes

For Operators, the following nodes should be created:

Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or

For each of the above nodes, the left and right sub-nodes are the operands of the respective operation.

In pseudo S-Expression format:

(Operator expression expression)

Negate, Not

For these node types, the left node is the operand, and the right node is null.

(Operator expression ;)

Sequence - sub-nodes are either statements or Sequences.

If - left node is the expression, the right node is If node, with it's left node being the if-true statement part, and the right node being the if-false (else) statement part.

(If expression (If statement else-statement))

If there is not an else, the tree becomes:

(If expression (If statement ;))

Prtc

(Prtc (expression) ;)

Prts

(Prts (String "the string") ;)

Prti

(Prti (Integer 12345) ;)

While - left node is the expression, the right node is the statement.

(While expression statement)

Assign - left node is the left-hand side of the assignment, the right node is the right-hand side of the assignment.

(Assign Identifier expression)

Terminal (leaf) nodes:

Identifier: (Identifier ident_name)
Integer:    (Integer 12345)
String:     (String "Hello World!")
";":        Empty node
Some simple examples

Sequences denote a list node; they are used to represent a list. semicolon's represent a null node, e.g., the end of this path.

This simple program:

   a=11;

Produces the following AST, encoded as a binary tree:

Under each non-leaf node are two '|' lines. The first represents the left sub-node, the second represents the right sub-node:

   (1) Sequence
   (2)     |-- ;
   (3)     |-- Assign
   (4)         |-- Identifier: a
   (5)         |-- Integer: 11

In flattened form:

   (1) Sequence
   (2) ;
   (3) Assign
   (4) Identifier    a
   (5) Integer       11


This program:

   a=11;
   b=22;
   c=33;

Produces the following AST:

   ( 1) Sequence
   ( 2)     |-- Sequence
   ( 3)     |   |-- Sequence
   ( 4)     |   |   |-- ;
   ( 5)     |   |   |-- Assign
   ( 6)     |   |       |-- Identifier: a
   ( 7)     |   |       |-- Integer: 11
   ( 8)     |   |-- Assign
   ( 9)     |       |-- Identifier: b
   (10)     |       |-- Integer: 22
   (11)     |-- Assign
   (12)         |-- Identifier: c
   (13)         |-- Integer: 33

In flattened form:

   ( 1) Sequence
   ( 2) Sequence
   ( 3) Sequence
   ( 4) ;
   ( 5) Assign
   ( 6) Identifier    a
   ( 7) Integer       11
   ( 8) Assign
   ( 9) Identifier    b
   (10) Integer       22
   (11) Assign
   (12) Identifier    c
   (13) Integer       33
Pseudo-code for the parser.

Uses Precedence Climbing for expression parsing, and Recursive Descent for statement parsing. The AST is also built:

<lang python>def expr(p)

   if tok is "("
       x = paren_expr()
   elif tok in ["-", "+", "!"]
       gettok()
       y = expr(precedence of operator)
       if operator was "+"
           x = y
       else
           x = make_node(operator, y)
   elif tok is an Identifier
       x = make_leaf(Identifier, variable name)
       gettok()
   elif tok is an Integer constant
       x = make_leaf(Integer, integer value)
       gettok()
   else
       error()
   while tok is a binary operator and precedence of tok >= p
       save_tok = tok
       gettok()
       q = precedence of save_tok
       if save_tok is not right associative
           q += 1
       x = make_node(Operator save_tok represents, x, expr(q))
   return x

def paren_expr()

   expect("(")
   x = expr(0)
   expect(")")
   return x

def stmt()

   t = NULL
   if accept("if")
       e = paren_expr()
       s = stmt()
       t = make_node(If, e, make_node(If, s, accept("else") ? stmt() : NULL))
   elif accept("putc")
       t = make_node(Prtc, paren_expr())
       expect(";")
   elif accept("print")
       expect("(")
       repeat
           if tok is a string
               e = make_node(Prts, make_leaf(String, the string))
               gettok()
           else
               e = make_node(Prti, expr(0))
           t = make_node(Sequence, t, e)
       until not accept(",")
       expect(")")
       expect(";")
   elif tok is ";"
       gettok()
   elif tok is an Identifier
       v = make_leaf(Identifier, variable name)
       gettok()
       expect("=")
       t = make_node(Assign, v, expr(0))
       expect(";")
   elif accept("while")
       e = paren_expr()
       t = make_node(While, e, stmt()
   elif accept("{")
       while tok not equal "}" and tok not equal end-of-file
           t = make_node(Sequence, t, stmt())
       expect("}")
   elif tok is end-of-file
       pass
   else
       error()
   return t

def parse()

   t = NULL
   gettok()
   repeat
       t = make_node(Sequence, t, stmt())
   until tok is end-of-file
   return t</lang>
Once the AST is built, it should be output in a flattened format. This can be as simple as the following

<lang python>def prt_ast(t)

   if t == NULL
       print(";\n")
   else
       print(t.node_type)
       if t.node_type in [Identifier, Integer, String]     # leaf node
           print the value of the Ident, Integer or String, "\n"
       else
           print("\n")
           prt_ast(t.left)
           prt_ast(t.right)</lang>
If the AST is correctly built, loading it into a subsequent program should be as simple as

<lang python>def load_ast()

   line = readline()
   # Each line has at least one token
   line_list = tokenize the line, respecting double quotes
   text = line_list[0] # first token is always the node type
   if text == ";"   # a terminal node
       return NULL
   node_type = text # could convert to internal form if desired
   # A line with two tokens is a leaf node
   # Leaf nodes are: Identifier, Integer, String
   # The 2nd token is the value
   if len(line_list) > 1
       return make_leaf(node_type, line_list[1])
   left = load_ast()
   right = load_ast()
   return make_node(node_type, left, right)</lang>

Finally, the AST can also be tested by running it against one of the AST Interpreter solutions.

Test program, assuming this is in a file called prime.t
lex <prime.t | parse
Input to lex Output from lex, input to parse Output from parse

<lang c>/*

Simple prime number generator
*/

count = 1; n = 1; limit = 100; while (n < limit) {

   k=3;
   p=1;
   n=n+2;
   while ((k*k<=n) && (p)) {
       p=n/k*k!=n;
       k=k+2;
   }
   if (p) {
       print(n, " is prime\n");
       count = count + 1;
   }

} print("Total primes found: ", count, "\n");</lang>

    4      1 Identifier      count
    4      7 Op_assign
    4      9 Integer             1
    4     10 Semicolon
    5      1 Identifier      n
    5      3 Op_assign
    5      5 Integer             1
    5      6 Semicolon
    6      1 Identifier      limit
    6      7 Op_assign
    6      9 Integer           100
    6     12 Semicolon
    7      1 Keyword_while
    7      7 LeftParen
    7      8 Identifier      n
    7     10 Op_less
    7     12 Identifier      limit
    7     17 RightParen
    7     19 LeftBrace
    8      5 Identifier      k
    8      6 Op_assign
    8      7 Integer             3
    8      8 Semicolon
    9      5 Identifier      p
    9      6 Op_assign
    9      7 Integer             1
    9      8 Semicolon
   10      5 Identifier      n
   10      6 Op_assign
   10      7 Identifier      n
   10      8 Op_add
   10      9 Integer             2
   10     10 Semicolon
   11      5 Keyword_while
   11     11 LeftParen
   11     12 LeftParen
   11     13 Identifier      k
   11     14 Op_multiply
   11     15 Identifier      k
   11     16 Op_lessequal
   11     18 Identifier      n
   11     19 RightParen
   11     21 Op_and
   11     24 LeftParen
   11     25 Identifier      p
   11     26 RightParen
   11     27 RightParen
   11     29 LeftBrace
   12      9 Identifier      p
   12     10 Op_assign
   12     11 Identifier      n
   12     12 Op_divide
   12     13 Identifier      k
   12     14 Op_multiply
   12     15 Identifier      k
   12     16 Op_notequal
   12     18 Identifier      n
   12     19 Semicolon
   13      9 Identifier      k
   13     10 Op_assign
   13     11 Identifier      k
   13     12 Op_add
   13     13 Integer             2
   13     14 Semicolon
   14      5 RightBrace
   15      5 Keyword_if
   15      8 LeftParen
   15      9 Identifier      p
   15     10 RightParen
   15     12 LeftBrace
   16      9 Keyword_print
   16     14 LeftParen
   16     15 Identifier      n
   16     16 Comma
   16     18 String          " is prime\n"
   16     31 RightParen
   16     32 Semicolon
   17      9 Identifier      count
   17     15 Op_assign
   17     17 Identifier      count
   17     23 Op_add
   17     25 Integer             1
   17     26 Semicolon
   18      5 RightBrace
   19      1 RightBrace
   20      1 Keyword_print
   20      6 LeftParen
   20      7 String          "Total primes found: "
   20     29 Comma
   20     31 Identifier      count
   20     36 Comma
   20     38 String          "\n"
   20     42 RightParen
   20     43 Semicolon
   21      1 End_of_input
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
Assign
Identifier    n
Integer       1
Assign
Identifier    limit
Integer       100
While
Less
Identifier    n
Identifier    limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    k
Integer       3
Assign
Identifier    p
Integer       1
Assign
Identifier    n
Add
Identifier    n
Integer       2
While
And
LessEqual
Multiply
Identifier    k
Identifier    k
Identifier    n
Identifier    p
Sequence
Sequence
;
Assign
Identifier    p
NotEqual
Multiply
Divide
Identifier    n
Identifier    k
Identifier    k
Identifier    n
Assign
Identifier    k
Add
Identifier    k
Integer       2
If
Identifier    p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier    n
;
Prts
String        " is prime\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
;
Sequence
Sequence
Sequence
;
Prts
String        "Total primes found: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;
Additional examples

Your solution should pass all the test cases above and the additional tests found Here.

Reference

The C and Python versions can be considered reference implementations.

Related Tasks

ALGOL W

<lang algolw>begin % syntax analyser %

   % parse tree nodes %
   record node( integer         type
              ; reference(node) left, right
              ; integer         iValue % nString/nIndentifier number or nInteger value %
              );
   integer     nIdentifier, nString, nInteger, nSequence, nIf,   nPrtc, nPrts
         ,     nPrti,       nWhile,  nAssign,  nNegate,   nNot,  nMultiply
         ,     nDivide,     nMod,    nAdd,     nSubtract, nLess, nLessEqual
         ,     nGreater,    nGreaterEqual,     nEqual,    nNotEqual,    nAnd, nOr
         ;
   string(14) array ndName ( 1 :: 25 );
   % tokens - names must match those output by the lexical analyser %
   integer     tkType, tkLine, tkColumn, tkLength, tkIntegerValue;
   integer     tOp_multiply   , tOp_divide        , tOp_mod       , tOp_add
         ,     tOp_subtract   , tOp_negate        , tOp_less      , tOp_lessequal
         ,     tOp_greater    , tOp_greaterequal  , tOp_equal     , tOp_notequal
         ,     tOp_not        , tOp_assign        , tOp_and       , tOp_or
         ,     tLeftParen     , tRightParen       , tLeftBrace    , tRightBrace
         ,     tSemicolon     , tComma            , tKeyword_if   , tKeyword_else
         ,     tKeyword_while , tKeyword_print    , tKeyword_putc , tIdentifier
         ,     tInteger       , tString           , tEnd_of_input
         ,     MAX_TOKEN_TYPE, PRIMARY_PREC
         ;
   string(16)  array tkName         ( 1 :: 31 );
   integer     array tkPrec, tkNode ( 1 :: 31 );
   % string literals and identifiers - uses a linked list - a hash table might be better... %
   string(1)   array text ( 0 :: 4095 );
   integer     textNext, TEXT_MAX;
   record textElement ( integer start, length; reference(textElement) next );
   reference(textElement) idList, stList;
   % returns a new node with left and right branches %
   reference(node) procedure opNode ( integer value opType; reference(node) value opLeft, opRight ) ; begin
       node( opType, opLeft, opRight, 0 )
   end opNode ;
   % returns a new operand node %
   reference(node) procedure operandNode ( integer value opType, opValue ) ; begin
       node( opType, null, null, opValue )
   end operandNode ;
   % reports an error %
   procedure synError( integer value line, column; string(80) value message ); begin
       integer errorPos;
       write( i_w := 1, s_w := 0, "**** Error at(", line, ",", column, "): " );
       errorPos := 0;
       while errorPos < 80 and message( errorPos // 1 ) not = "." do begin
           writeon( s_w := 0, message( errorPos // 1 ) );
           errorPos := errorPos + 1
       end while_not_at_end_of_message ;
       writeon( s_w := 0, "." )
   end synError ;
   % reports an error and stops %
   procedure fatalError( integer value line, column; string(80) value message ); begin
       synError( line, column, message );
       assert( false )
   end fatalError ;
   % prints a node and its sub-nodes %
   procedure writeNode( reference(node) value n ) ; begin
       % prints an identifier or string from text %
       procedure writeOnText( reference(textElement) value txHead; integer value txNumber ) ;
       begin
           reference(textElement) txPos;
           integer                count;
           txPos := txHead;
           count := 1;
           while count < txNumber and txPos not = null do begin
               txPos := next(txPos);
               count := count + 1
           end while_text_element_not_found ;
           if txPos = null then fatalError( 0, txNumber, "INTERNAL ERROR: text not found." )
           else for cPos := 0 until length(txPos) - 1 do writeon( text( start(txPos) + cPos ) );
           if text( start(txPos) ) = """" then writeon( """" );
       end writeOnText ;
       if n = null then write( ";" )
       else begin
           write( ndName( type(n) ) );
           if      type(n) = nInteger    then writeon( iValue(n) )
           else if type(n) = nIdentifier then writeOnText( idList, iValue(n) )
           else if type(n) = nString     then writeOnText( stList, iValue(n) )
           else begin
               writeNode(  left(n) );
               writeNode( right(n) )
           end
       end
   end writeNode ;
   % reads a token from standard input %
   procedure readToken ; begin
       % parses a string from line and stores it in a string in the text array %
       % - if it is not already present in the specified textElement list.     %
       % returns the position of the string in the text array                  %
       integer procedure readString ( reference(textElement) value result txList; string(1) value terminator ) ; begin
           string(256) str;
           integer     sLen, sPos, ePos;
           logical     found;
           reference(textElement) txPos, txLastPos;
           % get the text of the string %
           str  := " ";
           sLen := 0;
           str( sLen // 1 ) := line( lPos // 1 );
           sLen := sLen + 1;
           lPos := lPos + 1;
           while lPos <= 255 and line( lPos // 1 ) not = terminator do begin
               str( sLen // 1 ) := line( lPos // 1 );
               sLen := sLen + 1;
               lPos := lPos + 1
           end while_more_string ;
           if lPos > 255 then fatalError( tkLine, tkColumn, "Unterminated String in token file." );
           % attempt to find the text in the list of strings/identifiers %
           txLastPos := txPos := txList;
           found := false;
           ePos := 0;
           while not found and txPos not = null do begin
               ePos  := ePos + 1;
               found := ( length(txPos) = sLen );
               sPos  := 0;
               while found and sPos < sLen do begin
                   found := str( sPos // 1 ) = text( start(txPos) + sPos );
                   sPos  := sPos + 1
               end while_not_found ;
               txLastPos := txPos;
               if not found then txPos := next(txPos)
           end while_string_not_found ;
           if not found then begin
               % the string/identifier is not in the list - add it %
               ePos := ePos + 1;
               if txList = null then txList := textElement( textNext, sLen, null )
                                else next(txLastPos) := textElement( textNext, sLen, null );
               if textNext + sLen > TEXT_MAX then fatalError( tkLine, tkColumn, "Text space exhausted." )
               else begin
                   for cPos := 0 until sLen - 1 do begin
                       text( textNext ) := str( cPos // 1 );
                       textNext := textNext + 1
                   end for_cPos
               end
           end if_not_found ;
           ePos
       end readString ;
       % gets an integer from the line - no checks for valid digits %
       integer procedure readInteger ; begin
           integer n;
           while line( lPos // 1 ) = " " do lPos := lPos + 1;
           n := 0;
           while line( lPos // 1 ) not = " " do begin
               n    := ( n * 10 ) + ( decode( line( lPos // 1 ) ) - decode( "0" ) );
               lPos := lPos + 1
           end while_not_end_of_integer ;
           n
       end readInteger ;
       string(256) line;
       string(16)  name;
       integer     lPos, tPos;
       tPos := lPos := 0;
       readcard( line );
       % get the line and column numbers %
       tkLine   := readInteger;
       tkColumn := readInteger;
       % get the token name %
       while line( lPos // 1 ) = " " do lPos := lPos + 1;
       name := "";
       while lPos < 256 and line( lPos // 1 ) not = " " do begin
           name( tPos // 1 ) := line( lPos // 1 );
           lPos := lPos + 1;
           tPos := tPos + 1
       end  while_more_name ;
       % determine the token type %
       tkType         := 1;
       tkIntegerValue := 0;
       while tkType <= MAX_TOKEN_TYPE and name not = tkName( tkType ) do tkType := tkType + 1;
       if tkType > MAX_TOKEN_TYPE then fatalError( tkLine, tkColumn, "Malformed token" );
       % handle the additional parameter for identifier/string/integer %
       if tkType = tInteger or tkType = tIdentifier or tkType = tString then begin
           while line( lPos // 1 ) = " " do lPos := lPos + 1;
           if      tkType = tInteger    then tkIntegerValue := readInteger
           else if tkType = tIdentifier then tkIntegerValue := readString( idList, " "  )
           else  % tkType = tString     %    tkIntegerValue := readString( stList, """" )
       end if_token_with_additional_parameter ;
   end readToken ;
   % parses a statement %
   reference(node) procedure parseStatement ; begin
       reference(node) stmtNode, stmtExpr;
       % skips the current token if it is expectedToken,              %
       % returns true if the token was expectedToken, false otherwise %
       logical procedure have ( integer value expectedToken ) ; begin
           logical haveExpectedToken;
           haveExpectedToken := ( tkType = expectedToken );
           if haveExpectedToken and tkType not = tEnd_of_input then readToken;
           haveExpectedToken
       end have ;
       % issues an error message and skips past the next semi-colon or to end of input %
       procedure skipStatement ( string(80) value message ) ; begin
           synError( tkLine, tkColumn, message );
           while tkType not = tEnd_of_input and not have( tSemicolon ) do readToken
       end skipStatement ;
       % checks we have a semicolon, issues an error and skips the statement if not %
       procedure mustBeEndOfStatement ; begin
           if not have( tSemicolon ) then skipStatement( """;"" expected." )
       end mustBeEndOfStatement ;
       % skips the current token if it is "(" and issues an error if it isn't %
       procedure mustBeLeftParen ; begin
           if not have( tLeftParen ) then synError( tkLine, tkColumn, """("" expected." )
       end % mustBeLeftParen % ;
       % skips the current token if it is ")" and issues an error if it isn't %
       procedure mustBeRightParen ; begin
           if not have( tRightParen ) then synError( tkLine, tkColumn, """)"" expected." )
       end % mustBeRightParen % ;
       % gets the next token and parses an expression with the specified precedence %
       reference(node) procedure nextAndparseExpr ( integer value precedence ) ; begin
           readToken;
           parseExpr( precedence )
       end nextAndParseExpr ;
       % parses an expression with the specified precedence %
       % all operators are assumed to be left-associative %
       reference(node) procedure parseExpr ( integer value precedence ) ; begin
           % handles a single token primary %
           reference(node) procedure simplePrimary ( integer value primaryNodeType ) ; begin
               reference(node) primaryNode;
               primaryNode := operandNode( primaryNodeType, tkIntegerValue );
               readToken;
               primaryNode
           end simplePrimary ;
           reference(node) exprNode;
           if precedence < PRIMARY_PREC  then begin
               exprNode := parseExpr( precedence + 1 );
               while tkPrec( tkType ) = precedence do begin
                   integer op;
                   op := tkNode( tkType );
                   exprNode := opNode( op, exprNode, nextAndParseExpr( precedence + 1 ) )
               end while_op_at_this_precedence_level
               end
           else if tkType = tIdentifier  then exprNode := simplePrimary( nIdentifier )
           else if tkType = tInteger     then exprNode := simplePrimary( nInteger    )
           else if tkType = nString      then begin
               synError( tkLine, tkColumn, "Unexpected string literal." );
               exprNode := simplePrimary( nInteger )
               end
           else if tkType = tLeftParen   then exprNode := parseParenExpr
           else if tkType = tOp_add      then exprNode := nextAndParseExpr( precedence )
           else if tkType = tOp_subtract then exprNode := opNode( nNegate, nextAndParseExpr( precedence ), null )
           else if tkType = tOp_not      then exprNode := opNode( nNot,    nextAndParseExpr( precedence ), null )
           else begin
               synError( tkLine, tkColumn, "Syntax error in expression." );
               exprNode := simplePrimary( nInteger )
           end;
           exprNode
       end parseExpr ;
       % parses a preenthesised expression %
       reference(node) procedure parseParenExpr ; begin
           reference(node) exprNode;
           mustBeLeftParen;
           exprNode := parseExpr( 0 );
           mustBeRightParen;
           exprNode
       end parseParenExpr ;
       % parse statement depending on it's first token %
       if      tkType = tIdentifier then begin % assignment statement %
           stmtExpr := operandNode( nIdentifier, tkIntegerValue );
           % skip the identifier and check for "=" %
           readToken;
           if not have( tOp_Assign ) then synError( tkLine, tkColumn, "Expected ""="" in assignment statement." );
           stmtNode := opNode( nAssign, stmtExpr, parseExpr( 0 ) );
           mustBeEndOfStatement
           end
       else if have( tKeyword_while ) then begin
           stmtExpr := parseParenExpr;
           stmtNode := opNode( nWhile, stmtExpr, parseStatement )
           end        
       else if have( tkeyword_if ) then begin
           stmtExpr := parseParenExpr;
           stmtNode := opNode( nIf, stmtExpr, opNode( nIf, parseStatement, null ) );
           if have( tKeyword_else ) then % have an "else" part % right(right(stmtNode)) := parseStatement
           end
       else if have( tKeyword_Print ) then begin
           mustBeLeftParen;
           stmtNode := null;
           while begin
               if tkType = tString then begin
                   stmtNode  := opNode( nSequence, stmtNode, opNode( nPrts, operandNode( nString, tkIntegerValue ), null ) );
                   readToken
                   end
               else stmtNode := opNode( nSequence, stmtNode, opNode( nPrti, parseExpr( 0 ), null ) );
               have( tComma )
           end do begin end;
           mustBeRightparen;
           mustBeEndOfStatement;
           end
       else if have( tKeyword_Putc ) then begin
           stmtNode := opNode( nPrtc, parseParenExpr, null );
           mustBeEndOfStatement
           end
       else if have( tLeftBrace ) then begin % block %
           stmtNode := parseStatementList( tRightBrace );
           if not have( tRightBrace ) then synError( tkLine, tkColumn, "Expected ""}""." );
           end
       else if have( tSemicolon ) then stmtNode := null
       else begin % unrecognised statement %
           skipStatement( "Unrecognised statement." );
           stmtNode := null
       end if_various_tokens ;
       stmtNode
   end parseStatement ;
   % parses a statement list ending with the specified terminator %
   reference(node) procedure parseStatementList ( integer value terminator ) ; begin
       reference(node) listNode;
       listNode := null;
       while tkType not = terminator and tkType not = tEnd_of_input do listNode := opNode( nSequence, listNode, parseStatement );
       listNode
   end parseStatementList ;
   nIdentifier      :=  1; ndName( nIdentifier      ) := "Identifier";   nString    :=  2; ndName( nString   ) := "String";
   nInteger         :=  3; ndName( nInteger         ) := "Integer";      nSequence  :=  4; ndName( nSequence ) := "Sequence";
   nIf              :=  5; ndName( nIf              ) := "If";           nPrtc      :=  6; ndName( nPrtc     ) := "Prtc";
   nPrts            :=  7; ndName( nPrts            ) := "Prts";         nPrti      :=  8; ndName( nPrti     ) := "Prti";
   nWhile           :=  9; ndName( nWhile           ) := "While";        nAssign    := 10; ndName( nAssign   ) := "Assign";
   nNegate          := 11; ndName( nNegate          ) := "Negate";       nNot       := 12; ndName( nNot      ) := "Not";
   nMultiply        := 13; ndName( nMultiply        ) := "Multiply";     nDivide    := 14; ndName( nDivide   ) := "Divide";
   nMod             := 15; ndName( nMod             ) := "Mod";          nAdd       := 16; ndName( nAdd      ) := "Add";
   nSubtract        := 17; ndName( nSubtract        ) := "Subtract";     nLess      := 18; ndName( nLess     ) := "Less";
   nLessEqual       := 19; ndName( nLessEqual       ) := "LessEqual"  ;  nGreater   := 20; ndName( nGreater  ) := "Greater";
   nGreaterEqual    := 21; ndName( nGreaterEqual    ) := "GreaterEqual"; nEqual     := 22; ndName( nEqual    ) := "Equal";
   nNotEqual        := 23; ndName( nNotEqual        ) := "NotEqual";     nAnd       := 24; ndName( nAnd      ) := "And";
   nOr              := 25; ndName( nOr              ) := "Or";
   tOp_multiply     :=  1; tkName( tOp_multiply     ) := "Op_multiply";     tkPrec( tOp_multiply     ) :=  5;
   tOp_divide       :=  2; tkName( tOp_divide       ) := "Op_divide";       tkPrec( tOp_divide       ) :=  5;
   tOp_mod          :=  3; tkName( tOp_mod          ) := "Op_mod";          tkPrec( tOp_mod          ) :=  5;
   tOp_add          :=  4; tkName( tOp_add          ) := "Op_add";          tkPrec( tOp_add          ) :=  4;
   tOp_subtract     :=  5; tkName( tOp_subtract     ) := "Op_subtract";     tkPrec( tOp_subtract     ) :=  4;
   tOp_negate       :=  6; tkName( tOp_negate       ) := "Op_negate";       tkPrec( tOp_negate       ) := -1;
   tOp_less         :=  7; tkName( tOp_less         ) := "Op_less";         tkPrec( tOp_less         ) :=  3;
   tOp_lessequal    :=  8; tkName( tOp_lessequal    ) := "Op_lessequal";    tkPrec( tOp_lessequal    ) :=  3;
   tOp_greater      :=  9; tkName( tOp_greater      ) := "Op_greater";      tkPrec( tOp_greater      ) :=  3;
   tOp_greaterequal := 10; tkName( tOp_greaterequal ) := "Op_greaterequal"; tkPrec( tOp_greaterequal ) :=  3;
   tOp_equal        := 11; tkName( tOp_equal        ) := "Op_equal";        tkPrec( tOp_equal        ) :=  2;
   tOp_notequal     := 12; tkName( tOp_notequal     ) := "Op_notequal";     tkPrec( tOp_notequal     ) :=  2;
   tOp_not          := 13; tkName( tOp_not          ) := "Op_not";          tkPrec( tOp_not          ) := -1;
   tOp_assign       := 14; tkName( tOp_assign       ) := "Op_assign";       tkPrec( tOp_assign       ) := -1;
   tOp_and          := 15; tkName( tOp_and          ) := "Op_and";          tkPrec( tOp_and          ) :=  1;
   tOp_or           := 16; tkName( tOp_or           ) := "Op_or";           tkPrec( tOp_or           ) :=  0;
   tLeftParen       := 17; tkName( tLeftParen       ) := "LeftParen";       tkPrec( tLeftParen       ) := -1;
   tRightParen      := 18; tkName( tRightParen      ) := "RightParen";      tkPrec( tRightParen      ) := -1;
   tLeftBrace       := 19; tkName( tLeftBrace       ) := "LeftBrace";       tkPrec( tLeftBrace       ) := -1;
   tRightBrace      := 20; tkName( tRightBrace      ) := "RightBrace";      tkPrec( tRightBrace      ) := -1;
   tSemicolon       := 21; tkName( tSemicolon       ) := "Semicolon";       tkPrec( tSemicolon       ) := -1;
   tComma           := 22; tkName( tComma           ) := "Comma";           tkPrec( tComma           ) := -1;
   tKeyword_if      := 23; tkName( tKeyword_if      ) := "Keyword_if";      tkPrec( tKeyword_if      ) := -1;
   tKeyword_else    := 24; tkName( tKeyword_else    ) := "Keyword_else";    tkPrec( tKeyword_else    ) := -1;
   tKeyword_while   := 25; tkName( tKeyword_while   ) := "Keyword_while";   tkPrec( tKeyword_while   ) := -1;
   tKeyword_print   := 26; tkName( tKeyword_print   ) := "Keyword_print";   tkPrec( tKeyword_print   ) := -1;
   tKeyword_putc    := 27; tkName( tKeyword_putc    ) := "Keyword_putc";    tkPrec( tKeyword_putc    ) := -1;
   tIdentifier      := 28; tkName( tIdentifier      ) := "Identifier";      tkPrec( tIdentifier      ) := -1;
   tInteger         := 29; tkName( tInteger         ) := "Integer";         tkPrec( tInteger         ) := -1;
   tString          := 30; tkName( tString          ) := "String";          tkPrec( tString          ) := -1;
   tEnd_of_input    := 31; tkName( tEnd_of_input    ) := "End_of_input";    tkPrec( tEnd_of_input    ) := -1;
   MAX_TOKEN_TYPE   := 31; TEXT_MAX := 4095; textNext := 0; PRIMARY_PREC := 6;
   for tkPos := 1 until MAX_TOKEN_TYPE do tkNode( tkPos ) := - tkPos;
   tkNode( tOp_multiply     ) := nMultiply;  tkNode( tOp_divide   ) := nDivide;   tkNode( tOp_mod          ) := nMod;
   tkNode( tOp_add          ) := nAdd;       tkNode( tOp_subtract ) := nSubtract; tkNode( tOp_less         ) := nLess;
   tkNode( tOp_lessequal    ) := nLessEqual; tkNode( tOp_greater  ) := nGreater;  tkNode( tOp_greaterequal ) := nGreaterEqual;
   tkNode( tOp_equal        ) := nEqual;     tkNode( tOp_notequal ) := nNotEqual; tkNode( tOp_not          ) := nNot;
   tkNode( tOp_and          ) := nAnd;       tkNode( tOp_or       ) := nOr;
   stList := idList := null;
   % parse the output from the lexical analyser and output the linearised parse tree %
   readToken;
   writeNode( parseStatementList( tEnd_of_input ) )

end.</lang>

Output:

Output from parsing the Prime Numbers example program.

Sequence      
Sequence      
Sequence      
Sequence      
Sequence      
;
Assign        
Identifier    count
Integer                    1  
Assign        
Identifier    n
Integer                    1  
Assign        
Identifier    limit
Integer                  100  
While         
Less          
Identifier    n
Identifier    limit
Sequence      
Sequence      
Sequence      
Sequence      
Sequence      
;
Assign        
Identifier    k
Integer                    3  
Assign        
Identifier    p
Integer                    1  
Assign        
Identifier    n
Add           
Identifier    n
Integer                    2  
While         
And           
LessEqual     
Multiply      
Identifier    k
Identifier    k
Identifier    n
Identifier    p
Sequence      
Sequence      
;
Assign        
Identifier    p
NotEqual      
Multiply      
Divide        
Identifier    n
Identifier    k
Identifier    k
Identifier    n
Assign        
Identifier    k
Add           
Identifier    k
Integer                    2  
If            
Identifier    p
If            
Sequence      
Sequence      
;
Sequence      
Sequence      
;
Prti          
Identifier    n
;
Prts          
String        " is prime\n"
;
Assign        
Identifier    count
Add           
Identifier    count
Integer                    1  
;
Sequence      
Sequence      
Sequence      
;
Prts          
String        "Total primes found: "
;
Prti          
Identifier    count
;
Prts          
String        "\n"
;

ATS

<lang ATS>(********************************************************************) (* Usage: parse [INPUTFILE [OUTPUTFILE]]

  If INPUTFILE or OUTPUTFILE is "-" or missing, then standard input
  or standard output is used, respectively. *)
  1. include "share/atspre_staload.hats"

staload UN = "prelude/SATS/unsafe.sats"

  1. define NIL list_nil ()
  2. define :: list_cons

%{^ /* alloca(3) is needed for ATS exceptions. */

  1. include <alloca.h>

%}

(********************************************************************)

  1. define NUM_TOKENS 31
  1. define TOKEN_ELSE 0
  2. define TOKEN_IF 1
  3. define TOKEN_PRINT 2
  4. define TOKEN_PUTC 3
  5. define TOKEN_WHILE 4
  6. define TOKEN_MULTIPLY 5
  7. define TOKEN_DIVIDE 6
  8. define TOKEN_MOD 7
  9. define TOKEN_ADD 8
  10. define TOKEN_SUBTRACT 9
  11. define TOKEN_NEGATE 10
  12. define TOKEN_LESS 11
  13. define TOKEN_LESSEQUAL 12
  14. define TOKEN_GREATER 13
  15. define TOKEN_GREATEREQUAL 14
  16. define TOKEN_EQUAL 15
  17. define TOKEN_NOTEQUAL 16
  18. define TOKEN_NOT 17
  19. define TOKEN_ASSIGN 18
  20. define TOKEN_AND 19
  21. define TOKEN_OR 20
  22. define TOKEN_LEFTPAREN 21
  23. define TOKEN_RIGHTPAREN 22
  24. define TOKEN_LEFTBRACE 23
  25. define TOKEN_RIGHTBRACE 24
  26. define TOKEN_SEMICOLON 25
  27. define TOKEN_COMMA 26
  28. define TOKEN_IDENTIFIER 27
  29. define TOKEN_INTEGER 28
  30. define TOKEN_STRING 29
  31. define TOKEN_END_OF_INPUT 30

typedef token_t =

 [i : int | TOKEN_ELSE <= i; i <= TOKEN_END_OF_INPUT] 
 int i

typedef tokentuple_t = (token_t, String, ullint, ullint)

fn token_text (tok : token_t) : String =

 case+ tok of
 | TOKEN_ELSE          => "else"
 | TOKEN_IF            => "if"
 | TOKEN_PRINT         => "print"
 | TOKEN_PUTC          => "putc"
 | TOKEN_WHILE         => "while"
 | TOKEN_MULTIPLY      => "*"
 | TOKEN_DIVIDE        => "/"
 | TOKEN_MOD           => "%"
 | TOKEN_ADD           => "+"
 | TOKEN_SUBTRACT      => "-"
 | TOKEN_NEGATE        => "-"
 | TOKEN_LESS          => "<"
 | TOKEN_LESSEQUAL     => "<="
 | TOKEN_GREATER       => ">"
 | TOKEN_GREATEREQUAL  => ">="
 | TOKEN_EQUAL         => "=="
 | TOKEN_NOTEQUAL      => "!="
 | TOKEN_NOT           => "!"
 | TOKEN_ASSIGN        => "="
 | TOKEN_AND           => "&&"
 | TOKEN_OR            => "||"
 | TOKEN_LEFTPAREN     => "("
 | TOKEN_RIGHTPAREN    => ")"
 | TOKEN_LEFTBRACE     => "{"
 | TOKEN_RIGHTBRACE    => "}"
 | TOKEN_SEMICOLON     => ";"
 | TOKEN_COMMA         => ","
 | TOKEN_IDENTIFIER    => "Ident"
 | TOKEN_INTEGER       => "Integer literal"
 | TOKEN_STRING        => "String literal"
 | TOKEN_END_OF_INPUT  => "EOI"

(********************************************************************) (* A perfect hash for the lexical token names.

  This hash was generated by GNU gperf and then translated to
  reasonable ATS by hand. Note, though, that one could have embedded
  the generated C code directly and used it. *)
  1. define MIN_WORD_LENGTH 5
  2. define MAX_WORD_LENGTH 15
  3. define MIN_HASH_VALUE 5
  4. define MAX_HASH_VALUE 64
  5. define HASH_TABLE_SIZE 65

local

 extern castfn u : {n : nat | n < 256} int n -<> uint8 n

in

 vtypedef asso_values_vt = @[[n : nat | n < 256] uint8 n][256]
 var asso_values =
   @[[n : nat | n < 256] uint8 n][256]
     (u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 10, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u  0, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u  0, u 65, u 25,
      u  5, u  5, u  0, u 15, u 65, u  0, u 65, u 65, u 10, u 65,
      u 30, u  0, u 65, u  5, u 10, u 10, u  0, u 15, u 65, u 65,
      u 65, u  5, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
      u 65, u 65, u 65, u 65, u 65, u 65)

end

fn get_asso_value {i : nat | i < 256}

              (i : uint i) :<>
   [n : nat | n < 256] uint n =
 let
   extern castfn u8ui : {n : nat} uint8 n -<> uint n
   extern castfn mk_asso_values :<>
     {p : addr} ptr p -<> (asso_values_vt @ p | ptr p)
   val asso_values_tup = mk_asso_values (addr@ asso_values)
   macdef asso_values = !(asso_values_tup.1)
   val retval = asso_values[i]
   val _ = $UN.castvwtp0{void} asso_values_tup
 in
   u8ui retval
 end

fn hash {n : int | MIN_WORD_LENGTH <= n; n <= MAX_WORD_LENGTH}

    (str : string n,
     len : size_t n) :<>
   [key : nat] uint key =
 let
   extern castfn uc2ui : {n : nat} uchar n -<> uint n
   val c1 = uc2ui (c2uc str[4])
   val c2 = uc2ui (c2uc str[pred len])
 in
   sz2u len + get_asso_value c1 + get_asso_value c2
 end

typedef wordlist_vt = @[(String, token_t)][HASH_TABLE_SIZE]

var wordlist =

 @[(String, token_t)][HASH_TABLE_SIZE]
   (("", 0), ("", 0), ("", 0), ("", 0), ("", 0),
    ("Comma", 26),
    ("Op_not", 17),
    ("", 0), ("", 0), ("", 0),
    ("Keyword_if", 1),
    ("Op_mod", 7),
    ("End_of_input", 30),
    ("Keyword_print", 2),
    ("Op_divide", 6),
    ("RightBrace", 24),
    ("Op_add", 8),
    ("Keyword_else", 0),
    ("Keyword_while", 4),
    ("Op_negate", 10),
    ("Identifier", 27),
    ("Op_notequal", 16),
    ("Op_less", 11),
    ("Op_equal", 15),
    ("LeftBrace", 23),
    ("Op_or", 20),
    ("Op_subtract", 9),
    ("Op_lessequal", 12),
    ("", 0), ("", 0),
    ("Op_greater", 13),
    ("Op_multiply", 5 ),
    ("Integer", 28),
    ("", 0), ("", 0),
    ("Op_greaterequal", 14),
    ("", 0),
    ("Keyword_putc", 3),
    ("", 0),
    ("LeftParen", 21),
    ("RightParen", 22),
    ("Op_and", 19),
    ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0),
    ("Op_assign", 18),
    ("", 0),
    ("String", 29),
    ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0),
    ("", 0), ("", 0), ("", 0), ("", 0), ("", 0),
    ("Semicolon", 25))

fn get_wordlist_entry

         {n   : nat | n <= MAX_HASH_VALUE}
         (key : uint n) :<> (String, token_t) =
 let
   extern castfn mk_wordlist_tup :<>
     {p : addr} ptr p -<> (wordlist_vt @ p | ptr p)
   val wordlist_tup = mk_wordlist_tup (addr@ wordlist)
   macdef wordlist = !(wordlist_tup.1)
   val retval = wordlist[key]
   val _ = $UN.castvwtp0{void} wordlist_tup
 in
   retval
 end

fn string2token_t_opt

         {n   : int}
         (str : string n) :<>
   Option token_t =
 let
   val len = string_length str
 in
   if len < i2sz MIN_WORD_LENGTH then
     None ()
   else if i2sz MAX_WORD_LENGTH < len then
     None ()
   else
     let
       val key = hash (str, len)
     in
       if i2u MAX_HASH_VALUE < key then
         None ()
       else
         let
           val (s, tok) = get_wordlist_entry (key)
         in
           if str <> s then
             None ()
           else
             Some tok
         end
     end
 end

(********************************************************************)

exception bad_lex_integer of (String) exception bad_lex_token_name of (String) exception bad_string_literal of (String)

extern fun {} skip_something$pred : char -<> bool fn {} skip_something {n : nat}

              {i : nat | i <= n}
              (s : string n,
               n : size_t n,
               i : size_t i) :<>
   [j : nat | i <= j; j <= n]
   size_t j =
 let
   fun
   loop {k : nat | i <= k; k <= n} .<n - k>.
        (k : size_t k) :<>
       [j : nat | i <= j; j <= n]
       size_t j =
     if k = n then
       k
     else if ~(skip_something$pred<> s[k]) then
       k
     else
       loop (succ k)
 in
   loop i
 end

fn skip_space {n : nat}

          {i : nat | i <= n}
          (s : string n,
           n : size_t n,
           i : size_t i) :<>
   [j : nat | i <= j; j <= n]
   size_t j =
 let
   implement skip_something$pred<> (c) = isspace c
 in
   skip_something (s, n, i)
 end

fn skip_nonspace {n : nat}

             {i : nat | i <= n}
             (s : string n,
              n : size_t n,
              i : size_t i) :<>
   [j : nat | i <= j; j <= n]
   size_t j =
 let
   implement skip_something$pred<> (c) = ~isspace c
 in
   skip_something (s, n, i)
 end

fn skip_nonquote {n : nat}

             {i : nat | i <= n}
             (s : string n,
              n : size_t n,
              i : size_t i) :<>
   [j : nat | i <= j; j <= n]
   size_t j =
 let
   implement skip_something$pred<> (c) = c <> '"'
 in
   skip_something (s, n, i)
 end

fn skip_string_literal

         {n : nat}
         {i : nat | i <= n}
         (s : string n,
          n : size_t n,
          i : size_t i) :<>
   [j : nat | i <= j; j <= n]
   size_t j =
 if i = n then
   i
 else if s[i] <> '"' then
   i
 else
   let
     val j = skip_nonquote (s, n, succ i)
   in
     if j = n then
       i
     else
       succ j
   end

fn get_substr {n, i, j : nat | i <= j; j <= n}

          (s : string n,
           i : size_t i,
           j : size_t j) :
   [m : int | m == j - i] string m =
 let
   val s = string_make_substring (s, i, j - i)
 in
   strnptr2string s
 end

fn string2ullint

         {n : nat}
         (s : string n) : ullint =
 let
   val n = string_length s
 in
   if n = i2sz 0 then
     $raise bad_lex_integer ("")
   else
     let
       extern castfn u2ull : uint -<> ullint
       fun
       evaluate {k : nat | k <= n} .<n - k>.
                (k : size_t k,
                 v : ullint) : ullint =
         if k = n then
           v
         else if ~isdigit s[k] then
           $raise bad_lex_integer (s)
         else
           let
             val d = char2ui s[k] - char2ui '0'
           in
             evaluate (succ k, (10ULL * v) + u2ull d)
           end
     in
       evaluate (i2sz 0, 0ULL)
     end
 end

fn string2token {n  : int}

            (str : string n) : token_t =
 case+ string2token_t_opt str of
 | None () => $raise bad_lex_token_name (str)
 | Some tok => tok

fn read_lex_file (inpf : FILEref) : List0 tokentuple_t =

 (* Convert the output of "lex" to a list of tokens. *)
 (* This routine could stand to do more validation of the input. *)
 let
   fun
   loop (lst : List0 tokentuple_t) : List0 tokentuple_t =
     if fileref_is_eof inpf then
       lst
     else
       let
         val s = strptr2string (fileref_get_line_string inpf)
         val n = string_length s
         prval _ = lemma_g1uint_param n
         val i0_line_no = skip_space (s, n, i2sz 0)
       in
         if i0_line_no = n then
           (* Skip any blank lines, including end of file. *)
           loop lst
         else
           let
             val i1_line_no = skip_nonspace (s, n, i0_line_no)
             val s_line_no = get_substr (s, i0_line_no, i1_line_no)
             val line_no = string2ullint s_line_no
             val i0_column_no = skip_space (s, n, i1_line_no)
             val i1_column_no = skip_nonspace (s, n, i0_column_no)
             val s_column_no = get_substr (s, i0_column_no,
                                           i1_column_no)
             val column_no = string2ullint s_column_no
             val i0_tokname = skip_space (s, n, i1_column_no)
             val i1_tokname = skip_nonspace (s, n, i0_tokname)
             val tokname = get_substr (s, i0_tokname, i1_tokname)
             val tok = string2token tokname
           in
             case+ tok of
             | TOKEN_INTEGER =>
               let
                 val i0 = skip_space (s, n, i1_tokname)
                 val i1 = skip_nonspace (s, n, i0)
                 val arg = get_substr (s, i0, i1)
                 val toktup = (tok, arg, line_no, column_no)
               in
                 loop (toktup :: lst)
               end
             | TOKEN_IDENTIFIER =>
               let
                 val i0 = skip_space (s, n, i1_tokname)
                 val i1 = skip_nonspace (s, n, i0)
                 val arg = get_substr (s, i0, i1)
                 val toktup = (tok, arg, line_no, column_no)
               in
                 loop (toktup :: lst)
               end
             | TOKEN_STRING =>
               let
                 val i0 = skip_space (s, n, i1_tokname)
                 val i1 = skip_string_literal (s, n, i0)
                 val arg = get_substr (s, i0, i1)
                 val toktup = (tok, arg, line_no, column_no)
               in
                 loop (toktup :: lst)
               end
             | _ =>
               let
                 val toktup = (tok, "", line_no, column_no)
               in
                 loop (toktup :: lst)
               end
           end
       end
 in
   list_vt2t (list_reverse (loop NIL))
 end

(********************************************************************)

exception truncated_lexical of () exception unexpected_token of (tokentuple_t, token_t) exception unexpected_primary of (tokentuple_t) exception unterminated_statement_block of (ullint, ullint) exception expected_a_statement of (tokentuple_t)

datatype node_t = | node_t_nil of () | node_t_leaf of (String, String) | node_t_cons of (String, node_t, node_t)

fn right_assoc (tok : token_t) : bool =

 (* None of the currently supported operators is right
    associative. *)
 false

fn binary_op (tok : token_t) : bool =

 case+ tok of
 | TOKEN_ADD => true
 | TOKEN_SUBTRACT => true
 | TOKEN_MULTIPLY => true
 | TOKEN_DIVIDE => true
 | TOKEN_MOD => true
 | TOKEN_LESS => true
 | TOKEN_LESSEQUAL => true
 | TOKEN_GREATER => true
 | TOKEN_GREATEREQUAL => true
 | TOKEN_EQUAL => true
 | TOKEN_NOTEQUAL => true
 | TOKEN_AND => true
 | TOKEN_OR => true
 | _ => false

fn precedence (tok : token_t) : int =

 case+ tok of
 | TOKEN_MULTIPLY => 13
 | TOKEN_DIVIDE => 13
 | TOKEN_MOD => 13
 | TOKEN_ADD => 12
 | TOKEN_SUBTRACT => 12
 | TOKEN_NEGATE => 14
 | TOKEN_NOT => 14
 | TOKEN_LESS => 10
 | TOKEN_LESSEQUAL => 10
 | TOKEN_GREATER => 10
 | TOKEN_GREATEREQUAL => 10
 | TOKEN_EQUAL => 9
 | TOKEN_NOTEQUAL => 9
 | TOKEN_AND => 5
 | TOKEN_OR => 4
 | _ => ~1

fn opname (tok : token_t) : String =

 case- tok of
 | TOKEN_MULTIPLY => "Multiply"
 | TOKEN_DIVIDE => "Divide"
 | TOKEN_MOD => "Mod"
 | TOKEN_ADD => "Add"
 | TOKEN_SUBTRACT => "Subtract"
 | TOKEN_NEGATE => "Negate"
 | TOKEN_NOT => "Not"
 | TOKEN_LESS => "Less"
 | TOKEN_LESSEQUAL => "LessEqual"
 | TOKEN_GREATER => "Greater"
 | TOKEN_GREATEREQUAL => "GreaterEqual"
 | TOKEN_EQUAL => "Equal"
 | TOKEN_NOTEQUAL => "NotEqual"
 | TOKEN_AND => "And"
 | TOKEN_OR => "Or"

fn parse (lex : List0 tokentuple_t) : node_t =

 let
   typedef toktups_t (n : int) = list (tokentuple_t, n)
   typedef toktups_t = [n : nat] toktups_t n
   fn
   expect (expected : token_t,
           lex      : toktups_t) : toktups_t =
     case+ lex of
     | NIL => $raise truncated_lexical ()
     | toktup :: tail =>
       if toktup.0 = expected then
         tail
       else
         $raise unexpected_token (toktup, expected)
   fn
   peek {n : int} (lex : toktups_t n) : [1 <= n] token_t =
     case+ lex of
     | NIL => $raise truncated_lexical ()
     | (tok, _, _, _) :: _ => tok
   fun
   stmt (lex : toktups_t) : (node_t, toktups_t) =
     case+ lex of
     | NIL => $raise truncated_lexical ()
     | (TOKEN_IF, _, _, _) :: lex =>
       let
         val (e, lex) = paren_expr lex
         val (s, lex) = stmt lex
       in
         case+ lex of
         | (TOKEN_ELSE, _, _, _) :: lex =>
           let
             val (t, lex) = stmt lex
           in
             (node_t_cons ("If", e, node_t_cons ("If", s, t)), lex)
           end
         | _ =>
           let
             (* There is no 'else' clause. *)
             val t = node_t_nil ()
           in
             (node_t_cons ("If", e, node_t_cons ("If", s, t)), lex)
           end
       end
     | (TOKEN_PUTC, _, _, _) :: lex =>
       let
         val (subtree, lex) = paren_expr lex
         val subtree = node_t_cons ("Prtc", subtree, node_t_nil ())
         val lex = expect (TOKEN_SEMICOLON, lex)
       in
         (subtree, lex)
       end
     | (TOKEN_PRINT, _, _, _) :: lex =>
       let
         val lex = expect (TOKEN_LEFTPAREN, lex)
         fun
         loop_over_args (subtree : node_t,
                         lex     : toktups_t) : (node_t, toktups_t) =
           case+ lex of
           | (TOKEN_STRING, arg, _, _) ::
               (TOKEN_COMMA, _, _, _) :: lex =>
             let
               val leaf = node_t_leaf ("String", arg)
               val e = node_t_cons ("Prts", leaf, node_t_nil ())
             in
               loop_over_args
                 (node_t_cons ("Sequence", subtree, e), lex)
             end
           | (TOKEN_STRING, arg, _, _) :: lex =>
             let
               val lex = expect (TOKEN_RIGHTPAREN, lex)
               val lex = expect (TOKEN_SEMICOLON, lex)
               val leaf = node_t_leaf ("String", arg)
               val e = node_t_cons ("Prts", leaf, node_t_nil ())
             in
               (node_t_cons ("Sequence", subtree, e), lex)
             end
           | _ :: _ =>
             let
               val (x, lex) = expr (0, lex)
               val e = node_t_cons ("Prti", x, node_t_nil ())
               val subtree = node_t_cons ("Sequence", subtree, e)
             in
               case+ peek lex of
               | TOKEN_COMMA =>
                 let
                   val lex = expect (TOKEN_COMMA, lex)
                 in
                   loop_over_args (subtree, lex)
                 end
               | _ =>
                 let
                   val lex = expect (TOKEN_RIGHTPAREN, lex)
                   val lex = expect (TOKEN_SEMICOLON, lex)
                 in
                   (subtree, lex)
                 end
             end
           | NIL => $raise truncated_lexical ()
       in
         loop_over_args (node_t_nil (), lex)
       end
     | (TOKEN_SEMICOLON, _, _, _) :: lex => (node_t_nil (), lex)
     | (TOKEN_IDENTIFIER, arg, _, _) :: lex =>
       let
         val v = node_t_leaf ("Identifier", arg)
         val lex = expect (TOKEN_ASSIGN, lex)
         val (subtree, lex) = expr (0, lex)
         val t = node_t_cons ("Assign", v, subtree)
         val lex = expect (TOKEN_SEMICOLON, lex)
       in
         (t, lex)
       end
     | (TOKEN_WHILE, _, _, _) :: lex =>
       let
         val (e, lex) = paren_expr lex
         val (t, lex) = stmt lex
       in
         (node_t_cons ("While", e, t), lex)
       end
     | (TOKEN_LEFTBRACE, _, _, _) :: lex =>
       let
         fun
         loop_over_stmts (subtree : node_t,
                          lex     : toktups_t) :
             (node_t, toktups_t) =
           case+ lex of
           | (TOKEN_RIGHTBRACE, _, _, _) :: lex => (subtree, lex)
           | (TOKEN_END_OF_INPUT, _, line_no, column_no) :: _ =>
             $raise unterminated_statement_block (line_no, column_no)
           | _ =>
             let
               val (e, lex) = stmt lex
             in
               loop_over_stmts
                 (node_t_cons ("Sequence", subtree, e), lex)
             end
       in
         loop_over_stmts (node_t_nil (), lex)
       end
     | (TOKEN_END_OF_INPUT, _, _, _) :: lex => (node_t_nil (), lex)
     | toktup :: _ => $raise expected_a_statement (toktup)
   and
   expr (prec : int,
         lex  : toktups_t) : (node_t, toktups_t) =
     case+ lex of
     | (TOKEN_LEFTPAREN, _, _, _) :: _ =>
       (* '(' expr ')' *)
       let
         val (subtree, lex) = paren_expr lex
       in
         prec_climb (prec, subtree, lex)
       end
     | (TOKEN_ADD, _, _, _) :: lex =>
        (* '+' expr *)
       let
         val (subtree, lex) = expr (precedence TOKEN_ADD, lex)
       in
         prec_climb (prec, subtree, lex)
       end
     | (TOKEN_SUBTRACT, _, _, _) :: lex =>
       (* '-' expr *)
       let
         val (subtree, lex) = expr (precedence TOKEN_NEGATE, lex)
         val subtree = node_t_cons ("Negate", subtree, node_t_nil ())
       in
         prec_climb (prec, subtree, lex)
       end
     | (TOKEN_NOT, _, _, _) :: lex =>
       (* '!' expr *)
       let
         val (subtree, lex) = expr (precedence TOKEN_NOT, lex)
         val subtree = node_t_cons ("Not", subtree, node_t_nil ())
       in
         prec_climb (prec, subtree, lex)
       end
     | (TOKEN_IDENTIFIER, arg, _, _) :: lex =>
       let
         val leaf = node_t_leaf ("Identifier", arg)
       in
         prec_climb (prec, leaf, lex)
       end
     | (TOKEN_INTEGER, arg, _, _) :: lex =>
       let
         val leaf = node_t_leaf ("Integer", arg)
       in
         prec_climb (prec, leaf, lex)
       end
     | toktup :: lex =>
       $raise unexpected_primary (toktup)
     | NIL =>
       $raise truncated_lexical ()
   and
   prec_climb (prec    : int,
               subtree : node_t,
               lex     : toktups_t) : (node_t, toktups_t) =
     case+ peek lex of
     | tokval =>
       if ~binary_op tokval then
         (subtree, lex)
       else if precedence tokval < prec then
         (subtree, lex)
       else
         case+ lex of
         | toktup :: lex =>
           let
             val q =
               if right_assoc (toktup.0) then
                 precedence tokval
               else
                 succ (precedence tokval)
             val (e, lex) = expr (q, lex)
             val subtree1 =
               node_t_cons (opname (toktup.0), subtree, e)
           in
             prec_climb (prec, subtree1, lex)
           end
   and
   paren_expr (lex : toktups_t) : (node_t, toktups_t) =
     (* '(' expr ')' *)
     let
       val lex = expect (TOKEN_LEFTPAREN, lex)
       val (subtree, lex) = expr (0, lex)
       val lex = expect (TOKEN_RIGHTPAREN, lex)
     in
       (subtree, lex)
     end
   fun
   main_loop (subtree : node_t,
              lex     : toktups_t) : node_t =
     case+ peek lex of
     | TOKEN_END_OF_INPUT => subtree
     | _ =>
       let
         val (x, lex) = stmt lex
       in
         main_loop (node_t_cons ("Sequence", subtree, x), lex)
       end
 in
   main_loop (node_t_nil (), lex)
 end

fn print_ast (outf : FILEref,

          ast  : node_t) : void =
 let
   fun
   traverse (ast : node_t) : void =
     case+ ast of
     | node_t_nil () => fprintln! (outf, ";")
     | node_t_leaf (str, arg) => fprintln! (outf, str, " ", arg)
     | node_t_cons (str, left, right) =>
       begin
         fprintln! (outf, str);
         traverse left;
         traverse right
       end
 in
   traverse ast
 end  

(********************************************************************)

fn main_program (inpf : FILEref,

             outf : FILEref) : int =
 let
   val toklst = read_lex_file inpf
   val ast = parse toklst
   val () = print_ast (outf, ast)
 in
   0
 end

fn error_start (line_no  : ullint,

            column_no : ullint) : void =
 print! ("(", line_no, ", ", column_no, ") error: ")

implement main (argc, argv) =

 let
   val inpfname =
     if 2 <= argc then
       $UN.cast{string} argv[1]
     else
       "-"
   val outfname =
     if 3 <= argc then
       $UN.cast{string} argv[2]
     else
       "-"
 in
   try
     let
       val inpf =
         if (inpfname : string) = "-" then
           stdin_ref
         else
           fileref_open_exn (inpfname, file_mode_r)
       val outf =
         if (outfname : string) = "-" then
           stdout_ref
         else
           fileref_open_exn (outfname, file_mode_w)
     in
       main_program (inpf, outf)
     end
   with
   | ~ unexpected_primary @(tok, _, line_no, column_no) =>
     begin
       error_start (line_no, column_no);
       println! ("Expecting a primary, found: ", token_text tok);
       1
     end
   | ~ unexpected_token (@(tok, _, line_no, column_no), expected) =>
     begin
       error_start (line_no, column_no);
       println! ("Expecting '", token_text expected,
                 "', found '", token_text tok, "'");
       1
     end
   | ~ expected_a_statement @(tok, _, line_no, column_no) =>
     begin
       error_start (line_no, column_no);
       println! ("expecting start of statement, found '",
                 token_text tok, "'");
       1
     end
   | ~ unterminated_statement_block (line_no, column_no) =>
     begin
       error_start (line_no, column_no);
       println! ("unterminated statement block");
       1
     end
   | ~ truncated_lexical () =>
     begin
       println! ("truncated input token stream");
       2
     end
   | ~ bad_lex_integer (s) =>
     begin
       println! ("bad integer literal in the token stream: '",
                 s, "'");
       2
     end
   | ~ bad_string_literal (s) =>
     begin
       println! ("bad string literal in the token stream: '",
                 s, "'");
       2
     end
   | ~ bad_lex_token_name (s) =>
     begin
       println! ("bad token name in the token stream: '",
                 s, "'");
       2
     end

end

(********************************************************************)</lang>


Output:
$ patscc -O2 -DATS_MEMALLOC_GCBDW -o parse parse-in-ATS.dats -latslib -lgc && ./lex compiler-tests/primes.t | ./parse
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier count
Integer 1
Assign
Identifier n
Integer 1
Assign
Identifier limit
Integer 100
While
Less
Identifier n
Identifier limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier k
Integer 3
Assign
Identifier p
Integer 1
Assign
Identifier n
Add
Identifier n
Integer 2
While
And
LessEqual
Multiply
Identifier k
Identifier k
Identifier n
Identifier p
Sequence
Sequence
;
Assign
Identifier p
NotEqual
Multiply
Divide
Identifier n
Identifier k
Identifier k
Identifier n
Assign
Identifier k
Add
Identifier k
Integer 2
If
Identifier p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier n
;
Prts
String " is prime\n"
;
Assign
Identifier count
Add
Identifier count
Integer 1
;
Sequence
Sequence
Sequence
;
Prts
String "Total primes found: "
;
Prti
Identifier count
;
Prts
String "\n"
;

AWK

Tested with gawk 4.1.1 and mawk 1.3.4. <lang AWK> function Token_assign(tk, attr, attr_array, n, i) {

 n=split(attr, attr_array)
 for(i=1; i<=n; i++)
   Tokens[tk,i-1] = attr_array[i]

}

        • show error and exit

function error(msg) {

 printf("(%s, %s) %s\n", err_line, err_col, msg)
 exit(1)

}

function gettok( line, n, i) {

 getline line
 if (line == "")
   error("empty line")
 n=split(line, line_list)
 # line col Ident var_name
 # 1    2   3     4
 err_line = line_list[1]
 err_col  = line_list[2]
 tok_text = line_list[3]
 tok = all_syms[tok_text]
 for (i=5; i<=n; i++)
   line_list[4] = line_list[4] " " line_list[i]
 if (tok == "")
   error("Unknown token " tok_text)
 tok_other = ""
 if (tok == "tk_Integer" || tok == "tk_Ident" || tok =="tk_String")
   tok_other = line_list[4]

}

function make_node(oper, left, right, value) {

 node_type [next_free_node_index] = oper
 node_left [next_free_node_index] = left
 node_right[next_free_node_index] = right
 node_value[next_free_node_index] = value
 return next_free_node_index ++

}

function make_leaf(oper, n) {

 return make_node(oper, 0, 0, n)

}

function expect(msg, s) {

 if (tok == s) {
   gettok()
   return
 }
 error(msg ": Expecting '" Tokens[s,TK_NAME] "', found '" Tokens[tok,TK_NAME] "'")

}

function expr(p, x, op, node) {

 x = 0
 if (tok == "tk_Lparen") {
   x = paren_expr()
 } else if (tok == "tk_Sub" || tok == "tk_Add") {
   if (tok == "tk_Sub")
     op = "tk_Negate"
   else
     op = "tk_Add"
   gettok()
   node = expr(Tokens["tk_Negate",TK_PRECEDENCE]+0)
   if (op == "tk_Negate")
     x = make_node("nd_Negate", node)
   else
     x = node
 } else if (tok == "tk_Not") {
   gettok()
   x = make_node("nd_Not", expr(Tokens["tk_Not",TK_PRECEDENCE]+0))
 } else if (tok == "tk_Ident") {
   x = make_leaf("nd_Ident", tok_other)
   gettok()
 } else if (tok == "tk_Integer") {
   x = make_leaf("nd_Integer", tok_other)
   gettok()
 } else {
   error("Expecting a primary, found: " Tokens[tok,TK_NAME])
 }
 while (((Tokens[tok,TK_IS_BINARY]+0) > 0) && ((Tokens[tok,TK_PRECEDENCE]+0) >= p)) {
   op = tok
   gettok()
   q = Tokens[op,TK_PRECEDENCE]+0
   if (! (Tokens[op,TK_RIGHT_ASSOC]+0 > 0))
     q += 1
   node = expr(q)
   x = make_node(Tokens[op,TK_NODE], x, node)
 }
 return x

}

function paren_expr( node) {

 expect("paren_expr", "tk_Lparen")
 node = expr(0)
 expect("paren_expr", "tk_Rparen")
 return node

}

function stmt( t, e, s, s2, v) {

 t = 0
 if (tok == "tk_If") {
   gettok()
   e = paren_expr()
   s = stmt()
   s2 = 0
   if (tok == "tk_Else") {
     gettok()
     s2 = stmt()
   }
   t = make_node("nd_If", e, make_node("nd_If", s, s2))
 } else if (tok == "tk_Putc") {
   gettok()
   e = paren_expr()
   t = make_node("nd_Prtc", e)
   expect("Putc", "tk_Semi")
 } else if (tok == "tk_Print") {
   gettok()
   expect("Print", "tk_Lparen")
   while (1) {
     if (tok == "tk_String") {
       e = make_node("nd_Prts", make_leaf("nd_String", tok_other))
       gettok()
     } else {
       e = make_node("nd_Prti", expr(0))
     }
     t = make_node("nd_Sequence", t, e)
     if (tok != "tk_Comma")
       break
     gettok()
   }
   expect("Print", "tk_Rparen")
   expect("Print", "tk_Semi")
 } else if (tok == "tk_Semi") {
   gettok()
 } else if (tok == "tk_Ident") {
   v = make_leaf("nd_Ident", tok_other)
   gettok()
   expect("assign", "tk_Assign")
   e = expr(0)
   t = make_node("nd_Assign", v, e)
   expect("assign", "tk_Semi")
 } else if (tok == "tk_While") {
   gettok()
   e = paren_expr()
   s = stmt()
   t = make_node("nd_While", e, s)
 } else if (tok == "tk_Lbrace") {
   gettok()
   while (tok != "tk_Rbrace" && tok != "tk_EOI")
     t = make_node("nd_Sequence", t, stmt())
   expect("Lbrace", "tk_Rbrace")
 } else if (tok == "tk_EOI") {
 } else {
   error("Expecting start of statement, found: " Tokens[tok,TK_NAME])
 }
 return t

}

function parse( t) {

 t = 0   # None
 gettok()
 while (1) {
   t = make_node("nd_Sequence", t, stmt())
   if (tok == "tk_EOI" || t == 0)
     break
 }
 return t

}

function prt_ast(t) {

 if (t == 0) {
   print(";")
 } else {
   printf("%-14s", Display_nodes[node_type[t]])
   if ((node_type[t] == "nd_Ident") || (node_type[t] == "nd_Integer"))
     printf("%s\n", node_value[t])
   else if (node_type[t] == "nd_String") {
     printf("%s\n", node_value[t])
   } else {
     print("")
     prt_ast(node_left[t])
     prt_ast(node_right[t])
   }
 }

}

BEGIN {

 all_syms["End_of_input"    ] = "tk_EOI"
 all_syms["Op_multiply"     ] = "tk_Mul"
 all_syms["Op_divide"       ] = "tk_Div"
 all_syms["Op_mod"          ] = "tk_Mod"
 all_syms["Op_add"          ] = "tk_Add"
 all_syms["Op_subtract"     ] = "tk_Sub"
 all_syms["Op_negate"       ] = "tk_Negate"
 all_syms["Op_not"          ] = "tk_Not"
 all_syms["Op_less"         ] = "tk_Lss"
 all_syms["Op_lessequal"    ] = "tk_Leq"
 all_syms["Op_greater"      ] = "tk_Gtr"
 all_syms["Op_greaterequal" ] = "tk_Geq"
 all_syms["Op_equal"        ] = "tk_Eq"
 all_syms["Op_notequal"     ] = "tk_Neq"
 all_syms["Op_assign"       ] = "tk_Assign"
 all_syms["Op_and"          ] = "tk_And"
 all_syms["Op_or"           ] = "tk_Or"
 all_syms["Keyword_if"      ] = "tk_If"
 all_syms["Keyword_else"    ] = "tk_Else"
 all_syms["Keyword_while"   ] = "tk_While"
 all_syms["Keyword_print"   ] = "tk_Print"
 all_syms["Keyword_putc"    ] = "tk_Putc"
 all_syms["LeftParen"       ] = "tk_Lparen"
 all_syms["RightParen"      ] = "tk_Rparen"
 all_syms["LeftBrace"       ] = "tk_Lbrace"
 all_syms["RightBrace"      ] = "tk_Rbrace"
 all_syms["Semicolon"       ] = "tk_Semi"
 all_syms["Comma"           ] = "tk_Comma"
 all_syms["Identifier"      ] = "tk_Ident"
 all_syms["Integer"         ] = "tk_Integer"
 all_syms["String"          ] = "tk_String"
 Display_nodes["nd_Ident"   ] = "Identifier"
 Display_nodes["nd_String"  ] = "String"
 Display_nodes["nd_Integer" ] = "Integer"
 Display_nodes["nd_Sequence"] = "Sequence"
 Display_nodes["nd_If"      ] = "If"
 Display_nodes["nd_Prtc"    ] = "Prtc"
 Display_nodes["nd_Prts"    ] = "Prts"
 Display_nodes["nd_Prti"    ] = "Prti"
 Display_nodes["nd_While"   ] = "While"
 Display_nodes["nd_Assign"  ] = "Assign"
 Display_nodes["nd_Negate"  ] = "Negate"
 Display_nodes["nd_Not"     ] = "Not"
 Display_nodes["nd_Mul"     ] = "Multiply"
 Display_nodes["nd_Div"     ] = "Divide"
 Display_nodes["nd_Mod"     ] = "Mod"
 Display_nodes["nd_Add"     ] = "Add"
 Display_nodes["nd_Sub"     ] = "Subtract"
 Display_nodes["nd_Lss"     ] = "Less"
 Display_nodes["nd_Leq"     ] = "LessEqual"
 Display_nodes["nd_Gtr"     ] = "Greater"
 Display_nodes["nd_Geq"     ] = "GreaterEqual"
 Display_nodes["nd_Eql"     ] = "Equal"
 Display_nodes["nd_Neq"     ] = "NotEqual"
 Display_nodes["nd_And"     ] = "And"
 Display_nodes["nd_Or"      ] = "Or"
 TK_NAME         =          0
 TK_RIGHT_ASSOC  =                   1
 TK_IS_BINARY    =                     2
 TK_IS_UNARY     =                       3
 TK_PRECEDENCE   =                          4
 TK_NODE         =                             5
 Token_assign("tk_EOI"    , "EOI     0 0 0 -1 -1        ")
 Token_assign("tk_Mul"    , "*       0 1 0 13 nd_Mul    ")
 Token_assign("tk_Div"    , "/       0 1 0 13 nd_Div    ")
 Token_assign("tk_Mod"    , "%       0 1 0 13 nd_Mod    ")
 Token_assign("tk_Add"    , "+       0 1 0 12 nd_Add    ")
 Token_assign("tk_Sub"    , "-       0 1 0 12 nd_Sub    ")
 Token_assign("tk_Negate" , "-       0 0 1 14 nd_Negate ")
 Token_assign("tk_Not"    , "!       0 0 1 14 nd_Not    ")
 Token_assign("tk_Lss"    , "<       0 1 0 10 nd_Lss    ")
 Token_assign("tk_Leq"    , "<=      0 1 0 10 nd_Leq    ")
 Token_assign("tk_Gtr"    , ">       0 1 0 10 nd_Gtr    ")
 Token_assign("tk_Geq"    , ">=      0 1 0 10 nd_Geq    ")
 Token_assign("tk_Eql"    , "==      0 1 0  9 nd_Eql    ")
 Token_assign("tk_Neq"    , "!=      0 1 0  9 nd_Neq    ")
 Token_assign("tk_Assign" , "=       0 0 0 -1 nd_Assign ")
 Token_assign("tk_And"    , "&&      0 1 0  5 nd_And    ")
 Token_assign("tk_Or"     , "||      0 1 0  4 nd_Or     ")
 Token_assign("tk_If"     , "if      0 0 0 -1 nd_If     ")
 Token_assign("tk_Else"   , "else    0 0 0 -1 -1        ")
 Token_assign("tk_While"  , "while   0 0 0 -1 nd_While  ")
 Token_assign("tk_Print"  , "print   0 0 0 -1 -1        ")
 Token_assign("tk_Putc"   , "putc    0 0 0 -1 -1        ")
 Token_assign("tk_Lparen" , "(       0 0 0 -1 -1        ")
 Token_assign("tk_Rparen" , ")       0 0 0 -1 -1        ")
 Token_assign("tk_Lbrace" , "{       0 0 0 -1 -1        ")
 Token_assign("tk_Rbrace" , "}       0 0 0 -1 -1        ")
 Token_assign("tk_Semi"   , ";       0 0 0 -1 -1        ")
 Token_assign("tk_Comma"  , ",       0 0 0 -1 -1        ")
 Token_assign("tk_Ident"  , "Ident   0 0 0 -1 nd_Ident  ")
 Token_assign("tk_Integer", "Integer 0 0 0 -1 nd_Integer")
 Token_assign("tk_String" , "String  0 0 0 -1 nd_String ")
 input_file = "-"
 err_line   = 0
 err_col    = 0
 tok        = ""
 tok_text   = ""
 next_free_node_index = 1
 if (ARGC > 1)
   input_file = ARGV[1]
 t = parse()
 prt_ast(t)

} </lang>

Output  —  count:

Sequence      
Sequence      
;
Assign        
Identifier    count
Integer       1
While         
Less          
Identifier    count
Integer       10
Sequence      
Sequence      
;
Sequence      
Sequence      
Sequence      
;
Prts          
String        "count is: "
;
Prti          
Identifier    count
;
Prts          
String        "\n"
;
Assign        
Identifier    count
Add           
Identifier    count
Integer       1

C

Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra <lang C>#include <stdio.h>

  1. include <stdlib.h>
  2. include <string.h>
  3. include <stdarg.h>
  4. include <stdbool.h>
  5. include <ctype.h>
  1. define NELEMS(arr) (sizeof(arr) / sizeof(arr[0]))

typedef enum {

   tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr,
   tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print,
   tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident,
   tk_Integer, tk_String

} TokenType;

typedef enum {

   nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While,
   nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq,
   nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or

} NodeType;

typedef struct {

   TokenType tok;
   int err_ln;
   int err_col;
   char *text;             /* ident or string literal or integer value */

} tok_s;

typedef struct Tree {

   NodeType node_type;
   struct Tree *left;
   struct Tree *right;
   char *value;

} Tree;

// dependency: Ordered by tok, must remain in same order as TokenType enum struct {

   char       *text, *enum_text;
   TokenType   tok;
   bool        right_associative, is_binary, is_unary;
   int         precedence;
   NodeType    node_type;

} atr[] = {

   {"EOI",             "End_of_input"   , tk_EOI,     false, false, false, -1, -1        },
   {"*",               "Op_multiply"    , tk_Mul,     false, true,  false, 13, nd_Mul    },
   {"/",               "Op_divide"      , tk_Div,     false, true,  false, 13, nd_Div    },
   {"%",               "Op_mod"         , tk_Mod,     false, true,  false, 13, nd_Mod    },
   {"+",               "Op_add"         , tk_Add,     false, true,  false, 12, nd_Add    },
   {"-",               "Op_subtract"    , tk_Sub,     false, true,  false, 12, nd_Sub    },
   {"-",               "Op_negate"      , tk_Negate,  false, false, true,  14, nd_Negate },
   {"!",               "Op_not"         , tk_Not,     false, false, true,  14, nd_Not    },
   {"<",               "Op_less"        , tk_Lss,     false, true,  false, 10, nd_Lss    },
   {"<=",              "Op_lessequal"   , tk_Leq,     false, true,  false, 10, nd_Leq    },
   {">",               "Op_greater"     , tk_Gtr,     false, true,  false, 10, nd_Gtr    },
   {">=",              "Op_greaterequal", tk_Geq,     false, true,  false, 10, nd_Geq    },
   {"==",              "Op_equal"       , tk_Eql,     false, true,  false,  9, nd_Eql    },
   {"!=",              "Op_notequal"    , tk_Neq,     false, true,  false,  9, nd_Neq    },
   {"=",               "Op_assign"      , tk_Assign,  false, false, false, -1, nd_Assign },
   {"&&",              "Op_and"         , tk_And,     false, true,  false,  5, nd_And    },
   {"||",              "Op_or"          , tk_Or,      false, true,  false,  4, nd_Or     },
   {"if",              "Keyword_if"     , tk_If,      false, false, false, -1, nd_If     },
   {"else",            "Keyword_else"   , tk_Else,    false, false, false, -1, -1        },
   {"while",           "Keyword_while"  , tk_While,   false, false, false, -1, nd_While  },
   {"print",           "Keyword_print"  , tk_Print,   false, false, false, -1, -1        },
   {"putc",            "Keyword_putc"   , tk_Putc,    false, false, false, -1, -1        },
   {"(",               "LeftParen"      , tk_Lparen,  false, false, false, -1, -1        },
   {")",               "RightParen"     , tk_Rparen,  false, false, false, -1, -1        },
   {"{",               "LeftBrace"      , tk_Lbrace,  false, false, false, -1, -1        },
   {"}",               "RightBrace"     , tk_Rbrace,  false, false, false, -1, -1        },
   {";",               "Semicolon"      , tk_Semi,    false, false, false, -1, -1        },
   {",",               "Comma"          , tk_Comma,   false, false, false, -1, -1        },
   {"Ident",           "Identifier"     , tk_Ident,   false, false, false, -1, nd_Ident  },
   {"Integer literal", "Integer"        , tk_Integer, false, false, false, -1, nd_Integer},
   {"String literal",  "String"         , tk_String,  false, false, false, -1, nd_String },

};

char *Display_nodes[] = {"Identifier", "String", "Integer", "Sequence", "If", "Prtc",

   "Prts", "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod",
   "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal",
   "NotEqual", "And", "Or"};

static tok_s tok; static FILE *source_fp, *dest_fp;

Tree *paren_expr();

void error(int err_line, int err_col, const char *fmt, ... ) {

   va_list ap;
   char buf[1000];
   va_start(ap, fmt);
   vsprintf(buf, fmt, ap);
   va_end(ap);
   printf("(%d, %d) error: %s\n", err_line, err_col, buf);
   exit(1);

}

char *read_line(int *len) {

   static char *text = NULL;
   static int textmax = 0;
   for (*len = 0; ; (*len)++) {
       int ch = fgetc(source_fp);
       if (ch == EOF || ch == '\n') {
           if (*len == 0)
               return NULL;
           break;
       }
       if (*len + 1 >= textmax) {
           textmax = (textmax == 0 ? 128 : textmax * 2);
           text = realloc(text, textmax);
       }
       text[*len] = ch;
   }
   text[*len] = '\0';
   return text;

}

char *rtrim(char *text, int *len) { // remove trailing spaces

   for (; *len > 0 && isspace(text[*len - 1]); --(*len))
       ;
   text[*len] = '\0';
   return text;

}

TokenType get_enum(const char *name) { // return internal version of name

   for (size_t i = 0; i < NELEMS(atr); i++) {
       if (strcmp(atr[i].enum_text, name) == 0)
           return atr[i].tok;
   }
   error(0, 0, "Unknown token %s\n", name);
   return 0;

}

tok_s gettok() {

   int len;
   tok_s tok;
   char *yytext = read_line(&len);
   yytext = rtrim(yytext, &len);
   // [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional
   // get line and column
   tok.err_ln  = atoi(strtok(yytext, " "));
   tok.err_col = atoi(strtok(NULL, " "));
   // get the token name
   char *name = strtok(NULL, " ");
   tok.tok = get_enum(name);
   // if there is extra data, get it
   char *p = name + strlen(name);
   if (p != &yytext[len]) {
       for (++p; isspace(*p); ++p)
           ;
       tok.text = strdup(p);
   }
   return tok;

}

Tree *make_node(NodeType node_type, Tree *left, Tree *right) {

   Tree *t = calloc(sizeof(Tree), 1);
   t->node_type = node_type;
   t->left = left;
   t->right = right;
   return t;

}

Tree *make_leaf(NodeType node_type, char *value) {

   Tree *t = calloc(sizeof(Tree), 1);
   t->node_type = node_type;
   t->value = strdup(value);
   return t;

}

void expect(const char msg[], TokenType s) {

   if (tok.tok == s) {
       tok = gettok();
       return;
   }
   error(tok.err_ln, tok.err_col, "%s: Expecting '%s', found '%s'\n", msg, atr[s].text, atr[tok.tok].text);

}

Tree *expr(int p) {

   Tree *x = NULL, *node;
   TokenType op;
   switch (tok.tok) {
       case tk_Lparen:
           x = paren_expr();
           break;
       case tk_Sub: case tk_Add:
           op = tok.tok;
           tok = gettok();
           node = expr(atr[tk_Negate].precedence);
           x = (op == tk_Sub) ? make_node(nd_Negate, node, NULL) : node;
           break;
       case tk_Not:
           tok = gettok();
           x = make_node(nd_Not, expr(atr[tk_Not].precedence), NULL);
           break;
       case tk_Ident:
           x = make_leaf(nd_Ident, tok.text);
           tok = gettok();
           break;
       case tk_Integer:
           x = make_leaf(nd_Integer, tok.text);
           tok = gettok();
           break;
       default:
           error(tok.err_ln, tok.err_col, "Expecting a primary, found: %s\n", atr[tok.tok].text);
   }
   while (atr[tok.tok].is_binary && atr[tok.tok].precedence >= p) {
       TokenType op = tok.tok;
       tok = gettok();
       int q = atr[op].precedence;
       if (!atr[op].right_associative)
           q++;
       node = expr(q);
       x = make_node(atr[op].node_type, x, node);
   }
   return x;

}

Tree *paren_expr() {

   expect("paren_expr", tk_Lparen);
   Tree *t = expr(0);
   expect("paren_expr", tk_Rparen);
   return t;

}

Tree *stmt() {

   Tree *t = NULL, *v, *e, *s, *s2;
   switch (tok.tok) {
       case tk_If:
           tok = gettok();
           e = paren_expr();
           s = stmt();
           s2 = NULL;
           if (tok.tok == tk_Else) {
               tok = gettok();
               s2 = stmt();
           }
           t = make_node(nd_If, e, make_node(nd_If, s, s2));
           break;
       case tk_Putc:
           tok = gettok();
           e = paren_expr();
           t = make_node(nd_Prtc, e, NULL);
           expect("Putc", tk_Semi);
           break;
       case tk_Print: /* print '(' expr {',' expr} ')' */
           tok = gettok();
           for (expect("Print", tk_Lparen); ; expect("Print", tk_Comma)) {
               if (tok.tok == tk_String) {
                   e = make_node(nd_Prts, make_leaf(nd_String, tok.text), NULL);
                   tok = gettok();
               } else
                   e = make_node(nd_Prti, expr(0), NULL);
               t = make_node(nd_Sequence, t, e);
               if (tok.tok != tk_Comma)
                   break;
           }
           expect("Print", tk_Rparen);
           expect("Print", tk_Semi);
           break;
       case tk_Semi:
           tok = gettok();
           break;
       case tk_Ident:
           v = make_leaf(nd_Ident, tok.text);
           tok = gettok();
           expect("assign", tk_Assign);
           e = expr(0);
           t = make_node(nd_Assign, v, e);
           expect("assign", tk_Semi);
           break;
       case tk_While:
           tok = gettok();
           e = paren_expr();
           s = stmt();
           t = make_node(nd_While, e, s);
           break;
       case tk_Lbrace:         /* {stmt} */
           for (expect("Lbrace", tk_Lbrace); tok.tok != tk_Rbrace && tok.tok != tk_EOI;)
               t = make_node(nd_Sequence, t, stmt());
           expect("Lbrace", tk_Rbrace);
           break;
       case tk_EOI:
           break;
       default: error(tok.err_ln, tok.err_col, "expecting start of statement, found '%s'\n", atr[tok.tok].text);
   }
   return t;

}

Tree *parse() {

   Tree *t = NULL;
   tok = gettok();
   do {
       t = make_node(nd_Sequence, t, stmt());
   } while (t != NULL && tok.tok != tk_EOI);
   return t;

}

void prt_ast(Tree *t) {

   if (t == NULL)
       printf(";\n");
   else {
       printf("%-14s ", Display_nodes[t->node_type]);
       if (t->node_type == nd_Ident || t->node_type == nd_Integer || t->node_type == nd_String) {
           printf("%s\n", t->value);
       } else {
           printf("\n");
           prt_ast(t->left);
           prt_ast(t->right);
       }
   }

}

void init_io(FILE **fp, FILE *std, const char mode[], const char fn[]) {

   if (fn[0] == '\0')
       *fp = std;
   else if ((*fp = fopen(fn, mode)) == NULL)
       error(0, 0, "Can't open %s\n", fn);

}

int main(int argc, char *argv[]) {

   init_io(&source_fp, stdin,  "r",  argc > 1 ? argv[1] : "");
   init_io(&dest_fp,   stdout, "wb", argc > 2 ? argv[2] : "");
   prt_ast(parse());

}</lang>

Output  —  prime numbers AST:

Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier     count
Integer        1
Assign
Identifier     n
Integer        1
Assign
Identifier     limit
Integer        100
While
Less
Identifier     n
Identifier     limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier     k
Integer        3
Assign
Identifier     p
Integer        1
Assign
Identifier     n
Add
Identifier     n
Integer        2
While
And
LessEqual
Multiply
Identifier     k
Identifier     k
Identifier     n
Identifier     p
Sequence
Sequence
;
Assign
Identifier     p
NotEqual
Multiply
Divide
Identifier     n
Identifier     k
Identifier     k
Identifier     n
Assign
Identifier     k
Add
Identifier     k
Integer        2
If
Identifier     p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier     n
;
Prts
String         " is prime\n"
;
Assign
Identifier     count
Add
Identifier     count
Integer        1
;
Sequence
Sequence
Sequence
;
Prts
String         "Total primes found: "
;
Prti
Identifier     count
;
Prts
String         "\n"
;

COBOL

Code by Steve Williams. Tested with GnuCOBOL 2.2.

<lang cobol> >>SOURCE FORMAT IS FREE identification division.

  • > this code is dedicated to the public domain
  • > (GnuCOBOL) 2.3-dev.0
  • > for extra credit, generate this program directly from the EBNF

program-id. parser. environment division. configuration section. repository. function all intrinsic. input-output section. file-control.

   select input-file assign using input-name
       status is input-status
       organization is line sequential.

data division. file section. fd input-file global. 01 input-record global.

   03  input-line pic zzzz9.
   03  input-column pic zzzzzz9.
   03  filler pic x(3).
   03  input-token pic x(16).
   03  input-value pic x(48).

working-storage section. 01 program-name pic x(32) value spaces global. 01 input-name pic x(32) value spaces global. 01 input-status pic xx global.

01 line-no pic 999 value 0. 01 col-no pic 99 value 0.

01 error-record global.

   03  error-line-no pic zzzz9.
   03  error-col-no pic zzzzzz9.
   03  filler pic x value space.
   03  error-message pic x(64) value spaces.

01 token global.

   03  token-type pic x(16).
   03  token-line pic 999.
   03  token-column pic 99.
   03  token-value pic x(48).

01 parse-stack global.

   03  p pic 999 value 0.
   03  p-lim pic 999 value 200.
   03  p-zero pic 999 value 0.
   03  parse-entry occurs 200.
       05  parse-name pic x(24).
       05  parse-token pic x(16).
       05  parse-left pic 999.
       05  parse-right pic 999.
       05  parse-work pic 999.
       05  parse-work1 pic 999.

01 abstract-syntax-tree global.

   03  t pic 999 value 0.
   03  t1 pic 999.
   03  t-lim pic 999 value 998.
   03  filler occurs 998.
       05  leaf.
           07  leaf-type pic x(14).
           07  leaf-value pic x(48).
       05  node redefines leaf.
           07  node-type pic x(14).
           07  node-left pic 999.
           07  node-right pic 999.

01 indent pic x(200) value all '| ' global.

procedure division chaining program-name. start-parser.

   if program-name <> spaces
       string program-name delimited by space '.lex' into input-name
       open input input-file
       if input-status <> '00'
           string 'in parser ' trim(input-name) ' open status ' input-status
               into error-message
           call 'reporterror'
       end-if
   end-if
   call 'gettoken'
   call 'stmt_list'
   if input-name <> spaces
       close input-file
   end-if
   call 'printast' using t
   >>d perform dump-ast
   stop run
   .

dump-ast.

   display '==========' upon syserr
   display 'ast:' upon syserr
   display 't=' t upon syserr
   perform varying t1 from 1 by 1 until t1 > t
       if leaf-type(t1) = 'Identifier' or 'Integer' or 'String'
           display t1 space trim(leaf-type(t1)) space trim(leaf-value(t1)) upon syserr
       else
           display t1 space node-left(t1) space node-right(t1) space trim(node-type(t1))
               upon syserr 
       end-if
   end-perform
   .

identification division. program-id. stmt_list common recursive. data division. procedure division. start-stmt_list.

   call 'push' using module-id
   move p-zero to parse-left(p)
   perform forever
       call 'stmt'
       move return-code to parse-right(p)
       call 'makenode' using 'Sequence' parse-left(p) parse-right(p)
       move return-code to parse-left(p)
       if parse-right(p) = 0
       or token-type = 'End_of_input'
           exit perform
       end-if
   end-perform
   call 'pop'
   .

end program stmt_list.

identification division. program-id. stmt common recursive. procedure division. start-stmt.

   call 'push' using module-id
   move p-zero to parse-left(p)
   evaluate token-type
   when 'Semicolon'
       call 'gettoken'
   when 'Identifier'
       *>Identifier '=' expr ';'
       call 'makeleaf' using 'Identifier' token-value
       move return-code to parse-left(p)
       call 'gettoken'
       call 'expect' using 'Op_assign'
       call 'expr'
       move return-code to parse-right(p)
       call 'expect' using 'Semicolon'
       call 'makenode' using 'Assign' parse-left(p) parse-right(p)
       move return-code to parse-left(p)
   when 'Keyword_while'
       *>'while' paren_expr '{' stmt '}'
       call 'gettoken'
       call 'paren_expr'
       move return-code to parse-work(p)
       call 'stmt'
       move return-code to parse-right(p)
       call 'makenode' using 'While' parse-work(p) parse-right(p)
       move return-code to parse-left(p)
   when 'Keyword_if'
       *>'if' paren_expr stmt ['else' stmt]
       call 'gettoken'
       call 'paren_expr'
       move return-code to parse-left(p)
       call 'stmt'
       move return-code to parse-work(p)
       move p-zero to parse-work1(p)
       if token-type = 'Keyword_else'
           call 'gettoken'
           call 'stmt'
           move return-code to parse-work1(p)
       end-if
       call 'makenode' using 'If' parse-work(p) parse-work1(p)
       move return-code to parse-right(p)
       call 'makenode' using 'If' parse-left(p) parse-right(p)
       move return-code to parse-left(p)
   when 'Keyword_print'
       *>'print' '(' prt_list ')' ';'
       call 'gettoken'
       call 'expect' using 'LeftParen'
       call 'prt_list'
       move return-code to parse-left(p)
       call 'expect' using 'RightParen'
       call 'expect' using 'Semicolon'
   when 'Keyword_putc'
       *>'putc' paren_expr ';'
       call 'gettoken'
       call 'paren_expr'
       move return-code to parse-left(p)
       call 'makenode' using 'Prtc' parse-left(p) p-zero
       move return-code to parse-left(p)
       call 'expect' using 'Semicolon'
   when 'LeftBrace'
       *>'{' stmt '}'
       call 'gettoken'
       move p-zero to parse-left(p)
       perform until token-type = 'RightBrace' or 'End_of_input'
           call 'stmt'
           move return-code to parse-right(p)
           call 'makenode' using 'Sequence' parse-left(p) parse-right(p)  
           move return-code to parse-left(p)
       end-perform
       if token-type <> 'End_of_input'
           call 'gettoken'
       end-if
   when other
       move 0 to parse-left(p)
   end-evaluate
   move parse-left(p) to return-code
   call 'pop'
   .

end program stmt.

identification division. program-id. paren_expr common recursive. procedure division. start-paren_expr.

   *>'(' expr ')' ;
   call 'push' using module-id
   call 'expect' using 'LeftParen'
   call 'expr'
   call 'expect' using 'RightParen'
   call 'pop'
   .

end program paren_expr.

identification division. program-id. prt_list common. procedure division. start-prt_list.

   *>(string | expr) {',' (String | expr)} ;
   call 'push' using module-id
   move p-zero to parse-work(p)
   perform prt_entry
   perform until token-type <> 'Comma'
       call 'gettoken'
       perform prt_entry
   end-perform
   call 'pop'
   exit program
   .

prt_entry.

   if token-type = 'String'
       call 'makeleaf' using token-type token-value
       move return-code to parse-left(p)
       call 'makenode' using 'Prts' parse-left(p) p-zero
       call 'gettoken'
   else
       call 'expr'
       move return-code to parse-left(p)
       call 'makenode' using 'Prti' parse-left(p) p-zero
   end-if
   move return-code to parse-right(p)
   call 'makenode' using 'Sequence' parse-work(p) parse-right(p)
   move return-code to parse-work(p)
   .

end program prt_list.

identification division. program-id. expr common recursive. procedure division. start-expr.

   *>and_expr {'||' and_expr} ;
   call 'push' using module-id
   call 'and_expr'
   move return-code to parse-left(p)
   perform forever
      if token-type <> 'Op_or'
          exit perform
      end-if
      call 'gettoken'
      call 'and_expr'
      move return-code to parse-right(p)
      call 'makenode' using 'Or' parse-left(p) parse-right(p)
      move return-code to parse-left(p)
   end-perform
   move parse-left(p) to return-code
   call 'pop'
   .

end program expr.

identification division. program-id. and_expr common recursive. procedure division. start-and_expr.

   *>equality_expr {'&&' equality_expr} ;
   call 'push' using module-id
   call 'equality_expr'
   move return-code to parse-left(p)
   perform forever
       if token-type <> 'Op_and'
           exit perform
       end-if
       call 'gettoken'
       call 'equality_expr'
       move return-code to parse-right(p)
       call 'makenode' using 'And' parse-left(p) parse-right(p)
       move return-code to parse-left(p)
   end-perform
   call 'pop'
   .

end program and_expr.

identification division. program-id. equality_expr common recursive. procedure division. start-equality_expr.

   *>relational_expr [('==' | '!=') relational_expr] ;
   call 'push' using module-id
   call 'relational_expr'
   move return-code to parse-left(p)
   evaluate token-type
   when 'Op_equal'
       move 'Equal' to parse-token(p)
   when 'Op_notequal'
       move 'NotEqual' to parse-token(p)
   end-evaluate
   if parse-token(p) <> spaces
       call 'gettoken'
       call 'relational_expr'
       move return-code to parse-right(p)
       call 'makenode' using parse-token(p) parse-left(p) parse-right(p)
       move return-code to parse-left(p)
   end-if
   call 'pop'
   .

end program equality_expr.

identification division. program-id. relational_expr common recursive. procedure division. start-relational_expr.

   *>addition_expr [('<' | '<=' | '>' | '>=') addition_expr] ;
   call 'push' using module-id
   call 'addition_expr'
   move return-code to parse-left(p)
   evaluate token-type
   when 'Op_less'
       move 'Less' to parse-token(p)
   when 'Op_lessequal'
       move 'LessEqual' to parse-token(p)
   when 'Op_greater'
       move 'Greater' to parse-token(p)
   when 'Op_greaterequal'
       move 'GreaterEqual' to parse-token(p)
   end-evaluate
   if parse-token(p) <> spaces
       call 'gettoken'
       call 'addition_expr'
       move return-code to parse-right(p)
       call 'makenode' using parse-token(p) parse-left(p) parse-right(p)
       move return-code to parse-left(p)
   end-if
   call 'pop'
   .

end program relational_expr.

identification division. program-id. addition_expr common recursive. procedure division. start-addition_expr.

   *>multiplication_expr {('+' | '-') multiplication_expr} ;
   call 'push' using module-id
   call 'multiplication_expr'
   move return-code to parse-left(p)
   perform forever
       evaluate token-type
       when 'Op_add'
           move 'Add' to parse-token(p)
       when 'Op_subtract'
           move 'Subtract' to parse-token(p)
       when other
           exit perform
       end-evaluate
       call 'gettoken'
       call 'multiplication_expr'
       move return-code to parse-right(p)
       call 'makenode' using parse-token(p) parse-left(p) parse-right(p)
       move return-code to parse-left(p)
   end-perform
   call 'pop'
   .

end program addition_expr.

identification division. program-id. multiplication_expr common recursive. procedure division. start-multiplication_expr.

   *>primary {('*' | '/' | '%') primary } ;
   call 'push' using module-id
   call 'primary'
   move return-code to parse-left(p)
   perform forever
       evaluate token-type
       when 'Op_multiply'
           move 'Multiply' to parse-token(p)
       when 'Op_divide'
           move 'Divide' to parse-token(p)
       when 'Op_mod'
           move 'Mod' to parse-token(p)
       when other
           exit perform
       end-evaluate
       call 'gettoken'
       call 'primary'
       move return-code to parse-right(p)
       call 'makenode' using parse-token(p) parse-left(p) parse-right(p)
       move return-code to parse-left(p)
   end-perform
   call 'pop'
   .

end program multiplication_expr.

identification division. program-id. primary common recursive. procedure division. start-primary.

   *>  Identifier
   *>| Integer
   *>| 'LeftParen' expr 'RightParen'
   *>| ('+' | '-' | '!') primary
   *>;
   call 'push' using module-id
   evaluate token-type
   when 'Identifier'
       call 'makeleaf' using 'Identifier' token-value
       call 'gettoken'
   when 'Integer'
       call 'makeleaf' using 'Integer' token-value
       call 'gettoken'
   when 'LeftParen'
       call 'gettoken'
       call 'expr'
       call 'expect' using 'RightParen'
       move t to return-code
   when 'Op_add'
       call 'gettoken'
       call 'primary'
   when 'Op_subtract'
       call 'gettoken'
       call 'primary'
       move return-code to parse-left(p)
       call 'makenode' using 'Negate' parse-left(p) p-zero
   when 'Op_not'
       call 'gettoken'
       call 'primary'
       move return-code to parse-left(p)
       call 'makenode' using 'Not' parse-left(p) p-zero
   when other
       move 0 to return-code
   end-evaluate  
   call 'pop'
   .

end program primary.

program-id. reporterror common. procedure division. start-reporterror. report-error.

   move token-line to error-line-no
   move token-column to error-col-no
   display error-record upon syserr
   stop run with error status -1
   .

end program reporterror.

identification division. program-id. gettoken common. procedure division. start-gettoken.

   if program-name = spaces
       move '00' to input-status
       accept input-record on exception move '10' to input-status end-accept
   else
       read input-file
   end-if
   evaluate input-status
   when '00'
       move input-token to token-type
       move input-value to token-value
       move numval(input-line) to token-line
       move numval(input-column) to token-column
       >>d display indent(1:min(4 * p,length(indent))) 'new token: ' token-type upon syserr
   when '10'
       string 'in parser ' trim(input-name) ' unexpected end of input'
           into error-message
       call 'reporterror'
   when other
       string 'in parser ' trim(input-name) ' unexpected input-status ' input-status
           into error-message
       call 'reporterror'
   end-evaluate
   .

end program gettoken.

identification division. program-id. expect common. data division. linkage section. 01 what any length. procedure division using what. start-expect.

   if token-type <> what
       string 'in parser expected ' what ' found ' token-type into error-message
       call 'reporterror'
   end-if
   >>d display indent(1:min(4 * p,length(indent))) 'match: ' token-type upon syserr
   call 'gettoken'
   .

end program expect.

identification division. program-id. push common. data division. linkage section. 01 what any length. procedure division using what. start-push.

   >>d display indent(1:min(4 * p,length(indent))) 'push ' what upon syserr
   if p >= p-lim
       move 'in parser stack overflow' to error-message
       call 'reporterror'
   end-if
   add 1 to p
   initialize parse-entry(p)
   move what to parse-name(p)
   .

end program push.

identification division. program-id. pop common. procedure division. start-pop.

   if p < 1
       move 'in parser stack underflow' to error-message
       call 'reporterror'
   end-if
   >>d display indent(1:4 * p - 4) 'pop ' parse-name(p) upon syserr
   subtract 1 from p
   .

end program pop.

identification division. program-id. makenode common. data division. linkage section. 01 parm-type any length. 01 parm-left pic 999. 01 parm-right pic 999. procedure division using parm-type parm-left parm-right. start-makenode.

   if t >= t-lim
       string 'in parser makenode tree index t exceeds ' t-lim into error-message
       call 'reporterror'
   end-if
   add 1 to t
   move parm-type to node-type(t)
   move parm-left to node-left(t)
   move parm-right to node-right(t)
   move t to return-code
   .

end program makenode.

identification division. program-id. makeleaf common. data division. linkage section. 01 parm-type any length. 01 parm-value pic x(48). procedure division using parm-type parm-value. start-makeleaf.

   if t >= t-lim
       string 'in parser makeleaf tree index t exceeds ' t-lim into error-message
       call 'reporterror'
   end-if
   add 1 to t
   move parm-type to leaf-type(t)
   move parm-value to leaf-value(t)
   move t to return-code
   .

end program makeleaf.

identification division. program-id. printast recursive. data division. linkage section. 01 n pic 999. procedure division using n. start-printast.

   if n = 0
       display ';'
       exit program
   end-if
   evaluate leaf-type(n)
   when 'Identifier'
   when 'Integer'
   when 'String'
       display leaf-type(n) trim(leaf-value(n))
   when other
       display node-type(n)
       call 'printast' using node-left(n)
       call 'printast' using node-right(n)
   end-evaluate
   .

end program printast. end program parser.</lang>

Output  —  Primes:
prompt$ ./lexer <testcases/Primes | ./parser
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
Assign
Identifier    n
Integer       1
Assign
Identifier    limit
Integer       100
While
Less
Identifier    n
Identifier    limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    k
Integer       3
Assign
Identifier    p
Integer       1
Assign
Identifier    n
Add
Identifier    n
Integer       2
While
And
LessEqual
Multiply
Identifier    k
Identifier    k
Identifier    n
Identifier    p
Sequence
Sequence
;
Assign
Identifier    p
NotEqual
Multiply
Divide
Identifier    n
Identifier    k
Identifier    k
Identifier    n
Assign
Identifier    k
Add
Identifier    k
Integer       2
If
Identifier    p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier    n
;
Prts
String        " is prime\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
;
Sequence
Sequence
Sequence
;
Prts
String        "Total primes found: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;

Forth

Tested with Gforth 0.7.3. <lang Forth>CREATE BUF 0 , \ single-character look-ahead buffer

PEEK BUF @ 0= IF KEY BUF ! THEN BUF @ ;
GETC PEEK 0 BUF ! ;
SPACE? DUP BL = SWAP 9 14 WITHIN OR ;
>SPACE BEGIN PEEK SPACE? WHILE GETC DROP REPEAT ;
DIGIT? 48 58 WITHIN ;
GETINT >SPACE 0
  BEGIN  PEEK DIGIT?
  WHILE  GETC [CHAR] 0 -  SWAP 10 * +
  REPEAT ;
GETNAM >SPACE PAD 1+
  BEGIN PEEK SPACE? INVERT
  WHILE GETC OVER C! CHAR+
  REPEAT  PAD TUCK - 1-  PAD C! ;
GETSTR >SPACE PAD 1+ GETC DROP \ skip leading "
  BEGIN GETC  DUP [CHAR] " <>
  WHILE OVER C! CHAR+
  REPEAT  DROP  PAD TUCK - 1-  PAD C! ;
INTERN HERE SWAP DUP C@ 1+ BOUNDS DO I C@ C, LOOP ALIGN ;

CREATE #TK 0 ,

TK: CREATE #TK @ , 1 #TK +! DOES> @ ;

TK: End_of_input TK: Keyword_if TK: Keyword_else TK: Keyword_while TK: Keyword_print TK: Keyword_putc TK: String TK: Integer TK: Identifier TK: LeftParen TK: RightParen TK: LeftBrace TK: RightBrace TK: Semicolon TK: Comma TK: Op_assign TK: Op_not

(BINARY?) [ #TK @ ] literal >= ;

TK: Op_subtract TK: Op_add TK: Op_mod TK: Op_multiply TK: Op_divide TK: Op_equal TK: Op_notequal TK: Op_less TK: Op_lessequal TK: Op_greater TK: Op_greaterequal TK: Op_and TK: Op_or CREATE TOKEN 0 , 0 , 0 , 0 ,

TOKEN-TYPE TOKEN 2 CELLS + @ ;
TOKEN-VALUE TOKEN 3 CELLS + @ ;
GETTOK GETINT GETINT TOKEN 2!
          GETNAM FIND DROP EXECUTE

DUP Integer = IF GETINT ELSE DUP String = IF GETSTR INTERN ELSE DUP Identifier = IF GETNAM INTERN ELSE 0 THEN THEN THEN TOKEN 3 CELLS + ! TOKEN 2 CELLS + ! ;

BINARY? TOKEN-TYPE (BINARY?) ;

CREATE PREC #TK @ CELLS ALLOT PREC #TK @ CELLS -1 FILL

PREC! CELLS PREC + ! ;

14 Op_not PREC! 13 Op_multiply PREC! 13 Op_divide PREC! 13 Op_mod PREC! 12 Op_add PREC! 12 Op_subtract PREC! 10 Op_less PREC! 10 Op_greater PREC! 10 Op_lessequal PREC! 10 Op_greaterequal PREC!

9 Op_equal        PREC!   9 Op_notequal     PREC!
5 Op_and          PREC!   4 Op_or           PREC!
PREC@ CELLS PREC + @ ;

\ Each AST Node is a sequence of cells in data space consisting \ of the execution token of a printing word, followed by that \ node's data. Each printing word receives the address of the \ node's data, and is responsible for printing that data \ appropriately.

DEFER .NODE

.NULL DROP ." ;" CR ;

CREATE $NULL ' .NULL ,

.IDENTIFIER ." Identifier " @ COUNT TYPE CR ;
$IDENTIFIER ( a-addr --) HERE SWAP ['] .IDENTIFIER , , ;
.INTEGER ." Integer " @ . CR ;
$INTEGER ( n --) HERE SWAP ['] .INTEGER , , ;
"TYPE" [CHAR] " EMIT TYPE [CHAR] " EMIT ;
.STRING ." String " @ COUNT "TYPE" CR ;
$STRING ( a-addr --) HERE SWAP ['] .STRING , , ;
.LEAF DUP @ COUNT TYPE CR CELL+ @ .NODE 0 .NULL ;
LEAF CREATE HERE CELL+ , BL WORD INTERN .
         DOES> HERE >R ['] .LEAF ,  @ , ,  R> ;

LEAF $PRTC Prtc LEAF $PRTS Prts LEAF $PRTI Prti LEAF $NOT Not LEAF $NEGATE Negate

.BINARY DUP @ COUNT TYPE CR
           CELL+ DUP @ .NODE  CELL+ @ .NODE ;
BINARY CREATE HERE CELL+ , BL WORD INTERN .
          DOES> HERE >R ['] .BINARY ,  @ ,  SWAP 2,  R> ;

BINARY $SEQUENCE Sequence BINARY $ASSIGN Assign BINARY $WHILE While BINARY $IF If BINARY $SUBTRACT Subtract BINARY $ADD Add BINARY $MOD Mod BINARY $MULTIPLY Multiply BINARY $DIVIDE Divide BINARY $LESS Less BINARY $LESSEQUAL LessEqual BINARY $GREATER Greater BINARY $GREATEREQUAL GreaterEqual BINARY $EQUAL Equal BINARY $NOTEQUAL NotEqual BINARY $AND And BINARY $OR Or

TOK-CONS ( x* -- node-xt) TOKEN-TYPE CASE
  Op_subtract     OF ['] $SUBTRACT     ENDOF
  Op_add          OF ['] $ADD          ENDOF
  op_mod          OF ['] $MOD          ENDOF
  op_multiply     OF ['] $MULTIPLY     ENDOF
  Op_divide       OF ['] $DIVIDE       ENDOF
  Op_equal        OF ['] $EQUAL        ENDOF
  Op_notequal     OF ['] $NOTEQUAL     ENDOF
  Op_less         OF ['] $LESS         ENDOF
  Op_lessequal    OF ['] $LESSEQUAL    ENDOF
  Op_greater      OF ['] $GREATER      ENDOF
  Op_greaterequal OF ['] $GREATEREQUAL ENDOF
  Op_and          OF ['] $AND          ENDOF
  Op_or           OF ['] $OR           ENDOF
  ENDCASE ;
(.NODE) DUP CELL+ SWAP @ EXECUTE ;

' (.NODE) IS .NODE

.- ( n --) 0 <# #S #> TYPE ;
EXPECT ( tk --) DUP TOKEN-TYPE <>
  IF CR ." stdin:" TOKEN 2@ SWAP .- ." :" .-
    ." : unexpected token, expecting " . CR BYE
  THEN  DROP GETTOK ;
'(' LeftParen EXPECT ;
')' RightParen EXPECT ;
'}' RightBrace EXPECT ;
';' Semicolon EXPECT ;
',' Comma EXPECT ;
'=' Op_assign EXPECT ;

DEFER *EXPR DEFER EXPR DEFER STMT

PAREN-EXPR '(' EXPR ')' ;
PRIMARY
  TOKEN-TYPE LeftParen   = IF PAREN-EXPR              EXIT THEN
  TOKEN-TYPE Op_add      = IF GETTOK 12 *EXPR         EXIT THEN
  TOKEN-TYPE Op_subtract = IF GETTOK 14 *EXPR $NEGATE EXIT THEN
  TOKEN-TYPE Op_not      = IF GETTOK 14 *EXPR $NOT    EXIT THEN
  TOKEN-TYPE Identifier  = IF TOKEN-VALUE $IDENTIFIER      ELSE
  TOKEN-TYPE Integer     = IF TOKEN-VALUE $INTEGER    THEN THEN
  GETTOK ;
(*EXPR) ( n -- node)
  PRIMARY ( n node)
  BEGIN OVER TOKEN-TYPE PREC@ SWAP OVER <=  BINARY?  AND
  WHILE ( n node prec) 1+ TOK-CONS SWAP GETTOK *EXPR SWAP EXECUTE
  REPEAT ( n node prec) DROP NIP ( node) ;
(EXPR) 0 *EXPR ;
-)? TOKEN-TYPE RightParen <> ;
-}? TOKEN-TYPE RightBrace <> ;
(STMT)
  TOKEN-TYPE Semicolon = IF GETTOK STMT EXIT THEN
  TOKEN-TYPE Keyword_while =
    IF GETTOK  PAREN-EXPR STMT $WHILE  EXIT THEN
  TOKEN-TYPE Keyword_if =
    IF GETTOK  PAREN-EXPR STMT
      TOKEN-TYPE Keyword_else = IF GETTOK STMT ELSE $NULL THEN
      $IF $IF EXIT
    THEN
  TOKEN-TYPE Keyword_putc =
    IF GETTOK  PAREN-EXPR ';' $PRTC  EXIT THEN
  TOKEN-TYPE Keyword_print =
    IF GETTOK  '(' $NULL
       BEGIN TOKEN-TYPE String =
          IF TOKEN-VALUE $STRING $PRTS  GETTOK
          ELSE EXPR $PRTI THEN  $SEQUENCE  -)?
       WHILE ',' REPEAT  ')' ';'  EXIT THEN
  TOKEN-TYPE Identifier =
    IF TOKEN-VALUE $IDENTIFIER GETTOK '=' EXPR ';' $ASSIGN
       EXIT THEN
  TOKEN-TYPE LeftBrace =
    IF $NULL GETTOK BEGIN -}? WHILE STMT $SEQUENCE REPEAT
       '}' EXIT THEN
  TOKEN-TYPE End_of_input = IF EXIT THEN  EXPR ;

' (*EXPR) IS *EXPR ' (EXPR) IS EXPR ' (STMT) IS STMT

-EOI? TOKEN-TYPE End_of_input <> ;
PARSE $NULL GETTOK BEGIN -EOI? WHILE STMT $SEQUENCE REPEAT ;

PARSE .NODE</lang>

Output  —  Count AST:

Sequence
Sequence
;
Assign
Identifier count
Integer 1
While
Less
Identifier count
Integer 10
Sequence
Sequence
;
Sequence
Sequence
Sequence
;
Prts
String "count is: "
;
Prti
Identifier count
;
Prts
String "\n"
;
Assign
Identifier count
Add
Identifier count
Integer 1


Fortran

Works with: gfortran version 11.2.1

The following is Fortran 2008/2018 code with C preprocessing directives. If you call the program source ‘parse.F90’, with a capital ‘F’, then gfortran will know to run the C preprocessor. <lang fortran>!!! !!! An implementation of the Rosetta Code parser task: !!! https://rosettacode.org/wiki/Compiler/syntax_analyzer !!! !!! The implementation is based on the published pseudocode. !!!

module compiler_type_kinds

 use, intrinsic :: iso_fortran_env, only: int32
 use, intrinsic :: iso_fortran_env, only: int64
 implicit none
 private
 ! Synonyms.
 integer, parameter, public :: size_kind = int64
 integer, parameter, public :: length_kind = size_kind
 integer, parameter, public :: nk = size_kind
 ! Synonyms for character capable of storing a Unicode code point.
 integer, parameter, public :: unicode_char_kind = selected_char_kind ('ISO_10646')
 integer, parameter, public :: ck = unicode_char_kind
 ! Synonyms for integers capable of storing a Unicode code point.
 integer, parameter, public :: unicode_ichar_kind = int32
 integer, parameter, public :: ick = unicode_ichar_kind

end module compiler_type_kinds

module string_buffers

 use, intrinsic :: iso_fortran_env, only: error_unit
 use, intrinsic :: iso_fortran_env, only: int64
 use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
 implicit none
 private
 public :: strbuf_t
 type :: strbuf_t
    integer(kind = nk), private :: len = 0
    !
    ! ‘chars’ is made public for efficient access to the individual
    ! characters.
    !
    character(1, kind = ck), allocatable, public :: chars(:)
  contains
    procedure, pass, private :: ensure_storage => strbuf_t_ensure_storage
    procedure, pass :: to_unicode_full_string => strbuf_t_to_unicode_full_string
    procedure, pass :: to_unicode_substring => strbuf_t_to_unicode_substring
    procedure, pass :: length => strbuf_t_length
    procedure, pass :: set => strbuf_t_set
    procedure, pass :: append => strbuf_t_append
    generic :: to_unicode => to_unicode_full_string
    generic :: to_unicode => to_unicode_substring
    generic :: assignment(=) => set
 end type strbuf_t

contains

 function strbuf_t_to_unicode_full_string (strbuf) result (s)
   class(strbuf_t), intent(in) :: strbuf
   character(:, kind = ck), allocatable :: s
   !
   ! This does not actually ensure that the string is valid Unicode;
   ! any 31-bit ‘character’ is supported.
   !
   integer(kind = nk) :: i
   allocate (character(len = strbuf%len, kind = ck) :: s)
   do i = 1, strbuf%len
      s(i:i) = strbuf%chars(i)
   end do
 end function strbuf_t_to_unicode_full_string
 function strbuf_t_to_unicode_substring (strbuf, i, j) result (s)
   !
   ! ‘Extreme’ values of i and j are allowed, as shortcuts for ‘from
   ! the beginning’, ‘up to the end’, or ‘empty substring’.
   !
   class(strbuf_t), intent(in) :: strbuf
   integer(kind = nk), intent(in) :: i, j
   character(:, kind = ck), allocatable :: s
   !
   ! This does not actually ensure that the string is valid Unicode;
   ! any 31-bit ‘character’ is supported.
   !
   integer(kind = nk) :: i1, j1
   integer(kind = nk) :: n
   integer(kind = nk) :: k
   i1 = max (1_nk, i)
   j1 = min (strbuf%len, j)
   n = max (0_nk, (j1 - i1) + 1_nk)
   allocate (character(n, kind = ck) :: s)
   do k = 1, n
      s(k:k) = strbuf%chars(i1 + (k - 1_nk))
   end do
 end function strbuf_t_to_unicode_substring
 elemental function strbuf_t_length (strbuf) result (n)
   class(strbuf_t), intent(in) :: strbuf
   integer(kind = nk) :: n
   n = strbuf%len
 end function strbuf_t_length
 elemental function next_power_of_two (x) result (y)
   integer(kind = nk), intent(in) :: x
   integer(kind = nk) :: y
   !
   ! It is assumed that no more than 64 bits are used.
   !
   ! The branch-free algorithm is that of
   ! https://archive.is/nKxAc#RoundUpPowerOf2
   !
   ! Fill in bits until one less than the desired power of two is
   ! reached, and then add one.
   !
   y = x - 1
   y = ior (y, ishft (y, -1))
   y = ior (y, ishft (y, -2))
   y = ior (y, ishft (y, -4))
   y = ior (y, ishft (y, -8))
   y = ior (y, ishft (y, -16))
   y = ior (y, ishft (y, -32))
   y = y + 1
 end function next_power_of_two
 elemental function new_storage_size (length_needed) result (size)
   integer(kind = nk), intent(in) :: length_needed
   integer(kind = nk) :: size
   ! Increase storage by orders of magnitude.
   if (2_nk**32 < length_needed) then
      size = huge (1_nk)
   else
      size = next_power_of_two (length_needed)
   end if
 end function new_storage_size
 subroutine strbuf_t_ensure_storage (strbuf, length_needed)
   class(strbuf_t), intent(inout) :: strbuf
   integer(kind = nk), intent(in) :: length_needed
   integer(kind = nk) :: new_size
   type(strbuf_t) :: new_strbuf
   if (.not. allocated (strbuf%chars)) then
      ! Initialize a new strbuf%chars array.
      new_size = new_storage_size (length_needed)
      allocate (strbuf%chars(1:new_size))
   else if (ubound (strbuf%chars, 1) < length_needed) then
      ! Allocate a new strbuf%chars array, larger than the current
      ! one, but containing the same characters.
      new_size = new_storage_size (length_needed)
      allocate (new_strbuf%chars(1:new_size))
      new_strbuf%chars(1:strbuf%len) = strbuf%chars(1:strbuf%len)
      call move_alloc (new_strbuf%chars, strbuf%chars)
   end if
 end subroutine strbuf_t_ensure_storage
 subroutine strbuf_t_set (dst, src)
   class(strbuf_t), intent(inout) :: dst
   class(*), intent(in) :: src
   integer(kind = nk) :: n
   integer(kind = nk) :: i
   select type (src)
   type is (character(*, kind = ck))
      n = len (src, kind = nk)
      call dst%ensure_storage(n)
      do i = 1, n
         dst%chars(i) = src(i:i)
      end do
      dst%len = n
   type is (character(*))
      n = len (src, kind = nk)
      call dst%ensure_storage(n)
      do i = 1, n
         dst%chars(i) = src(i:i)
      end do
      dst%len = n
   class is (strbuf_t)
      n = src%len
      call dst%ensure_storage(n)
      dst%chars(1:n) = src%chars(1:n)
      dst%len = n
   class default
      error stop
   end select
 end subroutine strbuf_t_set
 subroutine strbuf_t_append (dst, src)
   class(strbuf_t), intent(inout) :: dst
   class(*), intent(in) :: src
   integer(kind = nk) :: n_dst, n_src, n
   integer(kind = nk) :: i
   select type (src)
   type is (character(*, kind = ck))
      n_dst = dst%len
      n_src = len (src, kind = nk)
      n = n_dst + n_src
      call dst%ensure_storage(n)
      do i = 1, n_src
         dst%chars(n_dst + i) = src(i:i)
      end do
      dst%len = n
   type is (character(*))
      n_dst = dst%len
      n_src = len (src, kind = nk)
      n = n_dst + n_src
      call dst%ensure_storage(n)
      do i = 1, n_src
         dst%chars(n_dst + i) = src(i:i)
      end do
      dst%len = n
   class is (strbuf_t)
      n_dst = dst%len
      n_src = src%len
      n = n_dst + n_src
      call dst%ensure_storage(n)
      dst%chars((n_dst + 1):n) = src%chars(1:n_src)
      dst%len = n
   class default
      error stop
   end select
 end subroutine strbuf_t_append

end module string_buffers

module reading_one_line_from_a_stream

 use, intrinsic :: iso_fortran_env, only: input_unit
 use, intrinsic :: iso_fortran_env, only: error_unit
 use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
 use, non_intrinsic :: string_buffers
 implicit none
 private
 ! get_line_from_stream: read an entire input line from a stream into
 ! a strbuf_t.
 public :: get_line_from_stream
 character(1, kind = ck), parameter :: linefeed_char = char (10, kind = ck)
 ! The following is correct for Unix and its relatives.
 character(1, kind = ck), parameter :: newline_char = linefeed_char

contains

 subroutine get_line_from_stream (unit_no, eof, no_newline, strbuf)
   integer, intent(in) :: unit_no
   logical, intent(out) :: eof ! End of file?
   logical, intent(out) :: no_newline ! There is a line but it has no
                                      ! newline? (Thus eof also must
                                      ! be .true.)
   class(strbuf_t), intent(inout) :: strbuf
   character(1, kind = ck) :: ch
   strbuf = 
   call get_ch (unit_no, eof, ch)
   do while (.not. eof .and. ch /= newline_char)
      call strbuf%append (ch)
      call get_ch (unit_no, eof, ch)
   end do
   no_newline = eof .and. (strbuf%length() /= 0)
 end subroutine get_line_from_stream
 subroutine get_ch (unit_no, eof, ch)
   !
   ! Read a single code point from the stream.
   !
   ! Currently this procedure simply inputs ‘ASCII’ bytes rather than
   ! Unicode code points.
   !
   integer, intent(in) :: unit_no
   logical, intent(out) :: eof
   character(1, kind = ck), intent(out) :: ch
   integer :: stat
   character(1) :: c = '*'
   eof = .false.
   if (unit_no == input_unit) then
      call get_input_unit_char (c, stat)
   else
      read (unit = unit_no, iostat = stat) c
   end if
   if (stat < 0) then
      ch = ck_'*'
      eof = .true.
   else if (0 < stat) then
      write (error_unit, '("Input error with status code ", I0)') stat
      stop 1
   else
      ch = char (ichar (c, kind = ick), kind = ck)
   end if
 end subroutine get_ch

!!! !!! If you tell gfortran you want -std=f2008 or -std=f2018, you likely !!! will need to add also -fall-intrinsics or -U__GFORTRAN__ !!! !!! The first way, you get the FGETC intrinsic. The latter way, you !!! get the C interface code that uses getchar(3). !!!

  1. ifdef __GFORTRAN__
 subroutine get_input_unit_char (c, stat)
   !
   ! The following works if you are using gfortran.
   !
   ! (FGETC is considered a feature for backwards compatibility with
   ! g77. However, I know of no way to reconfigure input_unit as a
   ! Fortran 2003 stream, for use with ordinary ‘read’.)
   !
   character, intent(inout) :: c
   integer, intent(out) :: stat
   call fgetc (input_unit, c, stat)
 end subroutine get_input_unit_char
  1. else
 subroutine get_input_unit_char (c, stat)
   !
   ! An alternative implementation of get_input_unit_char. This
   ! actually reads input from the C standard input, which might not
   ! be the same as input_unit.
   !
   use, intrinsic :: iso_c_binding, only: c_int
   character, intent(inout) :: c
   integer, intent(out) :: stat
   interface
      !
      ! Use getchar(3) to read characters from standard input. This
      ! assumes there is actually such a function available, and that
      ! getchar(3) does not exist solely as a macro. (One could write
      ! one’s own getchar() if necessary, of course.)
      !
      function getchar () result (c) bind (c, name = 'getchar')
        use, intrinsic :: iso_c_binding, only: c_int
        integer(kind = c_int) :: c
      end function getchar
   end interface
   integer(kind = c_int) :: i_char
   i_char = getchar ()
   !
   ! The C standard requires that EOF have a negative value. If the
   ! value returned by getchar(3) is not EOF, then it will be
   ! representable as an unsigned char. Therefore, to check for end
   ! of file, one need only test whether i_char is negative.
   !
   if (i_char < 0) then
      stat = -1
   else
      stat = 0
      c = char (i_char)
   end if
 end subroutine get_input_unit_char
  1. endif

end module reading_one_line_from_a_stream

module lexer_token_facts

 implicit none
 private
 integer, parameter, public :: tk_EOI = 0
 integer, parameter, public :: tk_Mul = 1
 integer, parameter, public :: tk_Div = 2
 integer, parameter, public :: tk_Mod = 3
 integer, parameter, public :: tk_Add = 4
 integer, parameter, public :: tk_Sub = 5
 integer, parameter, public :: tk_Negate = 6
 integer, parameter, public :: tk_Not = 7
 integer, parameter, public :: tk_Lss = 8
 integer, parameter, public :: tk_Leq = 9
 integer, parameter, public :: tk_Gtr = 10
 integer, parameter, public :: tk_Geq = 11
 integer, parameter, public :: tk_Eq = 12
 integer, parameter, public :: tk_Neq = 13
 integer, parameter, public :: tk_Assign = 14
 integer, parameter, public :: tk_And = 15
 integer, parameter, public :: tk_Or = 16
 integer, parameter, public :: tk_If = 17
 integer, parameter, public :: tk_Else = 18
 integer, parameter, public :: tk_While = 19
 integer, parameter, public :: tk_Print = 20
 integer, parameter, public :: tk_Putc = 21
 integer, parameter, public :: tk_Lparen = 22
 integer, parameter, public :: tk_Rparen = 23
 integer, parameter, public :: tk_Lbrace = 24
 integer, parameter, public :: tk_Rbrace = 25
 integer, parameter, public :: tk_Semi = 26
 integer, parameter, public :: tk_Comma = 27
 integer, parameter, public :: tk_Ident = 28
 integer, parameter, public :: tk_Integer = 29
 integer, parameter, public :: tk_String = 30
 integer, parameter, public :: tk_Positive = 31
 character(16), parameter, public :: lexer_token_string(0:31) = &
      (/ "EOI             ",   &
      &  "*               ",   &
      &  "/               ",   &
      &  "%               ",   &
      &  "+               ",   &
      &  "-               ",   &
      &  "-               ",   &
      &  "!               ",   &
      &  "<               ",   &
      &  "<=              ",   &
      &  ">               ",   &
      &  ">=              ",   &
      &  "==              ",   &
      &  "!=              ",   &
      &  "=               ",   &
      &  "&&              ",   &
      &  "||              ",   &
      &  "if              ",   &
      &  "else            ",   &
      &  "while           ",   &
      &  "print           ",   &
      &  "putc            ",   &
      &  "(               ",   &
      &  ")               ",   &
      &  "{               ",   &
      &  "}               ",   &
      &  ";               ",   &
      &  ",               ",   &
      &  "Ident           ",   &
      &  "Integer literal ",   &
      &  "String literal  ",   &
      &  "+               " /)
 integer, parameter, public :: lexer_token_arity(0:31) = &
      & (/ -1,                   & ! EOI
      &    2, 2, 2, 2, 2,        & ! * / % + -
      &    1, 1,                 & ! negate !
      &    2, 2, 2, 2, 2, 2,     & ! < <= > >= == !=
      &    -1,                   & ! =
      &    2, 2,                 & ! && ||
      &    -1, -1, -1, -1, -1,   & !
      &    -1, -1, -1, -1, -1,   & !
      &    -1, -1, -1, -1,       & !
      &    1 /)                    ! positive
 integer, parameter, public :: lexer_token_precedence(0:31) = &
      & (/ -1,                   & ! EOI
      &    13, 13, 13,           & ! * / %
      &    12, 12,               & ! + -
      &    14, 14,               & ! negate !
      &    10, 10, 10, 10,       & ! < <= > >=
      &    9, 9,                 & ! == !=
      &    -1,                   & ! =
      &    5,                    & ! &&
      &    4,                    & ! ||
      &    -1, -1, -1, -1, -1,   & !
      &    -1, -1, -1, -1, -1,   & !
      &    -1, -1, -1, -1,       & !
      &    14 /)                   ! positive
 integer, parameter, public :: left_associative = 0
 integer, parameter, public :: right_associative = 1
 ! All current operators are left associative. (The values in the
 ! array for things that are not operators are unimportant.)
 integer, parameter, public :: lexer_token_associativity(0:31) = left_associative

end module lexer_token_facts

module reading_of_lexer_tokens

 use, intrinsic :: iso_fortran_env, only: error_unit
 use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
 use, non_intrinsic :: string_buffers
 use, non_intrinsic :: reading_one_line_from_a_stream
 use, non_intrinsic :: lexer_token_facts
 implicit none
 private  
 public :: lexer_token_t
 public :: get_lexer_token
 character(1, kind = ck), parameter :: horizontal_tab_char = char (9, kind = ck)
 character(1, kind = ck), parameter :: linefeed_char = char (10, kind = ck)
 character(1, kind = ck), parameter :: vertical_tab_char = char (11, kind = ck)
 character(1, kind = ck), parameter :: formfeed_char = char (12, kind = ck)
 character(1, kind = ck), parameter :: carriage_return_char = char (13, kind = ck)
 character(1, kind = ck), parameter :: space_char = ck_' '
 type :: lexer_token_t
    integer :: token_no = -(huge (1))
    character(:, kind = ck), allocatable :: val
    integer(nk) :: line_no = -(huge (1_nk))
    integer(nk) :: column_no = -(huge (1_nk))
 end type lexer_token_t

contains

 subroutine get_lexer_token (unit_no, lex_line_no, eof, token)
   !
   ! Lines that are empty or contain only whitespace are tolerated.
   !
   ! Also tolerated are comment lines, whose first character is a
   ! '!'. It is convenient for debugging to be able to comment out
   ! lines.
   !
   ! A last line be without a newline is *not* tolerated, unless it
   ! contains only whitespace.
   !
   ! Letting there be some whitespace is partly for the sake of
   ! reading cut-and-paste from a browser display.
   !
   integer, intent(in) :: unit_no
   integer(kind = nk), intent(inout) :: lex_line_no
   logical, intent(out) :: eof
   type(lexer_token_t), intent(out) :: token
   type(strbuf_t) :: strbuf
   logical :: no_newline
   logical :: input_found
   ! Let a negative setting initialize the line number.
   lex_line_no = max (0_nk, lex_line_no)
   strbuf = 
   eof = .false.
   input_found = .false.
   do while (.not. eof .and. .not. input_found)
      call get_line_from_stream (unit_no, eof, no_newline, strbuf)
      if (eof) then
         if (no_newline) then
            lex_line_no = lex_line_no + 1
            if (.not. strbuf_is_all_whitespace (strbuf)) then
               call start_error_message (lex_line_no)
               write (error_unit, '("lexer line ends without a newline")')
               stop 1
            end if
         end if
      else
         lex_line_no = lex_line_no + 1
         input_found = .true.
         if (strbuf_is_all_whitespace (strbuf)) then
            ! A blank line.
            input_found = .false.
         else if (0 < strbuf%length()) then
            if (strbuf%chars(1) == ck_'!') then
               ! A comment line.
               input_found = .false.
            end if
         end if
      end if
   end do
   token = lexer_token_t ()
   if (.not. eof) then
      token = strbuf_to_token (lex_line_no, strbuf)
   end if
 end subroutine get_lexer_token
 function strbuf_to_token (lex_line_no, strbuf) result (token)
   integer(kind = nk), intent(in) :: lex_line_no
   class(strbuf_t), intent(in) :: strbuf
   type(lexer_token_t) :: token
   character(:, kind = ck), allocatable :: line_no
   character(:, kind = ck), allocatable :: column_no
   character(:, kind = ck), allocatable :: token_name
   character(:, kind = ck), allocatable :: val_string
   integer :: stat
   integer(kind = nk) :: n
   call split_line (lex_line_no, strbuf, line_no, column_no, token_name, val_string)
   read (line_no, *, iostat = stat) token%line_no
   if (stat /= 0) then
      call start_error_message (lex_line_no)
      write (error_unit, '("line number field is unreadable or too large")')
      stop 1
   end if
   read (column_no, *, iostat = stat) token%column_no
   if (stat /= 0) then
      call start_error_message (lex_line_no)
      write (error_unit, '("column number field is unreadable or too large")')
      stop 1
   end if
   token%token_no = token_name_to_token_no (lex_line_no, token_name)
   select case (token%token_no)
   case (tk_Ident)
      ! I do no checking of identifier names.
      allocate (token%val, source = val_string)
   case (tk_Integer)
      call check_is_all_digits (lex_line_no, val_string)
      allocate (token%val, source = val_string)
   case (tk_String)
      n = len (val_string, kind = nk)
      if (n < 2) then
         call string_literal_missing_or_no_good
      else if (val_string(1:1) /= ck_'"' .or. val_string(n:n) /= ck_'"') then
         call string_literal_missing_or_no_good
      else
         allocate (token%val, source = val_string)
      end if
   case default
      if (len (val_string, kind = nk) /= 0) then
         call start_error_message (lex_line_no)
         write (error_unit, '("token should not have a value")')
         stop 1
      end if
   end select
 contains
   subroutine string_literal_missing_or_no_good
     call start_error_message (lex_line_no)
     write (error_unit, '("""String"" token requires a string literal")')
     stop 1
   end subroutine string_literal_missing_or_no_good
 end function strbuf_to_token
 subroutine split_line (lex_line_no, strbuf, line_no, column_no, token_name, val_string)
   integer(kind = nk), intent(in) :: lex_line_no
   class(strbuf_t), intent(in) :: strbuf
   character(:, kind = ck), allocatable, intent(out) :: line_no
   character(:, kind = ck), allocatable, intent(out) :: column_no
   character(:, kind = ck), allocatable, intent(out) :: token_name
   character(:, kind = ck), allocatable, intent(out) :: val_string
   integer(kind = nk) :: i, j
   i = skip_whitespace (strbuf, 1_nk)
   j = skip_non_whitespace (strbuf, i)
   line_no = strbuf%to_unicode(i, j - 1)
   call check_is_all_digits (lex_line_no, line_no)
   i = skip_whitespace (strbuf, j)
   j = skip_non_whitespace (strbuf, i)
   column_no = strbuf%to_unicode(i, j - 1)
   call check_is_all_digits (lex_line_no, column_no)
   i = skip_whitespace (strbuf, j)
   j = skip_non_whitespace (strbuf, i)
   token_name = strbuf%to_unicode(i, j - 1)
   i = skip_whitespace (strbuf, j)
   if (strbuf%length() < i) then
      val_string = ck_
   else if (strbuf%chars(i) == ck_'"') then
      j = skip_whitespace_backwards (strbuf, strbuf%length())
      if (strbuf%chars(j) == ck_'"') then
         val_string = strbuf%to_unicode(i, j)
      else
         call start_error_message (lex_line_no)
         write (error_unit, '("string literal does not end in a double quote")')
         stop 1
      end if
   else
      j = skip_non_whitespace (strbuf, i)
      val_string = strbuf%to_unicode(i, j - 1)
      i = skip_whitespace (strbuf, j)
      if (i <= strbuf%length()) then
         call start_error_message (lex_line_no)
         write (error_unit, '("token line contains unexpected text")')
         stop 1
      end if
   end if
 end subroutine split_line
 function token_name_to_token_no (lex_line_no, token_name) result (token_no)
   integer(kind = nk), intent(in) :: lex_line_no
   character(*, kind = ck), intent(in) :: token_name
   integer :: token_no
   !!
   !! This implementation is not optimized in any way, unless the
   !! Fortran compiler can optimize the SELECT CASE.
   !!
   select case (token_name)
   case (ck_"End_of_input")
      token_no = tk_EOI
   case (ck_"Op_multiply")
      token_no = tk_Mul
   case (ck_"Op_divide")
      token_no = tk_Div
   case (ck_"Op_mod")
      token_no = tk_Mod
   case (ck_"Op_add")
      token_no = tk_Add
   case (ck_"Op_subtract")
      token_no = tk_Sub
   case (ck_"Op_negate")
      token_no = tk_Negate
   case (ck_"Op_not")
      token_no = tk_Not
   case (ck_"Op_less")
      token_no = tk_Lss
   case (ck_"Op_lessequal    ")
      token_no = tk_Leq
   case (ck_"Op_greater")
      token_no = tk_Gtr
   case (ck_"Op_greaterequal")
      token_no = tk_Geq
   case (ck_"Op_equal")
      token_no = tk_Eq
   case (ck_"Op_notequal")
      token_no = tk_Neq
   case (ck_"Op_assign")
      token_no = tk_Assign
   case (ck_"Op_and")
      token_no = tk_And
   case (ck_"Op_or")
      token_no = tk_Or
   case (ck_"Keyword_if")
      token_no = tk_If
   case (ck_"Keyword_else")
      token_no = tk_Else
   case (ck_"Keyword_while")
      token_no = tk_While
   case (ck_"Keyword_print")
      token_no = tk_Print
   case (ck_"Keyword_putc")
      token_no = tk_Putc
   case (ck_"LeftParen")
      token_no = tk_Lparen
   case (ck_"RightParen")
      token_no = tk_Rparen
   case (ck_"LeftBrace")
      token_no = tk_Lbrace
   case (ck_"RightBrace")
      token_no = tk_Rbrace
   case (ck_"Semicolon")
      token_no = tk_Semi
   case (ck_"Comma")
      token_no = tk_Comma
   case (ck_"Identifier")
      token_no = tk_Ident
   case (ck_"Integer")
      token_no = tk_Integer
   case (ck_"String")
      token_no = tk_String
   case default
      call start_error_message (lex_line_no)
      write (error_unit, '("unrecognized token name: ", A)') token_name
      stop 1
   end select
 end function token_name_to_token_no
 function skip_whitespace (strbuf, i) result (j)
   class(strbuf_t), intent(in) :: strbuf
   integer(kind = nk), intent(in) :: i
   integer(kind = nk) :: j
   logical :: done
   j = i
   done = .false.
   do while (.not. done)
      if (at_end_of_line (strbuf, j)) then
         done = .true.
      else if (.not. isspace (strbuf%chars(j))) then
         done = .true.
      else
         j = j + 1
      end if
   end do
 end function skip_whitespace
 function skip_non_whitespace (strbuf, i) result (j)
   class(strbuf_t), intent(in) :: strbuf
   integer(kind = nk), intent(in) :: i
   integer(kind = nk) :: j
   logical :: done
   j = i
   done = .false.
   do while (.not. done)
      if (at_end_of_line (strbuf, j)) then
         done = .true.
      else if (isspace (strbuf%chars(j))) then
         done = .true.
      else
         j = j + 1
      end if
   end do
 end function skip_non_whitespace
 function skip_whitespace_backwards (strbuf, i) result (j)
   class(strbuf_t), intent(in) :: strbuf
   integer(kind = nk), intent(in) :: i
   integer(kind = nk) :: j
   logical :: done
   j = i
   done = .false.
   do while (.not. done)
      if (j == -1) then
         done = .true.
      else if (.not. isspace (strbuf%chars(j))) then
         done = .true.
      else
         j = j - 1
      end if
   end do
 end function skip_whitespace_backwards
 function at_end_of_line (strbuf, i) result (bool)
   class(strbuf_t), intent(in) :: strbuf
   integer(kind = nk), intent(in) :: i
   logical :: bool
   bool = (strbuf%length() < i)
 end function at_end_of_line
 elemental function strbuf_is_all_whitespace (strbuf) result (bool)
   class(strbuf_t), intent(in) :: strbuf
   logical :: bool
   integer(kind = nk) :: n
   integer(kind = nk) :: i
   n = strbuf%length()
   if (n == 0) then
      bool = .true.
   else
      i = 1
      bool = .true.
      do while (bool .and. i /= n + 1)
         bool = isspace (strbuf%chars(i))
         i = i + 1
      end do
   end if
 end function strbuf_is_all_whitespace
 elemental function isspace (ch) result (bool)
   character(1, kind = ck), intent(in) :: ch
   logical :: bool
   bool = (ch == horizontal_tab_char) .or.  &
        & (ch == linefeed_char) .or.        &
        & (ch == vertical_tab_char) .or.    &
        & (ch == formfeed_char) .or.        &
        & (ch == carriage_return_char) .or. &
        & (ch == space_char)
 end function isspace
 elemental function isdigit (ch) result (bool)
   character(1, kind = ck), intent(in) :: ch
   logical :: bool
   integer(kind = ick), parameter :: zero = ichar (ck_'0', kind = ick)
   integer(kind = ick), parameter :: nine = ichar (ck_'9', kind = ick)
   integer(kind = ick) :: i_ch
   i_ch = ichar (ch, kind = ick)
   bool = (zero <= i_ch .and. i_ch <= nine)
 end function isdigit
 subroutine check_is_all_digits (lex_line_no, str)
   integer(kind = nk), intent(in) :: lex_line_no
   character(*, kind = ck), intent(in) :: str
   integer(kind = nk) :: n
   integer(kind = nk) :: i
   n = len (str, kind = nk)
   if (n == 0_nk) then
      call start_error_message (lex_line_no)
      write (error_unit, '("a required field is missing")')
      stop 1
   else
      do i = 1, n
         if (.not. isdigit (str(i:i))) then
            call start_error_message (lex_line_no)
            write (error_unit, '("a numeric field contains a non-digit")')
            stop 1
         end if
      end do
   end if
 end subroutine check_is_all_digits
 subroutine start_error_message (lex_line_no)
   integer(kind = nk), intent(in) :: lex_line_no
   write (error_unit, '("Token stream error at line ", I0, ": ")', advance = 'no') &
        &    lex_line_no
 end subroutine start_error_message

end module reading_of_lexer_tokens

module syntactic_analysis

 use, intrinsic :: iso_fortran_env, only: input_unit
 use, intrinsic :: iso_fortran_env, only: output_unit
 use, intrinsic :: iso_fortran_env, only: error_unit
 use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
 use, non_intrinsic :: string_buffers
 use, non_intrinsic :: lexer_token_facts
 use, non_intrinsic :: reading_of_lexer_tokens
 implicit none
 private
 public :: ast_node_t
 public :: ast_t
 public :: parse_token_stream
 public :: output_ast_flattened
 integer, parameter, public :: tk_start_of_statement = -1
 integer, parameter, public :: tk_primary = -2
 integer, parameter :: node_Identifier = 1
 integer, parameter :: node_String = 2
 integer, parameter :: node_Integer = 3
 integer, parameter :: node_Sequence = 4
 integer, parameter :: node_If = 5
 integer, parameter :: node_Prtc = 6
 integer, parameter :: node_Prts = 7
 integer, parameter :: node_Prti = 8
 integer, parameter :: node_While = 9
 integer, parameter :: node_Assign = 10
 integer, parameter :: node_Negate = 11
 integer, parameter :: node_Not = 12
 integer, parameter :: node_Multiply = 13
 integer, parameter :: node_Divide = 14
 integer, parameter :: node_Mod = 15
 integer, parameter :: node_Add = 16
 integer, parameter :: node_Subtract = 17
 integer, parameter :: node_Less = 18
 integer, parameter :: node_LessEqual = 19
 integer, parameter :: node_Greater = 20
 integer, parameter :: node_GreaterEqual = 21
 integer, parameter :: node_Equal = 22
 integer, parameter :: node_NotEqual = 23
 integer, parameter :: node_And = 24
 integer, parameter :: node_Or = 25
 character(16), parameter :: node_variety_string(1:25) = &
      (/ "Identifier      ",    &
      &  "String          ",    &
      &  "Integer         ",    &
      &  "Sequence        ",    &
      &  "If              ",    &
      &  "Prtc            ",    &
      &  "Prts            ",    &
      &  "Prti            ",    &
      &  "While           ",    &
      &  "Assign          ",    &
      &  "Negate          ",    &
      &  "Not             ",    &
      &  "Multiply        ",    &
      &  "Divide          ",    &
      &  "Mod             ",    &
      &  "Add             ",    &
      &  "Subtract        ",    &
      &  "Less            ",    &
      &  "LessEqual       ",    &
      &  "Greater         ",    &
      &  "GreaterEqual    ",    &
      &  "Equal           ",    &
      &  "NotEqual        ",    &
      &  "And             ",    &
      &  "Or              " /)
 type :: ast_node_t
    integer :: node_variety
    character(:, kind = ck), allocatable :: val
    type(ast_node_t), pointer :: left => null ()
    type(ast_node_t), pointer :: right => null ()
  contains
    procedure, pass :: assign => ast_node_t_assign
    generic :: assignment(=) => assign
    final :: ast_node_t_finalize
 end type ast_node_t
 ! ast_t phases.
 integer, parameter :: building = 1
 integer, parameter :: completed = 2
 type :: ast_t
    !
    ! This type is used to build the subtrees, as well as for the
    ! completed AST. The difference is in the setting of ‘phase’.
    !
    type(ast_node_t), pointer :: node => null ()
    integer, private :: phase = building
  contains
    procedure, pass :: assign => ast_t_assign
    generic :: assignment(=) => assign
    final :: ast_t_finalize
 end type ast_t
 type(ast_t), parameter :: ast_nil = ast_t (null ())

contains

 recursive subroutine ast_node_t_assign (node, other)
   class(ast_node_t), intent(out) :: node
   class(*), intent(in) :: other
   select type (other)
   class is (ast_node_t)
      node%node_variety = other%node_variety
      if (allocated (other%val)) allocate (node%val, source = other%val)
      if (associated (other%left)) allocate (node%left, source = other%left)
      if (associated (other%right)) allocate (node%right, source = other%right)
   class default
      ! This branch should never be reached.
      error stop
   end select
 end subroutine ast_node_t_assign
 recursive subroutine ast_node_t_finalize (node)
   type(ast_node_t), intent(inout) :: node
   if (associated (node%left)) deallocate (node%left)
   if (associated (node%right)) deallocate (node%right)
 end subroutine ast_node_t_finalize
 recursive subroutine ast_t_assign (ast, other)
   class(ast_t), intent(out) :: ast
   class(*), intent(in) :: other
   select type (other)
   class is (ast_t)
      if (associated (other%node)) allocate (ast%node, source = other%node)
      !
      ! Whether it is better to set phase to ‘building’ or to set it
      ! to ‘other%phase’ is unclear to me. Probably ‘building’ is the
      ! better choice. Which variable controls memory recovery is
      ! clear and unchanging, in that case: it is the original,
      ! ‘other’, that does.
      !
      ast%phase = building
   class default
      ! This should not happen.
      error stop
   end select
 end subroutine ast_t_assign
 subroutine ast_t_finalize (ast)
   type(ast_t), intent(inout) :: ast
   !
   ! When we are building the tree, the tree’s nodes should not be
   ! deallocated when the ast_t variable temporarily holding them
   ! goes out of scope.
   !
   ! However, once the AST is completed, we do want the memory
   ! recovered when the variable goes out of scope.
   !
   ! (Elsewhere I have written a primitive garbage collector for
   ! Fortran programs, but in this case it would be a lot of overhead
   ! for little gain. In fact, we could reasonably just let the
   ! memory leak, in this program.
   !
   ! Fortran runtimes *are* allowed by the standard to have garbage
   ! collectors built in. To my knowledge, at the time of this
   ! writing, only NAG Fortran has a garbage collector option.)
   !
   if (ast%phase == completed) then
      if (associated (ast%node)) deallocate (ast%node)
   end if
 end subroutine ast_t_finalize
 function parse_token_stream (unit_no) result (ast)
   integer, intent(in) :: unit_no
   type(ast_t) :: ast
   integer(kind = nk) :: lex_line_no
   type(ast_t) :: statement
   type(lexer_token_t) :: token
   lex_line_no = -1_nk
   call get_token (unit_no, lex_line_no, token)
   call parse_statement (unit_no, lex_line_no, token, statement)
   ast = make_internal_node (node_Sequence, ast, statement)
   do while (token%token_no /= tk_EOI)
      call parse_statement (unit_no, lex_line_no, token, statement)
      ast = make_internal_node (node_Sequence, ast, statement)
   end do
   ast%phase = completed
 end function parse_token_stream
 recursive subroutine parse_statement (unit_no, lex_line_no, token, ast)
   integer, intent(in) :: unit_no
   integer(kind = nk), intent(inout) :: lex_line_no
   type(lexer_token_t), intent(inout) :: token
   type(ast_t), intent(out) :: ast
   ast = ast_nil
   select case (token%token_no)
   case (tk_If)
      call parse_ifelse_construct
   case (tk_Putc)
      call parse_putc
   case (tk_Print)
      call parse_print
   case (tk_Semi)
      call get_token (unit_no, lex_line_no, token)
   case (tk_Ident)
      call parse_identifier
   case (tk_While)
      call parse_while_construct
   case (tk_Lbrace)
      call parse_lbrace_construct
   case (tk_EOI)
      continue
   case default
      call syntax_error_message ("", tk_start_of_statement, token)
      stop 1
   end select
 contains
   recursive subroutine parse_ifelse_construct
     type(ast_t) :: predicate
     type(ast_t) :: statement_for_predicate_true
     type(ast_t) :: statement_for_predicate_false
     call expect_token ("If", tk_If, token)
     call get_token (unit_no, lex_line_no, token)
     call parse_parenthesized_expression (unit_no, lex_line_no, token, predicate)
     call parse_statement (unit_no, lex_line_no, token, statement_for_predicate_true)
     if (token%token_no == tk_Else) then
        call get_token (unit_no, lex_line_no, token)
        call parse_statement (unit_no, lex_line_no, token, statement_for_predicate_false)
        ast = make_internal_node (node_If, statement_for_predicate_true, &
             &                    statement_for_predicate_false)
     else
        ast = make_internal_node (node_If, statement_for_predicate_true, ast_nil)
     end if
     ast = make_internal_node (node_If, predicate, ast)
   end subroutine parse_ifelse_construct
   recursive subroutine parse_putc
     type(ast_t) :: arguments
     call expect_token ("Putc", tk_Putc, token)
     call get_token (unit_no, lex_line_no, token)
     call parse_parenthesized_expression (unit_no, lex_line_no, token, arguments)
     ast = make_internal_node (node_Prtc, arguments, ast_nil)
     call expect_token ("Putc", tk_Semi, token)
     call get_token (unit_no, lex_line_no, token)
   end subroutine parse_putc
   recursive subroutine parse_print
     logical :: done
     type(ast_t) :: arg
     type(ast_t) :: printer
     call expect_token ("Print", tk_Print, token)
     call get_token (unit_no, lex_line_no, token)
     call expect_token ("Print", tk_Lparen, token)
     done = .false.
     do while (.not. done)
        call get_token (unit_no, lex_line_no, token)
        select case (token%token_no)
        case (tk_String)
           arg = make_leaf_node (node_String, token%val)
           printer = make_internal_node (node_Prts, arg, ast_nil)
           call get_token (unit_no, lex_line_no, token)
        case default
           call parse_expression (unit_no, 0, lex_line_no, token, arg)
           printer = make_internal_node (node_Prti, arg, ast_nil)
        end select
        ast = make_internal_node (node_Sequence, ast, printer)
        done = (token%token_no /= tk_Comma)
     end do
     call expect_token ("Print", tk_Rparen, token)
     call get_token (unit_no, lex_line_no, token)
     call expect_token ("Print", tk_Semi, token)
     call get_token (unit_no, lex_line_no, token)
   end subroutine parse_print
   recursive subroutine parse_identifier
     type(ast_t) :: left_side
     type(ast_t) :: right_side
     left_side = make_leaf_node (node_Identifier, token%val)
     call get_token (unit_no, lex_line_no, token)
     call expect_token ("assign", tk_Assign, token)
     call get_token (unit_no, lex_line_no, token)
     call parse_expression (unit_no, 0, lex_line_no, token, right_side)
     ast = make_internal_node (node_Assign, left_side, right_side)
     call expect_token ("assign", tk_Semi, token)
     call get_token (unit_no, lex_line_no, token)
   end subroutine parse_identifier
   recursive subroutine parse_while_construct
     type(ast_t) :: predicate
     type(ast_t) :: statement_to_be_repeated
     call expect_token ("While", tk_While, token)
     call get_token (unit_no, lex_line_no, token)
     call parse_parenthesized_expression (unit_no, lex_line_no, token, predicate)
     call parse_statement (unit_no, lex_line_no, token, statement_to_be_repeated)
     ast = make_internal_node (node_While, predicate, statement_to_be_repeated)
   end subroutine parse_while_construct
   recursive subroutine parse_lbrace_construct
     type(ast_t) :: statement
     call expect_token ("Lbrace", tk_Lbrace, token)
     call get_token (unit_no, lex_line_no, token)
     do while (token%token_no /= tk_Rbrace .and. token%token_no /= tk_EOI)
        call parse_statement (unit_no, lex_line_no, token, statement)
        ast = make_internal_node (node_Sequence, ast, statement)
     end do
     call expect_token ("Lbrace", tk_Rbrace, token)
     call get_token (unit_no, lex_line_no, token)
   end subroutine parse_lbrace_construct
 end subroutine parse_statement
 recursive subroutine parse_expression (unit_no, p, lex_line_no, token, ast)
   integer, intent(in) :: unit_no
   integer, intent(in) :: p
   integer(kind = nk), intent(inout) :: lex_line_no
   type(lexer_token_t), intent(inout) :: token
   type(ast_t), intent(out) :: ast
   integer :: precedence
   type(ast_t) :: expression
   select case (token%token_no)
   case (tk_Lparen)
      call parse_parenthesized_expression (unit_no, lex_line_no, token, ast)
   case (tk_Sub)
      token%token_no = tk_Negate
      precedence = lexer_token_precedence(token%token_no)
      call get_token (unit_no, lex_line_no, token)
      call parse_expression (unit_no, precedence, lex_line_no, token, expression)
      ast = make_internal_node (node_Negate, expression, ast_nil)
   case (tk_Add)
      token%token_no = tk_Positive
      precedence = lexer_token_precedence(token%token_no)
      call get_token (unit_no, lex_line_no, token)
      call parse_expression (unit_no, precedence, lex_line_no, token, expression)
      ast = expression
   case (tk_Not)
      precedence = lexer_token_precedence(token%token_no)
      call get_token (unit_no, lex_line_no, token)
      call parse_expression (unit_no, precedence, lex_line_no, token, expression)
      ast = make_internal_node (node_Not, expression, ast_nil)
   case (tk_Ident)
      ast = make_leaf_node (node_Identifier, token%val)
      call get_token (unit_no, lex_line_no, token)
   case (tk_Integer)
      ast = make_leaf_node (node_Integer, token%val)
      call get_token (unit_no, lex_line_no, token)
   case default
      call syntax_error_message ("", tk_primary, token)
      stop 1
   end select
   do while (lexer_token_arity(token%token_no) == 2 .and. &
        &    p <= lexer_token_precedence(token%token_no))
      block
        type(ast_t) :: right_expression
        integer :: q
        integer :: node_variety
        if (lexer_token_associativity(token%token_no) == right_associative) then
           q = lexer_token_precedence(token%token_no)
        else
           q = lexer_token_precedence(token%token_no) + 1
        end if
        node_variety = binary_operator_node_variety (token%token_no)
        call get_token (unit_no, lex_line_no, token)
        call parse_expression (unit_no, q, lex_line_no, token, right_expression)
        ast = make_internal_node (node_variety, ast, right_expression)
      end block
   end do
 end subroutine parse_expression
 recursive subroutine parse_parenthesized_expression (unit_no, lex_line_no, token, ast)
   integer, intent(in) :: unit_no
   integer(kind = nk), intent(inout) :: lex_line_no
   type(lexer_token_t), intent(inout) :: token
   type(ast_t), intent(out) :: ast
   call expect_token ("paren_expr", tk_Lparen, token)
   call get_token (unit_no, lex_line_no, token)
   call parse_expression (unit_no, 0, lex_line_no, token, ast)
   call expect_token ("paren_expr", tk_Rparen, token)
   call get_token (unit_no, lex_line_no, token)
 end subroutine parse_parenthesized_expression
 elemental function binary_operator_node_variety (token_no) result (node_variety)
   integer, intent(in) :: token_no
   integer :: node_variety
   select case (token_no)
   case (tk_Mul)
      node_variety = node_Multiply
   case (tk_Div)
      node_variety = node_Divide
   case (tk_Mod)
      node_variety = node_Mod
   case (tk_Add)
      node_variety = node_Add
   case (tk_Sub)
      node_variety = node_Subtract
   case (tk_Lss)
      node_variety = node_Less
   case (tk_Leq)
      node_variety = node_LessEqual
   case (tk_Gtr)
      node_variety = node_Greater
   case (tk_Geq)
      node_variety = node_GreaterEqual
   case (tk_Eq)
      node_variety = node_Equal
   case (tk_Neq)
      node_variety = node_NotEqual
   case (tk_And)
      node_variety = node_And
   case (tk_Or)
      node_variety = node_Or
   case default
      ! This branch should never be reached.
      error stop
   end select
 end function binary_operator_node_variety
 function make_internal_node (node_variety, left, right) result (ast)
   integer, intent(in) :: node_variety
   class(ast_t), intent(in) :: left, right
   type(ast_t) :: ast
   type(ast_node_t), pointer :: node
   allocate (node)
   node%node_variety = node_variety
   node%left => left%node
   node%right => right%node
   ast%node => node
 end function make_internal_node
 function make_leaf_node (node_variety, val) result (ast)
   integer, intent(in) :: node_variety
   character(*, kind = ck), intent(in) :: val
   type(ast_t) :: ast
   type(ast_node_t), pointer :: node
   allocate (node)
   node%node_variety = node_variety
   node%val = val
   ast%node => node
 end function make_leaf_node
 subroutine get_token (unit_no, lex_line_no, token)
   integer, intent(in) :: unit_no
   integer(kind = nk), intent(inout) :: lex_line_no
   type(lexer_token_t), intent(out) :: token
   logical :: eof
   call get_lexer_token (unit_no, lex_line_no, eof, token)
   if (eof) then
      write (error_unit, '("Parser error: the stream of input tokens is incomplete")')
      stop 1
   end if
 end subroutine get_token
 subroutine expect_token (message, token_no, token)
   character(*), intent(in) :: message
   integer, intent (in) :: token_no
   class(lexer_token_t), intent(in) :: token
   if (token%token_no /= token_no) then
      call syntax_error_message (message, token_no, token)
      stop 1
   end if
 end subroutine expect_token
 subroutine syntax_error_message (message, expected_token_no, token)
   character(*), intent(in) :: message
   integer, intent(in) :: expected_token_no
   class(lexer_token_t), intent(in) :: token
   ! Write a message to an output unit dedicated to printing
   ! errors. The message could, of course, be more detailed than what
   ! we are doing here.
   write (error_unit, '("Syntax error at ", I0, ".", I0)') &
        &    token%line_no, token%column_no
   !
   ! For the sake of the exercise, also write, to output_unit, a
   ! message in the style of the C reference program.
   !
   write (output_unit, '("(", I0, ", ", I0, ") error: ")', advance = 'no') &
        &    token%line_no, token%column_no
   select case (expected_token_no)
   case (tk_start_of_statement)
      write (output_unit, '("expecting start of statement, found ", 1A, "")') &
           &    trim (lexer_token_string(token%token_no))
   case (tk_primary)
      write (output_unit, '("Expecting a primary, found ", 1A, "")') &
           &    trim (lexer_token_string(token%token_no))
   case default
      write (output_unit, '(1A, ": Expecting ", 1A, ", found ", 1A, "")') &
           &    trim (message), trim (lexer_token_string(expected_token_no)), &
           &    trim (lexer_token_string(token%token_no))
   end select
 end subroutine syntax_error_message
 subroutine output_ast_flattened (unit_no, ast)
   integer, intent(in) :: unit_no
   type(ast_t), intent(in) :: ast
   call output_ast_node_flattened (unit_no, ast%node)
 end subroutine output_ast_flattened
 recursive subroutine output_ast_node_flattened (unit_no, node)
   integer, intent(in) :: unit_no
   type(ast_node_t), pointer, intent(in) :: node
     if (.not. associated (node)) then
        write (unit_no, '(";")')
     else
        if (allocated (node%val)) then
           write (unit_no, '(1A16, 2X, 1A)') &
                &   node_variety_string(node%node_variety), node%val
        else
           write (unit_no, '(1A)') &
                &   trim (node_variety_string(node%node_variety))
           call output_ast_node_flattened (unit_no, node%left)
           call output_ast_node_flattened (unit_no, node%right)
        end if
     end if
   end subroutine output_ast_node_flattened

end module syntactic_analysis

program parse

 use, intrinsic :: iso_fortran_env, only: input_unit
 use, intrinsic :: iso_fortran_env, only: output_unit
 use, intrinsic :: iso_fortran_env, only: error_unit
 use, non_intrinsic :: syntactic_analysis
 implicit none
 integer, parameter :: inp_unit_no = 100
 integer, parameter :: outp_unit_no = 101
 integer :: arg_count
 character(200) :: arg
 integer :: inp
 integer :: outp
 arg_count = command_argument_count ()
 if (3 <= arg_count) then
    call print_usage
 else
    if (arg_count == 0) then
       inp = input_unit
       outp = output_unit
    else if (arg_count == 1) then
       call get_command_argument (1, arg)
       inp = open_for_input (trim (arg))
       outp = output_unit
    else if (arg_count == 2) then
       call get_command_argument (1, arg)
       inp = open_for_input (trim (arg))
       call get_command_argument (2, arg)
       outp = open_for_output (trim (arg))
    end if
    block
      type(ast_t) :: ast
      ast = parse_token_stream (inp)
      call output_ast_flattened (outp, ast)
    end block
 end if

contains

 function open_for_input (filename) result (unit_no)
   character(*), intent(in) :: filename
   integer :: unit_no
   integer :: stat
   open (unit = inp_unit_no, file = filename, status = 'old', &
        & action = 'read', access = 'stream', form = 'unformatted',  &
        & iostat = stat)
   if (stat /= 0) then
      write (error_unit, '("Error: failed to open ", 1A, " for input")') filename
      stop 1
   end if
   unit_no = inp_unit_no
 end function open_for_input
 function open_for_output (filename) result (unit_no)
   character(*), intent(in) :: filename
   integer :: unit_no
   integer :: stat
   open (unit = outp_unit_no, file = filename, action = 'write', iostat = stat)
   if (stat /= 0) then
      write (error_unit, '("Error: failed to open ", 1A, " for output")') filename
      stop 1
   end if
   unit_no = outp_unit_no
 end function open_for_output
 subroutine print_usage
   character(200) :: progname
   call get_command_argument (0, progname)
   write (output_unit, '("Usage: ", 1A, " [INPUT_FILE [OUTPUT_FILE]]")') &
        &      trim (progname)
 end subroutine print_usage
 

end program parse</lang>

Output:

Prime numbers example:

Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier        count
Integer           1
Assign
Identifier        n
Integer           1
Assign
Identifier        limit
Integer           100
While
Less
Identifier        n
Identifier        limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier        k
Integer           3
Assign
Identifier        p
Integer           1
Assign
Identifier        n
Add
Identifier        n
Integer           2
While
And
LessEqual
Multiply
Identifier        k
Identifier        k
Identifier        n
Identifier        p
Sequence
Sequence
;
Assign
Identifier        p
NotEqual
Multiply
Divide
Identifier        n
Identifier        k
Identifier        k
Identifier        n
Assign
Identifier        k
Add
Identifier        k
Integer           2
If
Identifier        p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier        n
;
Prts
String            " is prime\n"
;
Assign
Identifier        count
Add
Identifier        count
Integer           1
;
Sequence
Sequence
Sequence
;
Prts
String            "Total primes found: "
;
Prti
Identifier        count
;
Prts
String            "\n"
;

Go

Translation of: C

<lang go>package main

import (

   "bufio"
   "fmt"
   "log"
   "os"
   "strconv"
   "strings"

)

type TokenType int

const (

   tkEOI TokenType = iota
   tkMul
   tkDiv
   tkMod
   tkAdd
   tkSub
   tkNegate
   tkNot
   tkLss
   tkLeq
   tkGtr
   tkGeq
   tkEql
   tkNeq
   tkAssign
   tkAnd
   tkOr
   tkIf
   tkElse
   tkWhile
   tkPrint
   tkPutc
   tkLparen
   tkRparen
   tkLbrace
   tkRbrace
   tkSemi
   tkComma
   tkIdent
   tkInteger
   tkString

)

type NodeType int

const (

   ndIdent NodeType = iota
   ndString
   ndInteger
   ndSequence
   ndIf
   ndPrtc
   ndPrts
   ndPrti
   ndWhile
   ndAssign
   ndNegate
   ndNot
   ndMul
   ndDiv
   ndMod
   ndAdd
   ndSub
   ndLss
   ndLeq
   ndGtr
   ndGeq
   ndEql
   ndNeq
   ndAnd
   ndOr

)

type tokS struct {

   tok    TokenType
   errLn  int
   errCol int
   text   string // ident or string literal or integer value

}

type Tree struct {

   nodeType NodeType
   left     *Tree
   right    *Tree
   value    string

}

// dependency: Ordered by tok, must remain in same order as TokenType consts type atr struct {

   text             string
   enumText         string
   tok              TokenType
   rightAssociative bool
   isBinary         bool
   isUnary          bool
   precedence       int
   nodeType         NodeType

}

var atrs = []atr{

   {"EOI", "End_of_input", tkEOI, false, false, false, -1, -1},
   {"*", "Op_multiply", tkMul, false, true, false, 13, ndMul},
   {"/", "Op_divide", tkDiv, false, true, false, 13, ndDiv},
   {"%", "Op_mod", tkMod, false, true, false, 13, ndMod},
   {"+", "Op_add", tkAdd, false, true, false, 12, ndAdd},
   {"-", "Op_subtract", tkSub, false, true, false, 12, ndSub},
   {"-", "Op_negate", tkNegate, false, false, true, 14, ndNegate},
   {"!", "Op_not", tkNot, false, false, true, 14, ndNot},
   {"<", "Op_less", tkLss, false, true, false, 10, ndLss},
   {"<=", "Op_lessequal", tkLeq, false, true, false, 10, ndLeq},
   {">", "Op_greater", tkGtr, false, true, false, 10, ndGtr},
   {">=", "Op_greaterequal", tkGeq, false, true, false, 10, ndGeq},
   {"==", "Op_equal", tkEql, false, true, false, 9, ndEql},
   {"!=", "Op_notequal", tkNeq, false, true, false, 9, ndNeq},
   {"=", "Op_assign", tkAssign, false, false, false, -1, ndAssign},
   {"&&", "Op_and", tkAnd, false, true, false, 5, ndAnd},
   {"||", "Op_or", tkOr, false, true, false, 4, ndOr},
   {"if", "Keyword_if", tkIf, false, false, false, -1, ndIf},
   {"else", "Keyword_else", tkElse, false, false, false, -1, -1},
   {"while", "Keyword_while", tkWhile, false, false, false, -1, ndWhile},
   {"print", "Keyword_print", tkPrint, false, false, false, -1, -1},
   {"putc", "Keyword_putc", tkPutc, false, false, false, -1, -1},
   {"(", "LeftParen", tkLparen, false, false, false, -1, -1},
   {")", "RightParen", tkRparen, false, false, false, -1, -1},
   {"{", "LeftBrace", tkLbrace, false, false, false, -1, -1},
   {"}", "RightBrace", tkRbrace, false, false, false, -1, -1},
   {";", "Semicolon", tkSemi, false, false, false, -1, -1},
   {",", "Comma", tkComma, false, false, false, -1, -1},
   {"Ident", "Identifier", tkIdent, false, false, false, -1, ndIdent},
   {"Integer literal", "Integer", tkInteger, false, false, false, -1, ndInteger},
   {"String literal", "String", tkString, false, false, false, -1, ndString},

}

var displayNodes = []string{

   "Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts", "Prti",
   "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add",
   "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal",
   "NotEqual", "And", "Or",

}

var (

   err     error
   token   tokS
   scanner *bufio.Scanner

)

func reportError(errLine, errCol int, msg string) {

   log.Fatalf("(%d, %d) error : %s\n", errLine, errCol, msg)

}

func check(err error) {

   if err != nil {
       log.Fatal(err)
   }

}

func getEum(name string) TokenType { // return internal version of name#

   for _, atr := range atrs {
       if atr.enumText == name {
           return atr.tok
       }
   }
   reportError(0, 0, fmt.Sprintf("Unknown token %s\n", name))
   return tkEOI

}

func getTok() tokS {

   tok := tokS{}
   if scanner.Scan() {
       line := strings.TrimRight(scanner.Text(), " \t")
       fields := strings.Fields(line)
       // [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional
       tok.errLn, err = strconv.Atoi(fields[0])
       check(err)
       tok.errCol, err = strconv.Atoi(fields[1])
       check(err)
       tok.tok = getEum(fields[2])
       le := len(fields)
       if le == 4 {
           tok.text = fields[3]
       } else if le > 4 {
           idx := strings.Index(line, `"`)
           tok.text = line[idx:]
       }
   }
   check(scanner.Err())
   return tok

}

func makeNode(nodeType NodeType, left *Tree, right *Tree) *Tree {

   return &Tree{nodeType, left, right, ""}

}

func makeLeaf(nodeType NodeType, value string) *Tree {

   return &Tree{nodeType, nil, nil, value}

}

func expect(msg string, s TokenType) {

   if token.tok == s {
       token = getTok()
       return
   }
   reportError(token.errLn, token.errCol,
       fmt.Sprintf("%s: Expecting '%s', found '%s'\n", msg, atrs[s].text, atrs[token.tok].text))

}

func expr(p int) *Tree {

   var x, node *Tree
   switch token.tok {
   case tkLparen:
       x = parenExpr()
   case tkSub, tkAdd:
       op := token.tok
       token = getTok()
       node = expr(atrs[tkNegate].precedence)
       if op == tkSub {
           x = makeNode(ndNegate, node, nil)
       } else {
           x = node
       }
   case tkNot:
       token = getTok()
       x = makeNode(ndNot, expr(atrs[tkNot].precedence), nil)
   case tkIdent:
       x = makeLeaf(ndIdent, token.text)
       token = getTok()
   case tkInteger:
       x = makeLeaf(ndInteger, token.text)
       token = getTok()
   default:
       reportError(token.errLn, token.errCol,
           fmt.Sprintf("Expecting a primary, found: %s\n", atrs[token.tok].text))
   }
   for atrs[token.tok].isBinary && atrs[token.tok].precedence >= p {
       op := token.tok
       token = getTok()
       q := atrs[op].precedence
       if !atrs[op].rightAssociative {
           q++
       }
       node = expr(q)
       x = makeNode(atrs[op].nodeType, x, node)
   }
   return x

}

func parenExpr() *Tree {

   expect("parenExpr", tkLparen)
   t := expr(0)
   expect("parenExpr", tkRparen)
   return t

}

func stmt() *Tree {

   var t, v, e, s, s2 *Tree
   switch token.tok {
   case tkIf:
       token = getTok()
       e = parenExpr()
       s = stmt()
       s2 = nil
       if token.tok == tkElse {
           token = getTok()
           s2 = stmt()
       }
       t = makeNode(ndIf, e, makeNode(ndIf, s, s2))
   case tkPutc:
       token = getTok()
       e = parenExpr()
       t = makeNode(ndPrtc, e, nil)
       expect("Putc", tkSemi)
   case tkPrint: // print '(' expr {',' expr} ')'
       token = getTok()
       for expect("Print", tkLparen); ; expect("Print", tkComma) {
           if token.tok == tkString {
               e = makeNode(ndPrts, makeLeaf(ndString, token.text), nil)
               token = getTok()
           } else {
               e = makeNode(ndPrti, expr(0), nil)
           }
           t = makeNode(ndSequence, t, e)
           if token.tok != tkComma {
               break
           }
       }
       expect("Print", tkRparen)
       expect("Print", tkSemi)
   case tkSemi:
       token = getTok()
   case tkIdent:
       v = makeLeaf(ndIdent, token.text)
       token = getTok()
       expect("assign", tkAssign)
       e = expr(0)
       t = makeNode(ndAssign, v, e)
       expect("assign", tkSemi)
   case tkWhile:
       token = getTok()
       e = parenExpr()
       s = stmt()
       t = makeNode(ndWhile, e, s)
   case tkLbrace: // {stmt}
       for expect("Lbrace", tkLbrace); token.tok != tkRbrace && token.tok != tkEOI; {
           t = makeNode(ndSequence, t, stmt())
       }
       expect("Lbrace", tkRbrace)
   case tkEOI:
       // do nothing
   default:
       reportError(token.errLn, token.errCol,
           fmt.Sprintf("expecting start of statement, found '%s'\n", atrs[token.tok].text))
   }
   return t

}

func parse() *Tree {

   var t *Tree
   token = getTok()
   for {
       t = makeNode(ndSequence, t, stmt())
       if t == nil || token.tok == tkEOI {
           break
       }
   }
   return t

}

func prtAst(t *Tree) {

   if t == nil {
       fmt.Print(";\n")
   } else {
       fmt.Printf("%-14s ", displayNodes[t.nodeType])
       if t.nodeType == ndIdent || t.nodeType == ndInteger || t.nodeType == ndString {
           fmt.Printf("%s\n", t.value)
       } else {
           fmt.Println()
           prtAst(t.left)
           prtAst(t.right)
       }
   }

}

func main() {

   source, err := os.Open("source.txt")
   check(err)
   defer source.Close()
   scanner = bufio.NewScanner(source)
   prtAst(parse())

}</lang>

Output:

Prime Numbers example:

Sequence       
Sequence       
Sequence       
Sequence       
Sequence       
;
Assign         
Identifier     count
Integer        1
Assign         
Identifier     n
Integer        1
Assign         
Identifier     limit
Integer        100
While          
Less           
Identifier     n
Identifier     limit
Sequence       
Sequence       
Sequence       
Sequence       
Sequence       
;
Assign         
Identifier     k
Integer        3
Assign         
Identifier     p
Integer        1
Assign         
Identifier     n
Add            
Identifier     n
Integer        2
While          
And            
LessEqual      
Multiply       
Identifier     k
Identifier     k
Identifier     n
Identifier     p
Sequence       
Sequence       
;
Assign         
Identifier     p
NotEqual       
Multiply       
Divide         
Identifier     n
Identifier     k
Identifier     k
Identifier     n
Assign         
Identifier     k
Add            
Identifier     k
Integer        2
If             
Identifier     p
If             
Sequence       
Sequence       
;
Sequence       
Sequence       
;
Prti           
Identifier     n
;
Prts           
String         " is prime\n"
;
Assign         
Identifier     count
Add            
Identifier     count
Integer        1
;
Sequence       
Sequence       
Sequence       
;
Prts           
String         "Total primes found: "
;
Prti           
Identifier     count
;
Prts           
String         "\n"
;

Icon

Works with: Icon version 9.5.20i


I use co-expressions in a way that could easily be done differently, but I prefer to use the co-expressions. (These can be sluggish or fast, depending on what sort of Icon you are running. In this case, the speed differences are of little concern.)


<lang Icon>#

  1. The Rosetta Code Tiny-Language Parser, in Icon.
  2. This implementation is based closely on the pseudocode and the C
  3. reference implementation.
  1. ximage from the IPL is useful for debugging. Use "xdump(x)" to
  2. pretty-print x.
  3. link ximage

record token_record (line_no, column_no, tok, tokval) record token_getter (nxt, curr)

procedure main (args)

 local inpf_name, outf_name
 local inpf, outf
 local nexttok, currtok, current_token, gettok
 local ast
 inpf_name := "-"
 outf_name := "-"
 if 1 <= *args then inpf_name := args[1]
 if 2 <= *args then outf_name := args[2]
 inpf :=
     if inpf_name == "-" then
         &input
     else
         (open(inpf_name, "r") |
          stop("failed to open \"" || inpf_name || "\" for input"))
 outf :=
     if outf_name == "-" then
         &output
     else
         (open(outf_name, "w") |
          stop("failed to open \"" || outf_name || "\" for output"))
 current_token := [&null]
 nexttok := create generate_tokens(inpf, current_token)
 currtok := create get_current_token (current_token)
 gettok := token_getter(nexttok, currtok)
 ast := parse(gettok)
 prt_ast(outf, ast)
 close(inpf)
 close(outf)

end

procedure prt_ast (outf, ast)

 if *ast = 0 then {
   write(outf, ";")
 } else {
   writes(outf, ast[1])
   if ast[1] == ("Identifier" | "Integer" | "String") then {
     write(outf, " ", ast[2])
   } else {
     write(outf)
     prt_ast(outf, ast[2])
     prt_ast(outf, ast[3])
   }
 }

end

procedure generate_tokens (inpf, current_token)

 local s
 while s := read(inpf) do {
   if trim(s) ~== "" then {
     current_token[1] := string_to_token_record(s)
     suspend current_token[1]
   }
 }

end

procedure get_current_token (current_token)

 repeat (suspend current_token[1])

end

procedure string_to_token_record (s)

 local line_no, column_no, tok, tokval
 static spaces
 initial {
   spaces := ' \t\f\v\r\n'
 }
 trim(s) ? {
   tab(many(spaces))
   line_no := integer(tab(many(&digits)))
   tab(many(spaces))
   column_no := integer(tab(many(&digits)))
   tab(many(spaces))
   tok := tab(many(&letters ++ '_'))
   tab(many(spaces))
   tokval := tab(0)
 }
 return token_record(line_no, column_no, tok, tokval)

end

procedure parse (gettok)

 local tok
 local t
 t := []
 @gettok.nxt
 tok := "Not End_of_input"
 while tok ~== "End_of_input" do {
   t := ["Sequence", t, stmt(gettok)]
   tok := (@gettok.curr).tok
 }
 return t

end

procedure stmt (gettok)

 local e, s, t, v
 local tok
 local done
 t := []
 if accept(gettok, "Keyword_if") then {
   e := paren_expr(gettok)
   s := stmt(gettok)
   t := ["If", e, ["If", s, 
                   if accept(gettok, "Keyword_else")
                   then stmt(gettok) else []]]
 } else if accept(gettok, "Keyword_putc") then {
   t := ["Prtc", paren_expr(gettok), []]
   expect(gettok, "Putc", "Semicolon")
 } else if accept(gettok, "Keyword_print") then {
   expect(gettok, "Print", "LeftParen")
   done := 0
   while done = 0 do {
     tok := @gettok.curr
     if tok.tok == "String" then {
       e := ["Prts", ["String", tok.tokval], []]
       @gettok.nxt
     } else {
       e := ["Prti", expr(gettok, 0), []]
     }
     t := ["Sequence", t, e]
     accept(gettok, "Comma") | (done := 1)
   }
   expect(gettok, "Print", "RightParen")
   expect(gettok, "Print", "Semicolon")
 } else if (@gettok.curr).tok == "Semicolon" then {
   @gettok.nxt
 } else if (@gettok.curr).tok == "Identifier" then {
   v := ["Identifier", (@gettok.curr).tokval]
   @gettok.nxt
   expect(gettok, "assign", "Op_assign")
   t := ["Assign", v, expr(gettok, 0)]
   expect(gettok, "assign", "Semicolon")
 } else if accept(gettok, "Keyword_while") then {
   e := paren_expr(gettok)
   t := ["While", e, stmt(gettok)]
 } else if accept(gettok, "LeftBrace") then {
   until (@gettok.curr).tok == ("RightBrace" | "End_of_input") do {
     t := ["Sequence", t, stmt(gettok)]
   }
   expect(gettok, "Lbrace", "RightBrace")
 } else if (@gettok.curr).tok ~== "End_of_input" then {
   tok := @gettok.curr
   error(tok, ("expecting start of statement, found '" ||
               text(tok.tok) || "'"))
 }
 return t

end

procedure paren_expr (gettok)

 local x
 expect(gettok, "paren_expr", "LeftParen");
 x := expr(gettok, 0);
 expect(gettok, "paren_expr", "RightParen");
 return x

end

procedure expr (gettok, p)

 local tok, save_tok
 local x, y
 local q
 tok := @gettok.curr
 case tok.tok of {
   "LeftParen" : {
     x := paren_expr(gettok)
   }
   "Op_subtract" : {
     @gettok.nxt
     y := expr(gettok, precedence("Op_negate"))
     x := ["Negate", y, []]
   }
   "Op_add" : {
     @gettok.nxt
     x := expr(gettok, precedence("Op_negate"))
   }
   "Op_not" : {
     @gettok.nxt
     y := expr(gettok, precedence("Op_not"))
     x := ["Not", y, []]
   }
   "Identifier" : {
     x := ["Identifier", tok.tokval]
     @gettok.nxt
   }
   "Integer" : {
     x := ["Integer", tok.tokval]
     @gettok.nxt
   }
   default : {
     error(tok, "Expecting a primary, found: " || text(tok.tok))
   }
 }
 while (tok := @gettok.curr &
        is_binary(tok.tok) &
        p <= precedence(tok.tok)) do
     {
       save_tok := tok
       @gettok.nxt
       q := precedence(save_tok.tok)
       if not is_right_associative(save_tok.tok) then q +:= 1
       x := [operator(save_tok.tok), x, expr(gettok, q)]
     }
 return x

end

procedure accept (gettok, tok)

 local nxt
 if (@gettok.curr).tok == tok then nxt := @gettok.nxt else fail
 return nxt

end

procedure expect (gettok, msg, tok)

 if (@gettok.curr).tok ~== tok then {
   error(@gettok.curr,
         msg || ": Expecting '" || tok || "', found '" ||
         (@gettok.curr).tok || "'")
 }
 return @gettok.nxt

end

procedure error (token, msg)

 write("(", token.line_no, ", ", token.column_no, ") error: ", msg)
 exit(1)

end

procedure precedence (tok)

 local p
 case tok of {
   "Op_multiply" : p := 13
   "Op_divide" : p := 13
   "Op_mod" : p := 13
   "Op_add" : p := 12
   "Op_subtract" : p := 12
   "Op_negate" : p := 14
   "Op_not" : p := 14
   "Op_less" : p := 10
   "Op_lessequal" : p := 10
   "Op_greater" : p := 10
   "Op_greaterequal" : p := 10
   "Op_equal" : p := 9
   "Op_notequal" : p := 9
   "Op_and" : p := 5
   "Op_or" : p := 4
   default : p := -1
 }
 return p

end

procedure is_binary (tok)

 return ("Op_add" |
         "Op_subtract" |
         "Op_multiply" |
         "Op_divide" |
         "Op_mod" |
         "Op_less" |
         "Op_lessequal" |
         "Op_greater" |
         "Op_greaterequal" |
         "Op_equal" |
         "Op_notequal" |
         "Op_and" |
         "Op_or") == tok
 fail

end

procedure is_right_associative (tok)

 # None of the current operators is right associative.
 fail

end

procedure operator (tok)

 local s
 case tok of {
   "Op_multiply" : s := "Multiply"
   "Op_divide" : s := "Divide"
   "Op_mod" : s := "Mod"
   "Op_add" : s := "Add"
   "Op_subtract" : s := "Subtract"
   "Op_negate" : s := "Negate"
   "Op_not" : s := "Not"
   "Op_less" : s := "Less"
   "Op_lessequal" : s := "LessEqual"
   "Op_greater" : s := "Greater"
   "Op_greaterequal" : s := "GreaterEqual"
   "Op_equal" : s := "Equal"
   "Op_notequal" : s := "NotEqual"
   "Op_and" : s := "And"
   "Op_or" : s := "Or"
 }
 return s

end

procedure text (tok)

 local s
 case tok of {
   "Keyword_else"    :  s := "else"
   "Keyword_if"      :  s := "if"
   "Keyword_print"   :  s := "print"
   "Keyword_putc"    :  s := "putc"
   "Keyword_while"   :  s := "while"
   "Op_multiply"     :  s := "*"
   "Op_divide"       :  s := "/"
   "Op_mod"          :  s := "%"
   "Op_add"          :  s := "+"
   "Op_subtract"     :  s := "-"
   "Op_negate"       :  s := "-"
   "Op_less"         :  s := "<"
   "Op_lessequal"    :  s := "<="
   "Op_greater"      :  s := ">"
   "Op_greaterequal" :  s := ">="
   "Op_equal"        :  s := "=="
   "Op_notequal"     :  s := "!="
   "Op_not"          :  s := "!"
   "Op_assign"       :  s := "="
   "Op_and"          :  s := "&&"
   "Op_or"           :  s := "||"
   "LeftParen"       :  s := "("
   "RightParen"      :  s := ")"
   "LeftBrace"       :  s := "{"
   "RightBrace"      :  s := "}"
   "Semicolon"       :  s := ";"
   "Comma"           :  s := ","
   "Identifier"      :  s := "Ident"
   "Integer"         :  s := "Integer literal"
   "String"          :  s := "String literal"
   "End_of_input"    :  s := "EOI"
 }
 return s

end</lang>

Output:
$ icont -s -u parse.icn && ./parse compiler-tests/primes.lex
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier count
Integer 1
Assign
Identifier n
Integer 1
Assign
Identifier limit
Integer 100
While
Less
Identifier n
Identifier limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier k
Integer 3
Assign
Identifier p
Integer 1
Assign
Identifier n
Add
Identifier n
Integer 2
While
And
LessEqual
Multiply
Identifier k
Identifier k
Identifier n
Identifier p
Sequence
Sequence
;
Assign
Identifier p
NotEqual
Multiply
Divide
Identifier n
Identifier k
Identifier k
Identifier n
Assign
Identifier k
Add
Identifier k
Integer 2
If
Identifier p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier n
;
Prts
String " is prime\n"
;
Assign
Identifier count
Add
Identifier count
Integer 1
;
Sequence
Sequence
Sequence
;
Prts
String "Total primes found: "
;
Prti
Identifier count
;
Prts
String "\n"
;



J

Implementation:

<lang J>require'format/printf'

tkref=: tokenize 'End_of_input*/%+--<<=>>===!=!&&||print=print(if{else}while;,putc)a""0' tkref,. (tknames)=: tknames=:;: {{)n

End_of_input Op_multiply Op_divide Op_mod Op_add Op_subtract Op_negate Op_less
Op_lessequal Op_greater Op_greaterequal Op_equal Op_notequal Op_not Op_and
Op_or Keyword_print Op_assign Keyword_print LeftParen Keyword_if LeftBrace Keyword_else RightBrace
Keyword_while Semicolon Comma Keyword_putc RightParen
Identifier String Integer

}}-.LF

tkV=: 2 (tkref i.tokenize '*/%+-<<=>>===!=&&||')} (#tktyp)#0 tkV=: 1 (1 0+tkref i.tokenize '-!')} tkV tkPrec=: 13 13 13 12 12 10 10 10 10 9 9 5 4 (tkref i.tokenize'*/%+-<<=>>==!=&&||')} tkV<._1 tkPrec=: 14 (1 0+tkref i.tokenize'-!')} tkPrec NB. proofread |:(<"1 tkV,.tkPrec),tkref,:tknames

tkref,.(ndDisp)=: ndDisp=:;:{{)n

Sequence Multiply Divide Mod Add Subtract Negate Less LessEqual Greater
GreaterEqual Equal NotEqual Not And Or Prts Assign Prti x If x x x While
x x Prtc x Identifier String Integer

}}-.LF NB. proofread |:tkref,:ndDisp

gettoken=: {{

 'tok_ln tok_col'=: (0;ndx){::x
 'tok_name tok_value'=: (1;ndx){::x
 if. 'Error'-:tok_name do.
   error 'invalid word ',":tok_value
 end.
 ind=. tknames i.<tok_name
 tok_text=: ind{::tkref
 tok_valence=: ind{::tkV
 tok_precedence=: ind{::tkPrec
 ndx=:ndx+1
 node_display=: ind{::ndDisp

}}

parse=: {{

 ndx=: tok_ln=: tok_col=: 0
 gettok=: y&gettoken
 gettok
 t=.a:
 whilst.-.(a:-:t)+.tok_name-:End_of_input do.
   t=. Sequence make_node t stmt
 end.

}}

stmt=:{{)v

 t=. a:
 select.tok_name
   case.Keyword_if do.
     s=. stmt e=. paren_expr gettok
     if.Keyword_else-:tok_name
     do.   S=. stmt gettok
     else. S=. a: end.
     t=. If make_node e If make_node s S
   case.Keyword_putc do.
     e=. paren_expr gettok
     t=. Prtc make_node e a:
     Prtc expect Semicolon
   case.Keyword_print do.gettok
     'Print' expect LeftParen
     while.do.
       if.String-:tok_name
       do. gettok e=. Prts make_node (String make_leaf tok_value) a:
       else. e=. Prti make_node (expr 0) a: end.
       t=. Sequence make_node t e
       if.Comma-:tok_name
       do.Comma expect Comma
       else.break.end.
     end.
     'Print' expect RightParen
     'Print' expect Semicolon      
   case.Semicolon do.gettok
   case.Identifier do.
     gettok v=. Identifier make_leaf tok_value
     Assign expect Op_assign
     t=. Assign make_node v e=. expr 0
     Assign expect Semicolon
   case.Keyword_while do.
     t=. While make_node e s=. stmt e=. paren_expr gettok
   case.LeftBrace do.
     'LeftBrace' expect LeftBrace
     while.-.(<tok_name) e.  RightBrace;End_of_input do.
       t=. Sequence make_node t stmt
     end.
     'LeftBrace' expect RightBrace
   case.End_of_input do.
   case.do. error 'Expecting start of statement, found %s'sprintf<tok_text
 end.
 t

}}

paren_expr=: {{

 'paren_expr' expect LeftParen
 t=. expr 0
 'paren_expr' expect RightParen
 t

}}

not_prec=: tkPrec{~tknames i.<Op_not expr=: {{

 select.tok_name
   case.LeftParen do.e=. paren_expr
   case.Op_add do.gettok
     e=. expr not_prec
   case.Op_subtract do.gettok
     e=. Negate make_node (expr not_prec) a:
   case.Op_not do.gettok
     e=. Not make_node (expr not_prec) a:
   case.Identifier do.
     gettok e=. Identifier make_leaf tok_value
   case.Integer do.
     gettok e=. Integer make_leaf tok_value
   case.do. error 'Expecting a primary, found %s'sprintf<tok_text
 end.
 while.(2=tok_valence)*tok_precedence>:y do.
   q=. 1+tok_precedence [ op=. node_display NB. no right associative operators
   gettok
   node=. expr q
   e=. op make_node e node
 end.
 e

}}

expect=: {{

 if.y-:tok_name do. gettok return.end.
 error '%s: Expecting "%s", found "%s"'sprintf x;(tkref{::~tknames i.<y);tok_text

}}

make_leaf=: Template:X;y

make_node=: {{

 m;n;<y

}}

error=: Template:Echo 'Error: line %d, column %d: %s\n'sprintf tok ln;tok col;y throw.


syntax=: Template:;(flatAST parse y),each LF

flatAST=: {{

 assert.*L.y
 select.#y
   case.1 do.<';' assert.y-:a:
   case.2 do.<;:inv ":each y
   case.3 do.({.y),(flatAST 1{::y),flatAST 2{::y
   case.do.assert.0
 end.

}} </lang>

Some quirks worth noting:

(1) '+' appears in the productions for 'primary' and 'addition_expr' but has only one node type (because we do not represent its appearance in 'primary' with a node.

(2) '-' and 'print' do have two node types (which we sort out on the fly).

(3) In this implementation, we require a 1:1 mapping between the data structure representing token types and the data structure representing node types. This means two token entries for both - and print (the second instance of both gets ignored by the lexer).

(4) Because the data structure produced by the lexer is independent of any type system implementation, we can use the same type system for the lexer or a different type system for the lexer and either way works (as long as the implementations are consistent with the spec).

(5) In this context parallel constant arrays represent token and node types.

Task example:

<lang J> primes=: {{)n /*

Simple prime number generator
*/

count = 1; n = 1; limit = 100; while (n < limit) {

   k=3;
   p=1;
   n=n+2;
   while ((k*k<=n) && (p)) {
       p=n/k*k!=n;
       k=k+2;
   }
   if (p) {
       print(n, " is prime\n");
       count = count + 1;
   }

} print("Total primes found: ", count, "\n"); }}

  syntax lex primes

Sequence Sequence Sequence Sequence Sequence

Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence

Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence

Assign Identifier p NotEqual Divide Identifier n Multiply Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence

Sequence Sequence

Prti Identifier n

Prts String " is prime\n"

Assign Identifier count Add Identifier count Integer 1

Sequence Sequence Sequence

Prts String "Total primes found: "

Prti Identifier count

Prts String "\n"

</lang>

Java

Usage: java Parser infile [>outfile]

Translation of: Python

<lang java> import java.io.File; import java.io.FileNotFoundException; import java.util.Scanner; import java.util.StringTokenizer; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.util.HashMap;

class Parser { private List<Token> source; private Token token; private int position;

static class Node { public NodeType nt; public Node left, right; public String value;

Node() { this.nt = null; this.left = null; this.right = null; this.value = null; } Node(NodeType node_type, Node left, Node right, String value) { this.nt = node_type; this.left = left; this.right = right; this.value = value; } public static Node make_node(NodeType nodetype, Node left, Node right) { return new Node(nodetype, left, right, ""); } public static Node make_node(NodeType nodetype, Node left) { return new Node(nodetype, left, null, ""); } public static Node make_leaf(NodeType nodetype, String value) { return new Node(nodetype, null, null, value); } }

static class Token { public TokenType tokentype; public String value; public int line; public int pos;

Token(TokenType token, String value, int line, int pos) { this.tokentype = token; this.value = value; this.line = line; this.pos = pos; } @Override public String toString() { return String.format("%5d %5d %-15s %s", this.line, this.pos, this.tokentype, this.value); } }

static enum TokenType { End_of_input(false, false, false, -1, NodeType.nd_None), Op_multiply(false, true, false, 13, NodeType.nd_Mul), Op_divide(false, true, false, 13, NodeType.nd_Div), Op_mod(false, true, false, 13, NodeType.nd_Mod), Op_add(false, true, false, 12, NodeType.nd_Add), Op_subtract(false, true, false, 12, NodeType.nd_Sub), Op_negate(false, false, true, 14, NodeType.nd_Negate), Op_not(false, false, true, 14, NodeType.nd_Not), Op_less(false, true, false, 10, NodeType.nd_Lss), Op_lessequal(false, true, false, 10, NodeType.nd_Leq), Op_greater(false, true, false, 10, NodeType.nd_Gtr), Op_greaterequal(false, true, false, 10, NodeType.nd_Geq), Op_equal(false, true, true, 9, NodeType.nd_Eql), Op_notequal(false, true, false, 9, NodeType.nd_Neq), Op_assign(false, false, false, -1, NodeType.nd_Assign), Op_and(false, true, false, 5, NodeType.nd_And), Op_or(false, true, false, 4, NodeType.nd_Or), Keyword_if(false, false, false, -1, NodeType.nd_If), Keyword_else(false, false, false, -1, NodeType.nd_None), Keyword_while(false, false, false, -1, NodeType.nd_While), Keyword_print(false, false, false, -1, NodeType.nd_None), Keyword_putc(false, false, false, -1, NodeType.nd_None), LeftParen(false, false, false, -1, NodeType.nd_None), RightParen(false, false, false, -1, NodeType.nd_None), LeftBrace(false, false, false, -1, NodeType.nd_None), RightBrace(false, false, false, -1, NodeType.nd_None), Semicolon(false, false, false, -1, NodeType.nd_None), Comma(false, false, false, -1, NodeType.nd_None), Identifier(false, false, false, -1, NodeType.nd_Ident), Integer(false, false, false, -1, NodeType.nd_Integer), String(false, false, false, -1, NodeType.nd_String);

private final int precedence; private final boolean right_assoc; private final boolean is_binary; private final boolean is_unary; private final NodeType node_type;

TokenType(boolean right_assoc, boolean is_binary, boolean is_unary, int precedence, NodeType node) { this.right_assoc = right_assoc; this.is_binary = is_binary; this.is_unary = is_unary; this.precedence = precedence; this.node_type = node; } boolean isRightAssoc() { return this.right_assoc; } boolean isBinary() { return this.is_binary; } boolean isUnary() { return this.is_unary; } int getPrecedence() { return this.precedence; } NodeType getNodeType() { return this.node_type; } } static enum NodeType { nd_None(""), nd_Ident("Identifier"), nd_String("String"), nd_Integer("Integer"), nd_Sequence("Sequence"), nd_If("If"), nd_Prtc("Prtc"), nd_Prts("Prts"), nd_Prti("Prti"), nd_While("While"), nd_Assign("Assign"), nd_Negate("Negate"), nd_Not("Not"), nd_Mul("Multiply"), nd_Div("Divide"), nd_Mod("Mod"), nd_Add("Add"), nd_Sub("Subtract"), nd_Lss("Less"), nd_Leq("LessEqual"), nd_Gtr("Greater"), nd_Geq("GreaterEqual"), nd_Eql("Equal"), nd_Neq("NotEqual"), nd_And("And"), nd_Or("Or");

private final String name;

NodeType(String name) { this.name = name; }

@Override public String toString() { return this.name; } } static void error(int line, int pos, String msg) { if (line > 0 && pos > 0) { System.out.printf("%s in line %d, pos %d\n", msg, line, pos); } else { System.out.println(msg); } System.exit(1); } Parser(List<Token> source) { this.source = source; this.token = null; this.position = 0; } Token getNextToken() { this.token = this.source.get(this.position++); return this.token; } Node expr(int p) { Node result = null, node; TokenType op; int q;

if (this.token.tokentype == TokenType.LeftParen) { result = paren_expr(); } else if (this.token.tokentype == TokenType.Op_add || this.token.tokentype == TokenType.Op_subtract) { op = (this.token.tokentype == TokenType.Op_subtract) ? TokenType.Op_negate : TokenType.Op_add; getNextToken(); node = expr(TokenType.Op_negate.getPrecedence()); result = (op == TokenType.Op_negate) ? Node.make_node(NodeType.nd_Negate, node) : node; } else if (this.token.tokentype == TokenType.Op_not) { getNextToken(); result = Node.make_node(NodeType.nd_Not, expr(TokenType.Op_not.getPrecedence())); } else if (this.token.tokentype == TokenType.Identifier) { result = Node.make_leaf(NodeType.nd_Ident, this.token.value); getNextToken(); } else if (this.token.tokentype == TokenType.Integer) { result = Node.make_leaf(NodeType.nd_Integer, this.token.value); getNextToken(); } else { error(this.token.line, this.token.pos, "Expecting a primary, found: " + this.token.tokentype); }

while (this.token.tokentype.isBinary() && this.token.tokentype.getPrecedence() >= p) { op = this.token.tokentype; getNextToken(); q = op.getPrecedence(); if (!op.isRightAssoc()) { q++; } node = expr(q); result = Node.make_node(op.getNodeType(), result, node); } return result; } Node paren_expr() { expect("paren_expr", TokenType.LeftParen); Node node = expr(0); expect("paren_expr", TokenType.RightParen); return node; } void expect(String msg, TokenType s) { if (this.token.tokentype == s) { getNextToken(); return; } error(this.token.line, this.token.pos, msg + ": Expecting '" + s + "', found: '" + this.token.tokentype + "'"); } Node stmt() { Node s, s2, t = null, e, v; if (this.token.tokentype == TokenType.Keyword_if) { getNextToken(); e = paren_expr(); s = stmt(); s2 = null; if (this.token.tokentype == TokenType.Keyword_else) { getNextToken(); s2 = stmt(); } t = Node.make_node(NodeType.nd_If, e, Node.make_node(NodeType.nd_If, s, s2)); } else if (this.token.tokentype == TokenType.Keyword_putc) { getNextToken(); e = paren_expr(); t = Node.make_node(NodeType.nd_Prtc, e); expect("Putc", TokenType.Semicolon); } else if (this.token.tokentype == TokenType.Keyword_print) { getNextToken(); expect("Print", TokenType.LeftParen); while (true) { if (this.token.tokentype == TokenType.String) { e = Node.make_node(NodeType.nd_Prts, Node.make_leaf(NodeType.nd_String, this.token.value)); getNextToken(); } else { e = Node.make_node(NodeType.nd_Prti, expr(0), null); } t = Node.make_node(NodeType.nd_Sequence, t, e); if (this.token.tokentype != TokenType.Comma) { break; } getNextToken(); } expect("Print", TokenType.RightParen); expect("Print", TokenType.Semicolon); } else if (this.token.tokentype == TokenType.Semicolon) { getNextToken(); } else if (this.token.tokentype == TokenType.Identifier) { v = Node.make_leaf(NodeType.nd_Ident, this.token.value); getNextToken(); expect("assign", TokenType.Op_assign); e = expr(0); t = Node.make_node(NodeType.nd_Assign, v, e); expect("assign", TokenType.Semicolon); } else if (this.token.tokentype == TokenType.Keyword_while) { getNextToken(); e = paren_expr(); s = stmt(); t = Node.make_node(NodeType.nd_While, e, s); } else if (this.token.tokentype == TokenType.LeftBrace) { getNextToken(); while (this.token.tokentype != TokenType.RightBrace && this.token.tokentype != TokenType.End_of_input) { t = Node.make_node(NodeType.nd_Sequence, t, stmt()); } expect("LBrace", TokenType.RightBrace); } else if (this.token.tokentype == TokenType.End_of_input) { } else { error(this.token.line, this.token.pos, "Expecting start of statement, found: " + this.token.tokentype); } return t; } Node parse() { Node t = null; getNextToken(); while (this.token.tokentype != TokenType.End_of_input) { t = Node.make_node(NodeType.nd_Sequence, t, stmt()); } return t; } void printAST(Node t) { int i = 0; if (t == null) { System.out.println(";"); } else { System.out.printf("%-14s", t.nt); if (t.nt == NodeType.nd_Ident || t.nt == NodeType.nd_Integer || t.nt == NodeType.nd_String) { System.out.println(" " + t.value); } else { System.out.println(); printAST(t.left); printAST(t.right); } } } public static void main(String[] args) { if (args.length > 0) { try { String value, token; int line, pos; Token t; boolean found; List<Token> list = new ArrayList<>(); Map<String, TokenType> str_to_tokens = new HashMap<>();

str_to_tokens.put("End_of_input", TokenType.End_of_input); str_to_tokens.put("Op_multiply", TokenType.Op_multiply); str_to_tokens.put("Op_divide", TokenType.Op_divide); str_to_tokens.put("Op_mod", TokenType.Op_mod); str_to_tokens.put("Op_add", TokenType.Op_add); str_to_tokens.put("Op_subtract", TokenType.Op_subtract); str_to_tokens.put("Op_negate", TokenType.Op_negate); str_to_tokens.put("Op_not", TokenType.Op_not); str_to_tokens.put("Op_less", TokenType.Op_less); str_to_tokens.put("Op_lessequal", TokenType.Op_lessequal); str_to_tokens.put("Op_greater", TokenType.Op_greater); str_to_tokens.put("Op_greaterequal", TokenType.Op_greaterequal); str_to_tokens.put("Op_equal", TokenType.Op_equal); str_to_tokens.put("Op_notequal", TokenType.Op_notequal); str_to_tokens.put("Op_assign", TokenType.Op_assign); str_to_tokens.put("Op_and", TokenType.Op_and); str_to_tokens.put("Op_or", TokenType.Op_or); str_to_tokens.put("Keyword_if", TokenType.Keyword_if); str_to_tokens.put("Keyword_else", TokenType.Keyword_else); str_to_tokens.put("Keyword_while", TokenType.Keyword_while); str_to_tokens.put("Keyword_print", TokenType.Keyword_print); str_to_tokens.put("Keyword_putc", TokenType.Keyword_putc); str_to_tokens.put("LeftParen", TokenType.LeftParen); str_to_tokens.put("RightParen", TokenType.RightParen); str_to_tokens.put("LeftBrace", TokenType.LeftBrace); str_to_tokens.put("RightBrace", TokenType.RightBrace); str_to_tokens.put("Semicolon", TokenType.Semicolon); str_to_tokens.put("Comma", TokenType.Comma); str_to_tokens.put("Identifier", TokenType.Identifier); str_to_tokens.put("Integer", TokenType.Integer); str_to_tokens.put("String", TokenType.String);

Scanner s = new Scanner(new File(args[0])); String source = " "; while (s.hasNext()) { String str = s.nextLine(); StringTokenizer st = new StringTokenizer(str); line = Integer.parseInt(st.nextToken()); pos = Integer.parseInt(st.nextToken()); token = st.nextToken(); value = ""; while (st.hasMoreTokens()) { value += st.nextToken() + " "; } found = false; if (str_to_tokens.containsKey(token)) { found = true; list.add(new Token(str_to_tokens.get(token), value, line, pos)); } if (found == false) { throw new Exception("Token not found: '" + token + "'"); } } Parser p = new Parser(list); p.printAST(p.parse()); } catch (FileNotFoundException e) { error(-1, -1, "Exception: " + e.getMessage()); } catch (Exception e) { error(-1, -1, "Exception: " + e.getMessage()); } } else { error(-1, -1, "No args"); } } } </lang>

Julia

Julia tends to discourage large numbers of global variables, so this direct port from the Python reference implementation moves the globals into a function wrapper.

Translation of: Python

<lang julia>struct ASTnode

   nodetype::Int
   left::Union{Nothing, ASTnode}
   right::Union{Nothing, ASTnode}
   value::Union{Nothing, Int, String}

end

function syntaxanalyzer(inputfile)

   tkEOI, tkMul, tkDiv, tkMod, tkAdd, tkSub, tkNegate, tkNot, tkLss, tkLeq, tkGtr, tkGeq,
   tkEql, tkNeq, tkAssign, tkAnd, tkOr, tkIf, tkElse, tkWhile, tkPrint, tkPutc, tkLparen, tkRparen,
   tkLbrace, tkRbrace, tkSemi, tkComma, tkIdent, tkInteger, tkString = collect(1:31)
   ndIdent, ndString, ndInteger, ndSequence, ndIf, ndPrtc, ndPrts, ndPrti, ndWhile,
   ndAssign, ndNegate, ndNot, ndMul, ndDiv, ndMod, ndAdd, ndSub, ndLss, ndLeq,
   ndGtr, ndGeq, ndEql, ndNeq, ndAnd, ndOr = collect(1:25)
   TK_NAME, TK_RIGHT_ASSOC, TK_IS_BINARY, TK_IS_UNARY, TK_PRECEDENCE, TK_NODE = collect(1:6) # label Token columns
   Tokens = [
   ["EOI"             , false, false, false, -1, -1       ],
   ["*"               , false, true,  false, 13, ndMul    ],
   ["/"               , false, true,  false, 13, ndDiv    ],
   ["%"               , false, true,  false, 13, ndMod    ],
   ["+"               , false, true,  false, 12, ndAdd    ],
   ["-"               , false, true,  false, 12, ndSub    ],
   ["-"               , false, false, true,  14, ndNegate ],
   ["!"               , false, false, true,  14, ndNot    ],
   ["<"               , false, true,  false, 10, ndLss    ],
   ["<="              , false, true,  false, 10, ndLeq    ],
   [">"               , false, true,  false, 10, ndGtr    ],
   [">="              , false, true,  false, 10, ndGeq    ],
   ["=="              , false, true,  false,  9, ndEql    ],
   ["!="              , false, true,  false,  9, ndNeq    ],
   ["="               , false, false, false, -1, ndAssign ],
   ["&&"              , false, true,  false,  5, ndAnd    ],
   ["||"              , false, true,  false,  4, ndOr     ],
   ["if"              , false, false, false, -1, ndIf     ],
   ["else"            , false, false, false, -1, -1       ],
   ["while"           , false, false, false, -1, ndWhile  ],
   ["print"           , false, false, false, -1, -1       ],
   ["putc"            , false, false, false, -1, -1       ],
   ["("               , false, false, false, -1, -1       ],
   [")"               , false, false, false, -1, -1       ],
   ["{"               , false, false, false, -1, -1       ],
   ["}"               , false, false, false, -1, -1       ],
   [";"               , false, false, false, -1, -1       ],
   [","               , false, false, false, -1, -1       ],
   ["Ident"           , false, false, false, -1, ndIdent  ],
   ["Integer literal" , false, false, false, -1, ndInteger],
   ["String literal"  , false, false, false, -1, ndString ]]
   allsyms = Dict(
       "End_of_input" => tkEOI, "Op_multiply" => tkMul, "Op_divide" => tkDiv,
       "Op_mod" => tkMod, "Op_add" => tkAdd, "Op_subtract" => tkSub,
       "Op_negate" => tkNegate, "Op_not" => tkNot, "Op_less" => tkLss,
       "Op_lessequal" => tkLeq, "Op_greater" => tkGtr, "Op_greaterequal" => tkGeq,
       "Op_equal" => tkEql, "Op_notequal" => tkNeq, "Op_assign" => tkAssign,
       "Op_and" => tkAnd, "Op_or" => tkOr, "Keyword_if" => tkIf, "Keyword_else" => tkElse,
       "Keyword_while" => tkWhile, "Keyword_print" => tkPrint, "Keyword_putc" => tkPutc,
       "LeftParen" => tkLparen, "RightParen" => tkRparen, "LeftBrace" => tkLbrace,
       "RightBrace" => tkRbrace, "Semicolon" => tkSemi, "Comma" => tkComma,
       "Identifier" => tkIdent, "Integer" => tkInteger, "String" => tkString)
   displaynodes = ["Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts", "Prti", "While",
                    "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add", "Subtract", "Less", 
                    "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual", "And", "Or"]
   errline, errcol, tok, toktext = fill("", 4)  
   error(msg) = throw("Error in syntax: $msg.")
   nilnode = ASTnode(0, nothing, nothing, nothing)
   tokother = "" 
  
   function gettok()
       s = readline(inputfile)
       if length(s) == 0
           error("empty line")
       end
       linelist = split(strip(s), r"\s+", limit = 4)
       # line col Ident varname
       # 0    1   2     3
       errline, errcol, toktext = linelist[1:3]
       if !haskey(allsyms, toktext)
           error("Unknown token $toktext")
       end
       tok = allsyms[toktext]
       tokother = (tok in [tkInteger, tkIdent, tkString]) ? linelist[4] : ""
    end   
   makenode(oper, left, right = nilnode) = ASTnode(oper, left, right, nothing)
   makeleaf(oper, n::Int) = ASTnode(oper, nothing, nothing, n) 
   makeleaf(oper, n) = ASTnode(oper, nothing, nothing, string(n)) 
   expect(msg, s) = if tok != s error("msg: Expecting $(Tokens[s][TK_NAME]), found $(Tokens[tok][TK_NAME])") else gettok() end
   function expr(p)
       x = nilnode
       if tok == tkLparen
           x = parenexpr()
       elseif tok in [tkSub, tkAdd]
           op = tok == tkSub ? tkNegate : tkAdd
           gettok()
           node = expr(Tokens[tkNegate][TK_PRECEDENCE])
           x = (op == tkNegate) ? makenode(ndNegate, node) : node
       elseif tok == tkNot
           gettok()
           x = makenode(ndNot, expr(Tokens[tkNot][TK_PRECEDENCE]))
       elseif tok == tkIdent
           x = makeleaf(ndIdent, tokother)
           gettok()
       elseif tok == tkInteger
           x = makeleaf(ndInteger, tokother)
           gettok()
       else
           error("Expecting a primary, found: $(Tokens[tok][TK_NAME])")
       end
       while Tokens[tok][TK_IS_BINARY] && (Tokens[tok][TK_PRECEDENCE] >= p)
           op = tok
           gettok()
           q = Tokens[op][TK_PRECEDENCE]
           if !Tokens[op][TK_RIGHT_ASSOC]
               q += 1
           end
           node = expr(q)
           x = makenode(Tokens[op][TK_NODE], x, node)
       end
       x
   end
   parenexpr() = (expect("paren_expr", tkLparen); node = expr(0); expect("paren_expr", tkRparen); node)
   function stmt()
       t = nilnode
       if tok == tkIf
           gettok()
           e = parenexpr()
           s = stmt()
           s2 = nilnode
           if tok == tkElse
               gettok()
               s2 = stmt()
           end
           t = makenode(ndIf, e, makenode(ndIf, s, s2))
       elseif tok == tkPutc
           gettok()
           e = parenexpr()
           t = makenode(ndPrtc, e)
           expect("Putc", tkSemi)
       elseif tok == tkPrint
           gettok()
           expect("Print", tkLparen)
           while true
               if tok == tkString
                   e = makenode(ndPrts, makeleaf(ndString, tokother))
                   gettok()
               else
                   e = makenode(ndPrti, expr(0))
               end
               t = makenode(ndSequence, t, e)
               if tok != tkComma
                   break
               end
               gettok()
           end
           expect("Print", tkRparen)
           expect("Print", tkSemi)
       elseif tok == tkSemi
           gettok()
       elseif tok == tkIdent
           v = makeleaf(ndIdent, tokother)
           gettok()
           expect("assign", tkAssign)
           e = expr(0)
           t = makenode(ndAssign, v, e)
           expect("assign", tkSemi)
       elseif tok == tkWhile
           gettok()
           e = parenexpr()
           s = stmt()
           t = makenode(ndWhile, e, s)
       elseif tok == tkLbrace
           gettok()
           while (tok != tkRbrace) && (tok != tkEOI)
               t = makenode(ndSequence, t, stmt())
           end
           expect("Lbrace", tkRbrace)
       elseif tok != tkEOI
           error("Expecting start of statement, found: $(Tokens[tok][TK_NAME])")
       end
       return t
   end
   function parse()
       t = nilnode
       gettok()
       while true
           t = makenode(ndSequence, t, stmt())
           if (tok == tkEOI) || (t == nilnode)
               break
           end
       end
       t
   end
   
   function prtASTnode(t)
       if t == nothing
           return
       elseif t == nilnode 
           println(";")
       elseif t.nodetype in [ndIdent, ndInteger, ndString]
           println(rpad(displaynodes[t.nodetype], 14), t.value)
       else
           println(rpad(displaynodes[t.nodetype], 14))
       end
       prtASTnode(t.left)
       prtASTnode(t.right)
   end
   # runs the function
   prtASTnode(parse())

end

testtxt = """

   1      1 Identifier      count
   1      7 Op_assign
   1      9 Integer             1
   1     10 Semicolon
   2      1 Keyword_while
   2      7 LeftParen
   2      8 Identifier      count
   2     14 Op_less
   2     16 Integer            10
   2     18 RightParen
   2     20 LeftBrace
   3      5 Keyword_print
   3     10 LeftParen
   3     11 String          \"count is: \"
   3     23 Comma
   3     25 Identifier      count
   3     30 Comma
   3     32 String          \"\\n\"
   3     36 RightParen
   3     37 Semicolon
   4      5 Identifier      count
   4     11 Op_assign
   4     13 Identifier      count
   4     19 Op_add
   4     21 Integer             1
   4     22 Semicolon
   5      1 RightBrace
   6      1 End_of_input           """

syntaxanalyzer(IOBuffer(testtxt)) # for isolated testing

  1. syntaxanalyzer(length(ARGS) > 1 ? ARGS[1] : stdin) # for use as in the Python code

</lang>

M2000 Interpreter

This program written without functions. Subs use the current stack of values (a feature from interpreter) to return arrays. Subs run on same scope as the module or function which called. We use Local to make local variables (erased at return). Sub prt_ast() called first time without passing parameter, because parameter already exist in stack of values. Interpreter when call a module, a function, a subroutine always pass values to stack of values. Functions called in an expression, always have own stack of values. Modules call other modules passing the same stack of values. Threads are parts of modules, with same scope in module where belong, but have own stack and static variables, and they rub in time intervals.

A (1,2,3) is an auto array or tuple. We can assign a tuple in a variable, in a item in another tuple. A tuple is a reference type, but here we don't use a second pointer (we say references variables which references to other variables - reference or value type-, so we say pointer the reference who hold an object alive. We can read 2nd item (expected string) from alfa, a pointer to array, using array$(alfa,1) or alfa#val$(1). The second variation can be used multiple times if a tuple has another tulple so alfa#val(2)#val$(1) return a string from 3rd item, which expect a tuple from 2nd item. The other variation array$(array(alfa,2),1) for the same result.


<lang M2000 Interpreter> Module syntax_analyzer(b$){ enum tokens { Op_add, Op_subtract, Op_not=5, Op_multiply=10, Op_divide, Op_mod, Op_negate, Op_less, Op_lessequal, Op_greater, Op_greaterequal, Op_equal, Op_notequal, Op_and, Op_or, Op_assign=100, Keyword_if=110, Keyword_else, Keyword_while, Keyword_print, Keyword_putc, LeftParen, RightParen, LeftBrace, RightBrace, Semicolon, Comma, Identifier, Integer, String, End_of_input }

Inventory precedence=Op_multiply:=13, Op_divide:=13, Op_mod:=13, Op_add:=12, Op_subtract:=12 Append precedence, Op_negate:=14, Op_not:=14, Op_less:=10, Op_lessequal:=10, Op_greater:=10 Append precedence, Op_greaterequal:=10, Op_equal:=9, Op_notequal:=9, Op_assign:=-1, Op_and:=5 Append precedence, Op_or:=4

Inventory symbols=Op_multiply:="Multiply", Op_divide:="Divide", Op_mod:="Mod", Op_add:="Add" Append symbols, Op_negate:="Negate", Op_not:="Not", Op_less:="Less", Op_subtract:="Subtract" Append symbols, Op_lessequal:="LessEqual", Op_greater:="Greater", Op_greaterequal:="GreaterEqual" Append symbols, Op_equal:="Equal", Op_notequal:="NotEqual", Op_and:="And", Op_or:="Or"

def lineNo, ColumnNo, m, line$, a, lim, cur=-1 const nl$=chr$(13)+chr$(10), Ansi=3 Dim lex$() lex$()=piece$(b$,chr$(13)+chr$(10)) lim=dimension(lex$(),1)-1 op=End_of_input flush k=0 Try { push (,) ' Null getone(&op) repeat stmt(&op) shift 2 ' swap two top items push ("Sequence", array, array) k++ until op=End_of_Input } er$=error$ if er$<>"" then print er$ : flush: break Print "Ast" Document Output$ prt_ast() clipboard Output$ Save.Doc Output$, "parse.t", Ansi document parse$ Load.Doc parse$,"parse.t", Ansi Report parse$

sub prt_ast(t) if len(t)<1 then Output$=";"+nl$ else.if len(t)=3 then Output$=t#val$(0) +nl$ prt_ast(t#val(1)) : prt_ast(t#val(2)) else Output$=t#val$(0) +nl$ end if end sub sub expr(p) ' only a number local x=(,), prev=op if op>=Identifier then x=(line$,) getone(&op) else.if op=LeftParen then paren_exp() x=array else.if op<10 then getone(&op) expr(precedence(int(Op_negate))) read local y if prev=Op_add then x=y else if prev=Op_subtract then prev=Op_negate x=(symbols(prev), y,(,)) End if else {error "??? "+eval$(op)} end if local prec while exist(precedence, int(op)) prev=op : prec=eval(precedence) if prec<14 and prec>=p else exit getone(&op) expr(prec+1) ' all operators are left associative (use prec for right a.) x=(symbols(int(prev)), x, array) End While Push x end sub sub paren_exp() expected(LeftParen) getone(&op) expr(0) expected(RightParen) getone(&op) end sub sub stmt(&op) local t=(,) if op=Identifier then t=(line$) getone(&op) expected(Op_assign) getone(&op) expr(0) read local rightnode Push ("Assign",t,rightnode) expected(Semicolon) getone(&op) else.if op=Semicolon then getone(&op) Push (";",) else.if op=Keyword_print then getone(&op) expected(LeftParen) repeat getone(&op) if op=String then Push ("Prts",(line$,),(,)) getone(&op) else expr(0) Push ("Prti", array,(,)) end if t=("Sequence", t, array) until op<>Comma expected(RightParen) getone(&op) expected(Semicolon) getone(&op) push t else.if op=Keyword_while then getone(&op) paren_exp() stmt(&op) shift 2 Push ("While",array, array) else.if op=Keyword_if then getone(&op) paren_exp() stmt(&op) local s2=(,) if op=Keyword_else then getone(&op) stmt(&op) read s2 end if shift 2 Push ("If",array ,("If",array,s2)) else.if op=Keyword_putc then getone(&op) paren_exp() Push ("Prtc",array,t) expected(Semicolon) getone(&op) else.if op=LeftBrace then Brace() else error "Unkown Op" end if end sub Sub Brace() getone(&op) while op<>RightBrace and op<>End_of_input stmt(&op) t=("Sequence", t, array) end while expected(RightBrace) getone(&op) push t End Sub Sub expected(what) if not op=what then {Error "Expected "+eval$(what)+str$(LineNo)+","+Str$(ColumnNo)} End Sub sub getone(&op) op=End_of_input while cur<lim cur++ line$=trim$(lex$(cur)) if line$<>"" then exit end while if cur=lim then exit sub LineNo=Val(line$,"int",m) line$=mid$(line$, m) ColumnNo=Val(line$,"int",m) line$=trim$(mid$(line$, m)) Rem : Print LineNo, ColumnNo m=instr(line$," ") if m>0 then op=Eval("."+leftpart$(line$, " ")) else op=Eval("."+line$) end sub }

syntax_analyzer {

        1         1 LeftBrace
        5         5 Identifier left_edge
        5        17 Op_assign
        5        19 Op_subtract
        5        20 Integer 420
        5        23 Semicolon
        6         5 Identifier right_edge
        6        17 Op_assign
        6        20 Integer 300
        6        23 Semicolon
        7         5 Identifier top_edge
        7        17 Op_assign
        7        20 Integer 300
        7        23 Semicolon
        8         5 Identifier bottom_edge
        8        17 Op_assign
        8        19 Op_subtract
        8        20 Integer 300
        8        23 Semicolon
        9         5 Identifier x_step
        9        17 Op_assign
        9        22 Integer 7
        9        23 Semicolon
       10         5 Identifier y_step
       10        17 Op_assign
       10        21 Integer 15
       10        23 Semicolon
       12         5 Identifier max_iter
       12        17 Op_assign
       12        20 Integer 200
       12        23 Semicolon
       14         5 Identifier y0
       14         8 Op_assign
       14        10 Identifier top_edge
       14        18 Semicolon
       15         5 Keyword_while
       15        11 LeftParen
       15        12 Identifier y0
       15        15 Op_greater
       15        17 Identifier bottom_edge
       15        28 RightParen
       15        30 LeftBrace
       16         9 Identifier x0
       16        12 Op_assign
       16        14 Identifier left_edge
       16        23 Semicolon
       17         9 Keyword_while
       17        15 LeftParen
       17        16 Identifier x0
       17        19 Op_less
       17        21 Identifier right_edge
       17        31 RightParen
       17        33 LeftBrace
       18        13 Identifier y
       18        15 Op_assign
       18        17 Integer 0
       18        18 Semicolon
       19        13 Identifier x
       19        15 Op_assign
       19        17 Integer 0
       19        18 Semicolon
       20        13 Identifier the_char
       20        22 Op_assign
       20        24 Integer 32
       20        27 Semicolon
       21        13 Identifier i
       21        15 Op_assign
       21        17 Integer 0
       21        18 Semicolon
       22        13 Keyword_while
       22        19 LeftParen
       22        20 Identifier i
       22        22 Op_less
       22        24 Identifier max_iter
       22        32 RightParen
       22        34 LeftBrace
       23        17 Identifier x_x
       23        21 Op_assign
       23        23 LeftParen
       23        24 Identifier x
       23        26 Op_multiply
       23        28 Identifier x
       23        29 RightParen
       23        31 Op_divide
       23        33 Integer 200
       23        36 Semicolon
       24        17 Identifier y_y
       24        21 Op_assign
       24        23 LeftParen
       24        24 Identifier y
       24        26 Op_multiply
       24        28 Identifier y
       24        29 RightParen
       24        31 Op_divide
       24        33 Integer 200
       24        36 Semicolon
       25        17 Keyword_if
       25        20 LeftParen
       25        21 Identifier x_x
       25        25 Op_add
       25        27 Identifier y_y
       25        31 Op_greater
       25        33 Integer 800
       25        37 RightParen
       25        39 LeftBrace
       26        21 Identifier the_char
       26        30 Op_assign
       26        32 Integer 48
       26        36 Op_add
       26        38 Identifier i
       26        39 Semicolon
       27        21 Keyword_if
       27        24 LeftParen
       27        25 Identifier i
       27        27 Op_greater
       27        29 Integer 9
       27        30 RightParen
       27        32 LeftBrace
       28        25 Identifier the_char
       28        34 Op_assign
       28        36 Integer 64
       28        39 Semicolon
       29        21 RightBrace
       30        21 Identifier i
       30        23 Op_assign
       30        25 Identifier max_iter
       30        33 Semicolon
       31        17 RightBrace
       32        17 Identifier y
       32        19 Op_assign
       32        21 Identifier x
       32        23 Op_multiply
       32        25 Identifier y
       32        27 Op_divide
       32        29 Integer 100
       32        33 Op_add
       32        35 Identifier y0
       32        37 Semicolon
       33        17 Identifier x
       33        19 Op_assign
       33        21 Identifier x_x
       33        25 Op_subtract
       33        27 Identifier y_y
       33        31 Op_add
       33        33 Identifier x0
       33        35 Semicolon
       34        17 Identifier i
       34        19 Op_assign
       34        21 Identifier i
       34        23 Op_add
       34        25 Integer 1
       34        26 Semicolon
       35        13 RightBrace
       36        13 Keyword_putc
       36        17 LeftParen
       36        18 Identifier the_char
       36        26 RightParen
       36        27 Semicolon
       37        13 Identifier x0
       37        16 Op_assign
       37        18 Identifier x0
       37        21 Op_add
       37        23 Identifier x_step
       37        29 Semicolon
       38         9 RightBrace
       39         9 Keyword_putc
       39        13 LeftParen
       39        14 Integer 10
       39        18 RightParen
       39        19 Semicolon
       40         9 Identifier y0
       40        12 Op_assign
       40        14 Identifier y0
       40        17 Op_subtract
       40        19 Identifier y_step
       40        25 Semicolon
       41         5 RightBrace
       42         1 RightBrace
       43         1 End_of_Input

} </lang>

Output:
Sequence
;
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier left_edge
Negate
Integer 420
;
Assign
Identifier right_edge
Integer 300
Assign
Identifier top_edge
Integer 300
Assign
Identifier bottom_edge
Negate
Integer 300
;
Assign
Identifier x_step
Integer 7
Assign
Identifier y_step
Integer 15
Assign
Identifier max_iter
Integer 200
Assign
Identifier y0
Identifier top_edge
While
Greater
Identifier y0
Identifier bottom_edge
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier x0
Identifier left_edge
While
Less
Identifier x0
Identifier right_edge
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier y
Integer 0
Assign
Identifier x
Integer 0
Assign
Identifier the_char
Integer 32
Assign
Identifier i
Integer 0
While
Less
Identifier i
Identifier max_iter
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier x_x
Divide
Multiply
Identifier x
Identifier x
Integer 200
Assign
Identifier y_y
Divide
Multiply
Identifier y
Identifier y
Integer 200
If
Greater
Add
Identifier x_x
Identifier y_y
Integer 800
if
Sequence
Sequence
Sequence
;
Assign
Identifier the_char
Add
Integer 48
Identifier i
If
Greater
Identifier i
Integer 9
if
Sequence
;
Assign
Identifier the_char
Integer 64
;
Assign
Identifier i
Identifier max_iter
;
Assign
Identifier y
Add
Divide
Multiply
Identifier x
Identifier y
Integer 100
Identifier y0
Assign
Identifier x
Add
Subtract
Identifier x_x
Identifier y_y
Identifier x0
Assign
Identifier i
Add
Identifier i
Integer 1
Putc
Identifier the_char
;
Assign
Identifier x0
Add
Identifier x0
Identifier x_step
Putc
Integer 10
;
Assign
Identifier y0
Subtract
Identifier y0
Identifier y_step

Nim

Using the third version of Nim lexer.

<lang Nim>import ast_lexer

type NodeKind* = enum

                   nIdentifier = "Identifier"
                   nString = "String"
                   nInteger = "Integer"
                   nSequence = "Sequence"
                   nIf = "If"
                   nPrtc = "Prtc"
                   nPrts = "Prts"
                   nPrti = "Prti"
                   nWhile = "While"
                   nAssign = "Assign"
                   nNegate = "Negate"
                   nNot = "Not"
                   nMultiply = "Multiply"
                   nDivide = "Divide"
                   nMod = "Mod"
                   nAdd = "Add"
                   nSubtract = "Subtract"
                   nLess = "Less"
                   nLessEqual = "LessEqual"
                   nGreater = "Greater"
                   nGreaterEqual = "GreaterEqual"
                   nEqual = "Equal"
                   nNotEqual = "NotEqual"
                   nAnd = "And"
                   nOr = "Or"

type Node* = ref object

 left*: Node
 right*: Node
 case kind*: NodeKind
 of nString: stringVal*: string
 of nInteger: intVal*: int
 of nIdentifier: name*: string
 else: nil

type Operator = range[tokMult..tokOr]

const

 Precedences: array[Operator, int] = [13,  # tokMult
                                      13,  # tokDiv
                                      13,  # tokMod
                                      12,  # tokAdd
                                      12,  # tokSub
                                      10,  # tokLess
                                      10,  # tokLessEq
                                      10,  # tokGreater
                                      10,  # tokGreaterEq
                                       9,  # tokEq
                                       9,  # tokNeq
                                      14,  # tokNot
                                      -1,  # tokAssign
                                       5,  # tokAnd
                                       4]  # tokOr
 UnaryPrecedence = 14
 BinaryOperators = {tokMult, tokDiv, tokMod, tokAdd, tokSub, tokLess, tokLessEq,
                   tokGreater, tokGreaterEq, tokEq, tokNotEq, tokAnd, tokOr}
 # Mapping of operators from TokenKind to NodeKind.
 NodeKinds: array[Operator, NodeKind] = [nMultiply, nDivide, nMod, nAdd, nSubtract,
                                         nLess, nLessEqual, nGreater, nGreaterEqual,
                                         nEqual, nNotEqual, nNot, nAssign, nAnd, nOr]

type SyntaxError* = object of CatchableError


template expect(token: Token; expected: TokenKind; errmsg: string) =

 ## Check if a token is of the expected kind.
 ## Raise a SyntaxError if this is not the case.
 if token.kind != expected:
   raise newException(SyntaxError, "line " & $token.ln & ": " & errmsg)
 token = lexer.next()
  1. ---------------------------------------------------------------------------------------------------

proc newNode*(kind: NodeKind; left: Node; right: Node = nil): Node =

 ## Create a new node with given left and right children.
 result = Node(kind: kind, left: left, right: right)
  1. ---------------------------------------------------------------------------------------------------
  1. Forward reference.

proc parExpr(lexer: var Lexer; token: var Token): Node

  1. ---------------------------------------------------------------------------------------------------

proc expr(lexer: var Lexer; token: var Token; p: int): Node =

 ## Parse an expression.
 case token.kind
 of tokLPar:
   result = parExpr(lexer, token)
 of tokAdd, tokSub, tokNot:
   # Unary operators.
   let savedToken = token
   token = lexer.next()
   let e = expr(lexer, token, UnaryPrecedence)
   if savedToken.kind == tokAdd:
     result = e
   else:
     result = newNode(if savedToken.kind == tokSub: nNegate else: nNot, e)
 of tokIdent:
   result = Node(kind: nIdentifier, name: token.ident)
   token = lexer.next()
 of tokInt:
   result = Node(kind:nInteger, intVal: token.intVal)
   token = lexer.next()
 of tokChar:
   result = Node(kind:nInteger, intVal: ord(token.charVal))
   token = lexer.next()
 else:
   raise newException(SyntaxError, "Unexpected symbol at line " & $token.ln)
 # Process the binary operators in the expression.
 while token.kind in BinaryOperators and Precedences[token.kind] >= p:
   let savedToken = token
   token = lexer.next()
   let q = Precedences[savedToken.kind] + 1  # No operator is right associative.
   result = newNode(NodeKinds[savedToken.kind], result, expr(lexer, token, q))
  1. ---------------------------------------------------------------------------------------------------

proc parExpr(lexer: var Lexer; token: var Token): Node =

 ## Parse a parenthetized expression.
 token.expect(tokLPar, "'(' expected")
 result = expr(lexer, token, 0)
 token.expect(tokRPar, "')' expected")
  1. ---------------------------------------------------------------------------------------------------

proc stmt(lexer: var Lexer; token: var Token): Node =

 ## Parse a statement.
 case token.kind:
 of tokIf:
   token = lexer.next()
   let e = parExpr(lexer, token)
   let thenNode = stmt(lexer, token)
   var elseNode: Node = nil
   if token.kind == tokElse:
     token = lexer.next()
     elseNode = stmt(lexer, token)
   result = newNode(nIf, e, newNode(nIf, thenNode, elseNode))
 of tokPutc:
   token = lexer.next()
   result = newNode(nPrtc, parExpr(lexer, token))
   token.expect(tokSemi, "';' expected")
 of tokPrint:
   token = lexer.next()
   token.expect(tokLPar, "'(' expected")
   while true:
     var e: Node
     if token.kind == tokString:
       e = newNode(nPrts, Node(kind: nString, stringVal: token.stringVal))
       token = lexer.next()
     else:
       e = newNode(nPrti, expr(lexer, token, 0))
     result = newNode(nSequence, result, e)
     if token.kind == tokComma:
       token = lexer.next()
     else:
       break
   token.expect(tokRPar, "')' expected")
   token.expect(tokSemi, "';' expected")
 of tokSemi:
   token = lexer.next()
 of tokIdent:
   let v = Node(kind: nIdentifier, name: token.ident)
   token = lexer.next()
   token.expect(tokAssign, "'=' expected")
   result = newNode(nAssign, v, expr(lexer, token, 0))
   token.expect(tokSemi, "';' expected")
 of tokWhile:
   token = lexer.next()
   let e = parExpr(lexer, token)
   result = newNode(nWhile, e, stmt(lexer, token))
 of tokLBrace:
   token = lexer.next()
   while token.kind notin {tokRBrace, tokEnd}:
     result = newNode(nSequence, result, stmt(lexer, token))
   token.expect(tokRBrace, "'}' expected")
 of tokEnd:
   discard
 else:
   raise newException(SyntaxError, "Unexpected symbol at line " & $token.ln)
  1. ---------------------------------------------------------------------------------------------------

proc parse*(code: string): Node =

 ## Parse the code provided.
 var lexer = initLexer(code)
 var token = lexer.next()
 while true:
   result = newNode(nSequence, result, stmt(lexer, token))
   if token.kind == tokEnd:
     break
  1. ———————————————————————————————————————————————————————————————————————————————————————————————————

when isMainModule:

 import os, strformat, strutils
 proc printAst(node: Node) =
   ## Print tha AST in linear form.
   if node.isNil:
     echo ';'
   else:
     stdout.write &"{$node.kind:<14}"
     case node.kind
     of nIdentifier:
       echo node.name
     of nInteger:
       echo node.intVal
     of nString:
       # Need to escape and to replace hexadecimal \x0A by \n.
       echo escape(node.stringVal).replace("\\x0A", "\\n")
     else:
       echo ""
       node.left.printAst()
       node.right.printAst()


 let code = if paramCount() < 1: stdin.readAll() else: paramStr(1).readFile()
 let tree = parse(code)
 tree.printAst()</lang>
Output:

Prime number program AST.

Sequence      
Sequence      
Sequence      
Sequence      
Sequence      
;
Assign        
Identifier    count
Integer       1
Assign        
Identifier    n
Integer       1
Assign        
Identifier    limit
Integer       100
While         
Less          
Identifier    n
Identifier    limit
Sequence      
Sequence      
Sequence      
Sequence      
Sequence      
;
Assign        
Identifier    k
Integer       3
Assign        
Identifier    p
Integer       1
Assign        
Identifier    n
Add           
Identifier    n
Integer       2
While         
And           
LessEqual     
Multiply      
Identifier    k
Identifier    k
Identifier    n
Identifier    p
Sequence      
Sequence      
;
Assign        
Identifier    p
NotEqual      
Multiply      
Divide        
Identifier    n
Identifier    k
Identifier    k
Identifier    n
Assign        
Identifier    k
Add           
Identifier    k
Integer       2
If            
Identifier    p
If            
Sequence      
Sequence      
;
Sequence      
Sequence      
;
Prti          
Identifier    n
;
Prts          
String        " is prime\n"
;
Assign        
Identifier    count
Add           
Identifier    count
Integer       1
;
Sequence      
Sequence      
Sequence      
;
Prts          
String        "Total primes found: "
;
Prti          
Identifier    count
;
Prts          
String        "\n"
;

Perl

Tested on perl v5.26.1 <lang Perl>#!/usr/bin/perl

use strict; # parse.pl - inputs lex, outputs flattened ast use warnings; # http://www.rosettacode.org/wiki/Compiler/syntax_analyzer

my $h = qr/\G\s*\d+\s+\d+\s+/; # header of each line

sub error { die "*** Expected @_ at " . (/\G(.*\n)/ ?

 $1 =~ s/^\s*(\d+)\s+(\d+)\s+/line $1 character $2 got /r : "EOF\n") }

sub want { /$h \Q$_[1]\E.*\n/gcx ? shift : error "'$_[1]'" }

local $_ = join , <>; print want stmtlist(), 'End_of_input';

sub stmtlist

 {
 /(?=$h (RightBrace|End_of_input))/gcx and return ";\n";
 my ($stmt, $stmtlist) = (stmt(), stmtlist());
 $stmtlist eq ";\n" ? $stmt : "Sequence\n$stmt$stmtlist";
 }

sub stmt

 {
 /$h Semicolon\n/gcx ? ";\n" :
   /$h Identifier \s+ (\w+) \n/gcx ? want("Assign\nIdentifier\t$1\n",
     'Op_assign') . want expr(0), 'Semicolon' :
   /$h Keyword_while \n/gcx ? "While\n" . parenexp() . stmt() :
   /$h Keyword_if \n/gcx ?  "If\n" . parenexp() . "If\n" . stmt() .
     (/$h Keyword_else \n/gcx ? stmt() : ";\n") :
   /$h Keyword_print \n/gcx ? want(, 'LeftParen') .
     want want(printlist(), 'RightParen'), 'Semicolon' :
   /$h Keyword_putc \n/gcx ? want "Prtc\n" . parenexp() . ";\n", 'Semicolon' :
   /$h LeftBrace \n/gcx ? want stmtlist(), 'RightBrace' :
   error 'A STMT';
 }

sub parenexp { want(, 'LeftParen') . want expr(0), 'RightParen' } # (expr)

sub printlist

 {
 my $ast = /$h String \s+ (".*") \n/gcx ?
   "Prts\nString\t\t$1\n;\n" : "Prti\n" . expr(0) . ";\n";
 /$h Comma \n/gcx ? "Sequence\n$ast" . printlist() : $ast;
 }

sub expr # (sort of EBNF) expr = operand { operator expr }

 {
 my $ast =                                        # operand
   /$h Integer \s+ (\d+) \n/gcx ? "Integer\t\t$1\n" :
   /$h Identifier \s+ (\w+) \n/gcx ? "Identifier\t$1\n" :
   /$h LeftParen \n/gcx ? want expr(0), 'RightParen' :
   /$h Op_(negate|subtract) \n/gcx ? "Negate\n" . expr(8) . ";\n" :
   /$h Op_not \n/gcx ? "Not\n" . expr(8) . ";\n" :
   /$h Op_add \n/gcx ? expr(8) :
   error "A PRIMARY";
 $ast =                                           # { operator expr }
   $_[0] <= 7 && /$h Op_multiply \n/gcx ? "Multiply\n$ast" . expr(8) :
   $_[0] <= 7 && /$h Op_divide \n/gcx ? "Divide\n$ast" . expr(8) :
   $_[0] <= 7 && /$h Op_mod \n/gcx ? "Mod\n$ast" . expr(8) :
   $_[0] <= 6 && /$h Op_add \n/gcx ? "Add\n$ast" . expr(7) :
   $_[0] <= 6 && /$h Op_subtract \n/gcx ? "Subtract\n$ast" . expr(7) :
   $_[0] == 5 && /(?=$h Op_(less|greater)(equal)? \n)/gcx ? error 'NO ASSOC' :
   $_[0] <= 5 && /$h Op_lessequal \n/gcx ? "LessEqual\n$ast" . expr(5) :
   $_[0] <= 5 && /$h Op_less \n/gcx ? "Less\n$ast" . expr(5) :
   $_[0] <= 5 && /$h Op_greater \n/gcx ? "Greater\n$ast" . expr(5) :
   $_[0] <= 5 && /$h Op_greaterequal \n/gcx ?  "GreaterEqual\n$ast" . expr(5) :
   $_[0] == 3 && /(?=$h Op_(not)?equal \n)/gcx ? error 'NO ASSOC' :
   $_[0] <= 3 && /$h Op_equal \n/gcx ? "Equal\n$ast" . expr(3) :
   $_[0] <= 3 && /$h Op_notequal \n/gcx ? "NotEqual\n$ast" . expr(3) :
   $_[0] <= 1 && /$h Op_and \n/gcx ? "And\n$ast" . expr(2) :
   $_[0] <= 0 && /$h Op_or \n/gcx ? "Or\n$ast" . expr(1) :
   return $ast while 1;
 }</lang>
Output  —  Count AST:

Sequence
Assign
Identifier      count
Integer         1
While
Less
Identifier      count
Integer         10
Sequence
Sequence
Prts
String          "count is: "
;
Sequence
Prti
Identifier      count
;
Prts
String          "\n"
;
Assign
Identifier      count
Add
Identifier      count
Integer         1

Phix

Reusing lex.e (and core.e) from the Lexical Analyzer task, and again written as a reusable module.

--
-- demo\rosetta\Compiler\parse.e
-- =============================
--
--  The reusable part of parse.exw
--
with javascript_semantics
include lex.e

sequence tok

procedure errd(sequence msg, sequence args={})
    {tok_line,tok_col} = tok
    error(msg,args)
end procedure

global sequence toks
integer next_tok = 1

function get_tok()
    sequence tok = toks[next_tok]
    next_tok += 1
    return tok
end function

procedure expect(string msg, integer s)
    integer tk = tok[3]
    if tk!=s then
        errd("%s: Expecting '%s', found '%s'\n", {msg, tkNames[s], tkNames[tk]})
    end if
    tok = get_tok()
end procedure

function expr(integer p)
    object x = NULL, node
    integer op = tok[3] 

    switch op do
        case tk_LeftParen:
            tok = get_tok()
            x = expr(0)
            expect("expr",tk_RightParen)
        case tk_sub: 
        case tk_add:
            tok = get_tok()
            node = expr(precedences[tk_neg]);
            x = iff(op==tk_sub?{tk_neg, node, NULL}:node)
        case tk_not:
            tok = get_tok();
            x = {tk_not, expr(precedences[tk_not]), NULL}
        case tk_Identifier:
            x = {tk_Identifier, tok[4]}
            tok = get_tok();
        case tk_Integer:
            x = {tk_Integer, tok[4]}
            tok = get_tok();
        default:
            errd("Expecting a primary, found: %s\n", tkNames[op])
    end switch
 
    op = tok[3]
    while narys[op]=BINARY 
      and precedences[op]>=p do
        tok = get_tok()
        x = {op, x, expr(precedences[op]+1)}
        op = tok[3]
    end while
    return x;
end function

function paren_expr(string msg)
    expect(msg, tk_LeftParen);
    object t = expr(0)
    expect(msg, tk_RightParen);
    return t
end function

function stmt()
    object t = NULL, e, s
 
    switch tok[3] do
        case tk_if:
            tok = get_tok();
            object condition = paren_expr("If-cond");
            object ifblock = stmt();
            object elseblock = NULL;
            if tok[3] == tk_else then
                tok = get_tok();
                elseblock = stmt();
            end if
            t = {tk_if, condition, {tk_if, ifblock, elseblock}}
        case tk_putc:
            tok = get_tok();
            e = paren_expr("Prtc")
            t = {tk_putc, e, NULL}
            expect("Putc", tk_Semicolon);
        case tk_print:
            tok = get_tok();
            expect("Print",tk_LeftParen)
            while 1 do
                if tok[3] == tk_String then
                    e = {tk_Prints, {tk_String, tok[4]}, NULL}
                    tok = get_tok();
                else
                    e = {tk_Printi, expr(0), NULL}
                end if
                t = {tk_Sequence, t, e}
                if tok[3]!=tk_Comma then exit end if
                expect("Print", tk_Comma)
            end while
            expect("Print", tk_RightParen);
            expect("Print", tk_Semicolon);
        case tk_Semicolon:
            tok = get_tok();
        case tk_Identifier:
            object v
            v = {tk_Identifier, tok[4]}
            tok = get_tok();
            expect("assign", tk_assign);
            e = expr(0);
            t = {tk_assign, v, e}
            expect("assign", tk_Semicolon);
        case tk_while:
            tok = get_tok();
            e = paren_expr("while");
            s = stmt();
            t = {tk_while, e, s}
        case tk_LeftBrace:      /* {stmt} */
            expect("LeftBrace", tk_LeftBrace)
            while not find(tok[3],{tk_RightBrace,tk_EOI}) do
                t = {tk_Sequence, t, stmt()}
            end while
            expect("LeftBrace", tk_RightBrace);
            break;
        case tk_EOI:
            break;
        default: 
            errd("expecting start of statement, found '%s'\n", tkNames[tok[3]]);
    end switch
    return t
end function

global function parse()
    object t = NULL
    tok = get_tok()
    while 1 do
        object s = stmt()
        if s=NULL then exit end if
        t = {tk_Sequence, t, s}
    end while
    return t
end function

And a simple test driver for the specific task:

--
-- demo\rosetta\Compiler\parse.exw
-- ===============================
--
with javascript_semantics
include parse.e

procedure print_ast(object t)
    if t == NULL then
        printf(output_file,";\n")
    else
        integer ttype = t[1]
        printf(output_file,tkNames[ttype])
        if ttype=tk_Identifier then
            printf(output_file," %s\n",t[2])
        elsif ttype=tk_Integer then
            printf(output_file," %d\n",t[2])
        elsif ttype=tk_String then
            printf(output_file," %s\n",enquote(t[2]))
        else
            printf(output_file,"\n")
            print_ast(t[2])
            print_ast(t[3])
        end if
    end if
end procedure

function ptree(object t)
    if sequence(t) then
        integer t1 = t[1]
        t = deep_copy(t)
        t[1] = tkNames[t1]
        if not find(t1,{tk_Identifier,tk_String}) then
            for i=2 to length(t) do
                if t1=tk_Sequence and t[i]=NULL then
                    t[i] = "NULL"
                else
                    t[i] = ptree(t[i])
                end if
            end for
        end if
    end if
    return t
end function

procedure main(sequence cl)
    open_files(cl)
    toks = lex()
    object t = parse()
    print_ast(t)
pp(ptree(t),{pp_Nest,10,pp_Pause,0,pp_IntCh,false})
    close_files()
end procedure

--main(command_line())
main({0,0,"test3.c"}) -- not parseable!
--main({0,0,"primes.c"})    -- as Algol, C, Python (apart from spacing)
--main({0,0,"count.c"})     -- as AWK              (       ""         )
Output:
Line 5 column 40:
Print: Expecting 'LeftParen', found 'Op_subtract'

Python

Tested with Python 2.7 and 3.x <lang Python>from __future__ import print_function import sys, shlex, operator

tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr, \ tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print, \ tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident, \ tk_Integer, tk_String = range(31)

nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, \ nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, \ nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or = range(25)

  1. must have same order as above

Tokens = [

   ["EOI"             , False, False, False, -1, -1        ],
   ["*"               , False, True,  False, 13, nd_Mul    ],
   ["/"               , False, True,  False, 13, nd_Div    ],
   ["%"               , False, True,  False, 13, nd_Mod    ],
   ["+"               , False, True,  False, 12, nd_Add    ],
   ["-"               , False, True,  False, 12, nd_Sub    ],
   ["-"               , False, False, True,  14, nd_Negate ],
   ["!"               , False, False, True,  14, nd_Not    ],
   ["<"               , False, True,  False, 10, nd_Lss    ],
   ["<="              , False, True,  False, 10, nd_Leq    ],
   [">"               , False, True,  False, 10, nd_Gtr    ],
   [">="              , False, True,  False, 10, nd_Geq    ],
   ["=="              , False, True,  False,  9, nd_Eql    ],
   ["!="              , False, True,  False,  9, nd_Neq    ],
   ["="               , False, False, False, -1, nd_Assign ],
   ["&&"              , False, True,  False,  5, nd_And    ],
   ["||"              , False, True,  False,  4, nd_Or     ],
   ["if"              , False, False, False, -1, nd_If     ],
   ["else"            , False, False, False, -1, -1        ],
   ["while"           , False, False, False, -1, nd_While  ],
   ["print"           , False, False, False, -1, -1        ],
   ["putc"            , False, False, False, -1, -1        ],
   ["("               , False, False, False, -1, -1        ],
   [")"               , False, False, False, -1, -1        ],
   ["{"               , False, False, False, -1, -1        ],
   ["}"               , False, False, False, -1, -1        ],
   [";"               , False, False, False, -1, -1        ],
   [","               , False, False, False, -1, -1        ],
   ["Ident"           , False, False, False, -1, nd_Ident  ],
   ["Integer literal" , False, False, False, -1, nd_Integer],
   ["String literal"  , False, False, False, -1, nd_String ]
   ]

all_syms = {"End_of_input"  : tk_EOI, "Op_multiply"  : tk_Mul,

           "Op_divide"      : tk_Div,     "Op_mod"         : tk_Mod,
           "Op_add"         : tk_Add,     "Op_subtract"    : tk_Sub,
           "Op_negate"      : tk_Negate,  "Op_not"         : tk_Not,
           "Op_less"        : tk_Lss,     "Op_lessequal"   : tk_Leq,
           "Op_greater"     : tk_Gtr,     "Op_greaterequal": tk_Geq,
           "Op_equal"       : tk_Eql,     "Op_notequal"    : tk_Neq,
           "Op_assign"      : tk_Assign,  "Op_and"         : tk_And,
           "Op_or"          : tk_Or,      "Keyword_if"     : tk_If,
           "Keyword_else"   : tk_Else,    "Keyword_while"  : tk_While,
           "Keyword_print"  : tk_Print,   "Keyword_putc"   : tk_Putc,
           "LeftParen"      : tk_Lparen,  "RightParen"     : tk_Rparen,
           "LeftBrace"      : tk_Lbrace,  "RightBrace"     : tk_Rbrace,
           "Semicolon"      : tk_Semi,    "Comma"          : tk_Comma,
           "Identifier"     : tk_Ident,   "Integer"        : tk_Integer,
           "String"         : tk_String}

Display_nodes = ["Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts",

   "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add",
   "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual",
   "And", "Or"]

TK_NAME = 0 TK_RIGHT_ASSOC = 1 TK_IS_BINARY = 2 TK_IS_UNARY = 3 TK_PRECEDENCE = 4 TK_NODE = 5

input_file = None err_line = None err_col = None tok = None tok_text = None

        • show error and exit

def error(msg):

   print("(%d, %d) %s" % (int(err_line), int(err_col), msg))
   exit(1)

def gettok():

   global err_line, err_col, tok, tok_text, tok_other
   line = input_file.readline()
   if len(line) == 0:
       error("empty line")
   line_list = shlex.split(line, False, False)
   # line col Ident var_name
   # 0    1   2     3
   err_line = line_list[0]
   err_col  = line_list[1]
   tok_text = line_list[2]
   tok = all_syms.get(tok_text)
   if tok == None:
       error("Unknown token %s" % (tok_text))
   tok_other = None
   if tok in [tk_Integer, tk_Ident, tk_String]:
       tok_other = line_list[3]

class Node:

   def __init__(self, node_type, left = None, right = None, value = None):
       self.node_type  = node_type
       self.left  = left
       self.right = right
       self.value = value

def make_node(oper, left, right = None):

   return Node(oper, left, right)

def make_leaf(oper, n):

   return Node(oper, value = n)

def expect(msg, s):

   if tok == s:
       gettok()
       return
   error("%s: Expecting '%s', found '%s'" % (msg, Tokens[s][TK_NAME], Tokens[tok][TK_NAME]))

def expr(p):

   x = None
   if tok == tk_Lparen:
       x = paren_expr()
   elif tok in [tk_Sub, tk_Add]:
       op = (tk_Negate if tok == tk_Sub else tk_Add)
       gettok()
       node = expr(Tokens[tk_Negate][TK_PRECEDENCE])
       x = (make_node(nd_Negate, node) if op == tk_Negate else node)
   elif tok == tk_Not:
       gettok()
       x = make_node(nd_Not, expr(Tokens[tk_Not][TK_PRECEDENCE]))
   elif tok == tk_Ident:
       x = make_leaf(nd_Ident, tok_other)
       gettok()
   elif tok == tk_Integer:
       x = make_leaf(nd_Integer, tok_other)
       gettok()
   else:
       error("Expecting a primary, found: %s" % (Tokens[tok][TK_NAME]))
   while Tokens[tok][TK_IS_BINARY] and Tokens[tok][TK_PRECEDENCE] >= p:
       op = tok
       gettok()
       q = Tokens[op][TK_PRECEDENCE]
       if not Tokens[op][TK_RIGHT_ASSOC]:
           q += 1
       node = expr(q)
       x = make_node(Tokens[op][TK_NODE], x, node)
   return x

def paren_expr():

   expect("paren_expr", tk_Lparen)
   node = expr(0)
   expect("paren_expr", tk_Rparen)
   return node

def stmt():

   t = None
   if tok == tk_If:
       gettok()
       e = paren_expr()
       s = stmt()
       s2 = None
       if tok == tk_Else:
           gettok()
           s2 = stmt()
       t = make_node(nd_If, e, make_node(nd_If, s, s2))
   elif tok == tk_Putc:
       gettok()
       e = paren_expr()
       t = make_node(nd_Prtc, e)
       expect("Putc", tk_Semi)
   elif tok == tk_Print:
       gettok()
       expect("Print", tk_Lparen)
       while True:
           if tok == tk_String:
               e = make_node(nd_Prts, make_leaf(nd_String, tok_other))
               gettok()
           else:
               e = make_node(nd_Prti, expr(0))
           t = make_node(nd_Sequence, t, e)
           if tok != tk_Comma:
               break
           gettok()
       expect("Print", tk_Rparen)
       expect("Print", tk_Semi)
   elif tok == tk_Semi:
       gettok()
   elif tok == tk_Ident:
       v = make_leaf(nd_Ident, tok_other)
       gettok()
       expect("assign", tk_Assign)
       e = expr(0)
       t = make_node(nd_Assign, v, e)
       expect("assign", tk_Semi)
   elif tok == tk_While:
       gettok()
       e = paren_expr()
       s = stmt()
       t = make_node(nd_While, e, s)
   elif tok == tk_Lbrace:
       gettok()
       while tok != tk_Rbrace and tok != tk_EOI:
           t = make_node(nd_Sequence, t, stmt())
       expect("Lbrace", tk_Rbrace)
   elif tok == tk_EOI:
       pass
   else:
       error("Expecting start of statement, found: %s" % (Tokens[tok][TK_NAME]))
   return t

def parse():

   t = None
   gettok()
   while True:
       t = make_node(nd_Sequence, t, stmt())
       if tok == tk_EOI or t == None:
           break
   return t

def prt_ast(t):

   if t == None:
       print(";")
   else:
       print("%-14s" % (Display_nodes[t.node_type]), end=)
       if t.node_type in [nd_Ident, nd_Integer]:
           print("%s" % (t.value))
       elif t.node_type == nd_String:
           print("%s" %(t.value))
       else:
           print("")
           prt_ast(t.left)
           prt_ast(t.right)
        • main driver

input_file = sys.stdin if len(sys.argv) > 1:

   try:
       input_file = open(sys.argv[1], "r", 4096)
   except IOError as e:
       error(0, 0, "Can't open %s" % sys.argv[1])

t = parse() prt_ast(t)</lang>

Output  —  prime numbers AST:

Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
Assign
Identifier    n
Integer       1
Assign
Identifier    limit
Integer       100
While
Less
Identifier    n
Identifier    limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    k
Integer       3
Assign
Identifier    p
Integer       1
Assign
Identifier    n
Add
Identifier    n
Integer       2
While
And
LessEqual
Multiply
Identifier    k
Identifier    k
Identifier    n
Identifier    p
Sequence
Sequence
;
Assign
Identifier    p
NotEqual
Multiply
Divide
Identifier    n
Identifier    k
Identifier    k
Identifier    n
Assign
Identifier    k
Add
Identifier    k
Integer       2
If
Identifier    p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier    n
;
Prts
String        " is prime\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
;
Sequence
Sequence
Sequence
;
Prts
String        "Total primes found: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;

Scala

The complete implementation for the compiler tasks can be found in a GitHub repository at github.com/edadma/rosettacodeCompiler which includes full unit testing for the samples given in Compiler/Sample programs.

The following code implements a configurable (from a symbol map provided as a parameter) Precedence Climbing parser for the output of the lexer. The recursive descent language parser is closely based on the pseudo code given in the task description.

<lang scala> package xyz.hyperreal.rosettacodeCompiler

import scala.io.Source

object SyntaxAnalyzer {

 val symbols =
   Map[String, (PrefixOperator, InfixOperator)](
     "Op_or"           -> (null, InfixOperator(10, LeftAssoc, BranchNode("Or", _, _))),
     "Op_and"          -> (null, InfixOperator(20, LeftAssoc, BranchNode("And", _, _))),
     "Op_equal"        -> (null, InfixOperator(30, LeftAssoc, BranchNode("Equal", _, _))),
     "Op_notequal"     -> (null, InfixOperator(30, LeftAssoc, BranchNode("NotEqual", _, _))),
     "Op_less"         -> (null, InfixOperator(40, LeftAssoc, BranchNode("Less", _, _))),
     "Op_lessequal"    -> (null, InfixOperator(40, LeftAssoc, BranchNode("LessEqual", _, _))),
     "Op_greater"      -> (null, InfixOperator(40, LeftAssoc, BranchNode("Greater", _, _))),
     "Op_greaterequal" -> (null, InfixOperator(40, LeftAssoc, BranchNode("GreaterEqual", _, _))),
     "Op_add"          -> (PrefixOperator(30, identity), InfixOperator(50, LeftAssoc, BranchNode("Add", _, _))),
     "Op_minus" -> (PrefixOperator(70, BranchNode("Negate", _, TerminalNode)), InfixOperator(
       50,
       LeftAssoc,
       BranchNode("Subtract", _, _))),
     "Op_multiply" -> (null, InfixOperator(60, LeftAssoc, BranchNode("Multiply", _, _))),
     "Op_divide"   -> (null, InfixOperator(60, LeftAssoc, BranchNode("Divide", _, _))),
     "Op_mod"      -> (null, InfixOperator(60, RightAssoc, BranchNode("Mod", _, _))),
     "Op_not"      -> (PrefixOperator(70, BranchNode("Not", _)), null),
     "LeftParen"   -> null,
     "RightParen"  -> null
   )
 def apply = new SyntaxAnalyzer(symbols)
 abstract class Node
 case class LeafNode(name: String, value: String)                            extends Node
 case class BranchNode(name: String, left: Node, right: Node = TerminalNode) extends Node
 case object TerminalNode                                                    extends Node
 abstract class Assoc
 case object LeftAssoc  extends Assoc
 case object RightAssoc extends Assoc
 abstract class Operator
 case class InfixOperator(prec: Int, assoc: Assoc, compute: (Node, Node) => Node) extends Operator
 case class PrefixOperator(prec: Int, compute: Node => Node)                      extends Operator

}

class SyntaxAnalyzer(symbols: Map[String, (SyntaxAnalyzer.PrefixOperator, SyntaxAnalyzer.InfixOperator)]) {

 import SyntaxAnalyzer.{BranchNode, InfixOperator, LeafNode, LeftAssoc, Node, PrefixOperator, TerminalNode}
 def fromStdin = fromSource(Source.stdin)
 def fromString(src: String) = fromSource(Source.fromString(src))
 def fromSource(s: Source) = {
   val tokens = ((s.getLines map (_.trim.split(" +", 4)) map {
     case Array(line, col, name) =>
       symbols get name match {
         case None | Some(null) => SimpleToken(line.toInt, col.toInt, name)
         case Some(operators)   => OperatorToken(line.toInt, col.toInt, name, operators)
       }
     case Array(line, col, name, value) => ValueToken(line.toInt, col.toInt, name, value)
   }) toStream)
   flatten(parse(tokens))
 }
 def flatten(n: Node): Unit =
   n match {
     case TerminalNode          => println(";")
     case LeafNode(name, value) => println(s"$name $value")
     case BranchNode(name, left, right) =>
       println(name)
       flatten(left)
       flatten(right)
   }
 def parse(toks: Stream[Token]) = {
   var cur = toks
   def next = cur = cur.tail
   def token = cur.head
   def consume = {
     val res = token
     next
     res
   }
   def accept(name: String) =
     if (token.name == name) {
       next
       true
     } else
       false
   def expect(name: String, error: String = null) =
     if (token.name != name)
       sys.error(if (error eq null) s"expected $name, found ${token.name}" else s"$error: $token")
     else
       next
   def expression(minPrec: Int): Node = {
     def infixOperator = token.asInstanceOf[OperatorToken].operators._2
     def isInfix = token.isInstanceOf[OperatorToken] && infixOperator != null
     var result =
       consume match {
         case SimpleToken(_, _, "LeftParen") =>
           val result = expression(0)
           expect("RightParen", "expected closing parenthesis")
           result
         case ValueToken(_, _, name, value)                         => LeafNode(name, value)
         case OperatorToken(_, _, _, (prefix, _)) if prefix ne null => prefix.compute(expression(prefix.prec))
         case OperatorToken(_, _, _, (_, infix)) if infix ne null =>
           sys.error(s"expected a primitive expression, not an infix operator: $token")
       }
     while (isInfix && infixOperator.prec >= minPrec) {
       val InfixOperator(prec, assoc, compute) = infixOperator
       val nextMinPrec                         = if (assoc == LeftAssoc) prec + 1 else prec
       next
       result = compute(result, expression(nextMinPrec))
     }
     result
   }
   def parenExpression = {
     expect("LeftParen")
     val e = expression(0)
     expect("RightParen")
     e
   }
   def statement: Node = {
     var stmt: Node = TerminalNode
     if (accept("Keyword_if"))
       stmt = BranchNode("If",
                         parenExpression,
                         BranchNode("If", statement, if (accept("Keyword_else")) statement else TerminalNode))
     else if (accept("Keyword_putc")) {
       stmt = BranchNode("Prtc", parenExpression)
       expect("Semicolon")
     } else if (accept("Keyword_print")) {
       expect("LeftParen")
       do {
         val e =
           if (token.name == "String")
             BranchNode("Prts", LeafNode("String", consume.asInstanceOf[ValueToken].value))
           else
             BranchNode("Prti", expression(0))
         stmt = BranchNode("Sequence", stmt, e)
       } while (accept("Comma"))
       expect("RightParen")
       expect("Semicolon")
     } else if (token.name == "Semicolon")
       next
     else if (token.name == "Identifier") {
       val ident = LeafNode("Identifier", consume.asInstanceOf[ValueToken].value)
       expect("Op_assign")
       stmt = BranchNode("Assign", ident, expression(0))
       expect("Semicolon")
     } else if (accept("Keyword_while"))
       stmt = BranchNode("While", parenExpression, statement)
     else if (accept("LeftBrace")) {
       while (token.name != "RightBrace" && token.name != "End_of_input") {
         stmt = BranchNode("Sequence", stmt, statement)
       }
       expect("RightBrace")
     } else
       sys.error(s"syntax error: $token")
     stmt
   }
   var tree: Node = TerminalNode
   do {
     tree = BranchNode("Sequence", tree, statement)
   } while (token.name != "End_of_input")
   expect("End_of_input")
   tree
 }
 abstract class Token {
   val line: Int;
   val col: Int;
   val name: String
 }
 case class SimpleToken(line: Int, col: Int, name: String)                                               extends Token
 case class ValueToken(line: Int, col: Int, name: String, value: String)                                 extends Token
 case class OperatorToken(line: Int, col: Int, name: String, operators: (PrefixOperator, InfixOperator)) extends Token

} </lang>

Scheme

Code implements a recursive descent parser based on the given grammar. Tested against all programs in Compiler/Sample programs.

<lang scheme> (import (scheme base)

       (scheme process-context)
       (scheme write))

(define *names* (list (cons 'Op_add 'Add)

                     (cons 'Op_subtract 'Subtract)
                     (cons 'Op_multiply 'Multiply)
                     (cons 'Op_divide 'Divide)
                     (cons 'Op_mod 'Mod)
                     (cons 'Op_not 'Not)
                     (cons 'Op_equal 'Equal)
                     (cons 'Op_notequal 'NotEqual)
                     (cons 'Op_or 'Or)
                     (cons 'Op_and 'And)
                     (cons 'Op_less 'Less)
                     (cons 'Op_lessequal 'LessEqual)
                     (cons 'Op_greater 'Greater)
                     (cons 'Op_greaterequal 'GreaterEqual)))

(define (retrieve-name type)

 (let ((res (assq type *names*)))
   (if res
     (cdr res)
     (error "Unknown type name"))))
takes a vector of tokens

(define (parse tokens) ; read statements, until hit end of tokens

 (define posn 0)
 (define (peek-token)
   (vector-ref tokens posn))
 (define (get-token)
   (set! posn (+ 1 posn))
   (vector-ref tokens (- posn 1)))
 (define (match type)
   (if (eq? (car (vector-ref tokens posn)) type)
     (set! posn (+ 1 posn))
     (error "Could not match token type" type)))
 ; make it easier to read token parts
 (define type car)
 (define value cadr)
 ;
 ;; left associative read of one or more items with given separators
 (define (read-one-or-more reader separators)
   (let loop ((lft (reader)))
     (let ((next (peek-token)))
       (if (memq (type next) separators)
         (begin (match (type next))
                (loop (list (retrieve-name (type next)) lft (reader))))
         lft))))
 ;
 ;; read one or two items with given separator
 (define (read-one-or-two reader separators)
   (let* ((lft (reader))
          (next (peek-token)))
     (if (memq (type next) separators)
       (begin (match (type next))
              (list (retrieve-name (type next)) lft (reader)))
       lft)))
 ;
 (define (read-primary)
   (let ((next (get-token)))
     (case (type next)
       ((Identifier Integer)
        next)
       ((LeftParen)
        (let ((v (read-expr)))
          (match 'RightParen)
          v))
       ((Op_add) ; + sign is ignored 
        (read-primary))
       ((Op_not)
        (list 'Not (read-primary) '()))
       ((Op_subtract)
        (list 'Negate (read-primary) '()))
       (else
         (error "Unknown primary type")))))
 ;
 (define (read-multiplication-expr) ; *
   (read-one-or-more read-primary '(Op_multiply Op_divide Op_mod)))
 ;
 (define (read-addition-expr) ; *
   (read-one-or-more read-multiplication-expr '(Op_add Op_subtract)))
 ;
 (define (read-relational-expr) ; ?
   (read-one-or-two read-addition-expr 
                    '(Op_less Op_lessequal Op_greater Op_greaterequal)))
 ;
 (define (read-equality-expr) ; ?
   (read-one-or-two read-relational-expr '(Op_equal Op_notequal)))
 ;
 (define (read-and-expr) ; *
   (read-one-or-more read-equality-expr '(Op_and)))
 ;
 (define (read-expr) ; *
   (read-one-or-more read-and-expr '(Op_or)))
 ;
 (define (read-prt-list)
   (define (read-print-part)
     (if (eq? (type (peek-token)) 'String)
       (list 'Prts (get-token) '())
       (list 'Prti (read-expr) '())))
   ;
   (do ((tok (read-print-part) (read-print-part))
        (rec '() (list 'Sequence rec tok)))
     ((not (eq? (type (peek-token)) 'Comma))
      (list 'Sequence rec tok))
     (match 'Comma)))
 ;
 (define (read-paren-expr)
   (match 'LeftParen)
   (let ((v (read-expr)))
     (match 'RightParen)
     v))
 ;
 (define (read-stmt)
   (case (type (peek-token))
     ((SemiColon)
      '())
     ((Identifier)
      (let ((id (get-token)))
        (match 'Op_assign)
        (let ((ex (read-expr)))
          (match 'Semicolon)
          (list 'Assign id ex))))
     ((Keyword_while)
      (match 'Keyword_while)
      (let* ((expr (read-paren-expr))
             (stmt (read-stmt)))
        (list 'While expr stmt)))
     ((Keyword_if)
      (match 'Keyword_if)
      (let* ((expr (read-paren-expr))
             (then-part (read-stmt))
             (else-part (if (eq? (type (peek-token)) 'Keyword_else)
                          (begin (match 'Keyword_else)
                                 (read-stmt))
                          '())))
        (list 'If expr (list 'If then-part else-part))))
     ((Keyword_print)
      (match 'Keyword_print)
      (match 'LeftParen)
      (let ((v (read-prt-list)))
        (match 'RightParen)
        (match 'Semicolon)
        v))
     ((Keyword_putc)
      (match 'Keyword_putc)
      (let ((v (read-paren-expr)))
        (match 'Semicolon)
        (list 'Putc v '())))
     ((LeftBrace) 
      (match 'LeftBrace)
      (let ((v (read-stmts)))
        (match 'RightBrace)
        v))
     (else
       (error "Unknown token type for statement" (type (peek-token))))))
 ;
 (define (read-stmts)
   (do ((sequence (list 'Sequence '() (read-stmt)) 
                  (list 'Sequence sequence (read-stmt))))
     ((memq (type (peek-token)) '(End_of_input RightBrace))
      sequence)))
 ;
 (let ((res (read-stmts)))
   (match 'End_of_input)
   res))
reads tokens from file, parses and returns the AST

(define (parse-file filename)

 (define (tokenise line)
   (let ((port (open-input-string line)))
     (read port) ; discard line
     (read port) ; discard col
     (let* ((type (read port)) ; read type as symbol
            (val (read port))) ; check for optional value
       (if (eof-object? val)
         (list type)
         (list type val)))))
 ;
 (with-input-from-file 
   filename
   (lambda () 
     (do ((line (read-line) (read-line))
          (toks '() (cons (tokenise line) toks)))
       ((eof-object? line) 
        (parse (list->vector (reverse toks))))))))
Output the AST in flattened format

(define (display-ast ast)

 (cond ((null? ast)
        (display ";\n"))
       ((= 2 (length ast))
        (display (car ast))
        (display #\tab)
        (write (cadr ast)) ; use write to preserve " " on String
        (newline))
       (else
         (display (car ast)) (newline)
         (display-ast (cadr ast))
         (display-ast (cadr (cdr ast))))))
read from filename passed on command line

(if (= 2 (length (command-line)))

 (display-ast (parse-file (cadr (command-line))))
 (display "Error: provide program filename\n"))

</lang>

Wren

Translation of: Go
Library: Wren-dynamic
Library: Wren-fmt
Library: wren-ioutil

<lang ecmascript>import "/dynamic" for Enum, Struct, Tuple import "/fmt" for Fmt import "/ioutil" for FileUtil

var tokens = [

   "EOI",
   "Mul",
   "Div",
   "Mod",
   "Add",
   "Sub",
   "Negate",
   "Not",
   "Lss",
   "Leq",
   "Gtr",
   "Geq",
   "Eql",
   "Neq",
   "Assign",
   "And",
   "Or",
   "If",
   "Else",
   "While",
   "Print",
   "Putc",
   "Lparen",
   "Rparen",
   "Lbrace",
   "Rbrace",
   "Semi",
   "Comma",
   "Ident",
   "Integer",
   "String"

]

var Token = Enum.create("Token", tokens)

var nodes = [

   "Ident",
   "String",
   "Integer",
   "Sequence",
   "If",
   "Prtc",
   "Prts",
   "Prti",
   "While",
   "Assign",
   "Negate",
   "Not",
   "Mul",
   "Div",
   "Mod",
   "Add",
   "Sub",
   "Lss",
   "Leq",
   "Gtr",
   "Geq",
   "Eql",
   "Neq",
   "And",
   "Or"

]

var Node = Enum.create("Node", nodes)

// 'text' field represents ident ot string literal or integer value var TokS = Struct.create("TokS", ["tok", "errLn", "errCol", "text"])

var Tree = Struct.create("Tree", ["nodeType", "left", "right", "value"])

// dependency: Ordered by tok, must remain in same order as Token enum constants var Atr = Tuple.create("Atr", ["text", "enumText", "tok", "rightAssociative", "isBinary",

                              "isUnary", "precedence", "nodeType"])

var atrs = [

   Atr.new("EOI", "End_of_input", Token.EOI, false, false, false, -1, -1),
   Atr.new("*", "Op_multiply", Token.Mul, false, true, false, 13, Node.Mul),
   Atr.new("/", "Op_divide", Token.Div, false, true, false, 13, Node.Div),
   Atr.new("\%", "Op_mod", Token.Mod, false, true, false, 13, Node.Mod),
   Atr.new("+", "Op_add", Token.Add, false, true, false, 12, Node.Add),
   Atr.new("-", "Op_subtract", Token.Sub, false, true, false, 12, Node.Sub),
   Atr.new("-", "Op_negate", Token.Negate, false, false, true, 14, Node.Negate),
   Atr.new("!", "Op_not", Token.Not, false, false, true, 14, Node.Not),
   Atr.new("<", "Op_less", Token.Lss, false, true, false, 10, Node.Lss),
   Atr.new("<=", "Op_lessequal", Token.Leq, false, true, false, 10, Node.Leq),
   Atr.new(">", "Op_greater", Token.Gtr, false, true, false, 10, Node.Gtr),
   Atr.new(">=", "Op_greaterequal", Token.Geq, false, true, false, 10, Node.Geq),
   Atr.new("==", "Op_equal", Token.Eql, false, true, false, 9, Node.Eql),
   Atr.new("!=", "Op_notequal", Token.Neq, false, true, false, 9, Node.Neq),
   Atr.new("=", "Op_assign", Token.Assign, false, false, false, -1, Node.Assign),
   Atr.new("&&", "Op_and", Token.And, false, true, false, 5, Node.And),
   Atr.new("||", "Op_or", Token.Or, false, true, false, 4, Node.Or),
   Atr.new("if", "Keyword_if", Token.If, false, false, false, -1, Node.If),
   Atr.new("else", "Keyword_else", Token.Else, false, false, false, -1, -1),
   Atr.new("while", "Keyword_while", Token.While, false, false, false, -1, Node.While),
   Atr.new("print", "Keyword_print", Token.Print, false, false, false, -1, -1),
   Atr.new("putc", "Keyword_putc", Token.Putc, false, false, false, -1, -1),
   Atr.new("(", "LeftParen", Token.Lparen, false, false, false, -1, -1),
   Atr.new(")", "RightParen", Token.Rparen, false, false, false, -1, -1),
   Atr.new("{", "LeftBrace", Token.Lbrace, false, false, false, -1, -1),
   Atr.new("}", "RightBrace", Token.Rbrace, false, false, false, -1, -1),
   Atr.new(";", "Semicolon", Token.Semi, false, false, false, -1, -1),
   Atr.new(",", "Comma", Token.Comma, false, false, false, -1, -1),
   Atr.new("Ident", "Identifier", Token.Ident, false, false, false, -1, Node.Ident),
   Atr.new("Integer literal", "Integer", Token.Integer, false, false, false, -1, Node.Integer),
   Atr.new("String literal", "String", Token.String, false, false, false, -1, Node.String),

]

var displayNodes = [

   "Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts", "Prti",
   "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add",
   "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal",
   "NotEqual", "And", "Or"

]

var token = TokS.new(0, 0, 0, "")

var reportError = Fn.new { |eline, ecol, msg| Fiber.abort("(%(eline):%(ecol)) error : %(msg)") }

// return internal version of name var getEnum = Fn.new { |name|

   for (atr in atrs) {
       if (atr.enumText == name) return atr.tok
   }
   reportError.call(0, 0, "Unknown token %(name)")

}

var lines = [] var lineCount = 0 var lineNum = 0

var getTok = Fn.new {

   var tok = TokS.new(0, 0, 0, "")
   if (lineNum < lineCount) {
       var line = lines[lineNum].trimEnd(" \t")
       lineNum = lineNum + 1
       var fields = line.split(" ").where { |s| s != "" }.toList
       // [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional
       tok.errLn = Num.fromString(fields[0])
       tok.errCol = Num.fromString(fields[1])
       tok.tok = getEnum.call(fields[2])
       var le = fields.count
       if (le == 4) {
           tok.text = fields[3]
       } else if (le > 4) {
           var idx = line.indexOf("\"")
           tok.text = line[idx..-1]
       }
   }
   return tok

}

var makeNode = Fn.new { |nodeType, left, right| Tree.new(nodeType, left, right, "") }

var makeLeaf = Fn.new { |nodeType, value| Tree.new(nodeType, null, null, value) }

var expect = Fn.new { |msg, s|

   if (token.tok == s) {
       token = getTok.call()
       return
   }
   reportError.call(token.errLn, token.errCol,
       Fmt.swrite("$s: Expecting '$s', found '$s'", msg, atrs[s].text, atrs[token.tok].text))

}

var parenExpr // forward reference

var expr // recursive function expr = Fn.new { |p|

   var x
   var node
   var t = token.tok
   if (t == Token.Lparen) {
       x = parenExpr.call()
   } else if (t == Token.Sub || t == Token.Add) {
       var op = t
       token = getTok.call()
       node = expr.call(atrs[Token.Negate].precedence)
       if (op == Token.Sub) {
           x = makeNode.call(Node.negate, node, null)
       } else {
           x = node
       }
   } else if (t == Token.Not) {
       token = getTok.call()
       x = makeNode.call(Node.Not, expr.call(atrs[Token.Not].precedence), null)
   } else if (t == Token.Ident) {
       x = makeLeaf.call(Node.Ident, token.text)
       token = getTok.call()
   } else if (t == Token.Integer) {
       x = makeLeaf.call(Node.Integer, token.text)
       token = getTok.call()
   } else {
       reportError.call(token.errLn, token.errCol,
           Fmt.swrite("Expecting a primary, found: $s", atrs[token.tok].text))
   }
   while (atrs[token.tok].isBinary && atrs[token.tok].precedence >= p) {
       var op = token.tok
       token = getTok.call()
       var q = atrs[op].precedence
       if (!atrs[op].rightAssociative) q = q + 1
       node = expr.call(q)
       x = makeNode.call(atrs[op].nodeType, x, node)
   }
   return x

}

parenExpr = Fn.new {

   expect.call("parenExpr", Token.Lparen)
   var t = expr.call(0)
   expect.call("parenExpr", Token.Rparen)
   return t

}

var stmt // recursive function stmt = Fn.new {

   var t
   var v
   var e
   var s
   var s2
   var tt = token.tok
   if (tt == Token.If) {
       token = getTok.call()
       e = parenExpr.call()
       s = stmt.call()
       s2 = null
       if (token.tok == Token.Else) {
           token = getTok.call()
           s2 = stmt.call()
       }
       t = makeNode.call(Node.If, e, makeNode.call(Node.If, s, s2))
   } else if (tt == Token.Putc) {
       token = getTok.call()
       e = parenExpr.call()
       t = makeNode.call(Node.Prtc, e, null)
       expect.call("Putc", Token.Semi)
   } else if (tt == Token.Print) { // print '(' expr {',' expr} ')'
       token = getTok.call()
       expect.call("Print", Token.Lparen)
       while (true) {
           if (token.tok == Token.String) {
               e = makeNode.call(Node.Prts, makeLeaf.call(Node.String, token.text), null)
               token = getTok.call()
           } else {
               e = makeNode.call(Node.Prti, expr.call(0), null)
           }
           t = makeNode.call(Node.Sequence, t, e)
           if (token.tok != Token.Comma) break
           expect.call("Print", Token.Comma)
       }
       expect.call("Print", Token.Rparen)
       expect.call("Print", Token.Semi)
   } else if (tt == Token.Semi) {
       token = getTok.call()
   } else if (tt == Token.Ident) {
       v = makeLeaf.call(Node.Ident, token.text)
       token = getTok.call()
       expect.call("assign", Token.Assign)
       e = expr.call(0)
       t = makeNode.call(Node.Assign, v, e)
       expect.call("assign", Token.Semi)
   } else if (tt == Token.While) {
       token = getTok.call()
       e = parenExpr.call()
       s = stmt.call()
       t = makeNode.call(Node.While, e, s)
   } else if (tt == Token.Lbrace) { // {stmt}
       expect.call("Lbrace", Token.Lbrace)
       while (token.tok != Token.Rbrace && token.tok != Token.EOI) {
           t = makeNode.call(Node.Sequence, t, stmt.call())
       }
       expect.call("Lbrace", Token.Rbrace)
   } else if (tt == Token.EOI) {
       // do nothing
   } else {
       reportError.call(token.errLn, token.errCol,
           Fmt.Swrite("expecting start of statement, found '$s'", atrs[token.tok].text))
   }
   return t

}

var parse = Fn.new {

   var t
   token = getTok.call()
   while (true) {
       t = makeNode.call(Node.Sequence, t, stmt.call())
       if (!t || token.tok == Token.EOI) break
   }
   return t

}

var prtAst // recursive function prtAst = Fn.new { |t|

   if (!t) {
       System.print(";")
   } else {
       Fmt.write("$-14s ", displayNodes[t.nodeType])
       if (t.nodeType == Node.Ident || t.nodeType == Node.Integer || t.nodeType == Node.String) {
           System.print(t.value)
       } else {
           System.print()
           prtAst.call(t.left)
           prtAst.call(t.right)
       }
   }

}

lines = FileUtil.readLines("source.txt") lineCount = lines.count prtAst.call(parse.call())</lang>

Output:
Sequence       
Sequence       
Sequence       
Sequence       
Sequence       
;
Assign         
Identifier     count
Integer        1
Assign         
Identifier     n
Integer        1
Assign         
Identifier     limit
Integer        100
While          
Less           
Identifier     n
Identifier     limit
Sequence       
Sequence       
Sequence       
Sequence       
Sequence       
;
Assign         
Identifier     k
Integer        3
Assign         
Identifier     p
Integer        1
Assign         
Identifier     n
Add            
Identifier     n
Integer        2
While          
And            
LessEqual      
Multiply       
Identifier     k
Identifier     k
Identifier     n
Identifier     p
Sequence       
Sequence       
;
Assign         
Identifier     p
NotEqual       
Multiply       
Divide         
Identifier     n
Identifier     k
Identifier     k
Identifier     n
Assign         
Identifier     k
Add            
Identifier     k
Integer        2
If             
Identifier     p
If             
Sequence       
Sequence       
;
Sequence       
Sequence       
;
Prti           
Identifier     n
;
Prts           
String         " is prime\n"
;
Assign         
Identifier     count
Add            
Identifier     count
Integer        1
;
Sequence       
Sequence       
Sequence       
;
Prts           
String         "Total primes found: "
;
Prti           
Identifier     count
;
Prts           
String         "\n"
;

Zig

<lang zig> const std = @import("std");

pub const NodeValue = union(enum) {

   integer: i32,
   string: []const u8,
   fn fromToken(token: Token) ?NodeValue {
       if (token.value) |value| {
           switch (value) {
               .integer => |int| return NodeValue{ .integer = int },
               .string => |str| return NodeValue{ .string = str },
           }
       } else {
           return null;
       }
   }

};

pub const Tree = struct {

   left: ?*Tree,
   right: ?*Tree,
   typ: NodeType,
   value: ?NodeValue = null,

};

pub const ParserError = error{

   OutOfMemory,
   ExpectedNotFound,

} || std.fmt.ParseIntError;

pub const Parser = struct {

   token_it: LexerOutputTokenizer,
   curr: Token,
   allocator: std.mem.Allocator,
   const Self = @This();
   pub fn init(allocator: std.mem.Allocator, str: []const u8) Self {
       return Self{
           .token_it = LexerOutputTokenizer.init(str),
           .curr = Token{ .line = 0, .col = 0, .typ = .unknown },
           .allocator = allocator,
       };
   }
   fn makeNode(self: *Self, typ: NodeType, left: ?*Tree, right: ?*Tree) !*Tree {
       const result = try self.allocator.create(Tree);
       result.* = Tree{ .left = left, .right = right, .typ = typ };
       return result;
   }
   fn makeLeaf(self: *Self, typ: NodeType, value: ?NodeValue) !*Tree {
       const result = try self.allocator.create(Tree);
       result.* = Tree{ .left = null, .right = null, .typ = typ, .value = value };
       return result;
   }
   pub fn parse(self: *Self) ParserError!?*Tree {
       try self.next();
       var result: ?*Tree = null;
       while (true) {
           const stmt = try self.parseStmt();
           result = try self.makeNode(.sequence, result, stmt);
           if (self.curr.typ == .eof) break;
       }
       return result;
   }
   /// Classic "Recursive descent" statement parser.
   fn parseStmt(self: *Self) ParserError!?*Tree {
       var result: ?*Tree = null;
       switch (self.curr.typ) {
           .kw_print => {
               try self.next();
               try self.expect(.left_paren);
               // Parse each print's argument as an expression delimited by commas until we reach
               // a closing parens.
               while (true) {
                   var expr: ?*Tree = null;
                   if (self.curr.typ == .string) {
                       expr = try self.makeNode(
                           .prts,
                           try self.makeLeaf(.string, NodeValue.fromToken(self.curr)),
                           null,
                       );
                       try self.next();
                   } else {
                       expr = try self.makeNode(.prti, try self.parseExpr(0), null);
                   }
                   result = try self.makeNode(.sequence, result, expr);
                   if (self.curr.typ != .comma) break;
                   try self.next();
               }
               try self.expect(.right_paren);
               try self.expect(.semicolon);
           },
           .kw_putc => {
               try self.next();
               result = try self.makeNode(.prtc, try self.parseParenExpr(), null);
               try self.expect(.semicolon);
           },
           .kw_while => {
               try self.next();
               const expr = try self.parseParenExpr();
               result = try self.makeNode(.kw_while, expr, try self.parseStmt());
           },
           .kw_if => {
               try self.next();
               const expr = try self.parseParenExpr();
               const if_stmt = try self.parseStmt();
               const else_stmt = blk: {
                   if (self.curr.typ == .kw_else) {
                       try self.next();
                       break :blk try self.parseStmt();
                   } else {
                       break :blk null;
                   }
               };
               const stmt_node = try self.makeNode(.kw_if, if_stmt, else_stmt);
               // If-statement uses `.kw_if` node for both first node with `expr` on the left
               // and statements on the right and also `.kw_if` node which goes to the right
               // and contains both if-branch and else-branch.
               result = try self.makeNode(.kw_if, expr, stmt_node);
           },
           .left_brace => {
               try self.next();
               while (self.curr.typ != .right_brace and self.curr.typ != .eof) {
                   result = try self.makeNode(.sequence, result, try self.parseStmt());
               }
               try self.expect(.right_brace);
           },
           .identifier => {
               const identifer = try self.makeLeaf(.identifier, NodeValue.fromToken(self.curr));
               try self.next();
               try self.expect(.assign);
               const expr = try self.parseExpr(0);
               result = try self.makeNode(.assign, identifer, expr);
               try self.expect(.semicolon);
           },
           .semicolon => try self.next(),
           else => {
               std.debug.print("\nSTMT: UNKNOWN {}\n", .{self.curr});
               std.os.exit(1);
           },
       }
       return result;
   }
   /// "Precedence climbing" expression parser.
   fn parseExpr(self: *Self, precedence: i8) ParserError!?*Tree {
       var result: ?*Tree = null;
       switch (self.curr.typ) {
           .left_paren => {
               result = try self.parseParenExpr();
           },
           .subtract => {
               try self.next();
               const metadata = NodeMetadata.find(.negate);
               const expr = try self.parseExpr(metadata.precedence);
               result = try self.makeNode(.negate, expr, null);
           },
           .not => {
               try self.next();
               const metadata = NodeMetadata.find(.not);
               const expr = try self.parseExpr(metadata.precedence);
               result = try self.makeNode(.not, expr, null);
           },
           .add => {
               try self.next();
               result = try self.parseExpr(precedence);
           },
           .integer, .identifier => {
               const node_type = NodeMetadata.find(self.curr.typ).node_type;
               result = try self.makeLeaf(node_type, NodeValue.fromToken(self.curr));
               try self.next();
           },
           else => {
               std.debug.print("\nEXPR: UNKNOWN {}\n", .{self.curr});
               std.os.exit(1);
           },
       }
       var curr_metadata = NodeMetadata.find(self.curr.typ);
       while (curr_metadata.binary and curr_metadata.precedence >= precedence) {
           const new_precedence =
               if (curr_metadata.right_associative)
               curr_metadata.precedence
           else
               curr_metadata.precedence + 1;
           try self.next();
           const sub_expr = try self.parseExpr(new_precedence);
           result = try self.makeNode(curr_metadata.node_type, result, sub_expr);
           curr_metadata = NodeMetadata.find(self.curr.typ);
       }
       return result;
   }
   fn parseParenExpr(self: *Self) ParserError!?*Tree {
       try self.expect(.left_paren);
       const result = try self.parseExpr(0);
       try self.expect(.right_paren);
       return result;
   }
   fn next(self: *Self) ParserError!void {
       const token = try self.token_it.next();
       if (token) |tok| {
           self.curr = tok;
       } else {
           self.curr = Token{ .line = 0, .col = 0, .typ = .unknown };
       }
   }
   fn expect(self: *Self, token_type: TokenType) ParserError!void {
       if (self.curr.typ != token_type) {
           const expected_str = NodeMetadata.find(token_type).token_str;
           const found_str = NodeMetadata.find(self.curr.typ).token_str;
           std.debug.print(
               "({d}, {d}) error: Expecting '{s}', found '{s}'\n",
               .{ self.curr.line, self.curr.col, expected_str, found_str },
           );
           return ParserError.ExpectedNotFound;
       }
       try self.next();
   }

};

pub fn parse(allocator: std.mem.Allocator, str: []const u8) !?*Tree {

   var parser = Parser.init(allocator, str);
   return try parser.parse();

}

pub fn main() !void {

   var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
   defer arena.deinit();
   const allocator = arena.allocator();
   var arg_it = std.process.args();
   _ = try arg_it.next(allocator) orelse unreachable; // program name
   const file_name = arg_it.next(allocator);
   // We accept both files and standard input.
   var file_handle = blk: {
       if (file_name) |file_name_delimited| {
           const fname: []const u8 = try file_name_delimited;
           break :blk try std.fs.cwd().openFile(fname, .{});
       } else {
           break :blk std.io.getStdIn();
       }
   };
   defer file_handle.close();
   const input_content = try file_handle.readToEndAlloc(allocator, std.math.maxInt(usize));
   const result: ?*Tree = try parse(allocator, input_content);
   const result_str = try astToFlattenedString(allocator, result);
   _ = try std.io.getStdOut().write(result_str);

}

const NodeMetadata = struct {

   token_type: TokenType,
   right_associative: bool,
   binary: bool,
   unary: bool,
   precedence: i8,
   node_type: NodeType,
   token_str: []const u8,
   const self = [_]NodeMetadata{
       .{ .token_type = .multiply, .right_associative = false, .binary = true, .unary = false, .precedence = 13, .node_type = .multiply, .token_str = "*" },
       .{ .token_type = .divide, .right_associative = false, .binary = true, .unary = false, .precedence = 13, .node_type = .divide, .token_str = "/" },
       .{ .token_type = .mod, .right_associative = false, .binary = true, .unary = false, .precedence = 13, .node_type = .mod, .token_str = "%" },
       .{ .token_type = .add, .right_associative = false, .binary = true, .unary = false, .precedence = 12, .node_type = .add, .token_str = "+" },
       .{ .token_type = .subtract, .right_associative = false, .binary = true, .unary = false, .precedence = 12, .node_type = .subtract, .token_str = "-" },
       .{ .token_type = .negate, .right_associative = false, .binary = false, .unary = true, .precedence = 14, .node_type = .negate, .token_str = "-" },
       .{ .token_type = .less, .right_associative = false, .binary = true, .unary = false, .precedence = 10, .node_type = .less, .token_str = "<" },
       .{ .token_type = .less_equal, .right_associative = false, .binary = true, .unary = false, .precedence = 10, .node_type = .less_equal, .token_str = "<=" },
       .{ .token_type = .greater, .right_associative = false, .binary = true, .unary = false, .precedence = 10, .node_type = .greater, .token_str = ">" },
       .{ .token_type = .greater_equal, .right_associative = false, .binary = true, .unary = false, .precedence = 10, .node_type = .greater_equal, .token_str = ">=" },
       .{ .token_type = .equal, .right_associative = false, .binary = true, .unary = false, .precedence = 9, .node_type = .equal, .token_str = "=" },
       .{ .token_type = .not_equal, .right_associative = false, .binary = true, .unary = false, .precedence = 9, .node_type = .not_equal, .token_str = "!=" },
       .{ .token_type = .not, .right_associative = false, .binary = false, .unary = true, .precedence = 14, .node_type = .not, .token_str = "!" },
       .{ .token_type = .assign, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .assign, .token_str = "=" },
       .{ .token_type = .bool_and, .right_associative = false, .binary = true, .unary = false, .precedence = 5, .node_type = .bool_and, .token_str = "&&" },
       .{ .token_type = .bool_or, .right_associative = false, .binary = true, .unary = false, .precedence = 4, .node_type = .bool_or, .token_str = "||" },
       .{ .token_type = .left_paren, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "(" },
       .{ .token_type = .right_paren, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = ")" },
       .{ .token_type = .left_brace, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "{" },
       .{ .token_type = .right_brace, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "}" },
       .{ .token_type = .semicolon, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = ";" },
       .{ .token_type = .comma, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "," },
       .{ .token_type = .kw_if, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .kw_if, .token_str = "if" },
       .{ .token_type = .kw_else, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "else" },
       .{ .token_type = .kw_while, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .kw_while, .token_str = "while" },
       .{ .token_type = .kw_print, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "print" },
       .{ .token_type = .kw_putc, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "putc" },
       .{ .token_type = .identifier, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .identifier, .token_str = "Identifier" },
       .{ .token_type = .integer, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .integer, .token_str = "Integer literal" },
       .{ .token_type = .string, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .string, .token_str = "String literal" },
       .{ .token_type = .eof, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "End of line" },
   };
   pub fn find(token_type: TokenType) NodeMetadata {
       for (self) |metadata| {
           if (metadata.token_type == token_type) return metadata;
       } else {
           unreachable;
       }
   }

};

pub const NodeType = enum {

   unknown,
   identifier,
   string,
   integer,
   sequence,
   kw_if,
   prtc,
   prts,
   prti,
   kw_while,
   assign,
   negate,
   not,
   multiply,
   divide,
   mod,
   add,
   subtract,
   less,
   less_equal,
   greater,
   greater_equal,
   equal,
   not_equal,
   bool_and,
   bool_or,
   pub fn toString(self: NodeType) []const u8 {
       return switch (self) {
           .unknown => "UNKNOWN",
           .identifier => "Identifier",
           .string => "String",
           .integer => "Integer",
           .sequence => "Sequence",
           .kw_if => "If",
           .prtc => "Prtc",
           .prts => "Prts",
           .prti => "Prti",
           .kw_while => "While",
           .assign => "Assign",
           .negate => "Negate",
           .not => "Not",
           .multiply => "Multiply",
           .divide => "Divide",
           .mod => "Mod",
           .add => "Add",
           .subtract => "Subtract",
           .less => "Less",
           .less_equal => "LessEqual",
           .greater => "Greater",
           .greater_equal => "GreaterEqual",
           .equal => "Equal",
           .not_equal => "NotEqual",
           .bool_and => "And",
           .bool_or => "Or",
       };
   }

};

fn astToFlattenedString(allocator: std.mem.Allocator, tree: ?*Tree) ![]const u8 {

   var result = std.ArrayList(u8).init(allocator);
   var writer = result.writer();
   try treeToString(allocator, writer, tree);
   return result.items;

}

pub const TokenType = enum {

   unknown,
   multiply,
   divide,
   mod,
   add,
   subtract,
   negate,
   less,
   less_equal,
   greater,
   greater_equal,
   equal,
   not_equal,
   not,
   assign,
   bool_and,
   bool_or,
   left_paren,
   right_paren,
   left_brace,
   right_brace,
   semicolon,
   comma,
   kw_if,
   kw_else,
   kw_while,
   kw_print,
   kw_putc,
   identifier,
   integer,
   string,
   eof,
   const from_string_map = std.ComptimeStringMap(TokenType, .{
       .{ "Op_multiply", .multiply },
       .{ "Op_divide", .divide },
       .{ "Op_mod", .mod },
       .{ "Op_add", .add },
       .{ "Op_subtract", .subtract },
       .{ "Op_negate", .negate },
       .{ "Op_less", .less },
       .{ "Op_lessequal", .less_equal },
       .{ "Op_greater", .greater },
       .{ "Op_greaterequal", .greater_equal },
       .{ "Op_equal", .equal },
       .{ "Op_notequal", .not_equal },
       .{ "Op_not", .not },
       .{ "Op_assign", .assign },
       .{ "Op_and", .bool_and },
       .{ "Op_or", .bool_or },
       .{ "LeftParen", .left_paren },
       .{ "RightParen", .right_paren },
       .{ "LeftBrace", .left_brace },
       .{ "RightBrace", .right_brace },
       .{ "Semicolon", .semicolon },
       .{ "Comma", .comma },
       .{ "Keyword_if", .kw_if },
       .{ "Keyword_else", .kw_else },
       .{ "Keyword_while", .kw_while },
       .{ "Keyword_print", .kw_print },
       .{ "Keyword_putc", .kw_putc },
       .{ "Identifier", .identifier },
       .{ "Integer", .integer },
       .{ "String", .string },
       .{ "End_of_input", .eof },
   });
   pub fn fromString(str: []const u8) TokenType {
       return from_string_map.get(str).?;
   }

};

pub const TokenValue = union(enum) {

   integer: i32,
   string: []const u8,

};

pub const Token = struct {

   line: usize,
   col: usize,
   typ: TokenType = .unknown,
   value: ?TokenValue = null,

};

const TreeToStringError = error{OutOfMemory};

fn treeToString(

   allocator: std.mem.Allocator,
   writer: std.ArrayList(u8).Writer,
   tree: ?*Tree,

) TreeToStringError!void {

   if (tree) |t| {
       _ = try writer.write(try std.fmt.allocPrint(
           allocator,
           "{s}",
           .{t.typ.toString()},
       ));
       switch (t.typ) {
           .string, .identifier => _ = try writer.write(try std.fmt.allocPrint(
               allocator,
               "   {s}\n",
               .{t.value.?.string},
           )),
           .integer => _ = try writer.write(try std.fmt.allocPrint(
               allocator,
               "   {d}\n",
               .{t.value.?.integer},
           )),
           else => {
               _ = try writer.write(try std.fmt.allocPrint(
                   allocator,
                   "\n",
                   .{},
               ));
               try treeToString(allocator, writer, t.left);
               try treeToString(allocator, writer, t.right);
           },
       }
   } else {
       _ = try writer.write(try std.fmt.allocPrint(
           allocator,
           ";\n",
           .{},
       ));
   }

}

pub const LexerOutputTokenizer = struct {

   it: std.mem.SplitIterator(u8),
   const Self = @This();
   pub fn init(str: []const u8) Self {
       return Self{ .it = std.mem.split(u8, str, "\n") };
   }
   pub fn next(self: *Self) std.fmt.ParseIntError!?Token {
       if (self.it.next()) |line| {
           if (line.len == 0) return null;
           var tokens_it = std.mem.tokenize(u8, line, " ");
           const lineNumber = try std.fmt.parseInt(usize, tokens_it.next().?, 10);
           const colNumber = try std.fmt.parseInt(usize, tokens_it.next().?, 10);
           const typ_text = tokens_it.next().?;
           const typ = TokenType.fromString(typ_text);
           const pre_value_index = tokens_it.index;
           const value = tokens_it.next();
           var token = Token{ .line = lineNumber, .col = colNumber, .typ = typ };
           if (value) |val| {
               const token_value = blk: {
                   switch (typ) {
                       .string, .identifier => {
                           tokens_it.index = pre_value_index;
                           break :blk TokenValue{ .string = tokens_it.rest() };
                       },
                       .integer => break :blk TokenValue{ .integer = try std.fmt.parseInt(i32, val, 10) },
                       else => unreachable,
                   }
               };
               token.value = token_value;
           }
           return token;
       } else {
           return null;
       }
   }

};

fn stringToTokenList(allocator: std.mem.Allocator, str: []const u8) !std.ArrayList(Token) {

   var result = std.ArrayList(Token).init(allocator);
   var lexer_output_it = LexerOutputTokenizer.init(str);
   while (try lexer_output_it.next()) |token| {
       try result.append(token);
   }
   return result;

} </lang>