Compiler/syntax analyzer

From Rosetta Code
Revision as of 00:46, 28 September 2016 by Ed Davis (talk | contribs) (Complimentary task to the lexical analyzer.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Compiler/syntax analyzer is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

A Syntax analyzer transforms a token stream (from the Lexical analyzer) into a Syntax tree, based on a grammar.

Take the output from the Lexical analyzer task, and convert it to an Abstract Syntax Tree (AST), based on the grammar below.

The program should read input from a file and/or stdin, and write output to a file and/or stdout. If the language being used has a parser module/library/class, it would be great if two versions of the solution are provided: One without the parser module, and one with.

Grammar

The simple programming language to be analyzed is more or less a (very tiny) subset of C. The formal grammar in Extended Backus-Naur Form (EBNF):

<lang EBNF>

   stmt_list           =   {stmt} ;
   stmt                =   ';'
                         | Identifier '=' expr ';'
                         | 'while' paren_expr stmt
                         | 'if' paren_expr stmt ['else' stmt]
                         | 'print' '(' prt_list ')' ';'
                         | 'putc' paren_expr ';'
                         | '{' stmt_list '}'
                         ;
   paren_expr          =   '(' expr ')' ;
   prt_list            =   '(' string | expr {',' String | expr} ')' ;
   expr                =   and_expr            {'||' and_expr} ;
   and_expr            =   equality_expr       {'&&' equality_expr} ;
   equality_expr       =   relational_expr     [('==' | '!=') relational_expr] ;
   relational_expr     =   addition_expr       [('<' | '<=' | '>' | '>=') addition_expr] ;
   addition_expr       =   multiplication_expr {('+' | '-') multiplication_expr] ;
   multiplication_expr =   primary             {('*' | '/' | '%') primary } ;
   primary             =   Identifier
                         | Integer
                         | '(' expr ')'
                         | ('+' | '-' | '!') expr
                         ;</lang>

The resulting AST should be formulated as a Binary Tree, that is traversable via Post-Order Depth-first search.

Example - given the simple program (below), stored in a file called while.t, create the list of tokens, using one of the Lexical analyzer solutions
lex < while.t > while.lex
Run one of the Syntax analyzer solutions
parse < while.lex > while.ast
The following table shows the input to lex, lex output, and the AST produced by the parser
Input to lex Output from lex, input to parse Output from parse

<lang c>count = 1;

while (count < 10) {
    print("count is: ", count, "\n");
    count = count + 1;
}</lang>
    1      1 Identifier      count
    1      7 Op_assign
    1      9 Integer             1
    1     10 Semicolon
    2      1 Keyword_while
    2      7 LeftParen
    2      8 Identifier      count
    2     14 Op_less
    2     16 Integer            10
    2     18 RightParen
    2     20 LeftBrace
    3      5 Keyword_print
    3     10 LeftParen
    3     11 String          "count is: "
    3     23 Comma
    3     25 Identifier      count
    3     30 Comma
    3     32 String          "\n"
    3     36 RightParen
    3     37 Semicolon
    4      5 Identifier      count
    4     11 Op_assign
    4     13 Identifier      count
    4     19 Op_add
    4     21 Integer             1
    4     22 Semicolon
    5      1 RightBrace
    6      1 End_of_input
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
While
Less
Identifier    count
Integer       10
Sequence
Sequence
;
Sequence
Sequence
Sequence
;
Prts
String        "count is: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
Specifications
List of node type names
Identifier String Integer Sequence If Prtc Prts Prti While Assign Negate Not Multiply Divide Mod
Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or

In the text below, Null/Empty nodes are represented by ";".

Non-terminal (internal) nodes

For Operators, the following nodes should be created:

Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or

For each of the above nodes, the left and right sub-nodes are the operands of the respective operation.

In pseudo S-Expression format:

(Operator expression expression)

Negate, Not

For these node types, the left node is the operand, and the right node is null.

(Operator expression ;)

Sequence - sub-nodes are either statements or Sequences.

If - left node is the expression, the right node is If node, with it's left node being the if-true statement part, and the right node being the if-false (else) statement part.

(If expression (If statement else-statement))

If there is not an else, the tree becomes:

(If expression (If statement ;))

Prtc

(Prtc (expression) ;)

Prts

(Prts (String "the string") ;)

Prti

(Prti (Integer 12345) ;)

While - left node is the expression, the right node is the statement.

(While expression statement)

Assign - left node is the left-hand side of the assignment, the right node is the right-hand side of the assignment.

(Assign Identifier expression)

Terminal (leaf) nodes:

Identifier: (Identifier ident_name)
Integer:    (Integer 12345)
String:     (String "Hello World!")
";":        Empty node

Pseudo-code for the parser. Uses Precedence Climbing for expression parsing, and Recursive Descent for statement parsing. The AST is also built:

<lang python>def expr(p)

   if tok is "("
       x = paren_expr()
   elif tok in ["-", "+", "!"]
       gettok()
       y = expr(precedence of operator)
       if operator was "+"
           x = y
       else
           x = make_node(operator, y)
   elif tok is an Identifier
       x = make_leaf(Identifier, variable name)
       gettok()
   elif tok is an Integer constant
       x = make_leaf(Integer, integer value)
       gettok()
   else
       error()
   while tok is a binary operator and precedence of tok >= p
       save_tok = tok
       gettok()
       q = precedence of save_tok
       if save_tok is not right associative
           q += 1
       x = make_node(Operator save_tok represents, x, expr(q))
   return x

def paren_expr()

   expect("(")
   x = expr(0)
   expect(")")
   return x

def stmt()

   t = None
   if accept("if")
       e = paren_expr()
       s = stmt()
       t = make_node(If, e, make_node(If, s, accept("else") ? stmt() : None))
   elif accept("putc")
       t = make_node(Prtc, paren_expr())
       expect(";")
   elif accept("print")
       expect("(")
       repeat
           if tok is a string
               e = make_node(Prts, make_leaf(String, the string))
               gettok()
           else
               e = make_node(Prti, expr(0))
           t = make_node(Sequence, t, e)
       until not accept(",")
       expect(")")
       expect(";")
   elif tok is ";"
       gettok()
   elif tok is an Identifier
       v = make_leaf(Identifier, variable name)
       gettok()
       expect("=")
       t = make_node(Assign, v, expr(0))
       expect(";")
   elif accept("while")
       e = paren_expr()
       t = make_node(While, e, stmt()
   elif accept("{")
       while tok not equal "}" and tok not equal end-of-file
           t = make_node(Sequence, t, stmt())
       expect("}")
   elif tok is end-of-file
       pass
   else
       error()
   return t

def parse()

   t = None
   gettok()
   repeat
       t = make_node(Sequence, t, stmt())
   until tok is end-of-file
   return t</lang>
Once the AST is built, outputting it is as simple as

<lang python>def prt_ast(t)

   if t == None
       print(";\n")
   else
       print(t.node_type)
       if t.node_type in [Identifier, Integer, String]     # leaf node
           print the value of the Ident, Integer or String, "\n"
       else
           print("\n")
           prt_ast(t.left)
           prt_ast(t.right)</lang>
If the AST is correctly built, loading it into a subsequent program should be as simple as

<lang python>def load_ast()

   line = readline()
   # Each line has at least one token
   line_list = tokenize the line, respecting double quotes
   text = line_list[0] # first token is always the node type
   if text == ";"   # a terminal node
       return None
   node_type = text # could convert to internal form if desired
   # A line with two tokens is a leaf node
   # Leaf nodes are: Identifier, Integer, String
   # The 2nd token is the value
   if len(line_list) > 1
       return make_leaf(node_type, line_list[1])
   left = load_ast()
   right = load_ast()
   return make_node(node_type, left, right)</lang>

Finally, the AST can also be tested by running it against one of the AST Interpreter solutions.

Test program, assuming this is in a file called prime.t
lex <prime.t | parse
Input to lex Output from lex, input to parse Output from parse

<lang c>/*

Simple prime number generator
*/

count = 1; n = 1; limit = 100; while (n < limit) {

   k=3;
   p=1;
   n=n+2;
   while ((k*k<=n) && (p)) {
       p=n/k*k!=n;
       k=k+2;
   }
   if (p) {
       print(n, " is prime\n");
       count = count + 1;
   }

} print("Total primes found: ", count, "\n");</lang>

    4      1 Identifier      count
    4      7 Op_assign
    4      9 Integer             1
    4     10 Semicolon
    5      1 Identifier      n
    5      3 Op_assign
    5      5 Integer             1
    5      6 Semicolon
    6      1 Identifier      limit
    6      7 Op_assign
    6      9 Integer           100
    6     12 Semicolon
    7      1 Keyword_while
    7      7 LeftParen
    7      8 Identifier      n
    7     10 Op_less
    7     12 Identifier      limit
    7     17 RightParen
    7     19 LeftBrace
    8      5 Identifier      k
    8      6 Op_assign
    8      7 Integer             3
    8      8 Semicolon
    9      5 Identifier      p
    9      6 Op_assign
    9      7 Integer             1
    9      8 Semicolon
   10      5 Identifier      n
   10      6 Op_assign
   10      7 Identifier      n
   10      8 Op_add
   10      9 Integer             2
   10     10 Semicolon
   11      5 Keyword_while
   11     11 LeftParen
   11     12 LeftParen
   11     13 Identifier      k
   11     14 Op_multiply
   11     15 Identifier      k
   11     16 Op_lessequal
   11     18 Identifier      n
   11     19 RightParen
   11     21 Op_and
   11     24 LeftParen
   11     25 Identifier      p
   11     26 RightParen
   11     27 RightParen
   11     29 LeftBrace
   12      9 Identifier      p
   12     10 Op_assign
   12     11 Identifier      n
   12     12 Op_divide
   12     13 Identifier      k
   12     14 Op_multiply
   12     15 Identifier      k
   12     16 Op_notequal
   12     18 Identifier      n
   12     19 Semicolon
   13      9 Identifier      k
   13     10 Op_assign
   13     11 Identifier      k
   13     12 Op_add
   13     13 Integer             2
   13     14 Semicolon
   14      5 RightBrace
   15      5 Keyword_if
   15      8 LeftParen
   15      9 Identifier      p
   15     10 RightParen
   15     12 LeftBrace
   16      9 Keyword_print
   16     14 LeftParen
   16     15 Identifier      n
   16     16 Comma
   16     18 String          " is prime\n"
   16     31 RightParen
   16     32 Semicolon
   17      9 Identifier      count
   17     15 Op_assign
   17     17 Identifier      count
   17     23 Op_add
   17     25 Integer             1
   17     26 Semicolon
   18      5 RightBrace
   19      1 RightBrace
   20      1 Keyword_print
   20      6 LeftParen
   20      7 String          "Total primes found: "
   20     29 Comma
   20     31 Identifier      count
   20     36 Comma
   20     38 String          "\n"
   20     42 RightParen
   20     43 Semicolon
   21      1 End_of_input
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
Assign
Identifier    n
Integer       1
Assign
Identifier    limit
Integer       100
While
Less
Identifier    n
Identifier    limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    k
Integer       3
Assign
Identifier    p
Integer       1
Assign
Identifier    n
Add
Identifier    n
Integer       2
While
And
LessEqual
Multiply
Identifier    k
Identifier    k
Identifier    n
Identifier    p
Sequence
Sequence
;
Assign
Identifier    p
NotEqual
Multiply
Divide
Identifier    n
Identifier    k
Identifier    k
Identifier    n
Assign
Identifier    k
Add
Identifier    k
Integer       2
If
Identifier    p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier    n
;
Prts
String        " is prime\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
;
Sequence
Sequence
Sequence
;
Prts
String        "Total primes found: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;

C

Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra <lang C>#include <stdio.h>

  1. include <stdlib.h>
  2. include <string.h>
  3. include <stdarg.h>
  4. include <stdbool.h>
  5. include <ctype.h>
  1. define NELEMS(arr) (sizeof(arr) / sizeof(arr[0]))

/*#define make_sequence(t, s) if (t == NULL) t = s = make_node(nd_Sequence, stmt(), NULL); \

                           else s = s->right = make_node(nd_Sequence, stmt(), NULL)*/
  1. define make_sequence(t, s) t = make_node(nd_Sequence, t, stmt())

typedef enum {

   tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr,
   tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print,
   tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident,
   tk_Integer, tk_String

} TokenType;

typedef enum {

   nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While,
   nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq,
   nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or

} NodeType;

typedef struct {

   TokenType tok;
   int err_ln;
   int err_col;
   char *text;             /* ident or string literal or integer value */

} tok_s;

typedef struct Tree {

   NodeType node_type;
   struct Tree *left;
   struct Tree *right;
   char *value;

} Tree;

// dependency: Ordered by tok, must remain in same order as TokenType enum struct {

   char       *text, *enum_text;
   TokenType   tok;
   bool        right_associative, is_binary, is_unary;
   int         precedence;
   NodeType    node_type;

} atr[] = {

   {"EOI",             "End_of_input"   , tk_EOI,     false, false, false, -1, -1        },
   {"*",               "Op_multiply"    , tk_Mul,     false, true,  false, 13, nd_Mul    },
   {"/",               "Op_divide"      , tk_Div,     false, true,  false, 13, nd_Div    },
   {"%",               "Op_mod"         , tk_Mod,     false, true,  false, 13, nd_Mod    },
   {"+",               "Op_add"         , tk_Add,     false, true,  false, 12, nd_Add    },
   {"-",               "Op_subtract"    , tk_Sub,     false, true,  false, 12, nd_Sub    },
   {"-",               "Op_negate"      , tk_Negate,  false, false, true,  14, nd_Negate },
   {"!",               "Op_not"         , tk_Not,     false, false, true,  14, nd_Negate },
   {"<",               "Op_less"        , tk_Lss,     false, true,  false, 10, nd_Lss    },
   {"<=",              "Op_lessequal"   , tk_Leq,     false, true,  false, 10, nd_Leq    },
   {">",               "Op_greater"     , tk_Gtr,     false, true,  false, 10, nd_Gtr    },
   {">=",              "Op_greaterequal", tk_Geq,     false, true,  false, 10, nd_Gtr    },
   {"==",              "Op_equal"       , tk_Eql,     false, true,  false,  9, nd_Eql    },
   {"!=",              "Op_notequal"    , tk_Neq,     false, true,  false,  9, nd_Neq    },
   {"=",               "Op_assign"      , tk_Assign,  false, false, false, -1, nd_Assign },
   {"&&",              "Op_and"         , tk_And,     false, true,  false,  5, nd_And    },
   {"||",              "Op_or"          , tk_Or,      false, true,  false,  4, nd_And    },
   {"if",              "Keyword_if"     , tk_If,      false, false, false, -1, nd_If     },
   {"else",            "Keyword_else"   , tk_Else,    false, false, false, -1, -1        },
   {"while",           "Keyword_while"  , tk_While,   false, false, false, -1, nd_While  },
   {"print",           "Keyword_print"  , tk_Print,   false, false, false, -1, -1        },
   {"putc",            "Keyword_putc"   , tk_Putc,    false, false, false, -1, -1        },
   {"(",               "LeftParen"      , tk_Lparen,  false, false, false, -1, -1        },
   {")",               "RightParen"     , tk_Rparen,  false, false, false, -1, -1        },
   {"{",               "LeftBrace"      , tk_Lbrace,  false, false, false, -1, -1        },
   {"}",               "RightBrace"     , tk_Rbrace,  false, false, false, -1, -1        },
   {";",               "Semicolon"      , tk_Semi,    false, false, false, -1, -1        },
   {",",               "Comma"          , tk_Comma,   false, false, false, -1, -1        },
   {"Ident",           "Identifier"     , tk_Ident,   false, false, false, -1, nd_Ident  },
   {"Integer literal", "Integer"        , tk_Integer, false, false, false, -1, nd_Integer},
   {"String literal",  "String"         , tk_String,  false, false, false, -1, nd_String },

};

char *Display_nodes[] = {"Identifier", "String", "Integer", "Sequence", "If", "Prtc",

   "Prts", "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod",
   "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal",
   "NotEqual", "And", "Or"};

static tok_s tok; static FILE *source_fp, *dest_fp;

Tree *paren_expr();

void error(int err_line, int err_col, const char *fmt, ... ) {

   va_list ap;
   char buf[1000];
   va_start(ap, fmt);
   vsprintf(buf, fmt, ap);
   va_end(ap);
   printf("(%d, %d) error: %s\n", err_line, err_col, buf);
   exit(1);

}

char *read_line(int *len) {

   static char *text = NULL;
   static int textmax = 0;
   for (*len = 0; ; (*len)++) {
       int ch = fgetc(source_fp);
       if (ch == EOF || ch == '\n') {
           if (*len == 0)
               return NULL;
           break;
       }
       if (*len + 1 >= textmax) {
           textmax = (textmax == 0 ? 128 : textmax * 2);
           text = realloc(text, textmax);
       }
       text[*len] = ch;
   }
   text[*len] = '\0';
   return text;

}

char *rtrim(char *text, int *len) { // remove trailing spaces

   for (; *len > 0 && isspace(text[*len - 1]); --(*len))
       ;
   text[*len] = '\0';
   return text;

}

TokenType get_enum(const char *name) { // return internal version of name

   for (size_t i = 0; i < NELEMS(atr); i++) {
       if (strcmp(atr[i].enum_text, name) == 0)
           return atr[i].tok;
   }
   error(0, 0, "Unknown token %s\n", name);
   return 0;

}

tok_s gettok() {

   int len;
   tok_s tok;
   char *yytext = read_line(&len);
   yytext = rtrim(yytext, &len);
   // [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional
   // get line and column
   tok.err_ln  = atoi(strtok(yytext, " "));
   tok.err_col = atoi(strtok(NULL, " "));
   // get the token name
   char *name = strtok(NULL, " ");
   tok.tok = get_enum(name);
   // if there is extra data, get it
   char *p = name + strlen(name);
   if (p != &yytext[len]) {
       for (++p; isspace(*p); ++p)
           ;
       tok.text = strdup(p);
   }
   return tok;

}

Tree *make_node(NodeType node_type, Tree *left, Tree *right) {

   Tree *t = calloc(sizeof(Tree), 1);
   t->node_type = node_type;
   t->left = left;
   t->right = right;
   return t;

}

Tree *make_leaf(NodeType node_type, char *value) {

   Tree *t = calloc(sizeof(Tree), 1);
   t->node_type = node_type;
   t->value = strdup(value);
   return t;

}

void expect(const char msg[], TokenType s) {

   if (tok.tok == s) {
       tok = gettok();
       return;
   }
   error(tok.err_ln, tok.err_col, "%s: Expecting '%s', found '%s'\n", msg, atr[s].text, atr[tok.tok].text);

}

Tree *expr(int p) {

   Tree *x = NULL, *node;
   TokenType op;
   switch (tok.tok) {
       case tk_Lparen:
           x = paren_expr();
           break;
       case tk_Sub: case tk_Add:
           op = tok.tok;
           tok = gettok();
           node = expr(atr[tk_Negate].precedence);
           x = (op == tk_Sub) ? make_node(nd_Negate, node, NULL) : node;
           break;
       case tk_Not:
           tok = gettok();
           x = make_node(nd_Not, expr(atr[tk_Not].precedence), NULL);
           break;
       case tk_Ident:
           x = make_leaf(nd_Ident, tok.text);
           tok = gettok();
           break;
       case tk_Integer:
           x = make_leaf(nd_Integer, tok.text);
           tok = gettok();
           break;
       default:
           error(tok.err_ln, tok.err_col, "Expecting a primary, found: %s\n", atr[tok.tok].text);
   }
   while (atr[tok.tok].is_binary && atr[tok.tok].precedence >= p) {
       TokenType op = tok.tok;
       tok = gettok();
       int q = atr[op].precedence;
       if (!atr[op].right_associative)
           q++;
       node = expr(q);
       x = make_node(atr[op].node_type, x, node);
   }
   return x;

}

Tree *paren_expr() {

   expect("paren_expr", tk_Lparen);
   Tree *t = expr(0);
   expect("paren_expr", tk_Rparen);
   return t;

}

Tree *stmt() {

   Tree *t = NULL, *v, *e, *s, *s2;
   switch (tok.tok) {
       case tk_If:
           tok = gettok();
           e = paren_expr();
           s = stmt();
           s2 = NULL;
           if (tok.tok == tk_Else) {
               tok = gettok();
               s2 = stmt();
           }
           t = make_node(nd_If, e, make_node(nd_If, s, s2));
           break;
       case tk_Putc:
           tok = gettok();
           e = paren_expr();
           t = make_node(nd_Prtc, e, NULL);
           expect("Putc", tk_Semi);
           break;
       case tk_Print: /* print '(' expr {',' expr} ')' */
           tok = gettok();
           for (expect("Print", tk_Lparen); ; expect("Print", tk_Comma)) {
               if (tok.tok == tk_String) {
                   e = make_node(nd_Prts, make_leaf(nd_String, tok.text), NULL);
                   tok = gettok();
               } else
                   e = make_node(nd_Prti, expr(0), NULL);
               t = make_node(nd_Sequence, t, e);
               if (tok.tok != tk_Comma)
                   break;
           }
           expect("Print", tk_Rparen);
           expect("Print", tk_Semi);
           break;
       case tk_Semi:
           tok = gettok();
           break;
       case tk_Ident:
           v = make_leaf(nd_Ident, tok.text);
           tok = gettok();
           expect("assign", tk_Assign);
           e = expr(0);
           t = make_node(nd_Assign, v, e);
           expect("assign", tk_Semi);
           break;
       case tk_While:
           tok = gettok();
           e = paren_expr();
           s = stmt();
           t = make_node(nd_While, e, s);
           break;
       case tk_Lbrace:         /* {stmt} */
           for (expect("Lbrace", tk_Lbrace); tok.tok != tk_Rbrace && tok.tok != tk_EOI;)
               t = make_node(nd_Sequence, t, stmt());
           expect("Lbrace", tk_Rbrace);
           break;
       case tk_EOI:
           break;
       default: error(tok.err_ln, tok.err_col, "expecting start of statement, found '%s'\n", atr[tok.tok].text);
   }
   return t;

}

Tree *parse() {

   Tree *t = NULL;
   tok = gettok();
   do {
       t = make_node(nd_Sequence, t, stmt());
   } while (t != NULL && tok.tok != tk_EOI);
   return t;

}

void prt_ast(Tree *t) {

   if (t == NULL)
       printf(";\n");
   else {
       printf("%-14s ", Display_nodes[t->node_type]);
       if (t->node_type == nd_Ident || t->node_type == nd_Integer || t->node_type == nd_String) {
           printf("%s\n", t->value);
       } else {
           printf("\n");
           prt_ast(t->left);
           prt_ast(t->right);
       }
   }

}

void init_io(FILE **fp, FILE *std, const char mode[], const char fn[]) {

   if (fn[0] == '\0')
       *fp = std;
   else if ((*fp = fopen(fn, mode)) == NULL)
       error(0, 0, "Can't open %s\n", fn);

}

int main(int argc, char *argv[]) {

   init_io(&source_fp, stdin,  "r",  argc > 1 ? argv[1] : "");
   init_io(&dest_fp,   stdout, "wb", argc > 2 ? argv[2] : "");
   prt_ast(parse());

}</lang>

Output  —  prime numbers AST:

Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier     count
Integer        1
Assign
Identifier     n
Integer        1
Assign
Identifier     limit
Integer        100
While
Less
Identifier     n
Identifier     limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier     k
Integer        3
Assign
Identifier     p
Integer        1
Assign
Identifier     n
Add
Identifier     n
Integer        2
While
And
LessEqual
Multiply
Identifier     k
Identifier     k
Identifier     n
Identifier     p
Sequence
Sequence
;
Assign
Identifier     p
NotEqual
Multiply
Divide
Identifier     n
Identifier     k
Identifier     k
Identifier     n
Assign
Identifier     k
Add
Identifier     k
Integer        2
If
Identifier     p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier     n
;
Prts
String         " is prime\n"
;
Assign
Identifier     count
Add
Identifier     count
Integer        1
;
Sequence
Sequence
Sequence
;
Prts
String         "Total primes found: "
;
Prti
Identifier     count
;
Prts
String         "\n"
;

Python

Tested with Python 2.7 and 3.x <lang Python>from __future__ import print_function import sys, shlex, operator

tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr, \ tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print, \ tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident, \ tk_Integer, tk_String = range(31)

nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, \ nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, \ nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or = range(25)

  1. must have same order as above

Tokens = [

   ["EOI"             , False, False, False, -1, -1        ],
   ["*"               , False, True,  False, 13, nd_Mul    ],
   ["/"               , False, True,  False, 13, nd_Div    ],
   ["%"               , False, True,  False, 13, nd_Mod    ],
   ["+"               , False, True,  False, 12, nd_Add    ],
   ["-"               , False, True,  False, 12, nd_Sub    ],
   ["-"               , False, False, True,  14, nd_Negate ],
   ["!"               , False, False, True,  14, nd_Not    ],
   ["<"               , False, True,  False, 10, nd_Lss    ],
   ["<="              , False, True,  False, 10, nd_Leq    ],
   [">"               , False, True,  False, 10, nd_Gtr    ],
   [">="              , False, True,  False, 10, nd_Geq    ],
   ["=="              , False, True,  False,  9, nd_Eql    ],
   ["!="              , False, True,  False,  9, nd_Neq    ],
   ["="               , False, False, False, -1, nd_Assign ],
   ["&&"              , False, True,  False,  5, nd_And    ],
   ["||"              , False, True,  False,  4, nd_Or     ],
   ["if"              , False, False, False, -1, nd_If     ],
   ["else"            , False, False, False, -1, -1        ],
   ["while"           , False, False, False, -1, nd_While  ],
   ["print"           , False, False, False, -1, -1        ],
   ["putc"            , False, False, False, -1, -1        ],
   ["("               , False, False, False, -1, -1        ],
   [")"               , False, False, False, -1, -1        ],
   ["{"               , False, False, False, -1, -1        ],
   ["}"               , False, False, False, -1, -1        ],
   [";"               , False, False, False, -1, -1        ],
   [","               , False, False, False, -1, -1        ],
   ["Ident"           , False, False, False, -1, nd_Ident  ],
   ["Integer literal" , False, False, False, -1, nd_Integer],
   ["String literal"  , False, False, False, -1, nd_String ]
   ]

all_syms = {"End_of_input"  : tk_EOI, "Op_multiply"  : tk_Mul,

           "Op_divide"      : tk_Div,     "Op_mod"         : tk_Mod,
           "Op_add"         : tk_Add,     "Op_subtract"    : tk_Sub,
           "Op_negate"      : tk_Negate,  "Op_not"         : tk_Not,
           "Op_less"        : tk_Lss,     "Op_lessequal"   : tk_Leq,
           "Op_greater"     : tk_Gtr,     "Op_greaterequal": tk_Geq,
           "Op_equal"       : tk_Eql,     "Op_notequal"    : tk_Neq,
           "Op_assign"      : tk_Assign,  "Op_and"         : tk_And,
           "Op_or"          : tk_Or,      "Keyword_if"     : tk_If,
           "Keyword_else"   : tk_Else,    "Keyword_while"  : tk_While,
           "Keyword_print"  : tk_Print,   "Keyword_putc"   : tk_Putc,
           "LeftParen"      : tk_Lparen,  "RightParen"     : tk_Rparen,
           "LeftBrace"      : tk_Lbrace,  "RightBrace"     : tk_Rbrace,
           "Semicolon"      : tk_Semi,    "Comma"          : tk_Comma,
           "Identifier"     : tk_Ident,   "Integer"        : tk_Integer,
           "String"         : tk_String}

Display_nodes = ["Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts",

   "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add",
   "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual",
   "And", "Or"]

TK_NAME = 0 TK_RIGHT_ASSOC = 1 TK_IS_BINARY = 2 TK_IS_UNARY = 3 TK_PRECEDENCE = 4 TK_NODE = 5

input_file = None err_line = None err_col = None tok = None tok_text = None

        • show error and exit

def error(msg):

   print("(%d, %d) %s" % (int(err_line), int(err_col), msg))
   exit(1)

def gettok():

   global err_line, err_col, tok, tok_text, tok_other
   line = input_file.readline()
   if len(line) == 0:
       error("empty line")
   line_list = shlex.split(line)
   # line col Ident var_name
   # 0    1   2     3
   err_line = line_list[0]
   err_col  = line_list[1]
   tok_text = line_list[2]
   tok = all_syms.get(tok_text)
   if tok == None:
       error("Unknown token %s" % (tok_text))
   tok_other = None
   if tok in [tk_Integer, tk_Ident, tk_String]:
       tok_other = line_list[3]

class Node:

   def __init__(self, node_type, left = None, right = None, value = None):
       self.node_type  = node_type
       self.left  = left
       self.right = right
       self.value = value

def make_node(oper, left, right = None):

   return Node(oper, left, right)

def make_leaf(oper, n):

   return Node(oper, value = n)

def expect(msg, s):

   if tok == s:
       gettok()
       return
   error("%s: Expecting '%s', found '%s'" % (msg, Tokens[s][TK_NAME], Tokens[tok][TK_NAME]))

def expr(p):

   x = None
   if tok == tk_Lparen:
       x = paren_expr()
   elif tok in [tk_Sub, tk_Add]:
       op = (tk_Negate if tok == tk_Sub else tk_Add)
       gettok()
       node = expr(Tokens[tk_Negate][TK_PRECEDENCE])
       x = (make_node(nd_Negate, node) if op == tk_Negate else node)
   elif tok == tk_Not:
       gettok()
       x = make_node(nd_Not, expr(Tokens[tk_Not][TK_PRECEDENCE]))
   elif tok == tk_Ident:
       x = make_leaf(nd_Ident, tok_other)
       gettok()
   elif tok == tk_Integer:
       x = make_leaf(nd_Integer, tok_other)
       gettok()
   else:
       error("Expecting a primary, found: %s" % (Tokens[tok][TK_NAME]))
   while Tokens[tok][TK_IS_BINARY] and Tokens[tok][TK_PRECEDENCE] >= p:
       op = tok
       gettok()
       q = Tokens[op][TK_PRECEDENCE]
       if not Tokens[op][TK_RIGHT_ASSOC]:
           q += 1
       node = expr(q)
       x = make_node(Tokens[op][TK_NODE], x, node)
   return x

def paren_expr():

   expect("paren_expr", tk_Lparen)
   node = expr(0)
   expect("paren_expr", tk_Rparen)
   return node

def stmt():

   t = None
   if tok == tk_If:
       gettok()
       e = paren_expr()
       s = stmt()
       s2 = None
       if tok == tk_Else:
           gettok()
           s2 = stmt()
       t = make_node(nd_If, e, make_node(nd_If, s, s2))
   elif tok == tk_Putc:
       gettok()
       e = paren_expr()
       t = make_node(nd_Prtc, e)
       expect("Putc", tk_Semi)
   elif tok == tk_Print:
       gettok()
       expect("Print", tk_Lparen)
       while True:
           if tok == tk_String:
               e = make_node(nd_Prts, make_leaf(nd_String, tok_other))
               gettok()
           else:
               e = make_node(nd_Prti, expr(0))
           t = make_node(nd_Sequence, t, e)
           if tok != tk_Comma:
               break
           gettok()
       expect("Print", tk_Rparen)
       expect("Print", tk_Semi)
   elif tok == tk_Semi:
       gettok()
   elif tok == tk_Ident:
       v = make_leaf(nd_Ident, tok_other)
       gettok()
       expect("assign", tk_Assign)
       e = expr(0)
       t = make_node(nd_Assign, v, e)
       expect("assign", tk_Semi)
   elif tok == tk_While:
       gettok()
       e = paren_expr()
       s = stmt()
       t = make_node(nd_While, e, s)
   elif tok == tk_Lbrace:
       gettok()
       while tok != tk_Rbrace and tok != tk_EOI:
           t = make_node(nd_Sequence, t, stmt())
       expect("Lbrace", tk_Rbrace)
   elif tok == tk_EOI:
       pass
   else:
       error("Expecting start of statement, found: %s" % (Tokens[tok][TK_NAME]))
   return t

def parse():

   t = None
   gettok()
   while True:
       t = make_node(nd_Sequence, t, stmt())
       if tok == tk_EOI or t == None:
           break
   return t

def prt_ast(t):

   if t == None:
       print(";")
   else:
       print("%-14s" % (Display_nodes[t.node_type]), end=)
       if t.node_type in [nd_Ident, nd_Integer]:
           print("%s" % (t.value))
       elif t.node_type == nd_String:
           print("\"%s\"" %(t.value))
       else:
           print("")
           prt_ast(t.left)
           prt_ast(t.right)
        • main driver

input_file = sys.stdin if len(sys.argv) > 1:

   try:
       input_file = open(sys.argv[1], "r", 4096)
   except IOError as e:
       error(0, 0, "Can't open %s" % sys.argv[1])

t = parse() prt_ast(t)</lang>

Output  —  prime numbers AST:

Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
Assign
Identifier    n
Integer       1
Assign
Identifier    limit
Integer       100
While
Less
Identifier    n
Identifier    limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    k
Integer       3
Assign
Identifier    p
Integer       1
Assign
Identifier    n
Add
Identifier    n
Integer       2
While
And
LessEqual
Multiply
Identifier    k
Identifier    k
Identifier    n
Identifier    p
Sequence
Sequence
;
Assign
Identifier    p
NotEqual
Multiply
Divide
Identifier    n
Identifier    k
Identifier    k
Identifier    n
Assign
Identifier    k
Add
Identifier    k
Integer       2
If
Identifier    p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier    n
;
Prts
String        " is prime\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
;
Sequence
Sequence
Sequence
;
Prts
String        "Total primes found: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;