Compiler/syntax analyzer
A Syntax analyzer transforms a token stream (from the Lexical analyzer) into a Syntax tree, based on a grammar.
Take the output from the Lexical analyzer task, and convert it to an Abstract Syntax Tree (AST), based on the grammar below.
The program should read input from a file and/or stdin, and write output to a file and/or stdout. If the language being used has a parser module/library/class, it would be great if two versions of the solution are provided: One without the parser module, and one with.
The simple programming language to be analyzed is more or less a (very tiny) subset of C. The formal grammar in Extended Backus-Naur Form (EBNF):
<lang EBNF>
stmt_list = {stmt} ;
stmt = ';' | Identifier '=' expr ';' | 'while' paren_expr stmt | 'if' paren_expr stmt ['else' stmt] | 'print' '(' prt_list ')' ';' | 'putc' paren_expr ';' | '{' stmt_list '}' ;
paren_expr = '(' expr ')' ;
prt_list = '(' string | expr {',' String | expr} ')' ;
expr = and_expr {'||' and_expr} ; and_expr = equality_expr {'&&' equality_expr} ; equality_expr = relational_expr [('==' | '!=') relational_expr] ; relational_expr = addition_expr [('<' | '<=' | '>' | '>=') addition_expr] ; addition_expr = multiplication_expr {('+' | '-') multiplication_expr] ; multiplication_expr = primary {('*' | '/' | '%') primary } ; primary = Identifier | Integer | '(' expr ')' | ('+' | '-' | '!') expr ;</lang>
The resulting AST should be formulated as a Binary Tree, that is traversable via Post-Order Depth-first search.
- Example - given the simple program (below), stored in a file called while.t, create the list of tokens, using one of the Lexical analyzer solutions
lex < while.t > while.lex
- Run one of the Syntax analyzer solutions
parse < while.lex > while.ast
- The following table shows the input to lex, lex output, and the AST produced by the parser
Input to lex | Output from lex, input to parse | Output from parse |
---|---|---|
<lang c>count = 1; while (count < 10) { print("count is: ", count, "\n"); count = count + 1; }</lang> |
1 1 Identifier count 1 7 Op_assign 1 9 Integer 1 1 10 Semicolon 2 1 Keyword_while 2 7 LeftParen 2 8 Identifier count 2 14 Op_less 2 16 Integer 10 2 18 RightParen 2 20 LeftBrace 3 5 Keyword_print 3 10 LeftParen 3 11 String "count is: " 3 23 Comma 3 25 Identifier count 3 30 Comma 3 32 String "\n" 3 36 RightParen 3 37 Semicolon 4 5 Identifier count 4 11 Op_assign 4 13 Identifier count 4 19 Op_add 4 21 Integer 1 4 22 Semicolon 5 1 RightBrace 6 1 End_of_input |
Sequence Sequence ; Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence ; Sequence Sequence Sequence ; Prts String "count is: " ; Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1 |
- Specifications
- List of node type names
Identifier String Integer Sequence If Prtc Prts Prti While Assign Negate Not Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or
In the text below, Null/Empty nodes are represented by ";".
- Non-terminal (internal) nodes
For Operators, the following nodes should be created:
Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or
For each of the above nodes, the left and right sub-nodes are the operands of the respective operation.
In pseudo S-Expression format:
(Operator expression expression)
Negate, Not
For these node types, the left node is the operand, and the right node is null.
(Operator expression ;)
Sequence - sub-nodes are either statements or Sequences.
If - left node is the expression, the right node is If node, with it's left node being the if-true statement part, and the right node being the if-false (else) statement part.
(If expression (If statement else-statement))
If there is not an else, the tree becomes:
(If expression (If statement ;))
Prtc
(Prtc (expression) ;)
Prts
(Prts (String "the string") ;)
Prti
(Prti (Integer 12345) ;)
While - left node is the expression, the right node is the statement.
(While expression statement)
Assign - left node is the left-hand side of the assignment, the right node is the right-hand side of the assignment.
(Assign Identifier expression)
Terminal (leaf) nodes:
Identifier: (Identifier ident_name) Integer: (Integer 12345) String: (String "Hello World!") ";": Empty node
Pseudo-code for the parser. Uses Precedence Climbing for expression parsing, and Recursive Descent for statement parsing. The AST is also built:
<lang python>def expr(p)
if tok is "(" x = paren_expr() elif tok in ["-", "+", "!"] gettok() y = expr(precedence of operator) if operator was "+" x = y else x = make_node(operator, y) elif tok is an Identifier x = make_leaf(Identifier, variable name) gettok() elif tok is an Integer constant x = make_leaf(Integer, integer value) gettok() else error()
while tok is a binary operator and precedence of tok >= p save_tok = tok gettok() q = precedence of save_tok if save_tok is not right associative q += 1 x = make_node(Operator save_tok represents, x, expr(q))
return x
def paren_expr()
expect("(") x = expr(0) expect(")") return x
def stmt()
t = None if accept("if") e = paren_expr() s = stmt() t = make_node(If, e, make_node(If, s, accept("else") ? stmt() : None)) elif accept("putc") t = make_node(Prtc, paren_expr()) expect(";") elif accept("print") expect("(") repeat if tok is a string e = make_node(Prts, make_leaf(String, the string)) gettok() else e = make_node(Prti, expr(0))
t = make_node(Sequence, t, e) until not accept(",") expect(")") expect(";") elif tok is ";" gettok() elif tok is an Identifier v = make_leaf(Identifier, variable name) gettok() expect("=") t = make_node(Assign, v, expr(0)) expect(";") elif accept("while") e = paren_expr() t = make_node(While, e, stmt() elif accept("{") while tok not equal "}" and tok not equal end-of-file t = make_node(Sequence, t, stmt()) expect("}") elif tok is end-of-file pass else error() return t
def parse()
t = None gettok() repeat t = make_node(Sequence, t, stmt()) until tok is end-of-file return t</lang>
- Once the AST is built, outputting it is as simple as
<lang python>def prt_ast(t)
if t == None print(";\n") else print(t.node_type) if t.node_type in [Identifier, Integer, String] # leaf node print the value of the Ident, Integer or String, "\n" else print("\n") prt_ast(t.left) prt_ast(t.right)</lang>
- If the AST is correctly built, loading it into a subsequent program should be as simple as
<lang python>def load_ast()
line = readline() # Each line has at least one token line_list = tokenize the line, respecting double quotes
text = line_list[0] # first token is always the node type
if text == ";" # a terminal node return None
node_type = text # could convert to internal form if desired
# A line with two tokens is a leaf node # Leaf nodes are: Identifier, Integer, String # The 2nd token is the value if len(line_list) > 1 return make_leaf(node_type, line_list[1])
left = load_ast() right = load_ast() return make_node(node_type, left, right)</lang>
Finally, the AST can also be tested by running it against one of the AST Interpreter solutions.
- Test program, assuming this is in a file called prime.t
- lex <prime.t | parse
Input to lex | Output from lex, input to parse | Output from parse |
---|---|---|
<lang c>/* Simple prime number generator */ count = 1; n = 1; limit = 100; while (n < limit) { k=3; p=1; n=n+2; while ((k*k<=n) && (p)) { p=n/k*k!=n; k=k+2; } if (p) { print(n, " is prime\n"); count = count + 1; } } print("Total primes found: ", count, "\n");</lang> |
4 1 Identifier count 4 7 Op_assign 4 9 Integer 1 4 10 Semicolon 5 1 Identifier n 5 3 Op_assign 5 5 Integer 1 5 6 Semicolon 6 1 Identifier limit 6 7 Op_assign 6 9 Integer 100 6 12 Semicolon 7 1 Keyword_while 7 7 LeftParen 7 8 Identifier n 7 10 Op_less 7 12 Identifier limit 7 17 RightParen 7 19 LeftBrace 8 5 Identifier k 8 6 Op_assign 8 7 Integer 3 8 8 Semicolon 9 5 Identifier p 9 6 Op_assign 9 7 Integer 1 9 8 Semicolon 10 5 Identifier n 10 6 Op_assign 10 7 Identifier n 10 8 Op_add 10 9 Integer 2 10 10 Semicolon 11 5 Keyword_while 11 11 LeftParen 11 12 LeftParen 11 13 Identifier k 11 14 Op_multiply 11 15 Identifier k 11 16 Op_lessequal 11 18 Identifier n 11 19 RightParen 11 21 Op_and 11 24 LeftParen 11 25 Identifier p 11 26 RightParen 11 27 RightParen 11 29 LeftBrace 12 9 Identifier p 12 10 Op_assign 12 11 Identifier n 12 12 Op_divide 12 13 Identifier k 12 14 Op_multiply 12 15 Identifier k 12 16 Op_notequal 12 18 Identifier n 12 19 Semicolon 13 9 Identifier k 13 10 Op_assign 13 11 Identifier k 13 12 Op_add 13 13 Integer 2 13 14 Semicolon 14 5 RightBrace 15 5 Keyword_if 15 8 LeftParen 15 9 Identifier p 15 10 RightParen 15 12 LeftBrace 16 9 Keyword_print 16 14 LeftParen 16 15 Identifier n 16 16 Comma 16 18 String " is prime\n" 16 31 RightParen 16 32 Semicolon 17 9 Identifier count 17 15 Op_assign 17 17 Identifier count 17 23 Op_add 17 25 Integer 1 17 26 Semicolon 18 5 RightBrace 19 1 RightBrace 20 1 Keyword_print 20 6 LeftParen 20 7 String "Total primes found: " 20 29 Comma 20 31 Identifier count 20 36 Comma 20 38 String "\n" 20 42 RightParen 20 43 Semicolon 21 1 End_of_input |
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ; |
C
Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra <lang C>#include <stdio.h>
- include <stdlib.h>
- include <string.h>
- include <stdarg.h>
- include <stdbool.h>
- include <ctype.h>
- define NELEMS(arr) (sizeof(arr) / sizeof(arr[0]))
/*#define make_sequence(t, s) if (t == NULL) t = s = make_node(nd_Sequence, stmt(), NULL); \
else s = s->right = make_node(nd_Sequence, stmt(), NULL)*/
- define make_sequence(t, s) t = make_node(nd_Sequence, t, stmt())
typedef enum {
tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr, tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print, tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident, tk_Integer, tk_String
} TokenType;
typedef enum {
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or
} NodeType;
typedef struct {
TokenType tok; int err_ln; int err_col; char *text; /* ident or string literal or integer value */
} tok_s;
typedef struct Tree {
NodeType node_type; struct Tree *left; struct Tree *right; char *value;
} Tree;
// dependency: Ordered by tok, must remain in same order as TokenType enum struct {
char *text, *enum_text; TokenType tok; bool right_associative, is_binary, is_unary; int precedence; NodeType node_type;
} atr[] = {
{"EOI", "End_of_input" , tk_EOI, false, false, false, -1, -1 }, {"*", "Op_multiply" , tk_Mul, false, true, false, 13, nd_Mul }, {"/", "Op_divide" , tk_Div, false, true, false, 13, nd_Div }, {"%", "Op_mod" , tk_Mod, false, true, false, 13, nd_Mod }, {"+", "Op_add" , tk_Add, false, true, false, 12, nd_Add }, {"-", "Op_subtract" , tk_Sub, false, true, false, 12, nd_Sub }, {"-", "Op_negate" , tk_Negate, false, false, true, 14, nd_Negate }, {"!", "Op_not" , tk_Not, false, false, true, 14, nd_Negate }, {"<", "Op_less" , tk_Lss, false, true, false, 10, nd_Lss }, {"<=", "Op_lessequal" , tk_Leq, false, true, false, 10, nd_Leq }, {">", "Op_greater" , tk_Gtr, false, true, false, 10, nd_Gtr }, {">=", "Op_greaterequal", tk_Geq, false, true, false, 10, nd_Gtr }, {"==", "Op_equal" , tk_Eql, false, true, false, 9, nd_Eql }, {"!=", "Op_notequal" , tk_Neq, false, true, false, 9, nd_Neq }, {"=", "Op_assign" , tk_Assign, false, false, false, -1, nd_Assign }, {"&&", "Op_and" , tk_And, false, true, false, 5, nd_And }, {"||", "Op_or" , tk_Or, false, true, false, 4, nd_And }, {"if", "Keyword_if" , tk_If, false, false, false, -1, nd_If }, {"else", "Keyword_else" , tk_Else, false, false, false, -1, -1 }, {"while", "Keyword_while" , tk_While, false, false, false, -1, nd_While }, {"print", "Keyword_print" , tk_Print, false, false, false, -1, -1 }, {"putc", "Keyword_putc" , tk_Putc, false, false, false, -1, -1 }, {"(", "LeftParen" , tk_Lparen, false, false, false, -1, -1 }, {")", "RightParen" , tk_Rparen, false, false, false, -1, -1 }, {"{", "LeftBrace" , tk_Lbrace, false, false, false, -1, -1 }, {"}", "RightBrace" , tk_Rbrace, false, false, false, -1, -1 }, {";", "Semicolon" , tk_Semi, false, false, false, -1, -1 }, {",", "Comma" , tk_Comma, false, false, false, -1, -1 }, {"Ident", "Identifier" , tk_Ident, false, false, false, -1, nd_Ident }, {"Integer literal", "Integer" , tk_Integer, false, false, false, -1, nd_Integer}, {"String literal", "String" , tk_String, false, false, false, -1, nd_String },
};
char *Display_nodes[] = {"Identifier", "String", "Integer", "Sequence", "If", "Prtc",
"Prts", "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual", "And", "Or"};
static tok_s tok; static FILE *source_fp, *dest_fp;
Tree *paren_expr();
void error(int err_line, int err_col, const char *fmt, ... ) {
va_list ap; char buf[1000];
va_start(ap, fmt); vsprintf(buf, fmt, ap); va_end(ap); printf("(%d, %d) error: %s\n", err_line, err_col, buf); exit(1);
}
char *read_line(int *len) {
static char *text = NULL; static int textmax = 0;
for (*len = 0; ; (*len)++) { int ch = fgetc(source_fp); if (ch == EOF || ch == '\n') { if (*len == 0) return NULL; break; } if (*len + 1 >= textmax) { textmax = (textmax == 0 ? 128 : textmax * 2); text = realloc(text, textmax); } text[*len] = ch; } text[*len] = '\0'; return text;
}
char *rtrim(char *text, int *len) { // remove trailing spaces
for (; *len > 0 && isspace(text[*len - 1]); --(*len)) ;
text[*len] = '\0'; return text;
}
TokenType get_enum(const char *name) { // return internal version of name
for (size_t i = 0; i < NELEMS(atr); i++) { if (strcmp(atr[i].enum_text, name) == 0) return atr[i].tok; } error(0, 0, "Unknown token %s\n", name); return 0;
}
tok_s gettok() {
int len; tok_s tok; char *yytext = read_line(&len); yytext = rtrim(yytext, &len);
// [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional
// get line and column tok.err_ln = atoi(strtok(yytext, " ")); tok.err_col = atoi(strtok(NULL, " "));
// get the token name char *name = strtok(NULL, " "); tok.tok = get_enum(name);
// if there is extra data, get it char *p = name + strlen(name); if (p != &yytext[len]) { for (++p; isspace(*p); ++p) ; tok.text = strdup(p); } return tok;
}
Tree *make_node(NodeType node_type, Tree *left, Tree *right) {
Tree *t = calloc(sizeof(Tree), 1); t->node_type = node_type; t->left = left; t->right = right; return t;
}
Tree *make_leaf(NodeType node_type, char *value) {
Tree *t = calloc(sizeof(Tree), 1); t->node_type = node_type; t->value = strdup(value); return t;
}
void expect(const char msg[], TokenType s) {
if (tok.tok == s) { tok = gettok(); return; } error(tok.err_ln, tok.err_col, "%s: Expecting '%s', found '%s'\n", msg, atr[s].text, atr[tok.tok].text);
}
Tree *expr(int p) {
Tree *x = NULL, *node; TokenType op;
switch (tok.tok) { case tk_Lparen: x = paren_expr(); break; case tk_Sub: case tk_Add: op = tok.tok; tok = gettok(); node = expr(atr[tk_Negate].precedence); x = (op == tk_Sub) ? make_node(nd_Negate, node, NULL) : node; break; case tk_Not: tok = gettok(); x = make_node(nd_Not, expr(atr[tk_Not].precedence), NULL); break; case tk_Ident: x = make_leaf(nd_Ident, tok.text); tok = gettok(); break; case tk_Integer: x = make_leaf(nd_Integer, tok.text); tok = gettok(); break; default: error(tok.err_ln, tok.err_col, "Expecting a primary, found: %s\n", atr[tok.tok].text); }
while (atr[tok.tok].is_binary && atr[tok.tok].precedence >= p) { TokenType op = tok.tok;
tok = gettok();
int q = atr[op].precedence; if (!atr[op].right_associative) q++;
node = expr(q); x = make_node(atr[op].node_type, x, node); } return x;
}
Tree *paren_expr() {
expect("paren_expr", tk_Lparen); Tree *t = expr(0); expect("paren_expr", tk_Rparen); return t;
}
Tree *stmt() {
Tree *t = NULL, *v, *e, *s, *s2;
switch (tok.tok) { case tk_If: tok = gettok(); e = paren_expr(); s = stmt(); s2 = NULL; if (tok.tok == tk_Else) { tok = gettok(); s2 = stmt(); } t = make_node(nd_If, e, make_node(nd_If, s, s2)); break; case tk_Putc: tok = gettok(); e = paren_expr(); t = make_node(nd_Prtc, e, NULL); expect("Putc", tk_Semi); break; case tk_Print: /* print '(' expr {',' expr} ')' */ tok = gettok(); for (expect("Print", tk_Lparen); ; expect("Print", tk_Comma)) { if (tok.tok == tk_String) { e = make_node(nd_Prts, make_leaf(nd_String, tok.text), NULL); tok = gettok(); } else e = make_node(nd_Prti, expr(0), NULL);
t = make_node(nd_Sequence, t, e);
if (tok.tok != tk_Comma) break; } expect("Print", tk_Rparen); expect("Print", tk_Semi); break; case tk_Semi: tok = gettok(); break; case tk_Ident: v = make_leaf(nd_Ident, tok.text); tok = gettok(); expect("assign", tk_Assign); e = expr(0); t = make_node(nd_Assign, v, e); expect("assign", tk_Semi); break; case tk_While: tok = gettok(); e = paren_expr(); s = stmt(); t = make_node(nd_While, e, s); break; case tk_Lbrace: /* {stmt} */ for (expect("Lbrace", tk_Lbrace); tok.tok != tk_Rbrace && tok.tok != tk_EOI;) t = make_node(nd_Sequence, t, stmt()); expect("Lbrace", tk_Rbrace); break; case tk_EOI: break; default: error(tok.err_ln, tok.err_col, "expecting start of statement, found '%s'\n", atr[tok.tok].text); } return t;
}
Tree *parse() {
Tree *t = NULL;
tok = gettok(); do { t = make_node(nd_Sequence, t, stmt()); } while (t != NULL && tok.tok != tk_EOI); return t;
}
void prt_ast(Tree *t) {
if (t == NULL) printf(";\n"); else { printf("%-14s ", Display_nodes[t->node_type]); if (t->node_type == nd_Ident || t->node_type == nd_Integer || t->node_type == nd_String) { printf("%s\n", t->value); } else { printf("\n"); prt_ast(t->left); prt_ast(t->right); } }
}
void init_io(FILE **fp, FILE *std, const char mode[], const char fn[]) {
if (fn[0] == '\0') *fp = std; else if ((*fp = fopen(fn, mode)) == NULL) error(0, 0, "Can't open %s\n", fn);
}
int main(int argc, char *argv[]) {
init_io(&source_fp, stdin, "r", argc > 1 ? argv[1] : ""); init_io(&dest_fp, stdout, "wb", argc > 2 ? argv[2] : ""); prt_ast(parse());
}</lang>
- Output — prime numbers AST:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Python
Tested with Python 2.7 and 3.x <lang Python>from __future__ import print_function import sys, shlex, operator
tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr, \ tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print, \ tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident, \ tk_Integer, tk_String = range(31)
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, \ nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, \ nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or = range(25)
- must have same order as above
Tokens = [
["EOI" , False, False, False, -1, -1 ], ["*" , False, True, False, 13, nd_Mul ], ["/" , False, True, False, 13, nd_Div ], ["%" , False, True, False, 13, nd_Mod ], ["+" , False, True, False, 12, nd_Add ], ["-" , False, True, False, 12, nd_Sub ], ["-" , False, False, True, 14, nd_Negate ], ["!" , False, False, True, 14, nd_Not ], ["<" , False, True, False, 10, nd_Lss ], ["<=" , False, True, False, 10, nd_Leq ], [">" , False, True, False, 10, nd_Gtr ], [">=" , False, True, False, 10, nd_Geq ], ["==" , False, True, False, 9, nd_Eql ], ["!=" , False, True, False, 9, nd_Neq ], ["=" , False, False, False, -1, nd_Assign ], ["&&" , False, True, False, 5, nd_And ], ["||" , False, True, False, 4, nd_Or ], ["if" , False, False, False, -1, nd_If ], ["else" , False, False, False, -1, -1 ], ["while" , False, False, False, -1, nd_While ], ["print" , False, False, False, -1, -1 ], ["putc" , False, False, False, -1, -1 ], ["(" , False, False, False, -1, -1 ], [")" , False, False, False, -1, -1 ], ["{" , False, False, False, -1, -1 ], ["}" , False, False, False, -1, -1 ], [";" , False, False, False, -1, -1 ], ["," , False, False, False, -1, -1 ], ["Ident" , False, False, False, -1, nd_Ident ], ["Integer literal" , False, False, False, -1, nd_Integer], ["String literal" , False, False, False, -1, nd_String ] ]
all_syms = {"End_of_input" : tk_EOI, "Op_multiply" : tk_Mul,
"Op_divide" : tk_Div, "Op_mod" : tk_Mod, "Op_add" : tk_Add, "Op_subtract" : tk_Sub, "Op_negate" : tk_Negate, "Op_not" : tk_Not, "Op_less" : tk_Lss, "Op_lessequal" : tk_Leq, "Op_greater" : tk_Gtr, "Op_greaterequal": tk_Geq, "Op_equal" : tk_Eql, "Op_notequal" : tk_Neq, "Op_assign" : tk_Assign, "Op_and" : tk_And, "Op_or" : tk_Or, "Keyword_if" : tk_If, "Keyword_else" : tk_Else, "Keyword_while" : tk_While, "Keyword_print" : tk_Print, "Keyword_putc" : tk_Putc, "LeftParen" : tk_Lparen, "RightParen" : tk_Rparen, "LeftBrace" : tk_Lbrace, "RightBrace" : tk_Rbrace, "Semicolon" : tk_Semi, "Comma" : tk_Comma, "Identifier" : tk_Ident, "Integer" : tk_Integer, "String" : tk_String}
Display_nodes = ["Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts",
"Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual", "And", "Or"]
TK_NAME = 0 TK_RIGHT_ASSOC = 1 TK_IS_BINARY = 2 TK_IS_UNARY = 3 TK_PRECEDENCE = 4 TK_NODE = 5
input_file = None err_line = None err_col = None tok = None tok_text = None
- show error and exit
def error(msg):
print("(%d, %d) %s" % (int(err_line), int(err_col), msg)) exit(1)
def gettok():
global err_line, err_col, tok, tok_text, tok_other line = input_file.readline() if len(line) == 0: error("empty line")
line_list = shlex.split(line) # line col Ident var_name # 0 1 2 3
err_line = line_list[0] err_col = line_list[1] tok_text = line_list[2]
tok = all_syms.get(tok_text) if tok == None: error("Unknown token %s" % (tok_text))
tok_other = None if tok in [tk_Integer, tk_Ident, tk_String]: tok_other = line_list[3]
class Node:
def __init__(self, node_type, left = None, right = None, value = None): self.node_type = node_type self.left = left self.right = right self.value = value
def make_node(oper, left, right = None):
return Node(oper, left, right)
def make_leaf(oper, n):
return Node(oper, value = n)
def expect(msg, s):
if tok == s: gettok() return error("%s: Expecting '%s', found '%s'" % (msg, Tokens[s][TK_NAME], Tokens[tok][TK_NAME]))
def expr(p):
x = None
if tok == tk_Lparen: x = paren_expr() elif tok in [tk_Sub, tk_Add]: op = (tk_Negate if tok == tk_Sub else tk_Add) gettok() node = expr(Tokens[tk_Negate][TK_PRECEDENCE]) x = (make_node(nd_Negate, node) if op == tk_Negate else node) elif tok == tk_Not: gettok() x = make_node(nd_Not, expr(Tokens[tk_Not][TK_PRECEDENCE])) elif tok == tk_Ident: x = make_leaf(nd_Ident, tok_other) gettok() elif tok == tk_Integer: x = make_leaf(nd_Integer, tok_other) gettok() else: error("Expecting a primary, found: %s" % (Tokens[tok][TK_NAME]))
while Tokens[tok][TK_IS_BINARY] and Tokens[tok][TK_PRECEDENCE] >= p: op = tok gettok() q = Tokens[op][TK_PRECEDENCE] if not Tokens[op][TK_RIGHT_ASSOC]: q += 1
node = expr(q) x = make_node(Tokens[op][TK_NODE], x, node)
return x
def paren_expr():
expect("paren_expr", tk_Lparen) node = expr(0) expect("paren_expr", tk_Rparen) return node
def stmt():
t = None
if tok == tk_If: gettok() e = paren_expr() s = stmt() s2 = None if tok == tk_Else: gettok() s2 = stmt() t = make_node(nd_If, e, make_node(nd_If, s, s2)) elif tok == tk_Putc: gettok() e = paren_expr() t = make_node(nd_Prtc, e) expect("Putc", tk_Semi) elif tok == tk_Print: gettok() expect("Print", tk_Lparen) while True: if tok == tk_String: e = make_node(nd_Prts, make_leaf(nd_String, tok_other)) gettok() else: e = make_node(nd_Prti, expr(0))
t = make_node(nd_Sequence, t, e) if tok != tk_Comma: break gettok() expect("Print", tk_Rparen) expect("Print", tk_Semi) elif tok == tk_Semi: gettok() elif tok == tk_Ident: v = make_leaf(nd_Ident, tok_other) gettok() expect("assign", tk_Assign) e = expr(0) t = make_node(nd_Assign, v, e) expect("assign", tk_Semi) elif tok == tk_While: gettok() e = paren_expr() s = stmt() t = make_node(nd_While, e, s) elif tok == tk_Lbrace: gettok() while tok != tk_Rbrace and tok != tk_EOI: t = make_node(nd_Sequence, t, stmt()) expect("Lbrace", tk_Rbrace) elif tok == tk_EOI: pass else: error("Expecting start of statement, found: %s" % (Tokens[tok][TK_NAME]))
return t
def parse():
t = None gettok() while True: t = make_node(nd_Sequence, t, stmt()) if tok == tk_EOI or t == None: break return t
def prt_ast(t):
if t == None: print(";") else: print("%-14s" % (Display_nodes[t.node_type]), end=) if t.node_type in [nd_Ident, nd_Integer]: print("%s" % (t.value)) elif t.node_type == nd_String: print("\"%s\"" %(t.value)) else: print("") prt_ast(t.left) prt_ast(t.right)
- main driver
input_file = sys.stdin if len(sys.argv) > 1:
try: input_file = open(sys.argv[1], "r", 4096) except IOError as e: error(0, 0, "Can't open %s" % sys.argv[1])
t = parse() prt_ast(t)</lang>
- Output — prime numbers AST:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;