Compiler/syntax analyzer
Syntax Analyzer
You are encouraged to solve this task according to the task description, using any language you may know.
A Syntax analyzer transforms a token stream (from the Lexical analyzer) into a Syntax tree, based on a grammar.
Take the output from the Lexical analyzer task, and convert it to an Abstract Syntax Tree (AST), based on the grammar below. The output should be in a flattened format.
The program should read input from a file and/or stdin, and write output to a file and/or stdout. If the language being used has a parser module/library/class, it would be great if two versions of the solution are provided: One without the parser module, and one with.
The simple programming language to be analyzed is more or less a (very tiny) subset of C. The formal grammar in Extended Backus-Naur Form (EBNF):
<lang EBNF>
stmt_list = {stmt} ;
stmt = ';' | Identifier '=' expr ';' | 'while' paren_expr stmt | 'if' paren_expr stmt ['else' stmt] | 'print' '(' prt_list ')' ';' | 'putc' paren_expr ';' | '{' stmt_list '}' ;
paren_expr = '(' expr ')' ;
prt_list = (string | expr) {',' (String | expr)} ;
expr = and_expr {'||' and_expr} ; and_expr = equality_expr {'&&' equality_expr} ; equality_expr = relational_expr [('==' | '!=') relational_expr] ; relational_expr = addition_expr [('<' | '<=' | '>' | '>=') addition_expr] ; addition_expr = multiplication_expr {('+' | '-') multiplication_expr} ; multiplication_expr = primary {('*' | '/' | '%') primary } ; primary = Identifier | Integer | '(' expr ')' | ('+' | '-' | '!') primary ;</lang>
The resulting AST should be formulated as a Binary Tree.
- Example - given the simple program (below), stored in a file called while.t, create the list of tokens, using one of the Lexical analyzer solutions
lex < while.t > while.lex
- Run one of the Syntax analyzer solutions
parse < while.lex > while.ast
- The following table shows the input to lex, lex output, and the AST produced by the parser
Input to lex | Output from lex, input to parse | Output from parse |
---|---|---|
<lang c>count = 1; while (count < 10) { print("count is: ", count, "\n"); count = count + 1; }</lang> |
1 1 Identifier count 1 7 Op_assign 1 9 Integer 1 1 10 Semicolon 2 1 Keyword_while 2 7 LeftParen 2 8 Identifier count 2 14 Op_less 2 16 Integer 10 2 18 RightParen 2 20 LeftBrace 3 5 Keyword_print 3 10 LeftParen 3 11 String "count is: " 3 23 Comma 3 25 Identifier count 3 30 Comma 3 32 String "\n" 3 36 RightParen 3 37 Semicolon 4 5 Identifier count 4 11 Op_assign 4 13 Identifier count 4 19 Op_add 4 21 Integer 1 4 22 Semicolon 5 1 RightBrace 6 1 End_of_input |
Sequence Sequence ; Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence ; Sequence Sequence Sequence ; Prts String "count is: " ; Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1 |
- Specifications
- List of node type names
Identifier String Integer Sequence If Prtc Prts Prti While Assign Negate Not Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or
In the text below, Null/Empty nodes are represented by ";".
- Non-terminal (internal) nodes
For Operators, the following nodes should be created:
Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or
For each of the above nodes, the left and right sub-nodes are the operands of the respective operation.
In pseudo S-Expression format:
(Operator expression expression)
Negate, Not
For these node types, the left node is the operand, and the right node is null.
(Operator expression ;)
Sequence - sub-nodes are either statements or Sequences.
If - left node is the expression, the right node is If node, with it's left node being the if-true statement part, and the right node being the if-false (else) statement part.
(If expression (If statement else-statement))
If there is not an else, the tree becomes:
(If expression (If statement ;))
Prtc
(Prtc (expression) ;)
Prts
(Prts (String "the string") ;)
Prti
(Prti (Integer 12345) ;)
While - left node is the expression, the right node is the statement.
(While expression statement)
Assign - left node is the left-hand side of the assignment, the right node is the right-hand side of the assignment.
(Assign Identifier expression)
Terminal (leaf) nodes:
Identifier: (Identifier ident_name) Integer: (Integer 12345) String: (String "Hello World!") ";": Empty node
- Some simple examples
Sequences denote a list node; they are used to represent a list. semicolon's represent a null node, e.g., the end of this path.
This simple program:
a=11;
Produces the following AST, encoded as a binary tree:
Under each non-leaf node are two '|' lines. The first represents the left sub-node, the second represents the right sub-node:
(1) Sequence (2) |-- ; (3) |-- Assign (4) |-- Identifier: a (5) |-- Integer: 11
In flattened form:
(1) Sequence (2) ; (3) Assign (4) Identifier a (5) Integer 11
This program:
a=11; b=22; c=33;
Produces the following AST:
( 1) Sequence ( 2) |-- Sequence ( 3) | |-- Sequence ( 4) | | |-- ; ( 5) | | |-- Assign ( 6) | | |-- Identifier: a ( 7) | | |-- Integer: 11 ( 8) | |-- Assign ( 9) | |-- Identifier: b (10) | |-- Integer: 22 (11) |-- Assign (12) |-- Identifier: c (13) |-- Integer: 33
In flattened form:
( 1) Sequence ( 2) Sequence ( 3) Sequence ( 4) ; ( 5) Assign ( 6) Identifier a ( 7) Integer 11 ( 8) Assign ( 9) Identifier b (10) Integer 22 (11) Assign (12) Identifier c (13) Integer 33
- Pseudo-code for the parser.
Uses Precedence Climbing for expression parsing, and Recursive Descent for statement parsing. The AST is also built:
<lang python>def expr(p)
if tok is "(" x = paren_expr() elif tok in ["-", "+", "!"] gettok() y = expr(precedence of operator) if operator was "+" x = y else x = make_node(operator, y) elif tok is an Identifier x = make_leaf(Identifier, variable name) gettok() elif tok is an Integer constant x = make_leaf(Integer, integer value) gettok() else error()
while tok is a binary operator and precedence of tok >= p save_tok = tok gettok() q = precedence of save_tok if save_tok is not right associative q += 1 x = make_node(Operator save_tok represents, x, expr(q))
return x
def paren_expr()
expect("(") x = expr(0) expect(")") return x
def stmt()
t = NULL if accept("if") e = paren_expr() s = stmt() t = make_node(If, e, make_node(If, s, accept("else") ? stmt() : NULL)) elif accept("putc") t = make_node(Prtc, paren_expr()) expect(";") elif accept("print") expect("(") repeat if tok is a string e = make_node(Prts, make_leaf(String, the string)) gettok() else e = make_node(Prti, expr(0))
t = make_node(Sequence, t, e) until not accept(",") expect(")") expect(";") elif tok is ";" gettok() elif tok is an Identifier v = make_leaf(Identifier, variable name) gettok() expect("=") t = make_node(Assign, v, expr(0)) expect(";") elif accept("while") e = paren_expr() t = make_node(While, e, stmt() elif accept("{") while tok not equal "}" and tok not equal end-of-file t = make_node(Sequence, t, stmt()) expect("}") elif tok is end-of-file pass else error() return t
def parse()
t = NULL gettok() repeat t = make_node(Sequence, t, stmt()) until tok is end-of-file return t</lang>
- Once the AST is built, it should be output in a flattened format. This can be as simple as the following
<lang python>def prt_ast(t)
if t == NULL print(";\n") else print(t.node_type) if t.node_type in [Identifier, Integer, String] # leaf node print the value of the Ident, Integer or String, "\n" else print("\n") prt_ast(t.left) prt_ast(t.right)</lang>
- If the AST is correctly built, loading it into a subsequent program should be as simple as
<lang python>def load_ast()
line = readline() # Each line has at least one token line_list = tokenize the line, respecting double quotes
text = line_list[0] # first token is always the node type
if text == ";" # a terminal node return NULL
node_type = text # could convert to internal form if desired
# A line with two tokens is a leaf node # Leaf nodes are: Identifier, Integer, String # The 2nd token is the value if len(line_list) > 1 return make_leaf(node_type, line_list[1])
left = load_ast() right = load_ast() return make_node(node_type, left, right)</lang>
Finally, the AST can also be tested by running it against one of the AST Interpreter solutions.
- Test program, assuming this is in a file called prime.t
- lex <prime.t | parse
Input to lex | Output from lex, input to parse | Output from parse |
---|---|---|
<lang c>/* Simple prime number generator */ count = 1; n = 1; limit = 100; while (n < limit) { k=3; p=1; n=n+2; while ((k*k<=n) && (p)) { p=n/k*k!=n; k=k+2; } if (p) { print(n, " is prime\n"); count = count + 1; } } print("Total primes found: ", count, "\n");</lang> |
4 1 Identifier count 4 7 Op_assign 4 9 Integer 1 4 10 Semicolon 5 1 Identifier n 5 3 Op_assign 5 5 Integer 1 5 6 Semicolon 6 1 Identifier limit 6 7 Op_assign 6 9 Integer 100 6 12 Semicolon 7 1 Keyword_while 7 7 LeftParen 7 8 Identifier n 7 10 Op_less 7 12 Identifier limit 7 17 RightParen 7 19 LeftBrace 8 5 Identifier k 8 6 Op_assign 8 7 Integer 3 8 8 Semicolon 9 5 Identifier p 9 6 Op_assign 9 7 Integer 1 9 8 Semicolon 10 5 Identifier n 10 6 Op_assign 10 7 Identifier n 10 8 Op_add 10 9 Integer 2 10 10 Semicolon 11 5 Keyword_while 11 11 LeftParen 11 12 LeftParen 11 13 Identifier k 11 14 Op_multiply 11 15 Identifier k 11 16 Op_lessequal 11 18 Identifier n 11 19 RightParen 11 21 Op_and 11 24 LeftParen 11 25 Identifier p 11 26 RightParen 11 27 RightParen 11 29 LeftBrace 12 9 Identifier p 12 10 Op_assign 12 11 Identifier n 12 12 Op_divide 12 13 Identifier k 12 14 Op_multiply 12 15 Identifier k 12 16 Op_notequal 12 18 Identifier n 12 19 Semicolon 13 9 Identifier k 13 10 Op_assign 13 11 Identifier k 13 12 Op_add 13 13 Integer 2 13 14 Semicolon 14 5 RightBrace 15 5 Keyword_if 15 8 LeftParen 15 9 Identifier p 15 10 RightParen 15 12 LeftBrace 16 9 Keyword_print 16 14 LeftParen 16 15 Identifier n 16 16 Comma 16 18 String " is prime\n" 16 31 RightParen 16 32 Semicolon 17 9 Identifier count 17 15 Op_assign 17 17 Identifier count 17 23 Op_add 17 25 Integer 1 17 26 Semicolon 18 5 RightBrace 19 1 RightBrace 20 1 Keyword_print 20 6 LeftParen 20 7 String "Total primes found: " 20 29 Comma 20 31 Identifier count 20 36 Comma 20 38 String "\n" 20 42 RightParen 20 43 Semicolon 21 1 End_of_input |
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ; |
- Additional examples
Your solution should pass all the test cases above and the additional tests found Here.
The C and Python versions can be considered reference implementations.
- Related Tasks
ALGOL W
<lang algolw>begin % syntax analyser %
% parse tree nodes % record node( integer type ; reference(node) left, right ; integer iValue % nString/nIndentifier number or nInteger value % ); integer nIdentifier, nString, nInteger, nSequence, nIf, nPrtc, nPrts , nPrti, nWhile, nAssign, nNegate, nNot, nMultiply , nDivide, nMod, nAdd, nSubtract, nLess, nLessEqual , nGreater, nGreaterEqual, nEqual, nNotEqual, nAnd, nOr ; string(14) array ndName ( 1 :: 25 ); % tokens - names must match those output by the lexical analyser % integer tkType, tkLine, tkColumn, tkLength, tkIntegerValue; integer tOp_multiply , tOp_divide , tOp_mod , tOp_add , tOp_subtract , tOp_negate , tOp_less , tOp_lessequal , tOp_greater , tOp_greaterequal , tOp_equal , tOp_notequal , tOp_not , tOp_assign , tOp_and , tOp_or , tLeftParen , tRightParen , tLeftBrace , tRightBrace , tSemicolon , tComma , tKeyword_if , tKeyword_else , tKeyword_while , tKeyword_print , tKeyword_putc , tIdentifier , tInteger , tString , tEnd_of_input , MAX_TOKEN_TYPE, PRIMARY_PREC ; string(16) array tkName ( 1 :: 31 ); integer array tkPrec, tkNode ( 1 :: 31 ); % string literals and identifiers - uses a linked list - a hash table might be better... % string(1) array text ( 0 :: 4095 ); integer textNext, TEXT_MAX; record textElement ( integer start, length; reference(textElement) next ); reference(textElement) idList, stList;
% returns a new node with left and right branches % reference(node) procedure opNode ( integer value opType; reference(node) value opLeft, opRight ) ; begin node( opType, opLeft, opRight, 0 ) end opNode ;
% returns a new operand node % reference(node) procedure operandNode ( integer value opType, opValue ) ; begin node( opType, null, null, opValue ) end operandNode ;
% reports an error % procedure synError( integer value line, column; string(80) value message ); begin integer errorPos; write( i_w := 1, s_w := 0, "**** Error at(", line, ",", column, "): " ); errorPos := 0; while errorPos < 80 and message( errorPos // 1 ) not = "." do begin writeon( s_w := 0, message( errorPos // 1 ) ); errorPos := errorPos + 1 end while_not_at_end_of_message ; writeon( s_w := 0, "." ) end synError ;
% reports an error and stops % procedure fatalError( integer value line, column; string(80) value message ); begin synError( line, column, message ); assert( false ) end fatalError ;
% prints a node and its sub-nodes % procedure writeNode( reference(node) value n ) ; begin % prints an identifier or string from text % procedure writeOnText( reference(textElement) value txHead; integer value txNumber ) ; begin reference(textElement) txPos; integer count; txPos := txHead; count := 1; while count < txNumber and txPos not = null do begin txPos := next(txPos); count := count + 1 end while_text_element_not_found ; if txPos = null then fatalError( 0, txNumber, "INTERNAL ERROR: text not found." ) else for cPos := 0 until length(txPos) - 1 do writeon( text( start(txPos) + cPos ) ); if text( start(txPos) ) = """" then writeon( """" ); end writeOnText ;
if n = null then write( ";" ) else begin write( ndName( type(n) ) ); if type(n) = nInteger then writeon( iValue(n) ) else if type(n) = nIdentifier then writeOnText( idList, iValue(n) ) else if type(n) = nString then writeOnText( stList, iValue(n) ) else begin writeNode( left(n) ); writeNode( right(n) ) end end end writeNode ;
% reads a token from standard input % procedure readToken ; begin
% parses a string from line and stores it in a string in the text array % % - if it is not already present in the specified textElement list. % % returns the position of the string in the text array % integer procedure readString ( reference(textElement) value result txList; string(1) value terminator ) ; begin string(256) str; integer sLen, sPos, ePos; logical found; reference(textElement) txPos, txLastPos; % get the text of the string % str := " "; sLen := 0; str( sLen // 1 ) := line( lPos // 1 ); sLen := sLen + 1; lPos := lPos + 1; while lPos <= 255 and line( lPos // 1 ) not = terminator do begin str( sLen // 1 ) := line( lPos // 1 ); sLen := sLen + 1; lPos := lPos + 1 end while_more_string ; if lPos > 255 then fatalError( tkLine, tkColumn, "Unterminated String in token file." ); % attempt to find the text in the list of strings/identifiers % txLastPos := txPos := txList; found := false; ePos := 0; while not found and txPos not = null do begin ePos := ePos + 1; found := ( length(txPos) = sLen ); sPos := 0; while found and sPos < sLen do begin found := str( sPos // 1 ) = text( start(txPos) + sPos ); sPos := sPos + 1 end while_not_found ; txLastPos := txPos; if not found then txPos := next(txPos) end while_string_not_found ; if not found then begin % the string/identifier is not in the list - add it % ePos := ePos + 1; if txList = null then txList := textElement( textNext, sLen, null ) else next(txLastPos) := textElement( textNext, sLen, null ); if textNext + sLen > TEXT_MAX then fatalError( tkLine, tkColumn, "Text space exhausted." ) else begin for cPos := 0 until sLen - 1 do begin text( textNext ) := str( cPos // 1 ); textNext := textNext + 1 end for_cPos end end if_not_found ; ePos end readString ;
% gets an integer from the line - no checks for valid digits % integer procedure readInteger ; begin integer n; while line( lPos // 1 ) = " " do lPos := lPos + 1; n := 0; while line( lPos // 1 ) not = " " do begin n := ( n * 10 ) + ( decode( line( lPos // 1 ) ) - decode( "0" ) ); lPos := lPos + 1 end while_not_end_of_integer ; n end readInteger ;
string(256) line; string(16) name; integer lPos, tPos; tPos := lPos := 0; readcard( line ); % get the line and column numbers % tkLine := readInteger; tkColumn := readInteger; % get the token name % while line( lPos // 1 ) = " " do lPos := lPos + 1; name := ""; while lPos < 256 and line( lPos // 1 ) not = " " do begin name( tPos // 1 ) := line( lPos // 1 ); lPos := lPos + 1; tPos := tPos + 1 end while_more_name ; % determine the token type % tkType := 1; tkIntegerValue := 0; while tkType <= MAX_TOKEN_TYPE and name not = tkName( tkType ) do tkType := tkType + 1; if tkType > MAX_TOKEN_TYPE then fatalError( tkLine, tkColumn, "Malformed token" ); % handle the additional parameter for identifier/string/integer % if tkType = tInteger or tkType = tIdentifier or tkType = tString then begin while line( lPos // 1 ) = " " do lPos := lPos + 1; if tkType = tInteger then tkIntegerValue := readInteger else if tkType = tIdentifier then tkIntegerValue := readString( idList, " " ) else % tkType = tString % tkIntegerValue := readString( stList, """" ) end if_token_with_additional_parameter ; end readToken ;
% parses a statement % reference(node) procedure parseStatement ; begin reference(node) stmtNode, stmtExpr;
% skips the current token if it is expectedToken, % % returns true if the token was expectedToken, false otherwise % logical procedure have ( integer value expectedToken ) ; begin logical haveExpectedToken; haveExpectedToken := ( tkType = expectedToken ); if haveExpectedToken and tkType not = tEnd_of_input then readToken; haveExpectedToken end have ;
% issues an error message and skips past the next semi-colon or to end of input % procedure skipStatement ( string(80) value message ) ; begin synError( tkLine, tkColumn, message ); while tkType not = tEnd_of_input and not have( tSemicolon ) do readToken end skipStatement ;
% checks we have a semicolon, issues an error and skips the statement if not % procedure mustBeEndOfStatement ; begin if not have( tSemicolon ) then skipStatement( """;"" expected." ) end mustBeEndOfStatement ;
% skips the current token if it is "(" and issues an error if it isn't % procedure mustBeLeftParen ; begin if not have( tLeftParen ) then synError( tkLine, tkColumn, """("" expected." ) end % mustBeLeftParen % ;
% skips the current token if it is ")" and issues an error if it isn't % procedure mustBeRightParen ; begin if not have( tRightParen ) then synError( tkLine, tkColumn, """)"" expected." ) end % mustBeRightParen % ;
% gets the next token and parses an expression with the specified precedence % reference(node) procedure nextAndparseExpr ( integer value precedence ) ; begin readToken; parseExpr( precedence ) end nextAndParseExpr ;
% parses an expression with the specified precedence % % all operators are assumed to be left-associative % reference(node) procedure parseExpr ( integer value precedence ) ; begin
% handles a single token primary % reference(node) procedure simplePrimary ( integer value primaryNodeType ) ; begin reference(node) primaryNode; primaryNode := operandNode( primaryNodeType, tkIntegerValue ); readToken; primaryNode end simplePrimary ;
reference(node) exprNode;
if precedence < PRIMARY_PREC then begin exprNode := parseExpr( precedence + 1 ); while tkPrec( tkType ) = precedence do begin integer op; op := tkNode( tkType ); exprNode := opNode( op, exprNode, nextAndParseExpr( precedence + 1 ) ) end while_op_at_this_precedence_level end else if tkType = tIdentifier then exprNode := simplePrimary( nIdentifier ) else if tkType = tInteger then exprNode := simplePrimary( nInteger ) else if tkType = nString then begin synError( tkLine, tkColumn, "Unexpected string literal." ); exprNode := simplePrimary( nInteger ) end else if tkType = tLeftParen then exprNode := parseParenExpr else if tkType = tOp_add then exprNode := nextAndParseExpr( precedence ) else if tkType = tOp_subtract then exprNode := opNode( nNegate, nextAndParseExpr( precedence ), null ) else if tkType = tOp_not then exprNode := opNode( nNot, nextAndParseExpr( precedence ), null ) else begin synError( tkLine, tkColumn, "Syntax error in expression." ); exprNode := simplePrimary( nInteger ) end; exprNode end parseExpr ;
% parses a preenthesised expression % reference(node) procedure parseParenExpr ; begin reference(node) exprNode; mustBeLeftParen; exprNode := parseExpr( 0 ); mustBeRightParen; exprNode end parseParenExpr ;
% parse statement depending on it's first token % if tkType = tIdentifier then begin % assignment statement % stmtExpr := operandNode( nIdentifier, tkIntegerValue ); % skip the identifier and check for "=" % readToken; if not have( tOp_Assign ) then synError( tkLine, tkColumn, "Expected ""="" in assignment statement." ); stmtNode := opNode( nAssign, stmtExpr, parseExpr( 0 ) ); mustBeEndOfStatement end else if have( tKeyword_while ) then begin stmtExpr := parseParenExpr; stmtNode := opNode( nWhile, stmtExpr, parseStatement ) end else if have( tkeyword_if ) then begin stmtExpr := parseParenExpr; stmtNode := opNode( nIf, stmtExpr, opNode( nIf, parseStatement, null ) ); if have( tKeyword_else ) then % have an "else" part % right(right(stmtNode)) := parseStatement end else if have( tKeyword_Print ) then begin mustBeLeftParen; stmtNode := null; while begin if tkType = tString then begin stmtNode := opNode( nSequence, stmtNode, opNode( nPrts, operandNode( nString, tkIntegerValue ), null ) ); readToken end else stmtNode := opNode( nSequence, stmtNode, opNode( nPrti, parseExpr( 0 ), null ) ); have( tComma ) end do begin end; mustBeRightparen; mustBeEndOfStatement; end else if have( tKeyword_Putc ) then begin stmtNode := opNode( nPrtc, parseParenExpr, null ); mustBeEndOfStatement end else if have( tLeftBrace ) then begin % block % stmtNode := parseStatementList( tRightBrace ); if not have( tRightBrace ) then synError( tkLine, tkColumn, "Expected ""}""." ); end else if have( tSemicolon ) then stmtNode := null else begin % unrecognised statement % skipStatement( "Unrecognised statement." ); stmtNode := null end if_various_tokens ; stmtNode end parseStatement ;
% parses a statement list ending with the specified terminator % reference(node) procedure parseStatementList ( integer value terminator ) ; begin reference(node) listNode; listNode := null; while tkType not = terminator and tkType not = tEnd_of_input do listNode := opNode( nSequence, listNode, parseStatement ); listNode end parseStatementList ;
nIdentifier := 1; ndName( nIdentifier ) := "Identifier"; nString := 2; ndName( nString ) := "String"; nInteger := 3; ndName( nInteger ) := "Integer"; nSequence := 4; ndName( nSequence ) := "Sequence"; nIf := 5; ndName( nIf ) := "If"; nPrtc := 6; ndName( nPrtc ) := "Prtc"; nPrts := 7; ndName( nPrts ) := "Prts"; nPrti := 8; ndName( nPrti ) := "Prti"; nWhile := 9; ndName( nWhile ) := "While"; nAssign := 10; ndName( nAssign ) := "Assign"; nNegate := 11; ndName( nNegate ) := "Negate"; nNot := 12; ndName( nNot ) := "Not"; nMultiply := 13; ndName( nMultiply ) := "Multiply"; nDivide := 14; ndName( nDivide ) := "Divide"; nMod := 15; ndName( nMod ) := "Mod"; nAdd := 16; ndName( nAdd ) := "Add"; nSubtract := 17; ndName( nSubtract ) := "Subtract"; nLess := 18; ndName( nLess ) := "Less"; nLessEqual := 19; ndName( nLessEqual ) := "LessEqual" ; nGreater := 20; ndName( nGreater ) := "Greater"; nGreaterEqual := 21; ndName( nGreaterEqual ) := "GreaterEqual"; nEqual := 22; ndName( nEqual ) := "Equal"; nNotEqual := 23; ndName( nNotEqual ) := "NotEqual"; nAnd := 24; ndName( nAnd ) := "And"; nOr := 25; ndName( nOr ) := "Or"; tOp_multiply := 1; tkName( tOp_multiply ) := "Op_multiply"; tkPrec( tOp_multiply ) := 5; tOp_divide := 2; tkName( tOp_divide ) := "Op_divide"; tkPrec( tOp_divide ) := 5; tOp_mod := 3; tkName( tOp_mod ) := "Op_mod"; tkPrec( tOp_mod ) := 5; tOp_add := 4; tkName( tOp_add ) := "Op_add"; tkPrec( tOp_add ) := 4; tOp_subtract := 5; tkName( tOp_subtract ) := "Op_subtract"; tkPrec( tOp_subtract ) := 4; tOp_negate := 6; tkName( tOp_negate ) := "Op_negate"; tkPrec( tOp_negate ) := -1; tOp_less := 7; tkName( tOp_less ) := "Op_less"; tkPrec( tOp_less ) := 3; tOp_lessequal := 8; tkName( tOp_lessequal ) := "Op_lessequal"; tkPrec( tOp_lessequal ) := 3; tOp_greater := 9; tkName( tOp_greater ) := "Op_greater"; tkPrec( tOp_greater ) := 3; tOp_greaterequal := 10; tkName( tOp_greaterequal ) := "Op_greaterequal"; tkPrec( tOp_greaterequal ) := 3; tOp_equal := 11; tkName( tOp_equal ) := "Op_equal"; tkPrec( tOp_equal ) := 2; tOp_notequal := 12; tkName( tOp_notequal ) := "Op_notequal"; tkPrec( tOp_notequal ) := 2; tOp_not := 13; tkName( tOp_not ) := "Op_not"; tkPrec( tOp_not ) := -1; tOp_assign := 14; tkName( tOp_assign ) := "Op_assign"; tkPrec( tOp_assign ) := -1; tOp_and := 15; tkName( tOp_and ) := "Op_and"; tkPrec( tOp_and ) := 1; tOp_or := 16; tkName( tOp_or ) := "Op_or"; tkPrec( tOp_or ) := 0; tLeftParen := 17; tkName( tLeftParen ) := "LeftParen"; tkPrec( tLeftParen ) := -1; tRightParen := 18; tkName( tRightParen ) := "RightParen"; tkPrec( tRightParen ) := -1; tLeftBrace := 19; tkName( tLeftBrace ) := "LeftBrace"; tkPrec( tLeftBrace ) := -1; tRightBrace := 20; tkName( tRightBrace ) := "RightBrace"; tkPrec( tRightBrace ) := -1; tSemicolon := 21; tkName( tSemicolon ) := "Semicolon"; tkPrec( tSemicolon ) := -1; tComma := 22; tkName( tComma ) := "Comma"; tkPrec( tComma ) := -1; tKeyword_if := 23; tkName( tKeyword_if ) := "Keyword_if"; tkPrec( tKeyword_if ) := -1; tKeyword_else := 24; tkName( tKeyword_else ) := "Keyword_else"; tkPrec( tKeyword_else ) := -1; tKeyword_while := 25; tkName( tKeyword_while ) := "Keyword_while"; tkPrec( tKeyword_while ) := -1; tKeyword_print := 26; tkName( tKeyword_print ) := "Keyword_print"; tkPrec( tKeyword_print ) := -1; tKeyword_putc := 27; tkName( tKeyword_putc ) := "Keyword_putc"; tkPrec( tKeyword_putc ) := -1; tIdentifier := 28; tkName( tIdentifier ) := "Identifier"; tkPrec( tIdentifier ) := -1; tInteger := 29; tkName( tInteger ) := "Integer"; tkPrec( tInteger ) := -1; tString := 30; tkName( tString ) := "String"; tkPrec( tString ) := -1; tEnd_of_input := 31; tkName( tEnd_of_input ) := "End_of_input"; tkPrec( tEnd_of_input ) := -1; MAX_TOKEN_TYPE := 31; TEXT_MAX := 4095; textNext := 0; PRIMARY_PREC := 6; for tkPos := 1 until MAX_TOKEN_TYPE do tkNode( tkPos ) := - tkPos; tkNode( tOp_multiply ) := nMultiply; tkNode( tOp_divide ) := nDivide; tkNode( tOp_mod ) := nMod; tkNode( tOp_add ) := nAdd; tkNode( tOp_subtract ) := nSubtract; tkNode( tOp_less ) := nLess; tkNode( tOp_lessequal ) := nLessEqual; tkNode( tOp_greater ) := nGreater; tkNode( tOp_greaterequal ) := nGreaterEqual; tkNode( tOp_equal ) := nEqual; tkNode( tOp_notequal ) := nNotEqual; tkNode( tOp_not ) := nNot; tkNode( tOp_and ) := nAnd; tkNode( tOp_or ) := nOr; stList := idList := null;
% parse the output from the lexical analyser and output the linearised parse tree % readToken; writeNode( parseStatementList( tEnd_of_input ) )
end.</lang>
- Output:
Output from parsing the Prime Numbers example program.
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
ATS
<lang ATS>(********************************************************************) (* Usage: parse [INPUTFILE [OUTPUTFILE]]
If INPUTFILE or OUTPUTFILE is "-" or missing, then standard input or standard output is used, respectively. *)
- include "share/atspre_staload.hats"
staload UN = "prelude/SATS/unsafe.sats"
- define NIL list_nil ()
- define :: list_cons
%{^ /* alloca(3) is needed for ATS exceptions. */
- include <alloca.h>
%}
(********************************************************************)
- define NUM_TOKENS 31
- define TOKEN_ELSE 0
- define TOKEN_IF 1
- define TOKEN_PRINT 2
- define TOKEN_PUTC 3
- define TOKEN_WHILE 4
- define TOKEN_MULTIPLY 5
- define TOKEN_DIVIDE 6
- define TOKEN_MOD 7
- define TOKEN_ADD 8
- define TOKEN_SUBTRACT 9
- define TOKEN_NEGATE 10
- define TOKEN_LESS 11
- define TOKEN_LESSEQUAL 12
- define TOKEN_GREATER 13
- define TOKEN_GREATEREQUAL 14
- define TOKEN_EQUAL 15
- define TOKEN_NOTEQUAL 16
- define TOKEN_NOT 17
- define TOKEN_ASSIGN 18
- define TOKEN_AND 19
- define TOKEN_OR 20
- define TOKEN_LEFTPAREN 21
- define TOKEN_RIGHTPAREN 22
- define TOKEN_LEFTBRACE 23
- define TOKEN_RIGHTBRACE 24
- define TOKEN_SEMICOLON 25
- define TOKEN_COMMA 26
- define TOKEN_IDENTIFIER 27
- define TOKEN_INTEGER 28
- define TOKEN_STRING 29
- define TOKEN_END_OF_INPUT 30
typedef token_t =
[i : int | TOKEN_ELSE <= i; i <= TOKEN_END_OF_INPUT] int i
typedef tokentuple_t = (token_t, String, ullint, ullint)
fn token_text (tok : token_t) : String =
case+ tok of | TOKEN_ELSE => "else" | TOKEN_IF => "if" | TOKEN_PRINT => "print" | TOKEN_PUTC => "putc" | TOKEN_WHILE => "while" | TOKEN_MULTIPLY => "*" | TOKEN_DIVIDE => "/" | TOKEN_MOD => "%" | TOKEN_ADD => "+" | TOKEN_SUBTRACT => "-" | TOKEN_NEGATE => "-" | TOKEN_LESS => "<" | TOKEN_LESSEQUAL => "<=" | TOKEN_GREATER => ">" | TOKEN_GREATEREQUAL => ">=" | TOKEN_EQUAL => "==" | TOKEN_NOTEQUAL => "!=" | TOKEN_NOT => "!" | TOKEN_ASSIGN => "=" | TOKEN_AND => "&&" | TOKEN_OR => "||" | TOKEN_LEFTPAREN => "(" | TOKEN_RIGHTPAREN => ")" | TOKEN_LEFTBRACE => "{" | TOKEN_RIGHTBRACE => "}" | TOKEN_SEMICOLON => ";" | TOKEN_COMMA => "," | TOKEN_IDENTIFIER => "Ident" | TOKEN_INTEGER => "Integer literal" | TOKEN_STRING => "String literal" | TOKEN_END_OF_INPUT => "EOI"
(********************************************************************) (* A perfect hash for the lexical token names.
This hash was generated by GNU gperf and then translated to reasonable ATS by hand. Note, though, that one could have embedded the generated C code directly and used it. *)
- define MIN_WORD_LENGTH 5
- define MAX_WORD_LENGTH 15
- define MIN_HASH_VALUE 5
- define MAX_HASH_VALUE 64
- define HASH_TABLE_SIZE 65
local
extern castfn u : {n : nat | n < 256} int n -<> uint8 n
in
vtypedef asso_values_vt = @[[n : nat | n < 256] uint8 n][256]
var asso_values = @[[n : nat | n < 256] uint8 n][256] (u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 10, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 0, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 0, u 65, u 25, u 5, u 5, u 0, u 15, u 65, u 0, u 65, u 65, u 10, u 65, u 30, u 0, u 65, u 5, u 10, u 10, u 0, u 15, u 65, u 65, u 65, u 5, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65)
end
fn get_asso_value {i : nat | i < 256}
(i : uint i) :<> [n : nat | n < 256] uint n = let extern castfn u8ui : {n : nat} uint8 n -<> uint n extern castfn mk_asso_values :<> {p : addr} ptr p -<> (asso_values_vt @ p | ptr p)
val asso_values_tup = mk_asso_values (addr@ asso_values) macdef asso_values = !(asso_values_tup.1) val retval = asso_values[i] val _ = $UN.castvwtp0{void} asso_values_tup in u8ui retval end
fn hash {n : int | MIN_WORD_LENGTH <= n; n <= MAX_WORD_LENGTH}
(str : string n, len : size_t n) :<> [key : nat] uint key = let extern castfn uc2ui : {n : nat} uchar n -<> uint n
val c1 = uc2ui (c2uc str[4]) val c2 = uc2ui (c2uc str[pred len]) in sz2u len + get_asso_value c1 + get_asso_value c2 end
typedef wordlist_vt = @[(String, token_t)][HASH_TABLE_SIZE]
var wordlist =
@[(String, token_t)][HASH_TABLE_SIZE] (("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("Comma", 26), ("Op_not", 17), ("", 0), ("", 0), ("", 0), ("Keyword_if", 1), ("Op_mod", 7), ("End_of_input", 30), ("Keyword_print", 2), ("Op_divide", 6), ("RightBrace", 24), ("Op_add", 8), ("Keyword_else", 0), ("Keyword_while", 4), ("Op_negate", 10), ("Identifier", 27), ("Op_notequal", 16), ("Op_less", 11), ("Op_equal", 15), ("LeftBrace", 23), ("Op_or", 20), ("Op_subtract", 9), ("Op_lessequal", 12), ("", 0), ("", 0), ("Op_greater", 13), ("Op_multiply", 5 ), ("Integer", 28), ("", 0), ("", 0), ("Op_greaterequal", 14), ("", 0), ("Keyword_putc", 3), ("", 0), ("LeftParen", 21), ("RightParen", 22), ("Op_and", 19), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("Op_assign", 18), ("", 0), ("String", 29), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("Semicolon", 25))
fn get_wordlist_entry
{n : nat | n <= MAX_HASH_VALUE} (key : uint n) :<> (String, token_t) = let extern castfn mk_wordlist_tup :<> {p : addr} ptr p -<> (wordlist_vt @ p | ptr p)
val wordlist_tup = mk_wordlist_tup (addr@ wordlist) macdef wordlist = !(wordlist_tup.1) val retval = wordlist[key] val _ = $UN.castvwtp0{void} wordlist_tup in retval end
fn string2token_t_opt
{n : int} (str : string n) :<> Option token_t = let val len = string_length str in if len < i2sz MIN_WORD_LENGTH then None () else if i2sz MAX_WORD_LENGTH < len then None () else let val key = hash (str, len) in if i2u MAX_HASH_VALUE < key then None () else let val (s, tok) = get_wordlist_entry (key) in if str <> s then None () else Some tok end end end
(********************************************************************)
exception bad_lex_integer of (String) exception bad_lex_token_name of (String) exception bad_string_literal of (String)
extern fun {} skip_something$pred : char -<> bool fn {} skip_something {n : nat}
{i : nat | i <= n} (s : string n, n : size_t n, i : size_t i) :<> [j : nat | i <= j; j <= n] size_t j = let fun loop {k : nat | i <= k; k <= n} .<n - k>. (k : size_t k) :<> [j : nat | i <= j; j <= n] size_t j = if k = n then k else if ~(skip_something$pred<> s[k]) then k else loop (succ k) in loop i end
fn skip_space {n : nat}
{i : nat | i <= n} (s : string n, n : size_t n, i : size_t i) :<> [j : nat | i <= j; j <= n] size_t j = let implement skip_something$pred<> (c) = isspace c in skip_something (s, n, i) end
fn skip_nonspace {n : nat}
{i : nat | i <= n} (s : string n, n : size_t n, i : size_t i) :<> [j : nat | i <= j; j <= n] size_t j = let implement skip_something$pred<> (c) = ~isspace c in skip_something (s, n, i) end
fn skip_nonquote {n : nat}
{i : nat | i <= n} (s : string n, n : size_t n, i : size_t i) :<> [j : nat | i <= j; j <= n] size_t j = let implement skip_something$pred<> (c) = c <> '"' in skip_something (s, n, i) end
fn skip_string_literal
{n : nat} {i : nat | i <= n} (s : string n, n : size_t n, i : size_t i) :<> [j : nat | i <= j; j <= n] size_t j = if i = n then i else if s[i] <> '"' then i else let val j = skip_nonquote (s, n, succ i) in if j = n then i else succ j end
fn get_substr {n, i, j : nat | i <= j; j <= n}
(s : string n, i : size_t i, j : size_t j) : [m : int | m == j - i] string m = let val s = string_make_substring (s, i, j - i) in strnptr2string s end
fn string2ullint
{n : nat} (s : string n) : ullint = let val n = string_length s in if n = i2sz 0 then $raise bad_lex_integer ("") else let extern castfn u2ull : uint -<> ullint
fun evaluate {k : nat | k <= n} .<n - k>. (k : size_t k, v : ullint) : ullint = if k = n then v else if ~isdigit s[k] then $raise bad_lex_integer (s) else let val d = char2ui s[k] - char2ui '0' in evaluate (succ k, (10ULL * v) + u2ull d) end in evaluate (i2sz 0, 0ULL) end end
fn string2token {n : int}
(str : string n) : token_t = case+ string2token_t_opt str of | None () => $raise bad_lex_token_name (str) | Some tok => tok
fn read_lex_file (inpf : FILEref) : List0 tokentuple_t =
(* Convert the output of "lex" to a list of tokens. *) (* This routine could stand to do more validation of the input. *) let fun loop (lst : List0 tokentuple_t) : List0 tokentuple_t = if fileref_is_eof inpf then lst else let val s = strptr2string (fileref_get_line_string inpf) val n = string_length s prval _ = lemma_g1uint_param n
val i0_line_no = skip_space (s, n, i2sz 0) in if i0_line_no = n then (* Skip any blank lines, including end of file. *) loop lst else let val i1_line_no = skip_nonspace (s, n, i0_line_no) val s_line_no = get_substr (s, i0_line_no, i1_line_no) val line_no = string2ullint s_line_no
val i0_column_no = skip_space (s, n, i1_line_no) val i1_column_no = skip_nonspace (s, n, i0_column_no) val s_column_no = get_substr (s, i0_column_no, i1_column_no) val column_no = string2ullint s_column_no
val i0_tokname = skip_space (s, n, i1_column_no) val i1_tokname = skip_nonspace (s, n, i0_tokname) val tokname = get_substr (s, i0_tokname, i1_tokname) val tok = string2token tokname in case+ tok of | TOKEN_INTEGER => let val i0 = skip_space (s, n, i1_tokname) val i1 = skip_nonspace (s, n, i0) val arg = get_substr (s, i0, i1) val toktup = (tok, arg, line_no, column_no) in loop (toktup :: lst) end | TOKEN_IDENTIFIER => let val i0 = skip_space (s, n, i1_tokname) val i1 = skip_nonspace (s, n, i0) val arg = get_substr (s, i0, i1) val toktup = (tok, arg, line_no, column_no) in loop (toktup :: lst) end | TOKEN_STRING => let val i0 = skip_space (s, n, i1_tokname) val i1 = skip_string_literal (s, n, i0) val arg = get_substr (s, i0, i1) val toktup = (tok, arg, line_no, column_no) in loop (toktup :: lst) end | _ => let val toktup = (tok, "", line_no, column_no) in loop (toktup :: lst) end end end in list_vt2t (list_reverse (loop NIL)) end
(********************************************************************)
exception truncated_lexical of () exception unexpected_token of (tokentuple_t, token_t) exception unexpected_primary of (tokentuple_t) exception unterminated_statement_block of (ullint, ullint) exception expected_a_statement of (tokentuple_t)
datatype node_t = | node_t_nil of () | node_t_leaf of (String, String) | node_t_cons of (String, node_t, node_t)
fn right_assoc (tok : token_t) : bool =
(* None of the currently supported operators is right associative. *) false
fn binary_op (tok : token_t) : bool =
case+ tok of | TOKEN_ADD => true | TOKEN_SUBTRACT => true | TOKEN_MULTIPLY => true | TOKEN_DIVIDE => true | TOKEN_MOD => true | TOKEN_LESS => true | TOKEN_LESSEQUAL => true | TOKEN_GREATER => true | TOKEN_GREATEREQUAL => true | TOKEN_EQUAL => true | TOKEN_NOTEQUAL => true | TOKEN_AND => true | TOKEN_OR => true | _ => false
fn precedence (tok : token_t) : int =
case+ tok of | TOKEN_MULTIPLY => 13 | TOKEN_DIVIDE => 13 | TOKEN_MOD => 13 | TOKEN_ADD => 12 | TOKEN_SUBTRACT => 12 | TOKEN_NEGATE => 14 | TOKEN_NOT => 14 | TOKEN_LESS => 10 | TOKEN_LESSEQUAL => 10 | TOKEN_GREATER => 10 | TOKEN_GREATEREQUAL => 10 | TOKEN_EQUAL => 9 | TOKEN_NOTEQUAL => 9 | TOKEN_AND => 5 | TOKEN_OR => 4 | _ => ~1
fn opname (tok : token_t) : String =
case- tok of | TOKEN_MULTIPLY => "Multiply" | TOKEN_DIVIDE => "Divide" | TOKEN_MOD => "Mod" | TOKEN_ADD => "Add" | TOKEN_SUBTRACT => "Subtract" | TOKEN_NEGATE => "Negate" | TOKEN_NOT => "Not" | TOKEN_LESS => "Less" | TOKEN_LESSEQUAL => "LessEqual" | TOKEN_GREATER => "Greater" | TOKEN_GREATEREQUAL => "GreaterEqual" | TOKEN_EQUAL => "Equal" | TOKEN_NOTEQUAL => "NotEqual" | TOKEN_AND => "And" | TOKEN_OR => "Or"
fn parse (lex : List0 tokentuple_t) : node_t =
let typedef toktups_t (n : int) = list (tokentuple_t, n) typedef toktups_t = [n : nat] toktups_t n
fn expect (expected : token_t, lex : toktups_t) : toktups_t = case+ lex of | NIL => $raise truncated_lexical () | toktup :: tail => if toktup.0 = expected then tail else $raise unexpected_token (toktup, expected)
fn peek {n : int} (lex : toktups_t n) : [1 <= n] token_t = case+ lex of | NIL => $raise truncated_lexical () | (tok, _, _, _) :: _ => tok
fun stmt (lex : toktups_t) : (node_t, toktups_t) = case+ lex of | NIL => $raise truncated_lexical () | (TOKEN_IF, _, _, _) :: lex => let val (e, lex) = paren_expr lex val (s, lex) = stmt lex in case+ lex of | (TOKEN_ELSE, _, _, _) :: lex => let val (t, lex) = stmt lex in (node_t_cons ("If", e, node_t_cons ("If", s, t)), lex) end | _ => let (* There is no 'else' clause. *) val t = node_t_nil () in (node_t_cons ("If", e, node_t_cons ("If", s, t)), lex) end end | (TOKEN_PUTC, _, _, _) :: lex => let val (subtree, lex) = paren_expr lex val subtree = node_t_cons ("Prtc", subtree, node_t_nil ()) val lex = expect (TOKEN_SEMICOLON, lex) in (subtree, lex) end | (TOKEN_PRINT, _, _, _) :: lex => let val lex = expect (TOKEN_LEFTPAREN, lex) fun loop_over_args (subtree : node_t, lex : toktups_t) : (node_t, toktups_t) = case+ lex of | (TOKEN_STRING, arg, _, _) :: (TOKEN_COMMA, _, _, _) :: lex => let val leaf = node_t_leaf ("String", arg) val e = node_t_cons ("Prts", leaf, node_t_nil ()) in loop_over_args (node_t_cons ("Sequence", subtree, e), lex) end | (TOKEN_STRING, arg, _, _) :: lex => let val lex = expect (TOKEN_RIGHTPAREN, lex) val lex = expect (TOKEN_SEMICOLON, lex) val leaf = node_t_leaf ("String", arg) val e = node_t_cons ("Prts", leaf, node_t_nil ()) in (node_t_cons ("Sequence", subtree, e), lex) end | _ :: _ => let val (x, lex) = expr (0, lex) val e = node_t_cons ("Prti", x, node_t_nil ()) val subtree = node_t_cons ("Sequence", subtree, e) in case+ peek lex of | TOKEN_COMMA => let val lex = expect (TOKEN_COMMA, lex) in loop_over_args (subtree, lex) end | _ => let val lex = expect (TOKEN_RIGHTPAREN, lex) val lex = expect (TOKEN_SEMICOLON, lex) in (subtree, lex) end end | NIL => $raise truncated_lexical () in loop_over_args (node_t_nil (), lex) end | (TOKEN_SEMICOLON, _, _, _) :: lex => (node_t_nil (), lex) | (TOKEN_IDENTIFIER, arg, _, _) :: lex => let val v = node_t_leaf ("Identifier", arg) val lex = expect (TOKEN_ASSIGN, lex) val (subtree, lex) = expr (0, lex) val t = node_t_cons ("Assign", v, subtree) val lex = expect (TOKEN_SEMICOLON, lex) in (t, lex) end | (TOKEN_WHILE, _, _, _) :: lex => let val (e, lex) = paren_expr lex val (t, lex) = stmt lex in (node_t_cons ("While", e, t), lex) end | (TOKEN_LEFTBRACE, _, _, _) :: lex => let fun loop_over_stmts (subtree : node_t, lex : toktups_t) : (node_t, toktups_t) = case+ lex of | (TOKEN_RIGHTBRACE, _, _, _) :: lex => (subtree, lex) | (TOKEN_END_OF_INPUT, _, line_no, column_no) :: _ => $raise unterminated_statement_block (line_no, column_no) | _ => let val (e, lex) = stmt lex in loop_over_stmts (node_t_cons ("Sequence", subtree, e), lex) end in loop_over_stmts (node_t_nil (), lex) end | (TOKEN_END_OF_INPUT, _, _, _) :: lex => (node_t_nil (), lex) | toktup :: _ => $raise expected_a_statement (toktup) and expr (prec : int, lex : toktups_t) : (node_t, toktups_t) = case+ lex of | (TOKEN_LEFTPAREN, _, _, _) :: _ => (* '(' expr ')' *) let val (subtree, lex) = paren_expr lex in prec_climb (prec, subtree, lex) end | (TOKEN_ADD, _, _, _) :: lex => (* '+' expr *) let val (subtree, lex) = expr (precedence TOKEN_ADD, lex) in prec_climb (prec, subtree, lex) end | (TOKEN_SUBTRACT, _, _, _) :: lex => (* '-' expr *) let val (subtree, lex) = expr (precedence TOKEN_NEGATE, lex) val subtree = node_t_cons ("Negate", subtree, node_t_nil ()) in prec_climb (prec, subtree, lex) end | (TOKEN_NOT, _, _, _) :: lex => (* '!' expr *) let val (subtree, lex) = expr (precedence TOKEN_NOT, lex) val subtree = node_t_cons ("Not", subtree, node_t_nil ()) in prec_climb (prec, subtree, lex) end | (TOKEN_IDENTIFIER, arg, _, _) :: lex => let val leaf = node_t_leaf ("Identifier", arg) in prec_climb (prec, leaf, lex) end | (TOKEN_INTEGER, arg, _, _) :: lex => let val leaf = node_t_leaf ("Integer", arg) in prec_climb (prec, leaf, lex) end | toktup :: lex => $raise unexpected_primary (toktup) | NIL => $raise truncated_lexical () and prec_climb (prec : int, subtree : node_t, lex : toktups_t) : (node_t, toktups_t) = case+ peek lex of | tokval => if ~binary_op tokval then (subtree, lex) else if precedence tokval < prec then (subtree, lex) else case+ lex of | toktup :: lex => let val q = if right_assoc (toktup.0) then precedence tokval else succ (precedence tokval)
val (e, lex) = expr (q, lex) val subtree1 = node_t_cons (opname (toktup.0), subtree, e) in prec_climb (prec, subtree1, lex) end and paren_expr (lex : toktups_t) : (node_t, toktups_t) = (* '(' expr ')' *) let val lex = expect (TOKEN_LEFTPAREN, lex) val (subtree, lex) = expr (0, lex) val lex = expect (TOKEN_RIGHTPAREN, lex) in (subtree, lex) end
fun main_loop (subtree : node_t, lex : toktups_t) : node_t = case+ peek lex of | TOKEN_END_OF_INPUT => subtree | _ => let val (x, lex) = stmt lex in main_loop (node_t_cons ("Sequence", subtree, x), lex) end in main_loop (node_t_nil (), lex) end
fn print_ast (outf : FILEref,
ast : node_t) : void = let fun traverse (ast : node_t) : void = case+ ast of | node_t_nil () => fprintln! (outf, ";") | node_t_leaf (str, arg) => fprintln! (outf, str, " ", arg) | node_t_cons (str, left, right) => begin fprintln! (outf, str); traverse left; traverse right end in traverse ast end
(********************************************************************)
fn main_program (inpf : FILEref,
outf : FILEref) : int = let val toklst = read_lex_file inpf val ast = parse toklst val () = print_ast (outf, ast) in 0 end
fn error_start (line_no : ullint,
column_no : ullint) : void = print! ("(", line_no, ", ", column_no, ") error: ")
implement main (argc, argv) =
let val inpfname = if 2 <= argc then $UN.cast{string} argv[1] else "-" val outfname = if 3 <= argc then $UN.cast{string} argv[2] else "-" in try let val inpf = if (inpfname : string) = "-" then stdin_ref else fileref_open_exn (inpfname, file_mode_r)
val outf = if (outfname : string) = "-" then stdout_ref else fileref_open_exn (outfname, file_mode_w) in main_program (inpf, outf) end with | ~ unexpected_primary @(tok, _, line_no, column_no) => begin error_start (line_no, column_no); println! ("Expecting a primary, found: ", token_text tok); 1 end | ~ unexpected_token (@(tok, _, line_no, column_no), expected) => begin error_start (line_no, column_no); println! ("Expecting '", token_text expected, "', found '", token_text tok, "'"); 1 end | ~ expected_a_statement @(tok, _, line_no, column_no) => begin error_start (line_no, column_no); println! ("expecting start of statement, found '", token_text tok, "'"); 1 end | ~ unterminated_statement_block (line_no, column_no) => begin error_start (line_no, column_no); println! ("unterminated statement block"); 1 end | ~ truncated_lexical () => begin println! ("truncated input token stream"); 2 end | ~ bad_lex_integer (s) => begin println! ("bad integer literal in the token stream: '", s, "'"); 2 end | ~ bad_string_literal (s) => begin println! ("bad string literal in the token stream: '", s, "'"); 2 end | ~ bad_lex_token_name (s) => begin println! ("bad token name in the token stream: '", s, "'"); 2 end
end
(********************************************************************)</lang>
- Output:
$ patscc -O2 -DATS_MEMALLOC_GCBDW -o parse parse-in-ATS.dats -latslib -lgc && ./lex compiler-tests/primes.t | ./parse Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
AWK
Tested with gawk 4.1.1 and mawk 1.3.4. <lang AWK> function Token_assign(tk, attr, attr_array, n, i) {
n=split(attr, attr_array) for(i=1; i<=n; i++) Tokens[tk,i-1] = attr_array[i]
}
- show error and exit
function error(msg) {
printf("(%s, %s) %s\n", err_line, err_col, msg) exit(1)
}
function gettok( line, n, i) {
getline line if (line == "") error("empty line") n=split(line, line_list) # line col Ident var_name # 1 2 3 4 err_line = line_list[1] err_col = line_list[2] tok_text = line_list[3] tok = all_syms[tok_text] for (i=5; i<=n; i++) line_list[4] = line_list[4] " " line_list[i] if (tok == "") error("Unknown token " tok_text) tok_other = "" if (tok == "tk_Integer" || tok == "tk_Ident" || tok =="tk_String") tok_other = line_list[4]
}
function make_node(oper, left, right, value) {
node_type [next_free_node_index] = oper node_left [next_free_node_index] = left node_right[next_free_node_index] = right node_value[next_free_node_index] = value return next_free_node_index ++
}
function make_leaf(oper, n) {
return make_node(oper, 0, 0, n)
}
function expect(msg, s) {
if (tok == s) { gettok() return } error(msg ": Expecting '" Tokens[s,TK_NAME] "', found '" Tokens[tok,TK_NAME] "'")
}
function expr(p, x, op, node) {
x = 0 if (tok == "tk_Lparen") { x = paren_expr() } else if (tok == "tk_Sub" || tok == "tk_Add") { if (tok == "tk_Sub") op = "tk_Negate" else op = "tk_Add" gettok() node = expr(Tokens["tk_Negate",TK_PRECEDENCE]+0) if (op == "tk_Negate") x = make_node("nd_Negate", node) else x = node } else if (tok == "tk_Not") { gettok() x = make_node("nd_Not", expr(Tokens["tk_Not",TK_PRECEDENCE]+0)) } else if (tok == "tk_Ident") { x = make_leaf("nd_Ident", tok_other) gettok() } else if (tok == "tk_Integer") { x = make_leaf("nd_Integer", tok_other) gettok() } else { error("Expecting a primary, found: " Tokens[tok,TK_NAME]) } while (((Tokens[tok,TK_IS_BINARY]+0) > 0) && ((Tokens[tok,TK_PRECEDENCE]+0) >= p)) { op = tok gettok() q = Tokens[op,TK_PRECEDENCE]+0 if (! (Tokens[op,TK_RIGHT_ASSOC]+0 > 0)) q += 1 node = expr(q) x = make_node(Tokens[op,TK_NODE], x, node) } return x
}
function paren_expr( node) {
expect("paren_expr", "tk_Lparen") node = expr(0) expect("paren_expr", "tk_Rparen") return node
}
function stmt( t, e, s, s2, v) {
t = 0 if (tok == "tk_If") { gettok() e = paren_expr() s = stmt() s2 = 0 if (tok == "tk_Else") { gettok() s2 = stmt() } t = make_node("nd_If", e, make_node("nd_If", s, s2)) } else if (tok == "tk_Putc") { gettok() e = paren_expr() t = make_node("nd_Prtc", e) expect("Putc", "tk_Semi") } else if (tok == "tk_Print") { gettok() expect("Print", "tk_Lparen") while (1) { if (tok == "tk_String") { e = make_node("nd_Prts", make_leaf("nd_String", tok_other)) gettok() } else { e = make_node("nd_Prti", expr(0)) } t = make_node("nd_Sequence", t, e) if (tok != "tk_Comma") break gettok() } expect("Print", "tk_Rparen") expect("Print", "tk_Semi") } else if (tok == "tk_Semi") { gettok() } else if (tok == "tk_Ident") { v = make_leaf("nd_Ident", tok_other) gettok() expect("assign", "tk_Assign") e = expr(0) t = make_node("nd_Assign", v, e) expect("assign", "tk_Semi") } else if (tok == "tk_While") { gettok() e = paren_expr() s = stmt() t = make_node("nd_While", e, s) } else if (tok == "tk_Lbrace") { gettok() while (tok != "tk_Rbrace" && tok != "tk_EOI") t = make_node("nd_Sequence", t, stmt()) expect("Lbrace", "tk_Rbrace") } else if (tok == "tk_EOI") { } else { error("Expecting start of statement, found: " Tokens[tok,TK_NAME]) } return t
}
function parse( t) {
t = 0 # None gettok() while (1) { t = make_node("nd_Sequence", t, stmt()) if (tok == "tk_EOI" || t == 0) break } return t
}
function prt_ast(t) {
if (t == 0) { print(";") } else { printf("%-14s", Display_nodes[node_type[t]]) if ((node_type[t] == "nd_Ident") || (node_type[t] == "nd_Integer")) printf("%s\n", node_value[t]) else if (node_type[t] == "nd_String") { printf("%s\n", node_value[t]) } else { print("") prt_ast(node_left[t]) prt_ast(node_right[t]) } }
}
BEGIN {
all_syms["End_of_input" ] = "tk_EOI" all_syms["Op_multiply" ] = "tk_Mul" all_syms["Op_divide" ] = "tk_Div" all_syms["Op_mod" ] = "tk_Mod" all_syms["Op_add" ] = "tk_Add" all_syms["Op_subtract" ] = "tk_Sub" all_syms["Op_negate" ] = "tk_Negate" all_syms["Op_not" ] = "tk_Not" all_syms["Op_less" ] = "tk_Lss" all_syms["Op_lessequal" ] = "tk_Leq" all_syms["Op_greater" ] = "tk_Gtr" all_syms["Op_greaterequal" ] = "tk_Geq" all_syms["Op_equal" ] = "tk_Eq" all_syms["Op_notequal" ] = "tk_Neq" all_syms["Op_assign" ] = "tk_Assign" all_syms["Op_and" ] = "tk_And" all_syms["Op_or" ] = "tk_Or" all_syms["Keyword_if" ] = "tk_If" all_syms["Keyword_else" ] = "tk_Else" all_syms["Keyword_while" ] = "tk_While" all_syms["Keyword_print" ] = "tk_Print" all_syms["Keyword_putc" ] = "tk_Putc" all_syms["LeftParen" ] = "tk_Lparen" all_syms["RightParen" ] = "tk_Rparen" all_syms["LeftBrace" ] = "tk_Lbrace" all_syms["RightBrace" ] = "tk_Rbrace" all_syms["Semicolon" ] = "tk_Semi" all_syms["Comma" ] = "tk_Comma" all_syms["Identifier" ] = "tk_Ident" all_syms["Integer" ] = "tk_Integer" all_syms["String" ] = "tk_String"
Display_nodes["nd_Ident" ] = "Identifier" Display_nodes["nd_String" ] = "String" Display_nodes["nd_Integer" ] = "Integer" Display_nodes["nd_Sequence"] = "Sequence" Display_nodes["nd_If" ] = "If" Display_nodes["nd_Prtc" ] = "Prtc" Display_nodes["nd_Prts" ] = "Prts" Display_nodes["nd_Prti" ] = "Prti" Display_nodes["nd_While" ] = "While" Display_nodes["nd_Assign" ] = "Assign" Display_nodes["nd_Negate" ] = "Negate" Display_nodes["nd_Not" ] = "Not" Display_nodes["nd_Mul" ] = "Multiply" Display_nodes["nd_Div" ] = "Divide" Display_nodes["nd_Mod" ] = "Mod" Display_nodes["nd_Add" ] = "Add" Display_nodes["nd_Sub" ] = "Subtract" Display_nodes["nd_Lss" ] = "Less" Display_nodes["nd_Leq" ] = "LessEqual" Display_nodes["nd_Gtr" ] = "Greater" Display_nodes["nd_Geq" ] = "GreaterEqual" Display_nodes["nd_Eql" ] = "Equal" Display_nodes["nd_Neq" ] = "NotEqual" Display_nodes["nd_And" ] = "And" Display_nodes["nd_Or" ] = "Or"
TK_NAME = 0 TK_RIGHT_ASSOC = 1 TK_IS_BINARY = 2 TK_IS_UNARY = 3 TK_PRECEDENCE = 4 TK_NODE = 5 Token_assign("tk_EOI" , "EOI 0 0 0 -1 -1 ") Token_assign("tk_Mul" , "* 0 1 0 13 nd_Mul ") Token_assign("tk_Div" , "/ 0 1 0 13 nd_Div ") Token_assign("tk_Mod" , "% 0 1 0 13 nd_Mod ") Token_assign("tk_Add" , "+ 0 1 0 12 nd_Add ") Token_assign("tk_Sub" , "- 0 1 0 12 nd_Sub ") Token_assign("tk_Negate" , "- 0 0 1 14 nd_Negate ") Token_assign("tk_Not" , "! 0 0 1 14 nd_Not ") Token_assign("tk_Lss" , "< 0 1 0 10 nd_Lss ") Token_assign("tk_Leq" , "<= 0 1 0 10 nd_Leq ") Token_assign("tk_Gtr" , "> 0 1 0 10 nd_Gtr ") Token_assign("tk_Geq" , ">= 0 1 0 10 nd_Geq ") Token_assign("tk_Eql" , "== 0 1 0 9 nd_Eql ") Token_assign("tk_Neq" , "!= 0 1 0 9 nd_Neq ") Token_assign("tk_Assign" , "= 0 0 0 -1 nd_Assign ") Token_assign("tk_And" , "&& 0 1 0 5 nd_And ") Token_assign("tk_Or" , "|| 0 1 0 4 nd_Or ") Token_assign("tk_If" , "if 0 0 0 -1 nd_If ") Token_assign("tk_Else" , "else 0 0 0 -1 -1 ") Token_assign("tk_While" , "while 0 0 0 -1 nd_While ") Token_assign("tk_Print" , "print 0 0 0 -1 -1 ") Token_assign("tk_Putc" , "putc 0 0 0 -1 -1 ") Token_assign("tk_Lparen" , "( 0 0 0 -1 -1 ") Token_assign("tk_Rparen" , ") 0 0 0 -1 -1 ") Token_assign("tk_Lbrace" , "{ 0 0 0 -1 -1 ") Token_assign("tk_Rbrace" , "} 0 0 0 -1 -1 ") Token_assign("tk_Semi" , "; 0 0 0 -1 -1 ") Token_assign("tk_Comma" , ", 0 0 0 -1 -1 ") Token_assign("tk_Ident" , "Ident 0 0 0 -1 nd_Ident ") Token_assign("tk_Integer", "Integer 0 0 0 -1 nd_Integer") Token_assign("tk_String" , "String 0 0 0 -1 nd_String ")
input_file = "-" err_line = 0 err_col = 0 tok = "" tok_text = "" next_free_node_index = 1
if (ARGC > 1) input_file = ARGV[1] t = parse() prt_ast(t)
} </lang>
- Output — count:
Sequence Sequence ; Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence ; Sequence Sequence Sequence ; Prts String "count is: " ; Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1
C
Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra <lang C>#include <stdio.h>
- include <stdlib.h>
- include <string.h>
- include <stdarg.h>
- include <stdbool.h>
- include <ctype.h>
- define NELEMS(arr) (sizeof(arr) / sizeof(arr[0]))
typedef enum {
tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr, tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print, tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident, tk_Integer, tk_String
} TokenType;
typedef enum {
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or
} NodeType;
typedef struct {
TokenType tok; int err_ln; int err_col; char *text; /* ident or string literal or integer value */
} tok_s;
typedef struct Tree {
NodeType node_type; struct Tree *left; struct Tree *right; char *value;
} Tree;
// dependency: Ordered by tok, must remain in same order as TokenType enum struct {
char *text, *enum_text; TokenType tok; bool right_associative, is_binary, is_unary; int precedence; NodeType node_type;
} atr[] = {
{"EOI", "End_of_input" , tk_EOI, false, false, false, -1, -1 }, {"*", "Op_multiply" , tk_Mul, false, true, false, 13, nd_Mul }, {"/", "Op_divide" , tk_Div, false, true, false, 13, nd_Div }, {"%", "Op_mod" , tk_Mod, false, true, false, 13, nd_Mod }, {"+", "Op_add" , tk_Add, false, true, false, 12, nd_Add }, {"-", "Op_subtract" , tk_Sub, false, true, false, 12, nd_Sub }, {"-", "Op_negate" , tk_Negate, false, false, true, 14, nd_Negate }, {"!", "Op_not" , tk_Not, false, false, true, 14, nd_Not }, {"<", "Op_less" , tk_Lss, false, true, false, 10, nd_Lss }, {"<=", "Op_lessequal" , tk_Leq, false, true, false, 10, nd_Leq }, {">", "Op_greater" , tk_Gtr, false, true, false, 10, nd_Gtr }, {">=", "Op_greaterequal", tk_Geq, false, true, false, 10, nd_Geq }, {"==", "Op_equal" , tk_Eql, false, true, false, 9, nd_Eql }, {"!=", "Op_notequal" , tk_Neq, false, true, false, 9, nd_Neq }, {"=", "Op_assign" , tk_Assign, false, false, false, -1, nd_Assign }, {"&&", "Op_and" , tk_And, false, true, false, 5, nd_And }, {"||", "Op_or" , tk_Or, false, true, false, 4, nd_Or }, {"if", "Keyword_if" , tk_If, false, false, false, -1, nd_If }, {"else", "Keyword_else" , tk_Else, false, false, false, -1, -1 }, {"while", "Keyword_while" , tk_While, false, false, false, -1, nd_While }, {"print", "Keyword_print" , tk_Print, false, false, false, -1, -1 }, {"putc", "Keyword_putc" , tk_Putc, false, false, false, -1, -1 }, {"(", "LeftParen" , tk_Lparen, false, false, false, -1, -1 }, {")", "RightParen" , tk_Rparen, false, false, false, -1, -1 }, {"{", "LeftBrace" , tk_Lbrace, false, false, false, -1, -1 }, {"}", "RightBrace" , tk_Rbrace, false, false, false, -1, -1 }, {";", "Semicolon" , tk_Semi, false, false, false, -1, -1 }, {",", "Comma" , tk_Comma, false, false, false, -1, -1 }, {"Ident", "Identifier" , tk_Ident, false, false, false, -1, nd_Ident }, {"Integer literal", "Integer" , tk_Integer, false, false, false, -1, nd_Integer}, {"String literal", "String" , tk_String, false, false, false, -1, nd_String },
};
char *Display_nodes[] = {"Identifier", "String", "Integer", "Sequence", "If", "Prtc",
"Prts", "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual", "And", "Or"};
static tok_s tok; static FILE *source_fp, *dest_fp;
Tree *paren_expr();
void error(int err_line, int err_col, const char *fmt, ... ) {
va_list ap; char buf[1000];
va_start(ap, fmt); vsprintf(buf, fmt, ap); va_end(ap); printf("(%d, %d) error: %s\n", err_line, err_col, buf); exit(1);
}
char *read_line(int *len) {
static char *text = NULL; static int textmax = 0;
for (*len = 0; ; (*len)++) { int ch = fgetc(source_fp); if (ch == EOF || ch == '\n') { if (*len == 0) return NULL; break; } if (*len + 1 >= textmax) { textmax = (textmax == 0 ? 128 : textmax * 2); text = realloc(text, textmax); } text[*len] = ch; } text[*len] = '\0'; return text;
}
char *rtrim(char *text, int *len) { // remove trailing spaces
for (; *len > 0 && isspace(text[*len - 1]); --(*len)) ;
text[*len] = '\0'; return text;
}
TokenType get_enum(const char *name) { // return internal version of name
for (size_t i = 0; i < NELEMS(atr); i++) { if (strcmp(atr[i].enum_text, name) == 0) return atr[i].tok; } error(0, 0, "Unknown token %s\n", name); return 0;
}
tok_s gettok() {
int len; tok_s tok; char *yytext = read_line(&len); yytext = rtrim(yytext, &len);
// [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional
// get line and column tok.err_ln = atoi(strtok(yytext, " ")); tok.err_col = atoi(strtok(NULL, " "));
// get the token name char *name = strtok(NULL, " "); tok.tok = get_enum(name);
// if there is extra data, get it char *p = name + strlen(name); if (p != &yytext[len]) { for (++p; isspace(*p); ++p) ; tok.text = strdup(p); } return tok;
}
Tree *make_node(NodeType node_type, Tree *left, Tree *right) {
Tree *t = calloc(sizeof(Tree), 1); t->node_type = node_type; t->left = left; t->right = right; return t;
}
Tree *make_leaf(NodeType node_type, char *value) {
Tree *t = calloc(sizeof(Tree), 1); t->node_type = node_type; t->value = strdup(value); return t;
}
void expect(const char msg[], TokenType s) {
if (tok.tok == s) { tok = gettok(); return; } error(tok.err_ln, tok.err_col, "%s: Expecting '%s', found '%s'\n", msg, atr[s].text, atr[tok.tok].text);
}
Tree *expr(int p) {
Tree *x = NULL, *node; TokenType op;
switch (tok.tok) { case tk_Lparen: x = paren_expr(); break; case tk_Sub: case tk_Add: op = tok.tok; tok = gettok(); node = expr(atr[tk_Negate].precedence); x = (op == tk_Sub) ? make_node(nd_Negate, node, NULL) : node; break; case tk_Not: tok = gettok(); x = make_node(nd_Not, expr(atr[tk_Not].precedence), NULL); break; case tk_Ident: x = make_leaf(nd_Ident, tok.text); tok = gettok(); break; case tk_Integer: x = make_leaf(nd_Integer, tok.text); tok = gettok(); break; default: error(tok.err_ln, tok.err_col, "Expecting a primary, found: %s\n", atr[tok.tok].text); }
while (atr[tok.tok].is_binary && atr[tok.tok].precedence >= p) { TokenType op = tok.tok;
tok = gettok();
int q = atr[op].precedence; if (!atr[op].right_associative) q++;
node = expr(q); x = make_node(atr[op].node_type, x, node); } return x;
}
Tree *paren_expr() {
expect("paren_expr", tk_Lparen); Tree *t = expr(0); expect("paren_expr", tk_Rparen); return t;
}
Tree *stmt() {
Tree *t = NULL, *v, *e, *s, *s2;
switch (tok.tok) { case tk_If: tok = gettok(); e = paren_expr(); s = stmt(); s2 = NULL; if (tok.tok == tk_Else) { tok = gettok(); s2 = stmt(); } t = make_node(nd_If, e, make_node(nd_If, s, s2)); break; case tk_Putc: tok = gettok(); e = paren_expr(); t = make_node(nd_Prtc, e, NULL); expect("Putc", tk_Semi); break; case tk_Print: /* print '(' expr {',' expr} ')' */ tok = gettok(); for (expect("Print", tk_Lparen); ; expect("Print", tk_Comma)) { if (tok.tok == tk_String) { e = make_node(nd_Prts, make_leaf(nd_String, tok.text), NULL); tok = gettok(); } else e = make_node(nd_Prti, expr(0), NULL);
t = make_node(nd_Sequence, t, e);
if (tok.tok != tk_Comma) break; } expect("Print", tk_Rparen); expect("Print", tk_Semi); break; case tk_Semi: tok = gettok(); break; case tk_Ident: v = make_leaf(nd_Ident, tok.text); tok = gettok(); expect("assign", tk_Assign); e = expr(0); t = make_node(nd_Assign, v, e); expect("assign", tk_Semi); break; case tk_While: tok = gettok(); e = paren_expr(); s = stmt(); t = make_node(nd_While, e, s); break; case tk_Lbrace: /* {stmt} */ for (expect("Lbrace", tk_Lbrace); tok.tok != tk_Rbrace && tok.tok != tk_EOI;) t = make_node(nd_Sequence, t, stmt()); expect("Lbrace", tk_Rbrace); break; case tk_EOI: break; default: error(tok.err_ln, tok.err_col, "expecting start of statement, found '%s'\n", atr[tok.tok].text); } return t;
}
Tree *parse() {
Tree *t = NULL;
tok = gettok(); do { t = make_node(nd_Sequence, t, stmt()); } while (t != NULL && tok.tok != tk_EOI); return t;
}
void prt_ast(Tree *t) {
if (t == NULL) printf(";\n"); else { printf("%-14s ", Display_nodes[t->node_type]); if (t->node_type == nd_Ident || t->node_type == nd_Integer || t->node_type == nd_String) { printf("%s\n", t->value); } else { printf("\n"); prt_ast(t->left); prt_ast(t->right); } }
}
void init_io(FILE **fp, FILE *std, const char mode[], const char fn[]) {
if (fn[0] == '\0') *fp = std; else if ((*fp = fopen(fn, mode)) == NULL) error(0, 0, "Can't open %s\n", fn);
}
int main(int argc, char *argv[]) {
init_io(&source_fp, stdin, "r", argc > 1 ? argv[1] : ""); init_io(&dest_fp, stdout, "wb", argc > 2 ? argv[2] : ""); prt_ast(parse());
}</lang>
- Output — prime numbers AST:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
COBOL
Code by Steve Williams. Tested with GnuCOBOL 2.2.
<lang cobol> >>SOURCE FORMAT IS FREE identification division.
- > this code is dedicated to the public domain
- > (GnuCOBOL) 2.3-dev.0
- > for extra credit, generate this program directly from the EBNF
program-id. parser. environment division. configuration section. repository. function all intrinsic. input-output section. file-control.
select input-file assign using input-name status is input-status organization is line sequential.
data division. file section. fd input-file global. 01 input-record global.
03 input-line pic zzzz9. 03 input-column pic zzzzzz9. 03 filler pic x(3). 03 input-token pic x(16). 03 input-value pic x(48).
working-storage section. 01 program-name pic x(32) value spaces global. 01 input-name pic x(32) value spaces global. 01 input-status pic xx global.
01 line-no pic 999 value 0. 01 col-no pic 99 value 0.
01 error-record global.
03 error-line-no pic zzzz9. 03 error-col-no pic zzzzzz9. 03 filler pic x value space. 03 error-message pic x(64) value spaces.
01 token global.
03 token-type pic x(16). 03 token-line pic 999. 03 token-column pic 99. 03 token-value pic x(48).
01 parse-stack global.
03 p pic 999 value 0. 03 p-lim pic 999 value 200. 03 p-zero pic 999 value 0. 03 parse-entry occurs 200. 05 parse-name pic x(24). 05 parse-token pic x(16). 05 parse-left pic 999. 05 parse-right pic 999. 05 parse-work pic 999. 05 parse-work1 pic 999.
01 abstract-syntax-tree global.
03 t pic 999 value 0. 03 t1 pic 999. 03 t-lim pic 999 value 998. 03 filler occurs 998. 05 leaf. 07 leaf-type pic x(14). 07 leaf-value pic x(48). 05 node redefines leaf. 07 node-type pic x(14). 07 node-left pic 999. 07 node-right pic 999.
01 indent pic x(200) value all '| ' global.
procedure division chaining program-name. start-parser.
if program-name <> spaces string program-name delimited by space '.lex' into input-name open input input-file if input-status <> '00' string 'in parser ' trim(input-name) ' open status ' input-status into error-message call 'reporterror' end-if end-if call 'gettoken' call 'stmt_list' if input-name <> spaces close input-file end-if
call 'printast' using t
>>d perform dump-ast
stop run .
dump-ast.
display '==========' upon syserr display 'ast:' upon syserr display 't=' t upon syserr perform varying t1 from 1 by 1 until t1 > t if leaf-type(t1) = 'Identifier' or 'Integer' or 'String' display t1 space trim(leaf-type(t1)) space trim(leaf-value(t1)) upon syserr else display t1 space node-left(t1) space node-right(t1) space trim(node-type(t1)) upon syserr end-if end-perform .
identification division. program-id. stmt_list common recursive. data division. procedure division. start-stmt_list.
call 'push' using module-id move p-zero to parse-left(p) perform forever call 'stmt' move return-code to parse-right(p) call 'makenode' using 'Sequence' parse-left(p) parse-right(p) move return-code to parse-left(p) if parse-right(p) = 0 or token-type = 'End_of_input' exit perform end-if end-perform call 'pop' .
end program stmt_list.
identification division. program-id. stmt common recursive. procedure division. start-stmt.
call 'push' using module-id move p-zero to parse-left(p) evaluate token-type when 'Semicolon' call 'gettoken' when 'Identifier' *>Identifier '=' expr ';' call 'makeleaf' using 'Identifier' token-value move return-code to parse-left(p) call 'gettoken' call 'expect' using 'Op_assign' call 'expr' move return-code to parse-right(p) call 'expect' using 'Semicolon' call 'makenode' using 'Assign' parse-left(p) parse-right(p) move return-code to parse-left(p) when 'Keyword_while' *>'while' paren_expr '{' stmt '}' call 'gettoken' call 'paren_expr' move return-code to parse-work(p) call 'stmt' move return-code to parse-right(p) call 'makenode' using 'While' parse-work(p) parse-right(p) move return-code to parse-left(p) when 'Keyword_if' *>'if' paren_expr stmt ['else' stmt] call 'gettoken' call 'paren_expr' move return-code to parse-left(p) call 'stmt' move return-code to parse-work(p) move p-zero to parse-work1(p) if token-type = 'Keyword_else' call 'gettoken' call 'stmt' move return-code to parse-work1(p) end-if call 'makenode' using 'If' parse-work(p) parse-work1(p) move return-code to parse-right(p) call 'makenode' using 'If' parse-left(p) parse-right(p) move return-code to parse-left(p) when 'Keyword_print' *>'print' '(' prt_list ')' ';' call 'gettoken' call 'expect' using 'LeftParen' call 'prt_list' move return-code to parse-left(p) call 'expect' using 'RightParen' call 'expect' using 'Semicolon' when 'Keyword_putc' *>'putc' paren_expr ';' call 'gettoken' call 'paren_expr' move return-code to parse-left(p) call 'makenode' using 'Prtc' parse-left(p) p-zero move return-code to parse-left(p) call 'expect' using 'Semicolon' when 'LeftBrace' *>'{' stmt '}' call 'gettoken' move p-zero to parse-left(p) perform until token-type = 'RightBrace' or 'End_of_input' call 'stmt' move return-code to parse-right(p) call 'makenode' using 'Sequence' parse-left(p) parse-right(p) move return-code to parse-left(p) end-perform if token-type <> 'End_of_input' call 'gettoken' end-if when other move 0 to parse-left(p) end-evaluate move parse-left(p) to return-code call 'pop' .
end program stmt.
identification division. program-id. paren_expr common recursive. procedure division. start-paren_expr.
*>'(' expr ')' ; call 'push' using module-id call 'expect' using 'LeftParen' call 'expr' call 'expect' using 'RightParen' call 'pop' .
end program paren_expr.
identification division. program-id. prt_list common. procedure division. start-prt_list.
*>(string | expr) {',' (String | expr)} ; call 'push' using module-id move p-zero to parse-work(p) perform prt_entry perform until token-type <> 'Comma' call 'gettoken' perform prt_entry end-perform call 'pop' exit program .
prt_entry.
if token-type = 'String' call 'makeleaf' using token-type token-value move return-code to parse-left(p) call 'makenode' using 'Prts' parse-left(p) p-zero call 'gettoken' else call 'expr' move return-code to parse-left(p) call 'makenode' using 'Prti' parse-left(p) p-zero end-if move return-code to parse-right(p) call 'makenode' using 'Sequence' parse-work(p) parse-right(p) move return-code to parse-work(p) .
end program prt_list.
identification division. program-id. expr common recursive. procedure division. start-expr.
*>and_expr {'||' and_expr} ; call 'push' using module-id call 'and_expr' move return-code to parse-left(p) perform forever if token-type <> 'Op_or' exit perform end-if call 'gettoken' call 'and_expr' move return-code to parse-right(p) call 'makenode' using 'Or' parse-left(p) parse-right(p) move return-code to parse-left(p) end-perform move parse-left(p) to return-code call 'pop' .
end program expr.
identification division. program-id. and_expr common recursive. procedure division. start-and_expr.
*>equality_expr {'&&' equality_expr} ; call 'push' using module-id call 'equality_expr' move return-code to parse-left(p) perform forever if token-type <> 'Op_and' exit perform end-if call 'gettoken' call 'equality_expr' move return-code to parse-right(p) call 'makenode' using 'And' parse-left(p) parse-right(p) move return-code to parse-left(p) end-perform call 'pop' .
end program and_expr.
identification division. program-id. equality_expr common recursive. procedure division. start-equality_expr.
*>relational_expr [('==' | '!=') relational_expr] ; call 'push' using module-id call 'relational_expr' move return-code to parse-left(p) evaluate token-type when 'Op_equal' move 'Equal' to parse-token(p) when 'Op_notequal' move 'NotEqual' to parse-token(p) end-evaluate if parse-token(p) <> spaces call 'gettoken' call 'relational_expr' move return-code to parse-right(p) call 'makenode' using parse-token(p) parse-left(p) parse-right(p) move return-code to parse-left(p) end-if call 'pop' .
end program equality_expr.
identification division. program-id. relational_expr common recursive. procedure division. start-relational_expr.
*>addition_expr [('<' | '<=' | '>' | '>=') addition_expr] ; call 'push' using module-id call 'addition_expr' move return-code to parse-left(p) evaluate token-type when 'Op_less' move 'Less' to parse-token(p) when 'Op_lessequal' move 'LessEqual' to parse-token(p) when 'Op_greater' move 'Greater' to parse-token(p) when 'Op_greaterequal' move 'GreaterEqual' to parse-token(p) end-evaluate if parse-token(p) <> spaces call 'gettoken' call 'addition_expr' move return-code to parse-right(p) call 'makenode' using parse-token(p) parse-left(p) parse-right(p) move return-code to parse-left(p) end-if call 'pop' .
end program relational_expr.
identification division. program-id. addition_expr common recursive. procedure division. start-addition_expr.
*>multiplication_expr {('+' | '-') multiplication_expr} ; call 'push' using module-id call 'multiplication_expr' move return-code to parse-left(p) perform forever evaluate token-type when 'Op_add' move 'Add' to parse-token(p) when 'Op_subtract' move 'Subtract' to parse-token(p) when other exit perform end-evaluate call 'gettoken' call 'multiplication_expr' move return-code to parse-right(p) call 'makenode' using parse-token(p) parse-left(p) parse-right(p) move return-code to parse-left(p) end-perform call 'pop' .
end program addition_expr.
identification division. program-id. multiplication_expr common recursive. procedure division. start-multiplication_expr.
*>primary {('*' | '/' | '%') primary } ; call 'push' using module-id call 'primary' move return-code to parse-left(p) perform forever evaluate token-type when 'Op_multiply' move 'Multiply' to parse-token(p) when 'Op_divide' move 'Divide' to parse-token(p) when 'Op_mod' move 'Mod' to parse-token(p) when other exit perform end-evaluate call 'gettoken' call 'primary' move return-code to parse-right(p) call 'makenode' using parse-token(p) parse-left(p) parse-right(p) move return-code to parse-left(p) end-perform call 'pop' .
end program multiplication_expr.
identification division. program-id. primary common recursive. procedure division. start-primary.
*> Identifier *>| Integer *>| 'LeftParen' expr 'RightParen' *>| ('+' | '-' | '!') primary *>; call 'push' using module-id evaluate token-type when 'Identifier' call 'makeleaf' using 'Identifier' token-value call 'gettoken' when 'Integer' call 'makeleaf' using 'Integer' token-value call 'gettoken' when 'LeftParen' call 'gettoken' call 'expr' call 'expect' using 'RightParen' move t to return-code when 'Op_add' call 'gettoken' call 'primary' when 'Op_subtract' call 'gettoken' call 'primary' move return-code to parse-left(p) call 'makenode' using 'Negate' parse-left(p) p-zero when 'Op_not' call 'gettoken' call 'primary' move return-code to parse-left(p) call 'makenode' using 'Not' parse-left(p) p-zero when other move 0 to return-code end-evaluate call 'pop' .
end program primary.
program-id. reporterror common. procedure division. start-reporterror. report-error.
move token-line to error-line-no move token-column to error-col-no display error-record upon syserr stop run with error status -1 .
end program reporterror.
identification division. program-id. gettoken common. procedure division. start-gettoken.
if program-name = spaces move '00' to input-status accept input-record on exception move '10' to input-status end-accept else read input-file end-if
evaluate input-status when '00' move input-token to token-type move input-value to token-value move numval(input-line) to token-line move numval(input-column) to token-column >>d display indent(1:min(4 * p,length(indent))) 'new token: ' token-type upon syserr when '10' string 'in parser ' trim(input-name) ' unexpected end of input' into error-message call 'reporterror' when other string 'in parser ' trim(input-name) ' unexpected input-status ' input-status into error-message call 'reporterror' end-evaluate .
end program gettoken.
identification division. program-id. expect common. data division. linkage section. 01 what any length. procedure division using what. start-expect.
if token-type <> what string 'in parser expected ' what ' found ' token-type into error-message call 'reporterror' end-if >>d display indent(1:min(4 * p,length(indent))) 'match: ' token-type upon syserr call 'gettoken' .
end program expect.
identification division. program-id. push common. data division. linkage section. 01 what any length. procedure division using what. start-push.
>>d display indent(1:min(4 * p,length(indent))) 'push ' what upon syserr if p >= p-lim move 'in parser stack overflow' to error-message call 'reporterror' end-if add 1 to p initialize parse-entry(p) move what to parse-name(p) .
end program push.
identification division. program-id. pop common. procedure division. start-pop.
if p < 1 move 'in parser stack underflow' to error-message call 'reporterror' end-if >>d display indent(1:4 * p - 4) 'pop ' parse-name(p) upon syserr subtract 1 from p .
end program pop.
identification division. program-id. makenode common. data division. linkage section. 01 parm-type any length. 01 parm-left pic 999. 01 parm-right pic 999. procedure division using parm-type parm-left parm-right. start-makenode.
if t >= t-lim string 'in parser makenode tree index t exceeds ' t-lim into error-message call 'reporterror' end-if add 1 to t move parm-type to node-type(t) move parm-left to node-left(t) move parm-right to node-right(t) move t to return-code .
end program makenode.
identification division. program-id. makeleaf common. data division. linkage section. 01 parm-type any length. 01 parm-value pic x(48). procedure division using parm-type parm-value. start-makeleaf.
if t >= t-lim string 'in parser makeleaf tree index t exceeds ' t-lim into error-message call 'reporterror' end-if add 1 to t move parm-type to leaf-type(t) move parm-value to leaf-value(t) move t to return-code .
end program makeleaf.
identification division. program-id. printast recursive. data division. linkage section. 01 n pic 999. procedure division using n. start-printast.
if n = 0 display ';' exit program end-if evaluate leaf-type(n) when 'Identifier' when 'Integer' when 'String' display leaf-type(n) trim(leaf-value(n)) when other display node-type(n) call 'printast' using node-left(n) call 'printast' using node-right(n) end-evaluate .
end program printast. end program parser.</lang>
- Output — Primes:
prompt$ ./lexer <testcases/Primes | ./parser Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Forth
Tested with Gforth 0.7.3. <lang Forth>CREATE BUF 0 , \ single-character look-ahead buffer
- PEEK BUF @ 0= IF KEY BUF ! THEN BUF @ ;
- GETC PEEK 0 BUF ! ;
- SPACE? DUP BL = SWAP 9 14 WITHIN OR ;
- >SPACE BEGIN PEEK SPACE? WHILE GETC DROP REPEAT ;
- DIGIT? 48 58 WITHIN ;
- GETINT >SPACE 0
BEGIN PEEK DIGIT? WHILE GETC [CHAR] 0 - SWAP 10 * + REPEAT ;
- GETNAM >SPACE PAD 1+
BEGIN PEEK SPACE? INVERT WHILE GETC OVER C! CHAR+ REPEAT PAD TUCK - 1- PAD C! ;
- GETSTR >SPACE PAD 1+ GETC DROP \ skip leading "
BEGIN GETC DUP [CHAR] " <> WHILE OVER C! CHAR+ REPEAT DROP PAD TUCK - 1- PAD C! ;
- INTERN HERE SWAP DUP C@ 1+ BOUNDS DO I C@ C, LOOP ALIGN ;
CREATE #TK 0 ,
- TK: CREATE #TK @ , 1 #TK +! DOES> @ ;
TK: End_of_input TK: Keyword_if TK: Keyword_else TK: Keyword_while TK: Keyword_print TK: Keyword_putc TK: String TK: Integer TK: Identifier TK: LeftParen TK: RightParen TK: LeftBrace TK: RightBrace TK: Semicolon TK: Comma TK: Op_assign TK: Op_not
- (BINARY?) [ #TK @ ] literal >= ;
TK: Op_subtract TK: Op_add TK: Op_mod TK: Op_multiply TK: Op_divide TK: Op_equal TK: Op_notequal TK: Op_less TK: Op_lessequal TK: Op_greater TK: Op_greaterequal TK: Op_and TK: Op_or CREATE TOKEN 0 , 0 , 0 , 0 ,
- TOKEN-TYPE TOKEN 2 CELLS + @ ;
- TOKEN-VALUE TOKEN 3 CELLS + @ ;
- GETTOK GETINT GETINT TOKEN 2!
GETNAM FIND DROP EXECUTE
DUP Integer = IF GETINT ELSE DUP String = IF GETSTR INTERN ELSE DUP Identifier = IF GETNAM INTERN ELSE 0 THEN THEN THEN TOKEN 3 CELLS + ! TOKEN 2 CELLS + ! ;
- BINARY? TOKEN-TYPE (BINARY?) ;
CREATE PREC #TK @ CELLS ALLOT PREC #TK @ CELLS -1 FILL
- PREC! CELLS PREC + ! ;
14 Op_not PREC! 13 Op_multiply PREC! 13 Op_divide PREC! 13 Op_mod PREC! 12 Op_add PREC! 12 Op_subtract PREC! 10 Op_less PREC! 10 Op_greater PREC! 10 Op_lessequal PREC! 10 Op_greaterequal PREC!
9 Op_equal PREC! 9 Op_notequal PREC! 5 Op_and PREC! 4 Op_or PREC!
- PREC@ CELLS PREC + @ ;
\ Each AST Node is a sequence of cells in data space consisting \ of the execution token of a printing word, followed by that \ node's data. Each printing word receives the address of the \ node's data, and is responsible for printing that data \ appropriately.
DEFER .NODE
- .NULL DROP ." ;" CR ;
CREATE $NULL ' .NULL ,
- .IDENTIFIER ." Identifier " @ COUNT TYPE CR ;
- $IDENTIFIER ( a-addr --) HERE SWAP ['] .IDENTIFIER , , ;
- .INTEGER ." Integer " @ . CR ;
- $INTEGER ( n --) HERE SWAP ['] .INTEGER , , ;
- "TYPE" [CHAR] " EMIT TYPE [CHAR] " EMIT ;
- .STRING ." String " @ COUNT "TYPE" CR ;
- $STRING ( a-addr --) HERE SWAP ['] .STRING , , ;
- .LEAF DUP @ COUNT TYPE CR CELL+ @ .NODE 0 .NULL ;
- LEAF CREATE HERE CELL+ , BL WORD INTERN .
DOES> HERE >R ['] .LEAF , @ , , R> ;
LEAF $PRTC Prtc LEAF $PRTS Prts LEAF $PRTI Prti LEAF $NOT Not LEAF $NEGATE Negate
- .BINARY DUP @ COUNT TYPE CR
CELL+ DUP @ .NODE CELL+ @ .NODE ;
- BINARY CREATE HERE CELL+ , BL WORD INTERN .
DOES> HERE >R ['] .BINARY , @ , SWAP 2, R> ;
BINARY $SEQUENCE Sequence BINARY $ASSIGN Assign BINARY $WHILE While BINARY $IF If BINARY $SUBTRACT Subtract BINARY $ADD Add BINARY $MOD Mod BINARY $MULTIPLY Multiply BINARY $DIVIDE Divide BINARY $LESS Less BINARY $LESSEQUAL LessEqual BINARY $GREATER Greater BINARY $GREATEREQUAL GreaterEqual BINARY $EQUAL Equal BINARY $NOTEQUAL NotEqual BINARY $AND And BINARY $OR Or
- TOK-CONS ( x* -- node-xt) TOKEN-TYPE CASE
Op_subtract OF ['] $SUBTRACT ENDOF Op_add OF ['] $ADD ENDOF op_mod OF ['] $MOD ENDOF op_multiply OF ['] $MULTIPLY ENDOF Op_divide OF ['] $DIVIDE ENDOF Op_equal OF ['] $EQUAL ENDOF Op_notequal OF ['] $NOTEQUAL ENDOF Op_less OF ['] $LESS ENDOF Op_lessequal OF ['] $LESSEQUAL ENDOF Op_greater OF ['] $GREATER ENDOF Op_greaterequal OF ['] $GREATEREQUAL ENDOF Op_and OF ['] $AND ENDOF Op_or OF ['] $OR ENDOF ENDCASE ;
- (.NODE) DUP CELL+ SWAP @ EXECUTE ;
' (.NODE) IS .NODE
- .- ( n --) 0 <# #S #> TYPE ;
- EXPECT ( tk --) DUP TOKEN-TYPE <>
IF CR ." stdin:" TOKEN 2@ SWAP .- ." :" .- ." : unexpected token, expecting " . CR BYE THEN DROP GETTOK ;
- '(' LeftParen EXPECT ;
- ')' RightParen EXPECT ;
- '}' RightBrace EXPECT ;
- ';' Semicolon EXPECT ;
- ',' Comma EXPECT ;
- '=' Op_assign EXPECT ;
DEFER *EXPR DEFER EXPR DEFER STMT
- PAREN-EXPR '(' EXPR ')' ;
- PRIMARY
TOKEN-TYPE LeftParen = IF PAREN-EXPR EXIT THEN TOKEN-TYPE Op_add = IF GETTOK 12 *EXPR EXIT THEN TOKEN-TYPE Op_subtract = IF GETTOK 14 *EXPR $NEGATE EXIT THEN TOKEN-TYPE Op_not = IF GETTOK 14 *EXPR $NOT EXIT THEN TOKEN-TYPE Identifier = IF TOKEN-VALUE $IDENTIFIER ELSE TOKEN-TYPE Integer = IF TOKEN-VALUE $INTEGER THEN THEN GETTOK ;
- (*EXPR) ( n -- node)
PRIMARY ( n node) BEGIN OVER TOKEN-TYPE PREC@ SWAP OVER <= BINARY? AND WHILE ( n node prec) 1+ TOK-CONS SWAP GETTOK *EXPR SWAP EXECUTE REPEAT ( n node prec) DROP NIP ( node) ;
- (EXPR) 0 *EXPR ;
- -)? TOKEN-TYPE RightParen <> ;
- -}? TOKEN-TYPE RightBrace <> ;
- (STMT)
TOKEN-TYPE Semicolon = IF GETTOK STMT EXIT THEN TOKEN-TYPE Keyword_while = IF GETTOK PAREN-EXPR STMT $WHILE EXIT THEN TOKEN-TYPE Keyword_if = IF GETTOK PAREN-EXPR STMT TOKEN-TYPE Keyword_else = IF GETTOK STMT ELSE $NULL THEN $IF $IF EXIT THEN TOKEN-TYPE Keyword_putc = IF GETTOK PAREN-EXPR ';' $PRTC EXIT THEN TOKEN-TYPE Keyword_print = IF GETTOK '(' $NULL BEGIN TOKEN-TYPE String = IF TOKEN-VALUE $STRING $PRTS GETTOK ELSE EXPR $PRTI THEN $SEQUENCE -)? WHILE ',' REPEAT ')' ';' EXIT THEN TOKEN-TYPE Identifier = IF TOKEN-VALUE $IDENTIFIER GETTOK '=' EXPR ';' $ASSIGN EXIT THEN TOKEN-TYPE LeftBrace = IF $NULL GETTOK BEGIN -}? WHILE STMT $SEQUENCE REPEAT '}' EXIT THEN TOKEN-TYPE End_of_input = IF EXIT THEN EXPR ;
' (*EXPR) IS *EXPR ' (EXPR) IS EXPR ' (STMT) IS STMT
- -EOI? TOKEN-TYPE End_of_input <> ;
- PARSE $NULL GETTOK BEGIN -EOI? WHILE STMT $SEQUENCE REPEAT ;
PARSE .NODE</lang>
- Output — Count AST:
Sequence Sequence ; Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence ; Sequence Sequence Sequence ; Prts String "count is: " ; Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1
Fortran
The following is Fortran 2008/2018 code with C preprocessing directives. If you call the program source ‘parse.F90’, with a capital ‘F’, then gfortran will know to run the C preprocessor. <lang fortran>!!! !!! An implementation of the Rosetta Code parser task: !!! https://rosettacode.org/wiki/Compiler/syntax_analyzer !!! !!! The implementation is based on the published pseudocode. !!!
module compiler_type_kinds
use, intrinsic :: iso_fortran_env, only: int32 use, intrinsic :: iso_fortran_env, only: int64
implicit none private
! Synonyms. integer, parameter, public :: size_kind = int64 integer, parameter, public :: length_kind = size_kind integer, parameter, public :: nk = size_kind
! Synonyms for character capable of storing a Unicode code point. integer, parameter, public :: unicode_char_kind = selected_char_kind ('ISO_10646') integer, parameter, public :: ck = unicode_char_kind
! Synonyms for integers capable of storing a Unicode code point. integer, parameter, public :: unicode_ichar_kind = int32 integer, parameter, public :: ick = unicode_ichar_kind
end module compiler_type_kinds
module string_buffers
use, intrinsic :: iso_fortran_env, only: error_unit use, intrinsic :: iso_fortran_env, only: int64 use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
implicit none private
public :: strbuf_t
type :: strbuf_t integer(kind = nk), private :: len = 0 ! ! ‘chars’ is made public for efficient access to the individual ! characters. ! character(1, kind = ck), allocatable, public :: chars(:) contains procedure, pass, private :: ensure_storage => strbuf_t_ensure_storage procedure, pass :: to_unicode_full_string => strbuf_t_to_unicode_full_string procedure, pass :: to_unicode_substring => strbuf_t_to_unicode_substring procedure, pass :: length => strbuf_t_length procedure, pass :: set => strbuf_t_set procedure, pass :: append => strbuf_t_append generic :: to_unicode => to_unicode_full_string generic :: to_unicode => to_unicode_substring generic :: assignment(=) => set end type strbuf_t
contains
function strbuf_t_to_unicode_full_string (strbuf) result (s) class(strbuf_t), intent(in) :: strbuf character(:, kind = ck), allocatable :: s
! ! This does not actually ensure that the string is valid Unicode; ! any 31-bit ‘character’ is supported. !
integer(kind = nk) :: i
allocate (character(len = strbuf%len, kind = ck) :: s) do i = 1, strbuf%len s(i:i) = strbuf%chars(i) end do end function strbuf_t_to_unicode_full_string
function strbuf_t_to_unicode_substring (strbuf, i, j) result (s) ! ! ‘Extreme’ values of i and j are allowed, as shortcuts for ‘from ! the beginning’, ‘up to the end’, or ‘empty substring’. ! class(strbuf_t), intent(in) :: strbuf integer(kind = nk), intent(in) :: i, j character(:, kind = ck), allocatable :: s
! ! This does not actually ensure that the string is valid Unicode; ! any 31-bit ‘character’ is supported. !
integer(kind = nk) :: i1, j1 integer(kind = nk) :: n integer(kind = nk) :: k
i1 = max (1_nk, i) j1 = min (strbuf%len, j) n = max (0_nk, (j1 - i1) + 1_nk)
allocate (character(n, kind = ck) :: s) do k = 1, n s(k:k) = strbuf%chars(i1 + (k - 1_nk)) end do end function strbuf_t_to_unicode_substring
elemental function strbuf_t_length (strbuf) result (n) class(strbuf_t), intent(in) :: strbuf integer(kind = nk) :: n
n = strbuf%len end function strbuf_t_length
elemental function next_power_of_two (x) result (y) integer(kind = nk), intent(in) :: x integer(kind = nk) :: y
! ! It is assumed that no more than 64 bits are used. ! ! The branch-free algorithm is that of ! https://archive.is/nKxAc#RoundUpPowerOf2 ! ! Fill in bits until one less than the desired power of two is ! reached, and then add one. !
y = x - 1 y = ior (y, ishft (y, -1)) y = ior (y, ishft (y, -2)) y = ior (y, ishft (y, -4)) y = ior (y, ishft (y, -8)) y = ior (y, ishft (y, -16)) y = ior (y, ishft (y, -32)) y = y + 1 end function next_power_of_two
elemental function new_storage_size (length_needed) result (size) integer(kind = nk), intent(in) :: length_needed integer(kind = nk) :: size
! Increase storage by orders of magnitude.
if (2_nk**32 < length_needed) then size = huge (1_nk) else size = next_power_of_two (length_needed) end if end function new_storage_size
subroutine strbuf_t_ensure_storage (strbuf, length_needed) class(strbuf_t), intent(inout) :: strbuf integer(kind = nk), intent(in) :: length_needed
integer(kind = nk) :: new_size type(strbuf_t) :: new_strbuf
if (.not. allocated (strbuf%chars)) then ! Initialize a new strbuf%chars array. new_size = new_storage_size (length_needed) allocate (strbuf%chars(1:new_size)) else if (ubound (strbuf%chars, 1) < length_needed) then ! Allocate a new strbuf%chars array, larger than the current ! one, but containing the same characters. new_size = new_storage_size (length_needed) allocate (new_strbuf%chars(1:new_size)) new_strbuf%chars(1:strbuf%len) = strbuf%chars(1:strbuf%len) call move_alloc (new_strbuf%chars, strbuf%chars) end if end subroutine strbuf_t_ensure_storage
subroutine strbuf_t_set (dst, src) class(strbuf_t), intent(inout) :: dst class(*), intent(in) :: src
integer(kind = nk) :: n integer(kind = nk) :: i
select type (src) type is (character(*, kind = ck)) n = len (src, kind = nk) call dst%ensure_storage(n) do i = 1, n dst%chars(i) = src(i:i) end do dst%len = n type is (character(*)) n = len (src, kind = nk) call dst%ensure_storage(n) do i = 1, n dst%chars(i) = src(i:i) end do dst%len = n class is (strbuf_t) n = src%len call dst%ensure_storage(n) dst%chars(1:n) = src%chars(1:n) dst%len = n class default error stop end select end subroutine strbuf_t_set
subroutine strbuf_t_append (dst, src) class(strbuf_t), intent(inout) :: dst class(*), intent(in) :: src
integer(kind = nk) :: n_dst, n_src, n integer(kind = nk) :: i
select type (src) type is (character(*, kind = ck)) n_dst = dst%len n_src = len (src, kind = nk) n = n_dst + n_src call dst%ensure_storage(n) do i = 1, n_src dst%chars(n_dst + i) = src(i:i) end do dst%len = n type is (character(*)) n_dst = dst%len n_src = len (src, kind = nk) n = n_dst + n_src call dst%ensure_storage(n) do i = 1, n_src dst%chars(n_dst + i) = src(i:i) end do dst%len = n class is (strbuf_t) n_dst = dst%len n_src = src%len n = n_dst + n_src call dst%ensure_storage(n) dst%chars((n_dst + 1):n) = src%chars(1:n_src) dst%len = n class default error stop end select end subroutine strbuf_t_append
end module string_buffers
module reading_one_line_from_a_stream
use, intrinsic :: iso_fortran_env, only: input_unit use, intrinsic :: iso_fortran_env, only: error_unit use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick use, non_intrinsic :: string_buffers
implicit none private
! get_line_from_stream: read an entire input line from a stream into ! a strbuf_t. public :: get_line_from_stream
character(1, kind = ck), parameter :: linefeed_char = char (10, kind = ck)
! The following is correct for Unix and its relatives. character(1, kind = ck), parameter :: newline_char = linefeed_char
contains
subroutine get_line_from_stream (unit_no, eof, no_newline, strbuf) integer, intent(in) :: unit_no logical, intent(out) :: eof ! End of file? logical, intent(out) :: no_newline ! There is a line but it has no ! newline? (Thus eof also must ! be .true.) class(strbuf_t), intent(inout) :: strbuf
character(1, kind = ck) :: ch
strbuf = call get_ch (unit_no, eof, ch) do while (.not. eof .and. ch /= newline_char) call strbuf%append (ch) call get_ch (unit_no, eof, ch) end do no_newline = eof .and. (strbuf%length() /= 0) end subroutine get_line_from_stream
subroutine get_ch (unit_no, eof, ch) ! ! Read a single code point from the stream. ! ! Currently this procedure simply inputs ‘ASCII’ bytes rather than ! Unicode code points. ! integer, intent(in) :: unit_no logical, intent(out) :: eof character(1, kind = ck), intent(out) :: ch
integer :: stat character(1) :: c = '*'
eof = .false.
if (unit_no == input_unit) then call get_input_unit_char (c, stat) else read (unit = unit_no, iostat = stat) c end if
if (stat < 0) then ch = ck_'*' eof = .true. else if (0 < stat) then write (error_unit, '("Input error with status code ", I0)') stat stop 1 else ch = char (ichar (c, kind = ick), kind = ck) end if end subroutine get_ch
!!! !!! If you tell gfortran you want -std=f2008 or -std=f2018, you likely !!! will need to add also -fall-intrinsics or -U__GFORTRAN__ !!! !!! The first way, you get the FGETC intrinsic. The latter way, you !!! get the C interface code that uses getchar(3). !!!
- ifdef __GFORTRAN__
subroutine get_input_unit_char (c, stat) ! ! The following works if you are using gfortran. ! ! (FGETC is considered a feature for backwards compatibility with ! g77. However, I know of no way to reconfigure input_unit as a ! Fortran 2003 stream, for use with ordinary ‘read’.) ! character, intent(inout) :: c integer, intent(out) :: stat
call fgetc (input_unit, c, stat) end subroutine get_input_unit_char
- else
subroutine get_input_unit_char (c, stat) ! ! An alternative implementation of get_input_unit_char. This ! actually reads input from the C standard input, which might not ! be the same as input_unit. ! use, intrinsic :: iso_c_binding, only: c_int character, intent(inout) :: c integer, intent(out) :: stat
interface ! ! Use getchar(3) to read characters from standard input. This ! assumes there is actually such a function available, and that ! getchar(3) does not exist solely as a macro. (One could write ! one’s own getchar() if necessary, of course.) ! function getchar () result (c) bind (c, name = 'getchar') use, intrinsic :: iso_c_binding, only: c_int integer(kind = c_int) :: c end function getchar end interface
integer(kind = c_int) :: i_char
i_char = getchar () ! ! The C standard requires that EOF have a negative value. If the ! value returned by getchar(3) is not EOF, then it will be ! representable as an unsigned char. Therefore, to check for end ! of file, one need only test whether i_char is negative. ! if (i_char < 0) then stat = -1 else stat = 0 c = char (i_char) end if end subroutine get_input_unit_char
- endif
end module reading_one_line_from_a_stream
module lexer_token_facts
implicit none private
integer, parameter, public :: tk_EOI = 0 integer, parameter, public :: tk_Mul = 1 integer, parameter, public :: tk_Div = 2 integer, parameter, public :: tk_Mod = 3 integer, parameter, public :: tk_Add = 4 integer, parameter, public :: tk_Sub = 5 integer, parameter, public :: tk_Negate = 6 integer, parameter, public :: tk_Not = 7 integer, parameter, public :: tk_Lss = 8 integer, parameter, public :: tk_Leq = 9 integer, parameter, public :: tk_Gtr = 10 integer, parameter, public :: tk_Geq = 11 integer, parameter, public :: tk_Eq = 12 integer, parameter, public :: tk_Neq = 13 integer, parameter, public :: tk_Assign = 14 integer, parameter, public :: tk_And = 15 integer, parameter, public :: tk_Or = 16 integer, parameter, public :: tk_If = 17 integer, parameter, public :: tk_Else = 18 integer, parameter, public :: tk_While = 19 integer, parameter, public :: tk_Print = 20 integer, parameter, public :: tk_Putc = 21 integer, parameter, public :: tk_Lparen = 22 integer, parameter, public :: tk_Rparen = 23 integer, parameter, public :: tk_Lbrace = 24 integer, parameter, public :: tk_Rbrace = 25 integer, parameter, public :: tk_Semi = 26 integer, parameter, public :: tk_Comma = 27 integer, parameter, public :: tk_Ident = 28 integer, parameter, public :: tk_Integer = 29 integer, parameter, public :: tk_String = 30 integer, parameter, public :: tk_Positive = 31
character(16), parameter, public :: lexer_token_string(0:31) = & (/ "EOI ", & & "* ", & & "/ ", & & "% ", & & "+ ", & & "- ", & & "- ", & & "! ", & & "< ", & & "<= ", & & "> ", & & ">= ", & & "== ", & & "!= ", & & "= ", & & "&& ", & & "|| ", & & "if ", & & "else ", & & "while ", & & "print ", & & "putc ", & & "( ", & & ") ", & & "{ ", & & "} ", & & "; ", & & ", ", & & "Ident ", & & "Integer literal ", & & "String literal ", & & "+ " /)
integer, parameter, public :: lexer_token_arity(0:31) = & & (/ -1, & ! EOI & 2, 2, 2, 2, 2, & ! * / % + - & 1, 1, & ! negate ! & 2, 2, 2, 2, 2, 2, & ! < <= > >= == != & -1, & ! = & 2, 2, & ! && || & -1, -1, -1, -1, -1, & ! & -1, -1, -1, -1, -1, & ! & -1, -1, -1, -1, & ! & 1 /) ! positive
integer, parameter, public :: lexer_token_precedence(0:31) = & & (/ -1, & ! EOI & 13, 13, 13, & ! * / % & 12, 12, & ! + - & 14, 14, & ! negate ! & 10, 10, 10, 10, & ! < <= > >= & 9, 9, & ! == != & -1, & ! = & 5, & ! && & 4, & ! || & -1, -1, -1, -1, -1, & ! & -1, -1, -1, -1, -1, & ! & -1, -1, -1, -1, & ! & 14 /) ! positive
integer, parameter, public :: left_associative = 0 integer, parameter, public :: right_associative = 1
! All current operators are left associative. (The values in the ! array for things that are not operators are unimportant.) integer, parameter, public :: lexer_token_associativity(0:31) = left_associative
end module lexer_token_facts
module reading_of_lexer_tokens
use, intrinsic :: iso_fortran_env, only: error_unit use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick use, non_intrinsic :: string_buffers use, non_intrinsic :: reading_one_line_from_a_stream use, non_intrinsic :: lexer_token_facts
implicit none private
public :: lexer_token_t public :: get_lexer_token
character(1, kind = ck), parameter :: horizontal_tab_char = char (9, kind = ck) character(1, kind = ck), parameter :: linefeed_char = char (10, kind = ck) character(1, kind = ck), parameter :: vertical_tab_char = char (11, kind = ck) character(1, kind = ck), parameter :: formfeed_char = char (12, kind = ck) character(1, kind = ck), parameter :: carriage_return_char = char (13, kind = ck) character(1, kind = ck), parameter :: space_char = ck_' '
type :: lexer_token_t integer :: token_no = -(huge (1)) character(:, kind = ck), allocatable :: val integer(nk) :: line_no = -(huge (1_nk)) integer(nk) :: column_no = -(huge (1_nk)) end type lexer_token_t
contains
subroutine get_lexer_token (unit_no, lex_line_no, eof, token) ! ! Lines that are empty or contain only whitespace are tolerated. ! ! Also tolerated are comment lines, whose first character is a ! '!'. It is convenient for debugging to be able to comment out ! lines. ! ! A last line be without a newline is *not* tolerated, unless it ! contains only whitespace. ! ! Letting there be some whitespace is partly for the sake of ! reading cut-and-paste from a browser display. ! integer, intent(in) :: unit_no integer(kind = nk), intent(inout) :: lex_line_no logical, intent(out) :: eof type(lexer_token_t), intent(out) :: token
type(strbuf_t) :: strbuf logical :: no_newline logical :: input_found
! Let a negative setting initialize the line number. lex_line_no = max (0_nk, lex_line_no)
strbuf = eof = .false. input_found = .false. do while (.not. eof .and. .not. input_found) call get_line_from_stream (unit_no, eof, no_newline, strbuf) if (eof) then if (no_newline) then lex_line_no = lex_line_no + 1 if (.not. strbuf_is_all_whitespace (strbuf)) then call start_error_message (lex_line_no) write (error_unit, '("lexer line ends without a newline")') stop 1 end if end if else lex_line_no = lex_line_no + 1 input_found = .true. if (strbuf_is_all_whitespace (strbuf)) then ! A blank line. input_found = .false. else if (0 < strbuf%length()) then if (strbuf%chars(1) == ck_'!') then ! A comment line. input_found = .false. end if end if end if end do
token = lexer_token_t () if (.not. eof) then token = strbuf_to_token (lex_line_no, strbuf) end if end subroutine get_lexer_token
function strbuf_to_token (lex_line_no, strbuf) result (token) integer(kind = nk), intent(in) :: lex_line_no class(strbuf_t), intent(in) :: strbuf type(lexer_token_t) :: token
character(:, kind = ck), allocatable :: line_no character(:, kind = ck), allocatable :: column_no character(:, kind = ck), allocatable :: token_name character(:, kind = ck), allocatable :: val_string integer :: stat integer(kind = nk) :: n
call split_line (lex_line_no, strbuf, line_no, column_no, token_name, val_string)
read (line_no, *, iostat = stat) token%line_no if (stat /= 0) then call start_error_message (lex_line_no) write (error_unit, '("line number field is unreadable or too large")') stop 1 end if
read (column_no, *, iostat = stat) token%column_no if (stat /= 0) then call start_error_message (lex_line_no) write (error_unit, '("column number field is unreadable or too large")') stop 1 end if
token%token_no = token_name_to_token_no (lex_line_no, token_name)
select case (token%token_no) case (tk_Ident) ! I do no checking of identifier names. allocate (token%val, source = val_string) case (tk_Integer) call check_is_all_digits (lex_line_no, val_string) allocate (token%val, source = val_string) case (tk_String) n = len (val_string, kind = nk) if (n < 2) then call string_literal_missing_or_no_good else if (val_string(1:1) /= ck_'"' .or. val_string(n:n) /= ck_'"') then call string_literal_missing_or_no_good else allocate (token%val, source = val_string) end if case default if (len (val_string, kind = nk) /= 0) then call start_error_message (lex_line_no) write (error_unit, '("token should not have a value")') stop 1 end if end select
contains
subroutine string_literal_missing_or_no_good call start_error_message (lex_line_no) write (error_unit, '("""String"" token requires a string literal")') stop 1 end subroutine string_literal_missing_or_no_good
end function strbuf_to_token
subroutine split_line (lex_line_no, strbuf, line_no, column_no, token_name, val_string) integer(kind = nk), intent(in) :: lex_line_no class(strbuf_t), intent(in) :: strbuf character(:, kind = ck), allocatable, intent(out) :: line_no character(:, kind = ck), allocatable, intent(out) :: column_no character(:, kind = ck), allocatable, intent(out) :: token_name character(:, kind = ck), allocatable, intent(out) :: val_string
integer(kind = nk) :: i, j
i = skip_whitespace (strbuf, 1_nk) j = skip_non_whitespace (strbuf, i) line_no = strbuf%to_unicode(i, j - 1) call check_is_all_digits (lex_line_no, line_no)
i = skip_whitespace (strbuf, j) j = skip_non_whitespace (strbuf, i) column_no = strbuf%to_unicode(i, j - 1) call check_is_all_digits (lex_line_no, column_no)
i = skip_whitespace (strbuf, j) j = skip_non_whitespace (strbuf, i) token_name = strbuf%to_unicode(i, j - 1)
i = skip_whitespace (strbuf, j) if (strbuf%length() < i) then val_string = ck_ else if (strbuf%chars(i) == ck_'"') then j = skip_whitespace_backwards (strbuf, strbuf%length()) if (strbuf%chars(j) == ck_'"') then val_string = strbuf%to_unicode(i, j) else call start_error_message (lex_line_no) write (error_unit, '("string literal does not end in a double quote")') stop 1 end if else j = skip_non_whitespace (strbuf, i) val_string = strbuf%to_unicode(i, j - 1) i = skip_whitespace (strbuf, j) if (i <= strbuf%length()) then call start_error_message (lex_line_no) write (error_unit, '("token line contains unexpected text")') stop 1 end if end if end subroutine split_line
function token_name_to_token_no (lex_line_no, token_name) result (token_no) integer(kind = nk), intent(in) :: lex_line_no character(*, kind = ck), intent(in) :: token_name integer :: token_no
!! !! This implementation is not optimized in any way, unless the !! Fortran compiler can optimize the SELECT CASE. !!
select case (token_name) case (ck_"End_of_input") token_no = tk_EOI case (ck_"Op_multiply") token_no = tk_Mul case (ck_"Op_divide") token_no = tk_Div case (ck_"Op_mod") token_no = tk_Mod case (ck_"Op_add") token_no = tk_Add case (ck_"Op_subtract") token_no = tk_Sub case (ck_"Op_negate") token_no = tk_Negate case (ck_"Op_not") token_no = tk_Not case (ck_"Op_less") token_no = tk_Lss case (ck_"Op_lessequal ") token_no = tk_Leq case (ck_"Op_greater") token_no = tk_Gtr case (ck_"Op_greaterequal") token_no = tk_Geq case (ck_"Op_equal") token_no = tk_Eq case (ck_"Op_notequal") token_no = tk_Neq case (ck_"Op_assign") token_no = tk_Assign case (ck_"Op_and") token_no = tk_And case (ck_"Op_or") token_no = tk_Or case (ck_"Keyword_if") token_no = tk_If case (ck_"Keyword_else") token_no = tk_Else case (ck_"Keyword_while") token_no = tk_While case (ck_"Keyword_print") token_no = tk_Print case (ck_"Keyword_putc") token_no = tk_Putc case (ck_"LeftParen") token_no = tk_Lparen case (ck_"RightParen") token_no = tk_Rparen case (ck_"LeftBrace") token_no = tk_Lbrace case (ck_"RightBrace") token_no = tk_Rbrace case (ck_"Semicolon") token_no = tk_Semi case (ck_"Comma") token_no = tk_Comma case (ck_"Identifier") token_no = tk_Ident case (ck_"Integer") token_no = tk_Integer case (ck_"String") token_no = tk_String case default call start_error_message (lex_line_no) write (error_unit, '("unrecognized token name: ", A)') token_name stop 1 end select end function token_name_to_token_no
function skip_whitespace (strbuf, i) result (j) class(strbuf_t), intent(in) :: strbuf integer(kind = nk), intent(in) :: i integer(kind = nk) :: j
logical :: done
j = i done = .false. do while (.not. done) if (at_end_of_line (strbuf, j)) then done = .true. else if (.not. isspace (strbuf%chars(j))) then done = .true. else j = j + 1 end if end do end function skip_whitespace
function skip_non_whitespace (strbuf, i) result (j) class(strbuf_t), intent(in) :: strbuf integer(kind = nk), intent(in) :: i integer(kind = nk) :: j
logical :: done
j = i done = .false. do while (.not. done) if (at_end_of_line (strbuf, j)) then done = .true. else if (isspace (strbuf%chars(j))) then done = .true. else j = j + 1 end if end do end function skip_non_whitespace
function skip_whitespace_backwards (strbuf, i) result (j) class(strbuf_t), intent(in) :: strbuf integer(kind = nk), intent(in) :: i integer(kind = nk) :: j
logical :: done
j = i done = .false. do while (.not. done) if (j == -1) then done = .true. else if (.not. isspace (strbuf%chars(j))) then done = .true. else j = j - 1 end if end do end function skip_whitespace_backwards
function at_end_of_line (strbuf, i) result (bool) class(strbuf_t), intent(in) :: strbuf integer(kind = nk), intent(in) :: i logical :: bool
bool = (strbuf%length() < i) end function at_end_of_line
elemental function strbuf_is_all_whitespace (strbuf) result (bool) class(strbuf_t), intent(in) :: strbuf logical :: bool
integer(kind = nk) :: n integer(kind = nk) :: i
n = strbuf%length() if (n == 0) then bool = .true. else i = 1 bool = .true. do while (bool .and. i /= n + 1) bool = isspace (strbuf%chars(i)) i = i + 1 end do end if end function strbuf_is_all_whitespace
elemental function isspace (ch) result (bool) character(1, kind = ck), intent(in) :: ch logical :: bool
bool = (ch == horizontal_tab_char) .or. & & (ch == linefeed_char) .or. & & (ch == vertical_tab_char) .or. & & (ch == formfeed_char) .or. & & (ch == carriage_return_char) .or. & & (ch == space_char) end function isspace
elemental function isdigit (ch) result (bool) character(1, kind = ck), intent(in) :: ch logical :: bool
integer(kind = ick), parameter :: zero = ichar (ck_'0', kind = ick) integer(kind = ick), parameter :: nine = ichar (ck_'9', kind = ick)
integer(kind = ick) :: i_ch
i_ch = ichar (ch, kind = ick) bool = (zero <= i_ch .and. i_ch <= nine) end function isdigit
subroutine check_is_all_digits (lex_line_no, str) integer(kind = nk), intent(in) :: lex_line_no character(*, kind = ck), intent(in) :: str
integer(kind = nk) :: n integer(kind = nk) :: i
n = len (str, kind = nk) if (n == 0_nk) then call start_error_message (lex_line_no) write (error_unit, '("a required field is missing")') stop 1 else do i = 1, n if (.not. isdigit (str(i:i))) then call start_error_message (lex_line_no) write (error_unit, '("a numeric field contains a non-digit")') stop 1 end if end do end if end subroutine check_is_all_digits
subroutine start_error_message (lex_line_no) integer(kind = nk), intent(in) :: lex_line_no
write (error_unit, '("Token stream error at line ", I0, ": ")', advance = 'no') & & lex_line_no end subroutine start_error_message
end module reading_of_lexer_tokens
module syntactic_analysis
use, intrinsic :: iso_fortran_env, only: input_unit use, intrinsic :: iso_fortran_env, only: output_unit use, intrinsic :: iso_fortran_env, only: error_unit use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick use, non_intrinsic :: string_buffers use, non_intrinsic :: lexer_token_facts use, non_intrinsic :: reading_of_lexer_tokens
implicit none private
public :: ast_node_t public :: ast_t public :: parse_token_stream public :: output_ast_flattened
integer, parameter, public :: tk_start_of_statement = -1 integer, parameter, public :: tk_primary = -2
integer, parameter :: node_Identifier = 1 integer, parameter :: node_String = 2 integer, parameter :: node_Integer = 3 integer, parameter :: node_Sequence = 4 integer, parameter :: node_If = 5 integer, parameter :: node_Prtc = 6 integer, parameter :: node_Prts = 7 integer, parameter :: node_Prti = 8 integer, parameter :: node_While = 9 integer, parameter :: node_Assign = 10 integer, parameter :: node_Negate = 11 integer, parameter :: node_Not = 12 integer, parameter :: node_Multiply = 13 integer, parameter :: node_Divide = 14 integer, parameter :: node_Mod = 15 integer, parameter :: node_Add = 16 integer, parameter :: node_Subtract = 17 integer, parameter :: node_Less = 18 integer, parameter :: node_LessEqual = 19 integer, parameter :: node_Greater = 20 integer, parameter :: node_GreaterEqual = 21 integer, parameter :: node_Equal = 22 integer, parameter :: node_NotEqual = 23 integer, parameter :: node_And = 24 integer, parameter :: node_Or = 25
character(16), parameter :: node_variety_string(1:25) = & (/ "Identifier ", & & "String ", & & "Integer ", & & "Sequence ", & & "If ", & & "Prtc ", & & "Prts ", & & "Prti ", & & "While ", & & "Assign ", & & "Negate ", & & "Not ", & & "Multiply ", & & "Divide ", & & "Mod ", & & "Add ", & & "Subtract ", & & "Less ", & & "LessEqual ", & & "Greater ", & & "GreaterEqual ", & & "Equal ", & & "NotEqual ", & & "And ", & & "Or " /)
type :: ast_node_t integer :: node_variety character(:, kind = ck), allocatable :: val type(ast_node_t), pointer :: left => null () type(ast_node_t), pointer :: right => null () contains procedure, pass :: assign => ast_node_t_assign generic :: assignment(=) => assign final :: ast_node_t_finalize end type ast_node_t
! ast_t phases. integer, parameter :: building = 1 integer, parameter :: completed = 2
type :: ast_t ! ! This type is used to build the subtrees, as well as for the ! completed AST. The difference is in the setting of ‘phase’. ! type(ast_node_t), pointer :: node => null () integer, private :: phase = building contains procedure, pass :: assign => ast_t_assign generic :: assignment(=) => assign final :: ast_t_finalize end type ast_t
type(ast_t), parameter :: ast_nil = ast_t (null ())
contains
recursive subroutine ast_node_t_assign (node, other) class(ast_node_t), intent(out) :: node class(*), intent(in) :: other
select type (other) class is (ast_node_t) node%node_variety = other%node_variety if (allocated (other%val)) allocate (node%val, source = other%val) if (associated (other%left)) allocate (node%left, source = other%left) if (associated (other%right)) allocate (node%right, source = other%right) class default ! This branch should never be reached. error stop end select end subroutine ast_node_t_assign
recursive subroutine ast_node_t_finalize (node) type(ast_node_t), intent(inout) :: node
if (associated (node%left)) deallocate (node%left) if (associated (node%right)) deallocate (node%right) end subroutine ast_node_t_finalize
recursive subroutine ast_t_assign (ast, other) class(ast_t), intent(out) :: ast class(*), intent(in) :: other
select type (other) class is (ast_t) if (associated (other%node)) allocate (ast%node, source = other%node) ! ! Whether it is better to set phase to ‘building’ or to set it ! to ‘other%phase’ is unclear to me. Probably ‘building’ is the ! better choice. Which variable controls memory recovery is ! clear and unchanging, in that case: it is the original, ! ‘other’, that does. ! ast%phase = building class default ! This should not happen. error stop end select end subroutine ast_t_assign
subroutine ast_t_finalize (ast) type(ast_t), intent(inout) :: ast
! ! When we are building the tree, the tree’s nodes should not be ! deallocated when the ast_t variable temporarily holding them ! goes out of scope. ! ! However, once the AST is completed, we do want the memory ! recovered when the variable goes out of scope. ! ! (Elsewhere I have written a primitive garbage collector for ! Fortran programs, but in this case it would be a lot of overhead ! for little gain. In fact, we could reasonably just let the ! memory leak, in this program. ! ! Fortran runtimes *are* allowed by the standard to have garbage ! collectors built in. To my knowledge, at the time of this ! writing, only NAG Fortran has a garbage collector option.) !
if (ast%phase == completed) then if (associated (ast%node)) deallocate (ast%node) end if end subroutine ast_t_finalize
function parse_token_stream (unit_no) result (ast) integer, intent(in) :: unit_no type(ast_t) :: ast
integer(kind = nk) :: lex_line_no type(ast_t) :: statement type(lexer_token_t) :: token
lex_line_no = -1_nk call get_token (unit_no, lex_line_no, token) call parse_statement (unit_no, lex_line_no, token, statement) ast = make_internal_node (node_Sequence, ast, statement) do while (token%token_no /= tk_EOI) call parse_statement (unit_no, lex_line_no, token, statement) ast = make_internal_node (node_Sequence, ast, statement) end do ast%phase = completed end function parse_token_stream
recursive subroutine parse_statement (unit_no, lex_line_no, token, ast) integer, intent(in) :: unit_no integer(kind = nk), intent(inout) :: lex_line_no type(lexer_token_t), intent(inout) :: token type(ast_t), intent(out) :: ast
ast = ast_nil
select case (token%token_no) case (tk_If) call parse_ifelse_construct case (tk_Putc) call parse_putc case (tk_Print) call parse_print case (tk_Semi) call get_token (unit_no, lex_line_no, token) case (tk_Ident) call parse_identifier case (tk_While) call parse_while_construct case (tk_Lbrace) call parse_lbrace_construct case (tk_EOI) continue case default call syntax_error_message ("", tk_start_of_statement, token) stop 1 end select
contains
recursive subroutine parse_ifelse_construct type(ast_t) :: predicate type(ast_t) :: statement_for_predicate_true type(ast_t) :: statement_for_predicate_false
call expect_token ("If", tk_If, token) call get_token (unit_no, lex_line_no, token) call parse_parenthesized_expression (unit_no, lex_line_no, token, predicate) call parse_statement (unit_no, lex_line_no, token, statement_for_predicate_true) if (token%token_no == tk_Else) then call get_token (unit_no, lex_line_no, token) call parse_statement (unit_no, lex_line_no, token, statement_for_predicate_false) ast = make_internal_node (node_If, statement_for_predicate_true, & & statement_for_predicate_false) else ast = make_internal_node (node_If, statement_for_predicate_true, ast_nil) end if ast = make_internal_node (node_If, predicate, ast) end subroutine parse_ifelse_construct
recursive subroutine parse_putc type(ast_t) :: arguments
call expect_token ("Putc", tk_Putc, token) call get_token (unit_no, lex_line_no, token) call parse_parenthesized_expression (unit_no, lex_line_no, token, arguments) ast = make_internal_node (node_Prtc, arguments, ast_nil) call expect_token ("Putc", tk_Semi, token) call get_token (unit_no, lex_line_no, token) end subroutine parse_putc
recursive subroutine parse_print logical :: done type(ast_t) :: arg type(ast_t) :: printer
call expect_token ("Print", tk_Print, token) call get_token (unit_no, lex_line_no, token) call expect_token ("Print", tk_Lparen, token) done = .false. do while (.not. done) call get_token (unit_no, lex_line_no, token) select case (token%token_no) case (tk_String) arg = make_leaf_node (node_String, token%val) printer = make_internal_node (node_Prts, arg, ast_nil) call get_token (unit_no, lex_line_no, token) case default call parse_expression (unit_no, 0, lex_line_no, token, arg) printer = make_internal_node (node_Prti, arg, ast_nil) end select ast = make_internal_node (node_Sequence, ast, printer) done = (token%token_no /= tk_Comma) end do call expect_token ("Print", tk_Rparen, token) call get_token (unit_no, lex_line_no, token) call expect_token ("Print", tk_Semi, token) call get_token (unit_no, lex_line_no, token) end subroutine parse_print
recursive subroutine parse_identifier type(ast_t) :: left_side type(ast_t) :: right_side
left_side = make_leaf_node (node_Identifier, token%val) call get_token (unit_no, lex_line_no, token) call expect_token ("assign", tk_Assign, token) call get_token (unit_no, lex_line_no, token) call parse_expression (unit_no, 0, lex_line_no, token, right_side) ast = make_internal_node (node_Assign, left_side, right_side) call expect_token ("assign", tk_Semi, token) call get_token (unit_no, lex_line_no, token) end subroutine parse_identifier
recursive subroutine parse_while_construct type(ast_t) :: predicate type(ast_t) :: statement_to_be_repeated
call expect_token ("While", tk_While, token) call get_token (unit_no, lex_line_no, token) call parse_parenthesized_expression (unit_no, lex_line_no, token, predicate) call parse_statement (unit_no, lex_line_no, token, statement_to_be_repeated) ast = make_internal_node (node_While, predicate, statement_to_be_repeated) end subroutine parse_while_construct
recursive subroutine parse_lbrace_construct type(ast_t) :: statement
call expect_token ("Lbrace", tk_Lbrace, token) call get_token (unit_no, lex_line_no, token) do while (token%token_no /= tk_Rbrace .and. token%token_no /= tk_EOI) call parse_statement (unit_no, lex_line_no, token, statement) ast = make_internal_node (node_Sequence, ast, statement) end do call expect_token ("Lbrace", tk_Rbrace, token) call get_token (unit_no, lex_line_no, token) end subroutine parse_lbrace_construct
end subroutine parse_statement
recursive subroutine parse_expression (unit_no, p, lex_line_no, token, ast) integer, intent(in) :: unit_no integer, intent(in) :: p integer(kind = nk), intent(inout) :: lex_line_no type(lexer_token_t), intent(inout) :: token type(ast_t), intent(out) :: ast
integer :: precedence type(ast_t) :: expression
select case (token%token_no) case (tk_Lparen) call parse_parenthesized_expression (unit_no, lex_line_no, token, ast) case (tk_Sub) token%token_no = tk_Negate precedence = lexer_token_precedence(token%token_no) call get_token (unit_no, lex_line_no, token) call parse_expression (unit_no, precedence, lex_line_no, token, expression) ast = make_internal_node (node_Negate, expression, ast_nil) case (tk_Add) token%token_no = tk_Positive precedence = lexer_token_precedence(token%token_no) call get_token (unit_no, lex_line_no, token) call parse_expression (unit_no, precedence, lex_line_no, token, expression) ast = expression case (tk_Not) precedence = lexer_token_precedence(token%token_no) call get_token (unit_no, lex_line_no, token) call parse_expression (unit_no, precedence, lex_line_no, token, expression) ast = make_internal_node (node_Not, expression, ast_nil) case (tk_Ident) ast = make_leaf_node (node_Identifier, token%val) call get_token (unit_no, lex_line_no, token) case (tk_Integer) ast = make_leaf_node (node_Integer, token%val) call get_token (unit_no, lex_line_no, token) case default call syntax_error_message ("", tk_primary, token) stop 1 end select
do while (lexer_token_arity(token%token_no) == 2 .and. & & p <= lexer_token_precedence(token%token_no)) block type(ast_t) :: right_expression integer :: q integer :: node_variety
if (lexer_token_associativity(token%token_no) == right_associative) then q = lexer_token_precedence(token%token_no) else q = lexer_token_precedence(token%token_no) + 1 end if node_variety = binary_operator_node_variety (token%token_no) call get_token (unit_no, lex_line_no, token) call parse_expression (unit_no, q, lex_line_no, token, right_expression) ast = make_internal_node (node_variety, ast, right_expression) end block end do end subroutine parse_expression
recursive subroutine parse_parenthesized_expression (unit_no, lex_line_no, token, ast) integer, intent(in) :: unit_no integer(kind = nk), intent(inout) :: lex_line_no type(lexer_token_t), intent(inout) :: token type(ast_t), intent(out) :: ast
call expect_token ("paren_expr", tk_Lparen, token) call get_token (unit_no, lex_line_no, token) call parse_expression (unit_no, 0, lex_line_no, token, ast) call expect_token ("paren_expr", tk_Rparen, token) call get_token (unit_no, lex_line_no, token) end subroutine parse_parenthesized_expression
elemental function binary_operator_node_variety (token_no) result (node_variety) integer, intent(in) :: token_no integer :: node_variety
select case (token_no) case (tk_Mul) node_variety = node_Multiply case (tk_Div) node_variety = node_Divide case (tk_Mod) node_variety = node_Mod case (tk_Add) node_variety = node_Add case (tk_Sub) node_variety = node_Subtract case (tk_Lss) node_variety = node_Less case (tk_Leq) node_variety = node_LessEqual case (tk_Gtr) node_variety = node_Greater case (tk_Geq) node_variety = node_GreaterEqual case (tk_Eq) node_variety = node_Equal case (tk_Neq) node_variety = node_NotEqual case (tk_And) node_variety = node_And case (tk_Or) node_variety = node_Or case default ! This branch should never be reached. error stop end select end function binary_operator_node_variety
function make_internal_node (node_variety, left, right) result (ast) integer, intent(in) :: node_variety class(ast_t), intent(in) :: left, right type(ast_t) :: ast
type(ast_node_t), pointer :: node
allocate (node) node%node_variety = node_variety node%left => left%node node%right => right%node ast%node => node end function make_internal_node
function make_leaf_node (node_variety, val) result (ast) integer, intent(in) :: node_variety character(*, kind = ck), intent(in) :: val type(ast_t) :: ast
type(ast_node_t), pointer :: node
allocate (node) node%node_variety = node_variety node%val = val ast%node => node end function make_leaf_node
subroutine get_token (unit_no, lex_line_no, token) integer, intent(in) :: unit_no integer(kind = nk), intent(inout) :: lex_line_no type(lexer_token_t), intent(out) :: token
logical :: eof
call get_lexer_token (unit_no, lex_line_no, eof, token) if (eof) then write (error_unit, '("Parser error: the stream of input tokens is incomplete")') stop 1 end if end subroutine get_token
subroutine expect_token (message, token_no, token) character(*), intent(in) :: message integer, intent (in) :: token_no class(lexer_token_t), intent(in) :: token
if (token%token_no /= token_no) then call syntax_error_message (message, token_no, token) stop 1 end if end subroutine expect_token
subroutine syntax_error_message (message, expected_token_no, token) character(*), intent(in) :: message integer, intent(in) :: expected_token_no class(lexer_token_t), intent(in) :: token
! Write a message to an output unit dedicated to printing ! errors. The message could, of course, be more detailed than what ! we are doing here. write (error_unit, '("Syntax error at ", I0, ".", I0)') & & token%line_no, token%column_no
! ! For the sake of the exercise, also write, to output_unit, a ! message in the style of the C reference program. ! write (output_unit, '("(", I0, ", ", I0, ") error: ")', advance = 'no') & & token%line_no, token%column_no select case (expected_token_no) case (tk_start_of_statement) write (output_unit, '("expecting start of statement, found ", 1A, "")') & & trim (lexer_token_string(token%token_no)) case (tk_primary) write (output_unit, '("Expecting a primary, found ", 1A, "")') & & trim (lexer_token_string(token%token_no)) case default write (output_unit, '(1A, ": Expecting ", 1A, ", found ", 1A, "")') & & trim (message), trim (lexer_token_string(expected_token_no)), & & trim (lexer_token_string(token%token_no)) end select end subroutine syntax_error_message
subroutine output_ast_flattened (unit_no, ast) integer, intent(in) :: unit_no type(ast_t), intent(in) :: ast
call output_ast_node_flattened (unit_no, ast%node) end subroutine output_ast_flattened
recursive subroutine output_ast_node_flattened (unit_no, node) integer, intent(in) :: unit_no type(ast_node_t), pointer, intent(in) :: node
if (.not. associated (node)) then write (unit_no, '(";")') else if (allocated (node%val)) then write (unit_no, '(1A16, 2X, 1A)') & & node_variety_string(node%node_variety), node%val else write (unit_no, '(1A)') & & trim (node_variety_string(node%node_variety)) call output_ast_node_flattened (unit_no, node%left) call output_ast_node_flattened (unit_no, node%right) end if end if end subroutine output_ast_node_flattened
end module syntactic_analysis
program parse
use, intrinsic :: iso_fortran_env, only: input_unit use, intrinsic :: iso_fortran_env, only: output_unit use, intrinsic :: iso_fortran_env, only: error_unit use, non_intrinsic :: syntactic_analysis
implicit none
integer, parameter :: inp_unit_no = 100 integer, parameter :: outp_unit_no = 101
integer :: arg_count character(200) :: arg integer :: inp integer :: outp
arg_count = command_argument_count () if (3 <= arg_count) then call print_usage else if (arg_count == 0) then inp = input_unit outp = output_unit else if (arg_count == 1) then call get_command_argument (1, arg) inp = open_for_input (trim (arg)) outp = output_unit else if (arg_count == 2) then call get_command_argument (1, arg) inp = open_for_input (trim (arg)) call get_command_argument (2, arg) outp = open_for_output (trim (arg)) end if
block type(ast_t) :: ast
ast = parse_token_stream (inp) call output_ast_flattened (outp, ast) end block end if
contains
function open_for_input (filename) result (unit_no) character(*), intent(in) :: filename integer :: unit_no
integer :: stat
open (unit = inp_unit_no, file = filename, status = 'old', & & action = 'read', access = 'stream', form = 'unformatted', & & iostat = stat) if (stat /= 0) then write (error_unit, '("Error: failed to open ", 1A, " for input")') filename stop 1 end if unit_no = inp_unit_no end function open_for_input
function open_for_output (filename) result (unit_no) character(*), intent(in) :: filename integer :: unit_no
integer :: stat
open (unit = outp_unit_no, file = filename, action = 'write', iostat = stat) if (stat /= 0) then write (error_unit, '("Error: failed to open ", 1A, " for output")') filename stop 1 end if unit_no = outp_unit_no end function open_for_output
subroutine print_usage character(200) :: progname
call get_command_argument (0, progname) write (output_unit, '("Usage: ", 1A, " [INPUT_FILE [OUTPUT_FILE]]")') & & trim (progname) end subroutine print_usage
end program parse</lang>
- Output:
Prime numbers example:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Go
<lang go>package main
import (
"bufio" "fmt" "log" "os" "strconv" "strings"
)
type TokenType int
const (
tkEOI TokenType = iota tkMul tkDiv tkMod tkAdd tkSub tkNegate tkNot tkLss tkLeq tkGtr tkGeq tkEql tkNeq tkAssign tkAnd tkOr tkIf tkElse tkWhile tkPrint tkPutc tkLparen tkRparen tkLbrace tkRbrace tkSemi tkComma tkIdent tkInteger tkString
)
type NodeType int
const (
ndIdent NodeType = iota ndString ndInteger ndSequence ndIf ndPrtc ndPrts ndPrti ndWhile ndAssign ndNegate ndNot ndMul ndDiv ndMod ndAdd ndSub ndLss ndLeq ndGtr ndGeq ndEql ndNeq ndAnd ndOr
)
type tokS struct {
tok TokenType errLn int errCol int text string // ident or string literal or integer value
}
type Tree struct {
nodeType NodeType left *Tree right *Tree value string
}
// dependency: Ordered by tok, must remain in same order as TokenType consts type atr struct {
text string enumText string tok TokenType rightAssociative bool isBinary bool isUnary bool precedence int nodeType NodeType
}
var atrs = []atr{
{"EOI", "End_of_input", tkEOI, false, false, false, -1, -1}, {"*", "Op_multiply", tkMul, false, true, false, 13, ndMul}, {"/", "Op_divide", tkDiv, false, true, false, 13, ndDiv}, {"%", "Op_mod", tkMod, false, true, false, 13, ndMod}, {"+", "Op_add", tkAdd, false, true, false, 12, ndAdd}, {"-", "Op_subtract", tkSub, false, true, false, 12, ndSub}, {"-", "Op_negate", tkNegate, false, false, true, 14, ndNegate}, {"!", "Op_not", tkNot, false, false, true, 14, ndNot}, {"<", "Op_less", tkLss, false, true, false, 10, ndLss}, {"<=", "Op_lessequal", tkLeq, false, true, false, 10, ndLeq}, {">", "Op_greater", tkGtr, false, true, false, 10, ndGtr}, {">=", "Op_greaterequal", tkGeq, false, true, false, 10, ndGeq}, {"==", "Op_equal", tkEql, false, true, false, 9, ndEql}, {"!=", "Op_notequal", tkNeq, false, true, false, 9, ndNeq}, {"=", "Op_assign", tkAssign, false, false, false, -1, ndAssign}, {"&&", "Op_and", tkAnd, false, true, false, 5, ndAnd}, {"||", "Op_or", tkOr, false, true, false, 4, ndOr}, {"if", "Keyword_if", tkIf, false, false, false, -1, ndIf}, {"else", "Keyword_else", tkElse, false, false, false, -1, -1}, {"while", "Keyword_while", tkWhile, false, false, false, -1, ndWhile}, {"print", "Keyword_print", tkPrint, false, false, false, -1, -1}, {"putc", "Keyword_putc", tkPutc, false, false, false, -1, -1}, {"(", "LeftParen", tkLparen, false, false, false, -1, -1}, {")", "RightParen", tkRparen, false, false, false, -1, -1}, {"{", "LeftBrace", tkLbrace, false, false, false, -1, -1}, {"}", "RightBrace", tkRbrace, false, false, false, -1, -1}, {";", "Semicolon", tkSemi, false, false, false, -1, -1}, {",", "Comma", tkComma, false, false, false, -1, -1}, {"Ident", "Identifier", tkIdent, false, false, false, -1, ndIdent}, {"Integer literal", "Integer", tkInteger, false, false, false, -1, ndInteger}, {"String literal", "String", tkString, false, false, false, -1, ndString},
}
var displayNodes = []string{
"Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts", "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual", "And", "Or",
}
var (
err error token tokS scanner *bufio.Scanner
)
func reportError(errLine, errCol int, msg string) {
log.Fatalf("(%d, %d) error : %s\n", errLine, errCol, msg)
}
func check(err error) {
if err != nil { log.Fatal(err) }
}
func getEum(name string) TokenType { // return internal version of name#
for _, atr := range atrs { if atr.enumText == name { return atr.tok } } reportError(0, 0, fmt.Sprintf("Unknown token %s\n", name)) return tkEOI
}
func getTok() tokS {
tok := tokS{} if scanner.Scan() { line := strings.TrimRight(scanner.Text(), " \t") fields := strings.Fields(line) // [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional tok.errLn, err = strconv.Atoi(fields[0]) check(err) tok.errCol, err = strconv.Atoi(fields[1]) check(err) tok.tok = getEum(fields[2]) le := len(fields) if le == 4 { tok.text = fields[3] } else if le > 4 { idx := strings.Index(line, `"`) tok.text = line[idx:] } } check(scanner.Err()) return tok
}
func makeNode(nodeType NodeType, left *Tree, right *Tree) *Tree {
return &Tree{nodeType, left, right, ""}
}
func makeLeaf(nodeType NodeType, value string) *Tree {
return &Tree{nodeType, nil, nil, value}
}
func expect(msg string, s TokenType) {
if token.tok == s { token = getTok() return } reportError(token.errLn, token.errCol, fmt.Sprintf("%s: Expecting '%s', found '%s'\n", msg, atrs[s].text, atrs[token.tok].text))
}
func expr(p int) *Tree {
var x, node *Tree switch token.tok { case tkLparen: x = parenExpr() case tkSub, tkAdd: op := token.tok token = getTok() node = expr(atrs[tkNegate].precedence) if op == tkSub { x = makeNode(ndNegate, node, nil) } else { x = node } case tkNot: token = getTok() x = makeNode(ndNot, expr(atrs[tkNot].precedence), nil) case tkIdent: x = makeLeaf(ndIdent, token.text) token = getTok() case tkInteger: x = makeLeaf(ndInteger, token.text) token = getTok() default: reportError(token.errLn, token.errCol, fmt.Sprintf("Expecting a primary, found: %s\n", atrs[token.tok].text)) }
for atrs[token.tok].isBinary && atrs[token.tok].precedence >= p { op := token.tok token = getTok() q := atrs[op].precedence if !atrs[op].rightAssociative { q++ } node = expr(q) x = makeNode(atrs[op].nodeType, x, node) } return x
}
func parenExpr() *Tree {
expect("parenExpr", tkLparen) t := expr(0) expect("parenExpr", tkRparen) return t
}
func stmt() *Tree {
var t, v, e, s, s2 *Tree switch token.tok { case tkIf: token = getTok() e = parenExpr() s = stmt() s2 = nil if token.tok == tkElse { token = getTok() s2 = stmt() } t = makeNode(ndIf, e, makeNode(ndIf, s, s2)) case tkPutc: token = getTok() e = parenExpr() t = makeNode(ndPrtc, e, nil) expect("Putc", tkSemi) case tkPrint: // print '(' expr {',' expr} ')' token = getTok() for expect("Print", tkLparen); ; expect("Print", tkComma) { if token.tok == tkString { e = makeNode(ndPrts, makeLeaf(ndString, token.text), nil) token = getTok() } else { e = makeNode(ndPrti, expr(0), nil) } t = makeNode(ndSequence, t, e) if token.tok != tkComma { break } } expect("Print", tkRparen) expect("Print", tkSemi) case tkSemi: token = getTok() case tkIdent: v = makeLeaf(ndIdent, token.text) token = getTok() expect("assign", tkAssign) e = expr(0) t = makeNode(ndAssign, v, e) expect("assign", tkSemi) case tkWhile: token = getTok() e = parenExpr() s = stmt() t = makeNode(ndWhile, e, s) case tkLbrace: // {stmt} for expect("Lbrace", tkLbrace); token.tok != tkRbrace && token.tok != tkEOI; { t = makeNode(ndSequence, t, stmt()) } expect("Lbrace", tkRbrace) case tkEOI: // do nothing default: reportError(token.errLn, token.errCol, fmt.Sprintf("expecting start of statement, found '%s'\n", atrs[token.tok].text)) } return t
}
func parse() *Tree {
var t *Tree token = getTok() for { t = makeNode(ndSequence, t, stmt()) if t == nil || token.tok == tkEOI { break } } return t
}
func prtAst(t *Tree) {
if t == nil { fmt.Print(";\n") } else { fmt.Printf("%-14s ", displayNodes[t.nodeType]) if t.nodeType == ndIdent || t.nodeType == ndInteger || t.nodeType == ndString { fmt.Printf("%s\n", t.value) } else { fmt.Println() prtAst(t.left) prtAst(t.right) } }
}
func main() {
source, err := os.Open("source.txt") check(err) defer source.Close() scanner = bufio.NewScanner(source) prtAst(parse())
}</lang>
- Output:
Prime Numbers example:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Icon
I use co-expressions in a way that could easily be done differently, but I prefer to use the co-expressions. (These can be sluggish or fast, depending on what sort of Icon you are running. In this case, the speed differences are of little concern.)
<lang Icon>#
- The Rosetta Code Tiny-Language Parser, in Icon.
- This implementation is based closely on the pseudocode and the C
- reference implementation.
- ximage from the IPL is useful for debugging. Use "xdump(x)" to
- pretty-print x.
- link ximage
record token_record (line_no, column_no, tok, tokval) record token_getter (nxt, curr)
procedure main (args)
local inpf_name, outf_name local inpf, outf local nexttok, currtok, current_token, gettok local ast
inpf_name := "-" outf_name := "-" if 1 <= *args then inpf_name := args[1] if 2 <= *args then outf_name := args[2]
inpf := if inpf_name == "-" then &input else (open(inpf_name, "r") | stop("failed to open \"" || inpf_name || "\" for input")) outf := if outf_name == "-" then &output else (open(outf_name, "w") | stop("failed to open \"" || outf_name || "\" for output"))
current_token := [&null] nexttok := create generate_tokens(inpf, current_token) currtok := create get_current_token (current_token) gettok := token_getter(nexttok, currtok) ast := parse(gettok) prt_ast(outf, ast)
close(inpf) close(outf)
end
procedure prt_ast (outf, ast)
if *ast = 0 then { write(outf, ";") } else { writes(outf, ast[1]) if ast[1] == ("Identifier" | "Integer" | "String") then { write(outf, " ", ast[2]) } else { write(outf) prt_ast(outf, ast[2]) prt_ast(outf, ast[3]) } }
end
procedure generate_tokens (inpf, current_token)
local s
while s := read(inpf) do { if trim(s) ~== "" then { current_token[1] := string_to_token_record(s) suspend current_token[1] } }
end
procedure get_current_token (current_token)
repeat (suspend current_token[1])
end
procedure string_to_token_record (s)
local line_no, column_no, tok, tokval
static spaces
initial { spaces := ' \t\f\v\r\n' }
trim(s) ? { tab(many(spaces)) line_no := integer(tab(many(&digits))) tab(many(spaces)) column_no := integer(tab(many(&digits))) tab(many(spaces)) tok := tab(many(&letters ++ '_')) tab(many(spaces)) tokval := tab(0) } return token_record(line_no, column_no, tok, tokval)
end
procedure parse (gettok)
local tok local t
t := [] @gettok.nxt tok := "Not End_of_input" while tok ~== "End_of_input" do { t := ["Sequence", t, stmt(gettok)] tok := (@gettok.curr).tok } return t
end
procedure stmt (gettok)
local e, s, t, v local tok local done
t := [] if accept(gettok, "Keyword_if") then { e := paren_expr(gettok) s := stmt(gettok) t := ["If", e, ["If", s, if accept(gettok, "Keyword_else") then stmt(gettok) else []]] } else if accept(gettok, "Keyword_putc") then { t := ["Prtc", paren_expr(gettok), []] expect(gettok, "Putc", "Semicolon") } else if accept(gettok, "Keyword_print") then { expect(gettok, "Print", "LeftParen") done := 0 while done = 0 do { tok := @gettok.curr if tok.tok == "String" then { e := ["Prts", ["String", tok.tokval], []] @gettok.nxt } else { e := ["Prti", expr(gettok, 0), []] } t := ["Sequence", t, e] accept(gettok, "Comma") | (done := 1) } expect(gettok, "Print", "RightParen") expect(gettok, "Print", "Semicolon") } else if (@gettok.curr).tok == "Semicolon" then { @gettok.nxt } else if (@gettok.curr).tok == "Identifier" then { v := ["Identifier", (@gettok.curr).tokval] @gettok.nxt expect(gettok, "assign", "Op_assign") t := ["Assign", v, expr(gettok, 0)] expect(gettok, "assign", "Semicolon") } else if accept(gettok, "Keyword_while") then { e := paren_expr(gettok) t := ["While", e, stmt(gettok)] } else if accept(gettok, "LeftBrace") then { until (@gettok.curr).tok == ("RightBrace" | "End_of_input") do { t := ["Sequence", t, stmt(gettok)] } expect(gettok, "Lbrace", "RightBrace") } else if (@gettok.curr).tok ~== "End_of_input" then { tok := @gettok.curr error(tok, ("expecting start of statement, found '" || text(tok.tok) || "'")) } return t
end
procedure paren_expr (gettok)
local x
expect(gettok, "paren_expr", "LeftParen"); x := expr(gettok, 0); expect(gettok, "paren_expr", "RightParen"); return x
end
procedure expr (gettok, p)
local tok, save_tok local x, y local q
tok := @gettok.curr case tok.tok of { "LeftParen" : { x := paren_expr(gettok) } "Op_subtract" : { @gettok.nxt y := expr(gettok, precedence("Op_negate")) x := ["Negate", y, []] } "Op_add" : { @gettok.nxt x := expr(gettok, precedence("Op_negate")) } "Op_not" : { @gettok.nxt y := expr(gettok, precedence("Op_not")) x := ["Not", y, []] } "Identifier" : { x := ["Identifier", tok.tokval] @gettok.nxt } "Integer" : { x := ["Integer", tok.tokval] @gettok.nxt } default : { error(tok, "Expecting a primary, found: " || text(tok.tok)) } }
while (tok := @gettok.curr & is_binary(tok.tok) & p <= precedence(tok.tok)) do { save_tok := tok @gettok.nxt q := precedence(save_tok.tok) if not is_right_associative(save_tok.tok) then q +:= 1 x := [operator(save_tok.tok), x, expr(gettok, q)] }
return x
end
procedure accept (gettok, tok)
local nxt
if (@gettok.curr).tok == tok then nxt := @gettok.nxt else fail return nxt
end
procedure expect (gettok, msg, tok)
if (@gettok.curr).tok ~== tok then { error(@gettok.curr, msg || ": Expecting '" || tok || "', found '" || (@gettok.curr).tok || "'") } return @gettok.nxt
end
procedure error (token, msg)
write("(", token.line_no, ", ", token.column_no, ") error: ", msg) exit(1)
end
procedure precedence (tok)
local p
case tok of { "Op_multiply" : p := 13 "Op_divide" : p := 13 "Op_mod" : p := 13 "Op_add" : p := 12 "Op_subtract" : p := 12 "Op_negate" : p := 14 "Op_not" : p := 14 "Op_less" : p := 10 "Op_lessequal" : p := 10 "Op_greater" : p := 10 "Op_greaterequal" : p := 10 "Op_equal" : p := 9 "Op_notequal" : p := 9 "Op_and" : p := 5 "Op_or" : p := 4 default : p := -1 } return p
end
procedure is_binary (tok)
return ("Op_add" | "Op_subtract" | "Op_multiply" | "Op_divide" | "Op_mod" | "Op_less" | "Op_lessequal" | "Op_greater" | "Op_greaterequal" | "Op_equal" | "Op_notequal" | "Op_and" | "Op_or") == tok fail
end
procedure is_right_associative (tok)
# None of the current operators is right associative. fail
end
procedure operator (tok)
local s
case tok of { "Op_multiply" : s := "Multiply" "Op_divide" : s := "Divide" "Op_mod" : s := "Mod" "Op_add" : s := "Add" "Op_subtract" : s := "Subtract" "Op_negate" : s := "Negate" "Op_not" : s := "Not" "Op_less" : s := "Less" "Op_lessequal" : s := "LessEqual" "Op_greater" : s := "Greater" "Op_greaterequal" : s := "GreaterEqual" "Op_equal" : s := "Equal" "Op_notequal" : s := "NotEqual" "Op_and" : s := "And" "Op_or" : s := "Or" } return s
end
procedure text (tok)
local s
case tok of { "Keyword_else" : s := "else" "Keyword_if" : s := "if" "Keyword_print" : s := "print" "Keyword_putc" : s := "putc" "Keyword_while" : s := "while" "Op_multiply" : s := "*" "Op_divide" : s := "/" "Op_mod" : s := "%" "Op_add" : s := "+" "Op_subtract" : s := "-" "Op_negate" : s := "-" "Op_less" : s := "<" "Op_lessequal" : s := "<=" "Op_greater" : s := ">" "Op_greaterequal" : s := ">=" "Op_equal" : s := "==" "Op_notequal" : s := "!=" "Op_not" : s := "!" "Op_assign" : s := "=" "Op_and" : s := "&&" "Op_or" : s := "||" "LeftParen" : s := "(" "RightParen" : s := ")" "LeftBrace" : s := "{" "RightBrace" : s := "}" "Semicolon" : s := ";" "Comma" : s := "," "Identifier" : s := "Ident" "Integer" : s := "Integer literal" "String" : s := "String literal" "End_of_input" : s := "EOI" } return s
end</lang>
- Output:
$ icont -s -u parse.icn && ./parse compiler-tests/primes.lex Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
J
Implementation:
<lang J>require'format/printf'
tkref=: tokenize 'End_of_input*/%+--<<=>>===!=!&&||print=print(if{else}while;,putc)a""0' tkref,. (tknames)=: tknames=:;: {{)n
End_of_input Op_multiply Op_divide Op_mod Op_add Op_subtract Op_negate Op_less Op_lessequal Op_greater Op_greaterequal Op_equal Op_notequal Op_not Op_and Op_or Keyword_print Op_assign Keyword_print LeftParen Keyword_if LeftBrace Keyword_else RightBrace Keyword_while Semicolon Comma Keyword_putc RightParen Identifier String Integer
}}-.LF
tkV=: 2 (tkref i.tokenize '*/%+-<<=>>===!=&&||')} (#tktyp)#0 tkV=: 1 (1 0+tkref i.tokenize '-!')} tkV tkPrec=: 13 13 13 12 12 10 10 10 10 9 9 5 4 (tkref i.tokenize'*/%+-<<=>>==!=&&||')} tkV<._1 tkPrec=: 14 (1 0+tkref i.tokenize'-!')} tkPrec NB. proofread |:(<"1 tkV,.tkPrec),tkref,:tknames
tkref,.(ndDisp)=: ndDisp=:;:{{)n
Sequence Multiply Divide Mod Add Subtract Negate Less LessEqual Greater GreaterEqual Equal NotEqual Not And Or Prts Assign Prti x If x x x While x x Prtc x Identifier String Integer
}}-.LF NB. proofread |:tkref,:ndDisp
gettoken=: {{
'tok_ln tok_col'=: (0;ndx){::x 'tok_name tok_value'=: (1;ndx){::x if. 'Error'-:tok_name do. error 'invalid word ',":tok_value end. ind=. tknames i.<tok_name tok_text=: ind{::tkref tok_valence=: ind{::tkV tok_precedence=: ind{::tkPrec ndx=:ndx+1 node_display=: ind{::ndDisp
}}
parse=: {{
ndx=: tok_ln=: tok_col=: 0 gettok=: y&gettoken gettok t=.a: whilst.-.(a:-:t)+.tok_name-:End_of_input do. t=. Sequence make_node t stmt end.
}}
stmt=:{{)v
t=. a: select.tok_name case.Keyword_if do. s=. stmt e=. paren_expr gettok if.Keyword_else-:tok_name do. S=. stmt gettok else. S=. a: end. t=. If make_node e If make_node s S case.Keyword_putc do. e=. paren_expr gettok t=. Prtc make_node e a: Prtc expect Semicolon case.Keyword_print do.gettok 'Print' expect LeftParen while.do. if.String-:tok_name do. gettok e=. Prts make_node (String make_leaf tok_value) a: else. e=. Prti make_node (expr 0) a: end. t=. Sequence make_node t e if.Comma-:tok_name do.Comma expect Comma else.break.end. end. 'Print' expect RightParen 'Print' expect Semicolon case.Semicolon do.gettok case.Identifier do. gettok v=. Identifier make_leaf tok_value Assign expect Op_assign t=. Assign make_node v e=. expr 0 Assign expect Semicolon case.Keyword_while do. t=. While make_node e s=. stmt e=. paren_expr gettok case.LeftBrace do. 'LeftBrace' expect LeftBrace while.-.(<tok_name) e. RightBrace;End_of_input do. t=. Sequence make_node t stmt end. 'LeftBrace' expect RightBrace case.End_of_input do. case.do. error 'Expecting start of statement, found %s'sprintf<tok_text end. t
}}
paren_expr=: {{
'paren_expr' expect LeftParen t=. expr 0 'paren_expr' expect RightParen t
}}
not_prec=: tkPrec{~tknames i.<Op_not expr=: {{
select.tok_name case.LeftParen do.e=. paren_expr case.Op_add do.gettok e=. expr not_prec case.Op_subtract do.gettok e=. Negate make_node (expr not_prec) a: case.Op_not do.gettok e=. Not make_node (expr not_prec) a: case.Identifier do. gettok e=. Identifier make_leaf tok_value case.Integer do. gettok e=. Integer make_leaf tok_value case.do. error 'Expecting a primary, found %s'sprintf<tok_text end. while.(2=tok_valence)*tok_precedence>:y do. q=. 1+tok_precedence [ op=. node_display NB. no right associative operators gettok node=. expr q e=. op make_node e node end. e
}}
expect=: {{
if.y-:tok_name do. gettok return.end. error '%s: Expecting "%s", found "%s"'sprintf x;(tkref{::~tknames i.<y);tok_text
}}
make_leaf=: Template:X;y
make_node=: {{
m;n;<y
}}
error=: Template:Echo 'Error: line %d, column %d: %s\n'sprintf tok ln;tok col;y throw.
syntax=: Template:;(flatAST parse y),each LF
flatAST=: {{
assert.*L.y select.#y case.1 do.<';' assert.y-:a: case.2 do.<;:inv ":each y case.3 do.({.y),(flatAST 1{::y),flatAST 2{::y case.do.assert.0 end.
}} </lang>
Some quirks worth noting:
(1) '+' appears in the productions for 'primary' and 'addition_expr' but has only one node type (because we do not represent its appearance in 'primary' with a node.
(2) '-' and 'print' do have two node types (which we sort out on the fly).
(3) In this implementation, we require a 1:1 mapping between the data structure representing token types and the data structure representing node types. This means two token entries for both - and print (the second instance of both gets ignored by the lexer).
(4) Because the data structure produced by the lexer is independent of any type system implementation, we can use the same type system for the lexer or a different type system for the lexer and either way works (as long as the implementations are consistent with the spec).
(5) In this context parallel constant arrays represent token and node types.
Task example:
<lang J> primes=: {{)n /*
Simple prime number generator */
count = 1; n = 1; limit = 100; while (n < limit) {
k=3; p=1; n=n+2; while ((k*k<=n) && (p)) { p=n/k*k!=n; k=k+2; } if (p) { print(n, " is prime\n"); count = count + 1; }
} print("Total primes found: ", count, "\n"); }}
syntax lex primes
Sequence Sequence Sequence Sequence Sequence
Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence
Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence
Assign Identifier p NotEqual Divide Identifier n Multiply Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence
Sequence Sequence
Prti Identifier n
Prts String " is prime\n"
Assign Identifier count Add Identifier count Integer 1
Sequence Sequence Sequence
Prts String "Total primes found: "
Prti Identifier count
Prts String "\n"
</lang>
Java
Usage: java Parser infile [>outfile]
<lang java> import java.io.File; import java.io.FileNotFoundException; import java.util.Scanner; import java.util.StringTokenizer; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.util.HashMap;
class Parser { private List<Token> source; private Token token; private int position;
static class Node { public NodeType nt; public Node left, right; public String value;
Node() { this.nt = null; this.left = null; this.right = null; this.value = null; } Node(NodeType node_type, Node left, Node right, String value) { this.nt = node_type; this.left = left; this.right = right; this.value = value; } public static Node make_node(NodeType nodetype, Node left, Node right) { return new Node(nodetype, left, right, ""); } public static Node make_node(NodeType nodetype, Node left) { return new Node(nodetype, left, null, ""); } public static Node make_leaf(NodeType nodetype, String value) { return new Node(nodetype, null, null, value); } }
static class Token { public TokenType tokentype; public String value; public int line; public int pos;
Token(TokenType token, String value, int line, int pos) { this.tokentype = token; this.value = value; this.line = line; this.pos = pos; } @Override public String toString() { return String.format("%5d %5d %-15s %s", this.line, this.pos, this.tokentype, this.value); } }
static enum TokenType { End_of_input(false, false, false, -1, NodeType.nd_None), Op_multiply(false, true, false, 13, NodeType.nd_Mul), Op_divide(false, true, false, 13, NodeType.nd_Div), Op_mod(false, true, false, 13, NodeType.nd_Mod), Op_add(false, true, false, 12, NodeType.nd_Add), Op_subtract(false, true, false, 12, NodeType.nd_Sub), Op_negate(false, false, true, 14, NodeType.nd_Negate), Op_not(false, false, true, 14, NodeType.nd_Not), Op_less(false, true, false, 10, NodeType.nd_Lss), Op_lessequal(false, true, false, 10, NodeType.nd_Leq), Op_greater(false, true, false, 10, NodeType.nd_Gtr), Op_greaterequal(false, true, false, 10, NodeType.nd_Geq), Op_equal(false, true, true, 9, NodeType.nd_Eql), Op_notequal(false, true, false, 9, NodeType.nd_Neq), Op_assign(false, false, false, -1, NodeType.nd_Assign), Op_and(false, true, false, 5, NodeType.nd_And), Op_or(false, true, false, 4, NodeType.nd_Or), Keyword_if(false, false, false, -1, NodeType.nd_If), Keyword_else(false, false, false, -1, NodeType.nd_None), Keyword_while(false, false, false, -1, NodeType.nd_While), Keyword_print(false, false, false, -1, NodeType.nd_None), Keyword_putc(false, false, false, -1, NodeType.nd_None), LeftParen(false, false, false, -1, NodeType.nd_None), RightParen(false, false, false, -1, NodeType.nd_None), LeftBrace(false, false, false, -1, NodeType.nd_None), RightBrace(false, false, false, -1, NodeType.nd_None), Semicolon(false, false, false, -1, NodeType.nd_None), Comma(false, false, false, -1, NodeType.nd_None), Identifier(false, false, false, -1, NodeType.nd_Ident), Integer(false, false, false, -1, NodeType.nd_Integer), String(false, false, false, -1, NodeType.nd_String);
private final int precedence; private final boolean right_assoc; private final boolean is_binary; private final boolean is_unary; private final NodeType node_type;
TokenType(boolean right_assoc, boolean is_binary, boolean is_unary, int precedence, NodeType node) { this.right_assoc = right_assoc; this.is_binary = is_binary; this.is_unary = is_unary; this.precedence = precedence; this.node_type = node; } boolean isRightAssoc() { return this.right_assoc; } boolean isBinary() { return this.is_binary; } boolean isUnary() { return this.is_unary; } int getPrecedence() { return this.precedence; } NodeType getNodeType() { return this.node_type; } } static enum NodeType { nd_None(""), nd_Ident("Identifier"), nd_String("String"), nd_Integer("Integer"), nd_Sequence("Sequence"), nd_If("If"), nd_Prtc("Prtc"), nd_Prts("Prts"), nd_Prti("Prti"), nd_While("While"), nd_Assign("Assign"), nd_Negate("Negate"), nd_Not("Not"), nd_Mul("Multiply"), nd_Div("Divide"), nd_Mod("Mod"), nd_Add("Add"), nd_Sub("Subtract"), nd_Lss("Less"), nd_Leq("LessEqual"), nd_Gtr("Greater"), nd_Geq("GreaterEqual"), nd_Eql("Equal"), nd_Neq("NotEqual"), nd_And("And"), nd_Or("Or");
private final String name;
NodeType(String name) { this.name = name; }
@Override public String toString() { return this.name; } } static void error(int line, int pos, String msg) { if (line > 0 && pos > 0) { System.out.printf("%s in line %d, pos %d\n", msg, line, pos); } else { System.out.println(msg); } System.exit(1); } Parser(List<Token> source) { this.source = source; this.token = null; this.position = 0; } Token getNextToken() { this.token = this.source.get(this.position++); return this.token; } Node expr(int p) { Node result = null, node; TokenType op; int q;
if (this.token.tokentype == TokenType.LeftParen) { result = paren_expr(); } else if (this.token.tokentype == TokenType.Op_add || this.token.tokentype == TokenType.Op_subtract) { op = (this.token.tokentype == TokenType.Op_subtract) ? TokenType.Op_negate : TokenType.Op_add; getNextToken(); node = expr(TokenType.Op_negate.getPrecedence()); result = (op == TokenType.Op_negate) ? Node.make_node(NodeType.nd_Negate, node) : node; } else if (this.token.tokentype == TokenType.Op_not) { getNextToken(); result = Node.make_node(NodeType.nd_Not, expr(TokenType.Op_not.getPrecedence())); } else if (this.token.tokentype == TokenType.Identifier) { result = Node.make_leaf(NodeType.nd_Ident, this.token.value); getNextToken(); } else if (this.token.tokentype == TokenType.Integer) { result = Node.make_leaf(NodeType.nd_Integer, this.token.value); getNextToken(); } else { error(this.token.line, this.token.pos, "Expecting a primary, found: " + this.token.tokentype); }
while (this.token.tokentype.isBinary() && this.token.tokentype.getPrecedence() >= p) { op = this.token.tokentype; getNextToken(); q = op.getPrecedence(); if (!op.isRightAssoc()) { q++; } node = expr(q); result = Node.make_node(op.getNodeType(), result, node); } return result; } Node paren_expr() { expect("paren_expr", TokenType.LeftParen); Node node = expr(0); expect("paren_expr", TokenType.RightParen); return node; } void expect(String msg, TokenType s) { if (this.token.tokentype == s) { getNextToken(); return; } error(this.token.line, this.token.pos, msg + ": Expecting '" + s + "', found: '" + this.token.tokentype + "'"); } Node stmt() { Node s, s2, t = null, e, v; if (this.token.tokentype == TokenType.Keyword_if) { getNextToken(); e = paren_expr(); s = stmt(); s2 = null; if (this.token.tokentype == TokenType.Keyword_else) { getNextToken(); s2 = stmt(); } t = Node.make_node(NodeType.nd_If, e, Node.make_node(NodeType.nd_If, s, s2)); } else if (this.token.tokentype == TokenType.Keyword_putc) { getNextToken(); e = paren_expr(); t = Node.make_node(NodeType.nd_Prtc, e); expect("Putc", TokenType.Semicolon); } else if (this.token.tokentype == TokenType.Keyword_print) { getNextToken(); expect("Print", TokenType.LeftParen); while (true) { if (this.token.tokentype == TokenType.String) { e = Node.make_node(NodeType.nd_Prts, Node.make_leaf(NodeType.nd_String, this.token.value)); getNextToken(); } else { e = Node.make_node(NodeType.nd_Prti, expr(0), null); } t = Node.make_node(NodeType.nd_Sequence, t, e); if (this.token.tokentype != TokenType.Comma) { break; } getNextToken(); } expect("Print", TokenType.RightParen); expect("Print", TokenType.Semicolon); } else if (this.token.tokentype == TokenType.Semicolon) { getNextToken(); } else if (this.token.tokentype == TokenType.Identifier) { v = Node.make_leaf(NodeType.nd_Ident, this.token.value); getNextToken(); expect("assign", TokenType.Op_assign); e = expr(0); t = Node.make_node(NodeType.nd_Assign, v, e); expect("assign", TokenType.Semicolon); } else if (this.token.tokentype == TokenType.Keyword_while) { getNextToken(); e = paren_expr(); s = stmt(); t = Node.make_node(NodeType.nd_While, e, s); } else if (this.token.tokentype == TokenType.LeftBrace) { getNextToken(); while (this.token.tokentype != TokenType.RightBrace && this.token.tokentype != TokenType.End_of_input) { t = Node.make_node(NodeType.nd_Sequence, t, stmt()); } expect("LBrace", TokenType.RightBrace); } else if (this.token.tokentype == TokenType.End_of_input) { } else { error(this.token.line, this.token.pos, "Expecting start of statement, found: " + this.token.tokentype); } return t; } Node parse() { Node t = null; getNextToken(); while (this.token.tokentype != TokenType.End_of_input) { t = Node.make_node(NodeType.nd_Sequence, t, stmt()); } return t; } void printAST(Node t) { int i = 0; if (t == null) { System.out.println(";"); } else { System.out.printf("%-14s", t.nt); if (t.nt == NodeType.nd_Ident || t.nt == NodeType.nd_Integer || t.nt == NodeType.nd_String) { System.out.println(" " + t.value); } else { System.out.println(); printAST(t.left); printAST(t.right); } } } public static void main(String[] args) { if (args.length > 0) { try { String value, token; int line, pos; Token t; boolean found; List<Token> list = new ArrayList<>(); Map<String, TokenType> str_to_tokens = new HashMap<>();
str_to_tokens.put("End_of_input", TokenType.End_of_input); str_to_tokens.put("Op_multiply", TokenType.Op_multiply); str_to_tokens.put("Op_divide", TokenType.Op_divide); str_to_tokens.put("Op_mod", TokenType.Op_mod); str_to_tokens.put("Op_add", TokenType.Op_add); str_to_tokens.put("Op_subtract", TokenType.Op_subtract); str_to_tokens.put("Op_negate", TokenType.Op_negate); str_to_tokens.put("Op_not", TokenType.Op_not); str_to_tokens.put("Op_less", TokenType.Op_less); str_to_tokens.put("Op_lessequal", TokenType.Op_lessequal); str_to_tokens.put("Op_greater", TokenType.Op_greater); str_to_tokens.put("Op_greaterequal", TokenType.Op_greaterequal); str_to_tokens.put("Op_equal", TokenType.Op_equal); str_to_tokens.put("Op_notequal", TokenType.Op_notequal); str_to_tokens.put("Op_assign", TokenType.Op_assign); str_to_tokens.put("Op_and", TokenType.Op_and); str_to_tokens.put("Op_or", TokenType.Op_or); str_to_tokens.put("Keyword_if", TokenType.Keyword_if); str_to_tokens.put("Keyword_else", TokenType.Keyword_else); str_to_tokens.put("Keyword_while", TokenType.Keyword_while); str_to_tokens.put("Keyword_print", TokenType.Keyword_print); str_to_tokens.put("Keyword_putc", TokenType.Keyword_putc); str_to_tokens.put("LeftParen", TokenType.LeftParen); str_to_tokens.put("RightParen", TokenType.RightParen); str_to_tokens.put("LeftBrace", TokenType.LeftBrace); str_to_tokens.put("RightBrace", TokenType.RightBrace); str_to_tokens.put("Semicolon", TokenType.Semicolon); str_to_tokens.put("Comma", TokenType.Comma); str_to_tokens.put("Identifier", TokenType.Identifier); str_to_tokens.put("Integer", TokenType.Integer); str_to_tokens.put("String", TokenType.String);
Scanner s = new Scanner(new File(args[0])); String source = " "; while (s.hasNext()) { String str = s.nextLine(); StringTokenizer st = new StringTokenizer(str); line = Integer.parseInt(st.nextToken()); pos = Integer.parseInt(st.nextToken()); token = st.nextToken(); value = ""; while (st.hasMoreTokens()) { value += st.nextToken() + " "; } found = false; if (str_to_tokens.containsKey(token)) { found = true; list.add(new Token(str_to_tokens.get(token), value, line, pos)); } if (found == false) { throw new Exception("Token not found: '" + token + "'"); } } Parser p = new Parser(list); p.printAST(p.parse()); } catch (FileNotFoundException e) { error(-1, -1, "Exception: " + e.getMessage()); } catch (Exception e) { error(-1, -1, "Exception: " + e.getMessage()); } } else { error(-1, -1, "No args"); } } } </lang>
Julia
Julia tends to discourage large numbers of global variables, so this direct port from the Python reference implementation moves the globals into a function wrapper.
<lang julia>struct ASTnode
nodetype::Int left::Union{Nothing, ASTnode} right::Union{Nothing, ASTnode} value::Union{Nothing, Int, String}
end
function syntaxanalyzer(inputfile)
tkEOI, tkMul, tkDiv, tkMod, tkAdd, tkSub, tkNegate, tkNot, tkLss, tkLeq, tkGtr, tkGeq, tkEql, tkNeq, tkAssign, tkAnd, tkOr, tkIf, tkElse, tkWhile, tkPrint, tkPutc, tkLparen, tkRparen, tkLbrace, tkRbrace, tkSemi, tkComma, tkIdent, tkInteger, tkString = collect(1:31)
ndIdent, ndString, ndInteger, ndSequence, ndIf, ndPrtc, ndPrts, ndPrti, ndWhile, ndAssign, ndNegate, ndNot, ndMul, ndDiv, ndMod, ndAdd, ndSub, ndLss, ndLeq, ndGtr, ndGeq, ndEql, ndNeq, ndAnd, ndOr = collect(1:25)
TK_NAME, TK_RIGHT_ASSOC, TK_IS_BINARY, TK_IS_UNARY, TK_PRECEDENCE, TK_NODE = collect(1:6) # label Token columns Tokens = [ ["EOI" , false, false, false, -1, -1 ], ["*" , false, true, false, 13, ndMul ], ["/" , false, true, false, 13, ndDiv ], ["%" , false, true, false, 13, ndMod ], ["+" , false, true, false, 12, ndAdd ], ["-" , false, true, false, 12, ndSub ], ["-" , false, false, true, 14, ndNegate ], ["!" , false, false, true, 14, ndNot ], ["<" , false, true, false, 10, ndLss ], ["<=" , false, true, false, 10, ndLeq ], [">" , false, true, false, 10, ndGtr ], [">=" , false, true, false, 10, ndGeq ], ["==" , false, true, false, 9, ndEql ], ["!=" , false, true, false, 9, ndNeq ], ["=" , false, false, false, -1, ndAssign ], ["&&" , false, true, false, 5, ndAnd ], ["||" , false, true, false, 4, ndOr ], ["if" , false, false, false, -1, ndIf ], ["else" , false, false, false, -1, -1 ], ["while" , false, false, false, -1, ndWhile ], ["print" , false, false, false, -1, -1 ], ["putc" , false, false, false, -1, -1 ], ["(" , false, false, false, -1, -1 ], [")" , false, false, false, -1, -1 ], ["{" , false, false, false, -1, -1 ], ["}" , false, false, false, -1, -1 ], [";" , false, false, false, -1, -1 ], ["," , false, false, false, -1, -1 ], ["Ident" , false, false, false, -1, ndIdent ], ["Integer literal" , false, false, false, -1, ndInteger], ["String literal" , false, false, false, -1, ndString ]]
allsyms = Dict( "End_of_input" => tkEOI, "Op_multiply" => tkMul, "Op_divide" => tkDiv, "Op_mod" => tkMod, "Op_add" => tkAdd, "Op_subtract" => tkSub, "Op_negate" => tkNegate, "Op_not" => tkNot, "Op_less" => tkLss, "Op_lessequal" => tkLeq, "Op_greater" => tkGtr, "Op_greaterequal" => tkGeq, "Op_equal" => tkEql, "Op_notequal" => tkNeq, "Op_assign" => tkAssign, "Op_and" => tkAnd, "Op_or" => tkOr, "Keyword_if" => tkIf, "Keyword_else" => tkElse, "Keyword_while" => tkWhile, "Keyword_print" => tkPrint, "Keyword_putc" => tkPutc, "LeftParen" => tkLparen, "RightParen" => tkRparen, "LeftBrace" => tkLbrace, "RightBrace" => tkRbrace, "Semicolon" => tkSemi, "Comma" => tkComma, "Identifier" => tkIdent, "Integer" => tkInteger, "String" => tkString)
displaynodes = ["Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts", "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual", "And", "Or"]
errline, errcol, tok, toktext = fill("", 4) error(msg) = throw("Error in syntax: $msg.") nilnode = ASTnode(0, nothing, nothing, nothing) tokother = "" function gettok() s = readline(inputfile) if length(s) == 0 error("empty line") end linelist = split(strip(s), r"\s+", limit = 4) # line col Ident varname # 0 1 2 3 errline, errcol, toktext = linelist[1:3] if !haskey(allsyms, toktext) error("Unknown token $toktext") end tok = allsyms[toktext] tokother = (tok in [tkInteger, tkIdent, tkString]) ? linelist[4] : "" end
makenode(oper, left, right = nilnode) = ASTnode(oper, left, right, nothing) makeleaf(oper, n::Int) = ASTnode(oper, nothing, nothing, n) makeleaf(oper, n) = ASTnode(oper, nothing, nothing, string(n)) expect(msg, s) = if tok != s error("msg: Expecting $(Tokens[s][TK_NAME]), found $(Tokens[tok][TK_NAME])") else gettok() end
function expr(p) x = nilnode if tok == tkLparen x = parenexpr() elseif tok in [tkSub, tkAdd] op = tok == tkSub ? tkNegate : tkAdd gettok() node = expr(Tokens[tkNegate][TK_PRECEDENCE]) x = (op == tkNegate) ? makenode(ndNegate, node) : node elseif tok == tkNot gettok() x = makenode(ndNot, expr(Tokens[tkNot][TK_PRECEDENCE])) elseif tok == tkIdent x = makeleaf(ndIdent, tokother) gettok() elseif tok == tkInteger x = makeleaf(ndInteger, tokother) gettok() else error("Expecting a primary, found: $(Tokens[tok][TK_NAME])") end while Tokens[tok][TK_IS_BINARY] && (Tokens[tok][TK_PRECEDENCE] >= p) op = tok gettok() q = Tokens[op][TK_PRECEDENCE] if !Tokens[op][TK_RIGHT_ASSOC] q += 1 end node = expr(q) x = makenode(Tokens[op][TK_NODE], x, node) end x end
parenexpr() = (expect("paren_expr", tkLparen); node = expr(0); expect("paren_expr", tkRparen); node)
function stmt() t = nilnode if tok == tkIf gettok() e = parenexpr() s = stmt() s2 = nilnode if tok == tkElse gettok() s2 = stmt() end t = makenode(ndIf, e, makenode(ndIf, s, s2)) elseif tok == tkPutc gettok() e = parenexpr() t = makenode(ndPrtc, e) expect("Putc", tkSemi) elseif tok == tkPrint gettok() expect("Print", tkLparen) while true if tok == tkString e = makenode(ndPrts, makeleaf(ndString, tokother)) gettok() else e = makenode(ndPrti, expr(0)) end t = makenode(ndSequence, t, e) if tok != tkComma break end gettok() end expect("Print", tkRparen) expect("Print", tkSemi) elseif tok == tkSemi gettok() elseif tok == tkIdent v = makeleaf(ndIdent, tokother) gettok() expect("assign", tkAssign) e = expr(0) t = makenode(ndAssign, v, e) expect("assign", tkSemi) elseif tok == tkWhile gettok() e = parenexpr() s = stmt() t = makenode(ndWhile, e, s) elseif tok == tkLbrace gettok() while (tok != tkRbrace) && (tok != tkEOI) t = makenode(ndSequence, t, stmt()) end expect("Lbrace", tkRbrace) elseif tok != tkEOI error("Expecting start of statement, found: $(Tokens[tok][TK_NAME])") end return t end
function parse() t = nilnode gettok() while true t = makenode(ndSequence, t, stmt()) if (tok == tkEOI) || (t == nilnode) break end end t end function prtASTnode(t) if t == nothing return elseif t == nilnode println(";") elseif t.nodetype in [ndIdent, ndInteger, ndString] println(rpad(displaynodes[t.nodetype], 14), t.value) else println(rpad(displaynodes[t.nodetype], 14)) end prtASTnode(t.left) prtASTnode(t.right) end
# runs the function prtASTnode(parse())
end
testtxt = """
1 1 Identifier count 1 7 Op_assign 1 9 Integer 1 1 10 Semicolon 2 1 Keyword_while 2 7 LeftParen 2 8 Identifier count 2 14 Op_less 2 16 Integer 10 2 18 RightParen 2 20 LeftBrace 3 5 Keyword_print 3 10 LeftParen 3 11 String \"count is: \" 3 23 Comma 3 25 Identifier count 3 30 Comma 3 32 String \"\\n\" 3 36 RightParen 3 37 Semicolon 4 5 Identifier count 4 11 Op_assign 4 13 Identifier count 4 19 Op_add 4 21 Integer 1 4 22 Semicolon 5 1 RightBrace 6 1 End_of_input """
syntaxanalyzer(IOBuffer(testtxt)) # for isolated testing
- syntaxanalyzer(length(ARGS) > 1 ? ARGS[1] : stdin) # for use as in the Python code
</lang>
M2000 Interpreter
This program written without functions. Subs use the current stack of values (a feature from interpreter) to return arrays. Subs run on same scope as the module or function which called. We use Local to make local variables (erased at return). Sub prt_ast() called first time without passing parameter, because parameter already exist in stack of values. Interpreter when call a module, a function, a subroutine always pass values to stack of values. Functions called in an expression, always have own stack of values. Modules call other modules passing the same stack of values. Threads are parts of modules, with same scope in module where belong, but have own stack and static variables, and they rub in time intervals.
A (1,2,3) is an auto array or tuple. We can assign a tuple in a variable, in a item in another tuple. A tuple is a reference type, but here we don't use a second pointer (we say references variables which references to other variables - reference or value type-, so we say pointer the reference who hold an object alive. We can read 2nd item (expected string) from alfa, a pointer to array, using array$(alfa,1) or alfa#val$(1). The second variation can be used multiple times if a tuple has another tulple so alfa#val(2)#val$(1) return a string from 3rd item, which expect a tuple from 2nd item. The other variation array$(array(alfa,2),1) for the same result.
<lang M2000 Interpreter>
Module syntax_analyzer(b$){
enum tokens {
Op_add, Op_subtract, Op_not=5, Op_multiply=10, Op_divide, Op_mod,
Op_negate, Op_less, Op_lessequal, Op_greater, Op_greaterequal,
Op_equal, Op_notequal, Op_and, Op_or, Op_assign=100, Keyword_if=110,
Keyword_else, Keyword_while, Keyword_print, Keyword_putc, LeftParen, RightParen,
LeftBrace, RightBrace, Semicolon, Comma, Identifier, Integer, String, End_of_input
}
Inventory precedence=Op_multiply:=13, Op_divide:=13, Op_mod:=13, Op_add:=12, Op_subtract:=12 Append precedence, Op_negate:=14, Op_not:=14, Op_less:=10, Op_lessequal:=10, Op_greater:=10 Append precedence, Op_greaterequal:=10, Op_equal:=9, Op_notequal:=9, Op_assign:=-1, Op_and:=5 Append precedence, Op_or:=4
Inventory symbols=Op_multiply:="Multiply", Op_divide:="Divide", Op_mod:="Mod", Op_add:="Add" Append symbols, Op_negate:="Negate", Op_not:="Not", Op_less:="Less", Op_subtract:="Subtract" Append symbols, Op_lessequal:="LessEqual", Op_greater:="Greater", Op_greaterequal:="GreaterEqual" Append symbols, Op_equal:="Equal", Op_notequal:="NotEqual", Op_and:="And", Op_or:="Or"
def lineNo, ColumnNo, m, line$, a, lim, cur=-1 const nl$=chr$(13)+chr$(10), Ansi=3 Dim lex$() lex$()=piece$(b$,chr$(13)+chr$(10)) lim=dimension(lex$(),1)-1 op=End_of_input flush k=0 Try { push (,) ' Null getone(&op) repeat stmt(&op) shift 2 ' swap two top items push ("Sequence", array, array) k++ until op=End_of_Input } er$=error$ if er$<>"" then print er$ : flush: break Print "Ast" Document Output$ prt_ast() clipboard Output$ Save.Doc Output$, "parse.t", Ansi document parse$ Load.Doc parse$,"parse.t", Ansi Report parse$
sub prt_ast(t) if len(t)<1 then Output$=";"+nl$ else.if len(t)=3 then Output$=t#val$(0) +nl$ prt_ast(t#val(1)) : prt_ast(t#val(2)) else Output$=t#val$(0) +nl$ end if end sub sub expr(p) ' only a number local x=(,), prev=op if op>=Identifier then x=(line$,) getone(&op) else.if op=LeftParen then paren_exp() x=array else.if op<10 then getone(&op) expr(precedence(int(Op_negate))) read local y if prev=Op_add then x=y else if prev=Op_subtract then prev=Op_negate x=(symbols(prev), y,(,)) End if else {error "??? "+eval$(op)} end if local prec while exist(precedence, int(op)) prev=op : prec=eval(precedence) if prec<14 and prec>=p else exit getone(&op) expr(prec+1) ' all operators are left associative (use prec for right a.) x=(symbols(int(prev)), x, array) End While Push x end sub sub paren_exp() expected(LeftParen) getone(&op) expr(0) expected(RightParen) getone(&op) end sub sub stmt(&op) local t=(,) if op=Identifier then t=(line$) getone(&op) expected(Op_assign) getone(&op) expr(0) read local rightnode Push ("Assign",t,rightnode) expected(Semicolon) getone(&op) else.if op=Semicolon then getone(&op) Push (";",) else.if op=Keyword_print then getone(&op) expected(LeftParen) repeat getone(&op) if op=String then Push ("Prts",(line$,),(,)) getone(&op) else expr(0) Push ("Prti", array,(,)) end if t=("Sequence", t, array) until op<>Comma expected(RightParen) getone(&op) expected(Semicolon) getone(&op) push t else.if op=Keyword_while then getone(&op) paren_exp() stmt(&op) shift 2 Push ("While",array, array) else.if op=Keyword_if then getone(&op) paren_exp() stmt(&op) local s2=(,) if op=Keyword_else then getone(&op) stmt(&op) read s2 end if shift 2 Push ("If",array ,("If",array,s2)) else.if op=Keyword_putc then getone(&op) paren_exp() Push ("Prtc",array,t) expected(Semicolon) getone(&op) else.if op=LeftBrace then Brace() else error "Unkown Op" end if end sub Sub Brace() getone(&op) while op<>RightBrace and op<>End_of_input stmt(&op) t=("Sequence", t, array) end while expected(RightBrace) getone(&op) push t End Sub Sub expected(what) if not op=what then {Error "Expected "+eval$(what)+str$(LineNo)+","+Str$(ColumnNo)} End Sub sub getone(&op) op=End_of_input while cur<lim cur++ line$=trim$(lex$(cur)) if line$<>"" then exit end while if cur=lim then exit sub LineNo=Val(line$,"int",m) line$=mid$(line$, m) ColumnNo=Val(line$,"int",m) line$=trim$(mid$(line$, m)) Rem : Print LineNo, ColumnNo m=instr(line$," ") if m>0 then op=Eval("."+leftpart$(line$, " ")) else op=Eval("."+line$) end sub }
syntax_analyzer {
1 1 LeftBrace 5 5 Identifier left_edge 5 17 Op_assign 5 19 Op_subtract 5 20 Integer 420 5 23 Semicolon 6 5 Identifier right_edge 6 17 Op_assign 6 20 Integer 300 6 23 Semicolon 7 5 Identifier top_edge 7 17 Op_assign 7 20 Integer 300 7 23 Semicolon 8 5 Identifier bottom_edge 8 17 Op_assign 8 19 Op_subtract 8 20 Integer 300 8 23 Semicolon 9 5 Identifier x_step 9 17 Op_assign 9 22 Integer 7 9 23 Semicolon 10 5 Identifier y_step 10 17 Op_assign 10 21 Integer 15 10 23 Semicolon 12 5 Identifier max_iter 12 17 Op_assign 12 20 Integer 200 12 23 Semicolon 14 5 Identifier y0 14 8 Op_assign 14 10 Identifier top_edge 14 18 Semicolon 15 5 Keyword_while 15 11 LeftParen 15 12 Identifier y0 15 15 Op_greater 15 17 Identifier bottom_edge 15 28 RightParen 15 30 LeftBrace 16 9 Identifier x0 16 12 Op_assign 16 14 Identifier left_edge 16 23 Semicolon 17 9 Keyword_while 17 15 LeftParen 17 16 Identifier x0 17 19 Op_less 17 21 Identifier right_edge 17 31 RightParen 17 33 LeftBrace 18 13 Identifier y 18 15 Op_assign 18 17 Integer 0 18 18 Semicolon 19 13 Identifier x 19 15 Op_assign 19 17 Integer 0 19 18 Semicolon 20 13 Identifier the_char 20 22 Op_assign 20 24 Integer 32 20 27 Semicolon 21 13 Identifier i 21 15 Op_assign 21 17 Integer 0 21 18 Semicolon 22 13 Keyword_while 22 19 LeftParen 22 20 Identifier i 22 22 Op_less 22 24 Identifier max_iter 22 32 RightParen 22 34 LeftBrace 23 17 Identifier x_x 23 21 Op_assign 23 23 LeftParen 23 24 Identifier x 23 26 Op_multiply 23 28 Identifier x 23 29 RightParen 23 31 Op_divide 23 33 Integer 200 23 36 Semicolon 24 17 Identifier y_y 24 21 Op_assign 24 23 LeftParen 24 24 Identifier y 24 26 Op_multiply 24 28 Identifier y 24 29 RightParen 24 31 Op_divide 24 33 Integer 200 24 36 Semicolon 25 17 Keyword_if 25 20 LeftParen 25 21 Identifier x_x 25 25 Op_add 25 27 Identifier y_y 25 31 Op_greater 25 33 Integer 800 25 37 RightParen 25 39 LeftBrace 26 21 Identifier the_char 26 30 Op_assign 26 32 Integer 48 26 36 Op_add 26 38 Identifier i 26 39 Semicolon 27 21 Keyword_if 27 24 LeftParen 27 25 Identifier i 27 27 Op_greater 27 29 Integer 9 27 30 RightParen 27 32 LeftBrace 28 25 Identifier the_char 28 34 Op_assign 28 36 Integer 64 28 39 Semicolon 29 21 RightBrace 30 21 Identifier i 30 23 Op_assign 30 25 Identifier max_iter 30 33 Semicolon 31 17 RightBrace 32 17 Identifier y 32 19 Op_assign 32 21 Identifier x 32 23 Op_multiply 32 25 Identifier y 32 27 Op_divide 32 29 Integer 100 32 33 Op_add 32 35 Identifier y0 32 37 Semicolon 33 17 Identifier x 33 19 Op_assign 33 21 Identifier x_x 33 25 Op_subtract 33 27 Identifier y_y 33 31 Op_add 33 33 Identifier x0 33 35 Semicolon 34 17 Identifier i 34 19 Op_assign 34 21 Identifier i 34 23 Op_add 34 25 Integer 1 34 26 Semicolon 35 13 RightBrace 36 13 Keyword_putc 36 17 LeftParen 36 18 Identifier the_char 36 26 RightParen 36 27 Semicolon 37 13 Identifier x0 37 16 Op_assign 37 18 Identifier x0 37 21 Op_add 37 23 Identifier x_step 37 29 Semicolon 38 9 RightBrace 39 9 Keyword_putc 39 13 LeftParen 39 14 Integer 10 39 18 RightParen 39 19 Semicolon 40 9 Identifier y0 40 12 Op_assign 40 14 Identifier y0 40 17 Op_subtract 40 19 Identifier y_step 40 25 Semicolon 41 5 RightBrace 42 1 RightBrace 43 1 End_of_Input
} </lang>
- Output:
Sequence ; Sequence Sequence Sequence Sequence Sequence Sequence Sequence Sequence Sequence ; Assign Identifier left_edge Negate Integer 420 ; Assign Identifier right_edge Integer 300 Assign Identifier top_edge Integer 300 Assign Identifier bottom_edge Negate Integer 300 ; Assign Identifier x_step Integer 7 Assign Identifier y_step Integer 15 Assign Identifier max_iter Integer 200 Assign Identifier y0 Identifier top_edge While Greater Identifier y0 Identifier bottom_edge Sequence Sequence Sequence Sequence ; Assign Identifier x0 Identifier left_edge While Less Identifier x0 Identifier right_edge Sequence Sequence Sequence Sequence Sequence Sequence Sequence ; Assign Identifier y Integer 0 Assign Identifier x Integer 0 Assign Identifier the_char Integer 32 Assign Identifier i Integer 0 While Less Identifier i Identifier max_iter Sequence Sequence Sequence Sequence Sequence Sequence ; Assign Identifier x_x Divide Multiply Identifier x Identifier x Integer 200 Assign Identifier y_y Divide Multiply Identifier y Identifier y Integer 200 If Greater Add Identifier x_x Identifier y_y Integer 800 if Sequence Sequence Sequence ; Assign Identifier the_char Add Integer 48 Identifier i If Greater Identifier i Integer 9 if Sequence ; Assign Identifier the_char Integer 64 ; Assign Identifier i Identifier max_iter ; Assign Identifier y Add Divide Multiply Identifier x Identifier y Integer 100 Identifier y0 Assign Identifier x Add Subtract Identifier x_x Identifier y_y Identifier x0 Assign Identifier i Add Identifier i Integer 1 Putc Identifier the_char ; Assign Identifier x0 Add Identifier x0 Identifier x_step Putc Integer 10 ; Assign Identifier y0 Subtract Identifier y0 Identifier y_step
Nim
Using the third version of Nim lexer.
<lang Nim>import ast_lexer
type NodeKind* = enum
nIdentifier = "Identifier" nString = "String" nInteger = "Integer" nSequence = "Sequence" nIf = "If" nPrtc = "Prtc" nPrts = "Prts" nPrti = "Prti" nWhile = "While" nAssign = "Assign" nNegate = "Negate" nNot = "Not" nMultiply = "Multiply" nDivide = "Divide" nMod = "Mod" nAdd = "Add" nSubtract = "Subtract" nLess = "Less" nLessEqual = "LessEqual" nGreater = "Greater" nGreaterEqual = "GreaterEqual" nEqual = "Equal" nNotEqual = "NotEqual" nAnd = "And" nOr = "Or"
type Node* = ref object
left*: Node right*: Node case kind*: NodeKind of nString: stringVal*: string of nInteger: intVal*: int of nIdentifier: name*: string else: nil
type Operator = range[tokMult..tokOr]
const
Precedences: array[Operator, int] = [13, # tokMult 13, # tokDiv 13, # tokMod 12, # tokAdd 12, # tokSub 10, # tokLess 10, # tokLessEq 10, # tokGreater 10, # tokGreaterEq 9, # tokEq 9, # tokNeq 14, # tokNot -1, # tokAssign 5, # tokAnd 4] # tokOr UnaryPrecedence = 14 BinaryOperators = {tokMult, tokDiv, tokMod, tokAdd, tokSub, tokLess, tokLessEq, tokGreater, tokGreaterEq, tokEq, tokNotEq, tokAnd, tokOr}
# Mapping of operators from TokenKind to NodeKind. NodeKinds: array[Operator, NodeKind] = [nMultiply, nDivide, nMod, nAdd, nSubtract, nLess, nLessEqual, nGreater, nGreaterEqual, nEqual, nNotEqual, nNot, nAssign, nAnd, nOr]
type SyntaxError* = object of CatchableError
template expect(token: Token; expected: TokenKind; errmsg: string) =
## Check if a token is of the expected kind. ## Raise a SyntaxError if this is not the case. if token.kind != expected: raise newException(SyntaxError, "line " & $token.ln & ": " & errmsg) token = lexer.next()
- ---------------------------------------------------------------------------------------------------
proc newNode*(kind: NodeKind; left: Node; right: Node = nil): Node =
## Create a new node with given left and right children. result = Node(kind: kind, left: left, right: right)
- ---------------------------------------------------------------------------------------------------
- Forward reference.
proc parExpr(lexer: var Lexer; token: var Token): Node
- ---------------------------------------------------------------------------------------------------
proc expr(lexer: var Lexer; token: var Token; p: int): Node =
## Parse an expression.
case token.kind
of tokLPar: result = parExpr(lexer, token)
of tokAdd, tokSub, tokNot: # Unary operators. let savedToken = token token = lexer.next() let e = expr(lexer, token, UnaryPrecedence) if savedToken.kind == tokAdd: result = e else: result = newNode(if savedToken.kind == tokSub: nNegate else: nNot, e)
of tokIdent: result = Node(kind: nIdentifier, name: token.ident) token = lexer.next()
of tokInt: result = Node(kind:nInteger, intVal: token.intVal) token = lexer.next()
of tokChar: result = Node(kind:nInteger, intVal: ord(token.charVal)) token = lexer.next()
else: raise newException(SyntaxError, "Unexpected symbol at line " & $token.ln)
# Process the binary operators in the expression. while token.kind in BinaryOperators and Precedences[token.kind] >= p: let savedToken = token token = lexer.next() let q = Precedences[savedToken.kind] + 1 # No operator is right associative. result = newNode(NodeKinds[savedToken.kind], result, expr(lexer, token, q))
- ---------------------------------------------------------------------------------------------------
proc parExpr(lexer: var Lexer; token: var Token): Node =
## Parse a parenthetized expression. token.expect(tokLPar, "'(' expected") result = expr(lexer, token, 0) token.expect(tokRPar, "')' expected")
- ---------------------------------------------------------------------------------------------------
proc stmt(lexer: var Lexer; token: var Token): Node =
## Parse a statement.
case token.kind:
of tokIf: token = lexer.next() let e = parExpr(lexer, token) let thenNode = stmt(lexer, token) var elseNode: Node = nil if token.kind == tokElse: token = lexer.next() elseNode = stmt(lexer, token) result = newNode(nIf, e, newNode(nIf, thenNode, elseNode))
of tokPutc: token = lexer.next() result = newNode(nPrtc, parExpr(lexer, token)) token.expect(tokSemi, "';' expected")
of tokPrint: token = lexer.next() token.expect(tokLPar, "'(' expected") while true: var e: Node if token.kind == tokString: e = newNode(nPrts, Node(kind: nString, stringVal: token.stringVal)) token = lexer.next() else: e = newNode(nPrti, expr(lexer, token, 0)) result = newNode(nSequence, result, e) if token.kind == tokComma: token = lexer.next() else: break token.expect(tokRPar, "')' expected") token.expect(tokSemi, "';' expected")
of tokSemi: token = lexer.next()
of tokIdent: let v = Node(kind: nIdentifier, name: token.ident) token = lexer.next() token.expect(tokAssign, "'=' expected") result = newNode(nAssign, v, expr(lexer, token, 0)) token.expect(tokSemi, "';' expected")
of tokWhile: token = lexer.next() let e = parExpr(lexer, token) result = newNode(nWhile, e, stmt(lexer, token))
of tokLBrace: token = lexer.next() while token.kind notin {tokRBrace, tokEnd}: result = newNode(nSequence, result, stmt(lexer, token)) token.expect(tokRBrace, "'}' expected")
of tokEnd: discard
else: raise newException(SyntaxError, "Unexpected symbol at line " & $token.ln)
- ---------------------------------------------------------------------------------------------------
proc parse*(code: string): Node =
## Parse the code provided.
var lexer = initLexer(code) var token = lexer.next() while true: result = newNode(nSequence, result, stmt(lexer, token)) if token.kind == tokEnd: break
- ———————————————————————————————————————————————————————————————————————————————————————————————————
when isMainModule:
import os, strformat, strutils
proc printAst(node: Node) = ## Print tha AST in linear form.
if node.isNil: echo ';'
else: stdout.write &"{$node.kind:<14}" case node.kind of nIdentifier: echo node.name of nInteger: echo node.intVal of nString: # Need to escape and to replace hexadecimal \x0A by \n. echo escape(node.stringVal).replace("\\x0A", "\\n") else: echo "" node.left.printAst() node.right.printAst()
let code = if paramCount() < 1: stdin.readAll() else: paramStr(1).readFile() let tree = parse(code) tree.printAst()</lang>
- Output:
Prime number program AST.
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Perl
Tested on perl v5.26.1 <lang Perl>#!/usr/bin/perl
use strict; # parse.pl - inputs lex, outputs flattened ast use warnings; # http://www.rosettacode.org/wiki/Compiler/syntax_analyzer
my $h = qr/\G\s*\d+\s+\d+\s+/; # header of each line
sub error { die "*** Expected @_ at " . (/\G(.*\n)/ ?
$1 =~ s/^\s*(\d+)\s+(\d+)\s+/line $1 character $2 got /r : "EOF\n") }
sub want { /$h \Q$_[1]\E.*\n/gcx ? shift : error "'$_[1]'" }
local $_ = join , <>; print want stmtlist(), 'End_of_input';
sub stmtlist
{ /(?=$h (RightBrace|End_of_input))/gcx and return ";\n"; my ($stmt, $stmtlist) = (stmt(), stmtlist()); $stmtlist eq ";\n" ? $stmt : "Sequence\n$stmt$stmtlist"; }
sub stmt
{ /$h Semicolon\n/gcx ? ";\n" : /$h Identifier \s+ (\w+) \n/gcx ? want("Assign\nIdentifier\t$1\n", 'Op_assign') . want expr(0), 'Semicolon' : /$h Keyword_while \n/gcx ? "While\n" . parenexp() . stmt() : /$h Keyword_if \n/gcx ? "If\n" . parenexp() . "If\n" . stmt() . (/$h Keyword_else \n/gcx ? stmt() : ";\n") : /$h Keyword_print \n/gcx ? want(, 'LeftParen') . want want(printlist(), 'RightParen'), 'Semicolon' : /$h Keyword_putc \n/gcx ? want "Prtc\n" . parenexp() . ";\n", 'Semicolon' : /$h LeftBrace \n/gcx ? want stmtlist(), 'RightBrace' : error 'A STMT'; }
sub parenexp { want(, 'LeftParen') . want expr(0), 'RightParen' } # (expr)
sub printlist
{ my $ast = /$h String \s+ (".*") \n/gcx ? "Prts\nString\t\t$1\n;\n" : "Prti\n" . expr(0) . ";\n"; /$h Comma \n/gcx ? "Sequence\n$ast" . printlist() : $ast; }
sub expr # (sort of EBNF) expr = operand { operator expr }
{ my $ast = # operand /$h Integer \s+ (\d+) \n/gcx ? "Integer\t\t$1\n" : /$h Identifier \s+ (\w+) \n/gcx ? "Identifier\t$1\n" : /$h LeftParen \n/gcx ? want expr(0), 'RightParen' : /$h Op_(negate|subtract) \n/gcx ? "Negate\n" . expr(8) . ";\n" : /$h Op_not \n/gcx ? "Not\n" . expr(8) . ";\n" : /$h Op_add \n/gcx ? expr(8) : error "A PRIMARY"; $ast = # { operator expr } $_[0] <= 7 && /$h Op_multiply \n/gcx ? "Multiply\n$ast" . expr(8) : $_[0] <= 7 && /$h Op_divide \n/gcx ? "Divide\n$ast" . expr(8) : $_[0] <= 7 && /$h Op_mod \n/gcx ? "Mod\n$ast" . expr(8) : $_[0] <= 6 && /$h Op_add \n/gcx ? "Add\n$ast" . expr(7) : $_[0] <= 6 && /$h Op_subtract \n/gcx ? "Subtract\n$ast" . expr(7) : $_[0] == 5 && /(?=$h Op_(less|greater)(equal)? \n)/gcx ? error 'NO ASSOC' : $_[0] <= 5 && /$h Op_lessequal \n/gcx ? "LessEqual\n$ast" . expr(5) : $_[0] <= 5 && /$h Op_less \n/gcx ? "Less\n$ast" . expr(5) : $_[0] <= 5 && /$h Op_greater \n/gcx ? "Greater\n$ast" . expr(5) : $_[0] <= 5 && /$h Op_greaterequal \n/gcx ? "GreaterEqual\n$ast" . expr(5) : $_[0] == 3 && /(?=$h Op_(not)?equal \n)/gcx ? error 'NO ASSOC' : $_[0] <= 3 && /$h Op_equal \n/gcx ? "Equal\n$ast" . expr(3) : $_[0] <= 3 && /$h Op_notequal \n/gcx ? "NotEqual\n$ast" . expr(3) : $_[0] <= 1 && /$h Op_and \n/gcx ? "And\n$ast" . expr(2) : $_[0] <= 0 && /$h Op_or \n/gcx ? "Or\n$ast" . expr(1) : return $ast while 1; }</lang>
- Output — Count AST:
Sequence Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence Prts String "count is: " ; Sequence Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1
Phix
Reusing lex.e (and core.e) from the Lexical Analyzer task, and again written as a reusable module.
-- -- demo\rosetta\Compiler\parse.e -- ============================= -- -- The reusable part of parse.exw -- with javascript_semantics include lex.e sequence tok procedure errd(sequence msg, sequence args={}) {tok_line,tok_col} = tok error(msg,args) end procedure global sequence toks integer next_tok = 1 function get_tok() sequence tok = toks[next_tok] next_tok += 1 return tok end function procedure expect(string msg, integer s) integer tk = tok[3] if tk!=s then errd("%s: Expecting '%s', found '%s'\n", {msg, tkNames[s], tkNames[tk]}) end if tok = get_tok() end procedure function expr(integer p) object x = NULL, node integer op = tok[3] switch op do case tk_LeftParen: tok = get_tok() x = expr(0) expect("expr",tk_RightParen) case tk_sub: case tk_add: tok = get_tok() node = expr(precedences[tk_neg]); x = iff(op==tk_sub?{tk_neg, node, NULL}:node) case tk_not: tok = get_tok(); x = {tk_not, expr(precedences[tk_not]), NULL} case tk_Identifier: x = {tk_Identifier, tok[4]} tok = get_tok(); case tk_Integer: x = {tk_Integer, tok[4]} tok = get_tok(); default: errd("Expecting a primary, found: %s\n", tkNames[op]) end switch op = tok[3] while narys[op]=BINARY and precedences[op]>=p do tok = get_tok() x = {op, x, expr(precedences[op]+1)} op = tok[3] end while return x; end function function paren_expr(string msg) expect(msg, tk_LeftParen); object t = expr(0) expect(msg, tk_RightParen); return t end function function stmt() object t = NULL, e, s switch tok[3] do case tk_if: tok = get_tok(); object condition = paren_expr("If-cond"); object ifblock = stmt(); object elseblock = NULL; if tok[3] == tk_else then tok = get_tok(); elseblock = stmt(); end if t = {tk_if, condition, {tk_if, ifblock, elseblock}} case tk_putc: tok = get_tok(); e = paren_expr("Prtc") t = {tk_putc, e, NULL} expect("Putc", tk_Semicolon); case tk_print: tok = get_tok(); expect("Print",tk_LeftParen) while 1 do if tok[3] == tk_String then e = {tk_Prints, {tk_String, tok[4]}, NULL} tok = get_tok(); else e = {tk_Printi, expr(0), NULL} end if t = {tk_Sequence, t, e} if tok[3]!=tk_Comma then exit end if expect("Print", tk_Comma) end while expect("Print", tk_RightParen); expect("Print", tk_Semicolon); case tk_Semicolon: tok = get_tok(); case tk_Identifier: object v v = {tk_Identifier, tok[4]} tok = get_tok(); expect("assign", tk_assign); e = expr(0); t = {tk_assign, v, e} expect("assign", tk_Semicolon); case tk_while: tok = get_tok(); e = paren_expr("while"); s = stmt(); t = {tk_while, e, s} case tk_LeftBrace: /* {stmt} */ expect("LeftBrace", tk_LeftBrace) while not find(tok[3],{tk_RightBrace,tk_EOI}) do t = {tk_Sequence, t, stmt()} end while expect("LeftBrace", tk_RightBrace); break; case tk_EOI: break; default: errd("expecting start of statement, found '%s'\n", tkNames[tok[3]]); end switch return t end function global function parse() object t = NULL tok = get_tok() while 1 do object s = stmt() if s=NULL then exit end if t = {tk_Sequence, t, s} end while return t end function
And a simple test driver for the specific task:
-- -- demo\rosetta\Compiler\parse.exw -- =============================== -- with javascript_semantics include parse.e procedure print_ast(object t) if t == NULL then printf(output_file,";\n") else integer ttype = t[1] printf(output_file,tkNames[ttype]) if ttype=tk_Identifier then printf(output_file," %s\n",t[2]) elsif ttype=tk_Integer then printf(output_file," %d\n",t[2]) elsif ttype=tk_String then printf(output_file," %s\n",enquote(t[2])) else printf(output_file,"\n") print_ast(t[2]) print_ast(t[3]) end if end if end procedure function ptree(object t) if sequence(t) then integer t1 = t[1] t = deep_copy(t) t[1] = tkNames[t1] if not find(t1,{tk_Identifier,tk_String}) then for i=2 to length(t) do if t1=tk_Sequence and t[i]=NULL then t[i] = "NULL" else t[i] = ptree(t[i]) end if end for end if end if return t end function procedure main(sequence cl) open_files(cl) toks = lex() object t = parse() print_ast(t) pp(ptree(t),{pp_Nest,10,pp_Pause,0,pp_IntCh,false}) close_files() end procedure --main(command_line()) main({0,0,"test3.c"}) -- not parseable! --main({0,0,"primes.c"}) -- as Algol, C, Python (apart from spacing) --main({0,0,"count.c"}) -- as AWK ( "" )
- Output:
Line 5 column 40: Print: Expecting 'LeftParen', found 'Op_subtract'
Python
Tested with Python 2.7 and 3.x <lang Python>from __future__ import print_function import sys, shlex, operator
tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr, \ tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print, \ tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident, \ tk_Integer, tk_String = range(31)
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, \ nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, \ nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or = range(25)
- must have same order as above
Tokens = [
["EOI" , False, False, False, -1, -1 ], ["*" , False, True, False, 13, nd_Mul ], ["/" , False, True, False, 13, nd_Div ], ["%" , False, True, False, 13, nd_Mod ], ["+" , False, True, False, 12, nd_Add ], ["-" , False, True, False, 12, nd_Sub ], ["-" , False, False, True, 14, nd_Negate ], ["!" , False, False, True, 14, nd_Not ], ["<" , False, True, False, 10, nd_Lss ], ["<=" , False, True, False, 10, nd_Leq ], [">" , False, True, False, 10, nd_Gtr ], [">=" , False, True, False, 10, nd_Geq ], ["==" , False, True, False, 9, nd_Eql ], ["!=" , False, True, False, 9, nd_Neq ], ["=" , False, False, False, -1, nd_Assign ], ["&&" , False, True, False, 5, nd_And ], ["||" , False, True, False, 4, nd_Or ], ["if" , False, False, False, -1, nd_If ], ["else" , False, False, False, -1, -1 ], ["while" , False, False, False, -1, nd_While ], ["print" , False, False, False, -1, -1 ], ["putc" , False, False, False, -1, -1 ], ["(" , False, False, False, -1, -1 ], [")" , False, False, False, -1, -1 ], ["{" , False, False, False, -1, -1 ], ["}" , False, False, False, -1, -1 ], [";" , False, False, False, -1, -1 ], ["," , False, False, False, -1, -1 ], ["Ident" , False, False, False, -1, nd_Ident ], ["Integer literal" , False, False, False, -1, nd_Integer], ["String literal" , False, False, False, -1, nd_String ] ]
all_syms = {"End_of_input" : tk_EOI, "Op_multiply" : tk_Mul,
"Op_divide" : tk_Div, "Op_mod" : tk_Mod, "Op_add" : tk_Add, "Op_subtract" : tk_Sub, "Op_negate" : tk_Negate, "Op_not" : tk_Not, "Op_less" : tk_Lss, "Op_lessequal" : tk_Leq, "Op_greater" : tk_Gtr, "Op_greaterequal": tk_Geq, "Op_equal" : tk_Eql, "Op_notequal" : tk_Neq, "Op_assign" : tk_Assign, "Op_and" : tk_And, "Op_or" : tk_Or, "Keyword_if" : tk_If, "Keyword_else" : tk_Else, "Keyword_while" : tk_While, "Keyword_print" : tk_Print, "Keyword_putc" : tk_Putc, "LeftParen" : tk_Lparen, "RightParen" : tk_Rparen, "LeftBrace" : tk_Lbrace, "RightBrace" : tk_Rbrace, "Semicolon" : tk_Semi, "Comma" : tk_Comma, "Identifier" : tk_Ident, "Integer" : tk_Integer, "String" : tk_String}
Display_nodes = ["Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts",
"Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual", "And", "Or"]
TK_NAME = 0 TK_RIGHT_ASSOC = 1 TK_IS_BINARY = 2 TK_IS_UNARY = 3 TK_PRECEDENCE = 4 TK_NODE = 5
input_file = None err_line = None err_col = None tok = None tok_text = None
- show error and exit
def error(msg):
print("(%d, %d) %s" % (int(err_line), int(err_col), msg)) exit(1)
def gettok():
global err_line, err_col, tok, tok_text, tok_other line = input_file.readline() if len(line) == 0: error("empty line")
line_list = shlex.split(line, False, False) # line col Ident var_name # 0 1 2 3
err_line = line_list[0] err_col = line_list[1] tok_text = line_list[2]
tok = all_syms.get(tok_text) if tok == None: error("Unknown token %s" % (tok_text))
tok_other = None if tok in [tk_Integer, tk_Ident, tk_String]: tok_other = line_list[3]
class Node:
def __init__(self, node_type, left = None, right = None, value = None): self.node_type = node_type self.left = left self.right = right self.value = value
def make_node(oper, left, right = None):
return Node(oper, left, right)
def make_leaf(oper, n):
return Node(oper, value = n)
def expect(msg, s):
if tok == s: gettok() return error("%s: Expecting '%s', found '%s'" % (msg, Tokens[s][TK_NAME], Tokens[tok][TK_NAME]))
def expr(p):
x = None
if tok == tk_Lparen: x = paren_expr() elif tok in [tk_Sub, tk_Add]: op = (tk_Negate if tok == tk_Sub else tk_Add) gettok() node = expr(Tokens[tk_Negate][TK_PRECEDENCE]) x = (make_node(nd_Negate, node) if op == tk_Negate else node) elif tok == tk_Not: gettok() x = make_node(nd_Not, expr(Tokens[tk_Not][TK_PRECEDENCE])) elif tok == tk_Ident: x = make_leaf(nd_Ident, tok_other) gettok() elif tok == tk_Integer: x = make_leaf(nd_Integer, tok_other) gettok() else: error("Expecting a primary, found: %s" % (Tokens[tok][TK_NAME]))
while Tokens[tok][TK_IS_BINARY] and Tokens[tok][TK_PRECEDENCE] >= p: op = tok gettok() q = Tokens[op][TK_PRECEDENCE] if not Tokens[op][TK_RIGHT_ASSOC]: q += 1
node = expr(q) x = make_node(Tokens[op][TK_NODE], x, node)
return x
def paren_expr():
expect("paren_expr", tk_Lparen) node = expr(0) expect("paren_expr", tk_Rparen) return node
def stmt():
t = None
if tok == tk_If: gettok() e = paren_expr() s = stmt() s2 = None if tok == tk_Else: gettok() s2 = stmt() t = make_node(nd_If, e, make_node(nd_If, s, s2)) elif tok == tk_Putc: gettok() e = paren_expr() t = make_node(nd_Prtc, e) expect("Putc", tk_Semi) elif tok == tk_Print: gettok() expect("Print", tk_Lparen) while True: if tok == tk_String: e = make_node(nd_Prts, make_leaf(nd_String, tok_other)) gettok() else: e = make_node(nd_Prti, expr(0))
t = make_node(nd_Sequence, t, e) if tok != tk_Comma: break gettok() expect("Print", tk_Rparen) expect("Print", tk_Semi) elif tok == tk_Semi: gettok() elif tok == tk_Ident: v = make_leaf(nd_Ident, tok_other) gettok() expect("assign", tk_Assign) e = expr(0) t = make_node(nd_Assign, v, e) expect("assign", tk_Semi) elif tok == tk_While: gettok() e = paren_expr() s = stmt() t = make_node(nd_While, e, s) elif tok == tk_Lbrace: gettok() while tok != tk_Rbrace and tok != tk_EOI: t = make_node(nd_Sequence, t, stmt()) expect("Lbrace", tk_Rbrace) elif tok == tk_EOI: pass else: error("Expecting start of statement, found: %s" % (Tokens[tok][TK_NAME]))
return t
def parse():
t = None gettok() while True: t = make_node(nd_Sequence, t, stmt()) if tok == tk_EOI or t == None: break return t
def prt_ast(t):
if t == None: print(";") else: print("%-14s" % (Display_nodes[t.node_type]), end=) if t.node_type in [nd_Ident, nd_Integer]: print("%s" % (t.value)) elif t.node_type == nd_String: print("%s" %(t.value)) else: print("") prt_ast(t.left) prt_ast(t.right)
- main driver
input_file = sys.stdin if len(sys.argv) > 1:
try: input_file = open(sys.argv[1], "r", 4096) except IOError as e: error(0, 0, "Can't open %s" % sys.argv[1])
t = parse() prt_ast(t)</lang>
- Output — prime numbers AST:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Scala
The complete implementation for the compiler tasks can be found in a GitHub repository at github.com/edadma/rosettacodeCompiler which includes full unit testing for the samples given in Compiler/Sample programs.
The following code implements a configurable (from a symbol map provided as a parameter) Precedence Climbing parser for the output of the lexer. The recursive descent language parser is closely based on the pseudo code given in the task description.
<lang scala> package xyz.hyperreal.rosettacodeCompiler
import scala.io.Source
object SyntaxAnalyzer {
val symbols = Map[String, (PrefixOperator, InfixOperator)]( "Op_or" -> (null, InfixOperator(10, LeftAssoc, BranchNode("Or", _, _))), "Op_and" -> (null, InfixOperator(20, LeftAssoc, BranchNode("And", _, _))), "Op_equal" -> (null, InfixOperator(30, LeftAssoc, BranchNode("Equal", _, _))), "Op_notequal" -> (null, InfixOperator(30, LeftAssoc, BranchNode("NotEqual", _, _))), "Op_less" -> (null, InfixOperator(40, LeftAssoc, BranchNode("Less", _, _))), "Op_lessequal" -> (null, InfixOperator(40, LeftAssoc, BranchNode("LessEqual", _, _))), "Op_greater" -> (null, InfixOperator(40, LeftAssoc, BranchNode("Greater", _, _))), "Op_greaterequal" -> (null, InfixOperator(40, LeftAssoc, BranchNode("GreaterEqual", _, _))), "Op_add" -> (PrefixOperator(30, identity), InfixOperator(50, LeftAssoc, BranchNode("Add", _, _))), "Op_minus" -> (PrefixOperator(70, BranchNode("Negate", _, TerminalNode)), InfixOperator( 50, LeftAssoc, BranchNode("Subtract", _, _))), "Op_multiply" -> (null, InfixOperator(60, LeftAssoc, BranchNode("Multiply", _, _))), "Op_divide" -> (null, InfixOperator(60, LeftAssoc, BranchNode("Divide", _, _))), "Op_mod" -> (null, InfixOperator(60, RightAssoc, BranchNode("Mod", _, _))), "Op_not" -> (PrefixOperator(70, BranchNode("Not", _)), null), "LeftParen" -> null, "RightParen" -> null )
def apply = new SyntaxAnalyzer(symbols)
abstract class Node case class LeafNode(name: String, value: String) extends Node case class BranchNode(name: String, left: Node, right: Node = TerminalNode) extends Node case object TerminalNode extends Node
abstract class Assoc case object LeftAssoc extends Assoc case object RightAssoc extends Assoc
abstract class Operator case class InfixOperator(prec: Int, assoc: Assoc, compute: (Node, Node) => Node) extends Operator case class PrefixOperator(prec: Int, compute: Node => Node) extends Operator
}
class SyntaxAnalyzer(symbols: Map[String, (SyntaxAnalyzer.PrefixOperator, SyntaxAnalyzer.InfixOperator)]) {
import SyntaxAnalyzer.{BranchNode, InfixOperator, LeafNode, LeftAssoc, Node, PrefixOperator, TerminalNode}
def fromStdin = fromSource(Source.stdin)
def fromString(src: String) = fromSource(Source.fromString(src))
def fromSource(s: Source) = { val tokens = ((s.getLines map (_.trim.split(" +", 4)) map { case Array(line, col, name) => symbols get name match { case None | Some(null) => SimpleToken(line.toInt, col.toInt, name) case Some(operators) => OperatorToken(line.toInt, col.toInt, name, operators) } case Array(line, col, name, value) => ValueToken(line.toInt, col.toInt, name, value) }) toStream)
flatten(parse(tokens)) }
def flatten(n: Node): Unit = n match { case TerminalNode => println(";") case LeafNode(name, value) => println(s"$name $value") case BranchNode(name, left, right) => println(name) flatten(left) flatten(right) }
def parse(toks: Stream[Token]) = { var cur = toks
def next = cur = cur.tail
def token = cur.head
def consume = { val res = token
next res }
def accept(name: String) = if (token.name == name) { next true } else false
def expect(name: String, error: String = null) = if (token.name != name) sys.error(if (error eq null) s"expected $name, found ${token.name}" else s"$error: $token") else next
def expression(minPrec: Int): Node = { def infixOperator = token.asInstanceOf[OperatorToken].operators._2
def isInfix = token.isInstanceOf[OperatorToken] && infixOperator != null
var result = consume match { case SimpleToken(_, _, "LeftParen") => val result = expression(0)
expect("RightParen", "expected closing parenthesis") result case ValueToken(_, _, name, value) => LeafNode(name, value) case OperatorToken(_, _, _, (prefix, _)) if prefix ne null => prefix.compute(expression(prefix.prec)) case OperatorToken(_, _, _, (_, infix)) if infix ne null => sys.error(s"expected a primitive expression, not an infix operator: $token") }
while (isInfix && infixOperator.prec >= minPrec) { val InfixOperator(prec, assoc, compute) = infixOperator val nextMinPrec = if (assoc == LeftAssoc) prec + 1 else prec
next result = compute(result, expression(nextMinPrec)) }
result }
def parenExpression = { expect("LeftParen")
val e = expression(0)
expect("RightParen") e }
def statement: Node = { var stmt: Node = TerminalNode
if (accept("Keyword_if")) stmt = BranchNode("If", parenExpression, BranchNode("If", statement, if (accept("Keyword_else")) statement else TerminalNode)) else if (accept("Keyword_putc")) { stmt = BranchNode("Prtc", parenExpression) expect("Semicolon") } else if (accept("Keyword_print")) { expect("LeftParen")
do { val e = if (token.name == "String") BranchNode("Prts", LeafNode("String", consume.asInstanceOf[ValueToken].value)) else BranchNode("Prti", expression(0))
stmt = BranchNode("Sequence", stmt, e) } while (accept("Comma"))
expect("RightParen") expect("Semicolon") } else if (token.name == "Semicolon") next else if (token.name == "Identifier") { val ident = LeafNode("Identifier", consume.asInstanceOf[ValueToken].value)
expect("Op_assign") stmt = BranchNode("Assign", ident, expression(0)) expect("Semicolon") } else if (accept("Keyword_while")) stmt = BranchNode("While", parenExpression, statement) else if (accept("LeftBrace")) { while (token.name != "RightBrace" && token.name != "End_of_input") { stmt = BranchNode("Sequence", stmt, statement) }
expect("RightBrace") } else sys.error(s"syntax error: $token")
stmt }
var tree: Node = TerminalNode
do { tree = BranchNode("Sequence", tree, statement) } while (token.name != "End_of_input")
expect("End_of_input") tree }
abstract class Token { val line: Int; val col: Int; val name: String }
case class SimpleToken(line: Int, col: Int, name: String) extends Token case class ValueToken(line: Int, col: Int, name: String, value: String) extends Token case class OperatorToken(line: Int, col: Int, name: String, operators: (PrefixOperator, InfixOperator)) extends Token
} </lang>
Scheme
Code implements a recursive descent parser based on the given grammar. Tested against all programs in Compiler/Sample programs.
<lang scheme> (import (scheme base)
(scheme process-context) (scheme write))
(define *names* (list (cons 'Op_add 'Add)
(cons 'Op_subtract 'Subtract) (cons 'Op_multiply 'Multiply) (cons 'Op_divide 'Divide) (cons 'Op_mod 'Mod) (cons 'Op_not 'Not) (cons 'Op_equal 'Equal) (cons 'Op_notequal 'NotEqual) (cons 'Op_or 'Or) (cons 'Op_and 'And) (cons 'Op_less 'Less) (cons 'Op_lessequal 'LessEqual) (cons 'Op_greater 'Greater) (cons 'Op_greaterequal 'GreaterEqual)))
(define (retrieve-name type)
(let ((res (assq type *names*))) (if res (cdr res) (error "Unknown type name"))))
- takes a vector of tokens
(define (parse tokens) ; read statements, until hit end of tokens
(define posn 0) (define (peek-token) (vector-ref tokens posn)) (define (get-token) (set! posn (+ 1 posn)) (vector-ref tokens (- posn 1))) (define (match type) (if (eq? (car (vector-ref tokens posn)) type) (set! posn (+ 1 posn)) (error "Could not match token type" type))) ; make it easier to read token parts (define type car) (define value cadr) ; ;; left associative read of one or more items with given separators (define (read-one-or-more reader separators) (let loop ((lft (reader))) (let ((next (peek-token))) (if (memq (type next) separators) (begin (match (type next)) (loop (list (retrieve-name (type next)) lft (reader)))) lft)))) ; ;; read one or two items with given separator (define (read-one-or-two reader separators) (let* ((lft (reader)) (next (peek-token))) (if (memq (type next) separators) (begin (match (type next)) (list (retrieve-name (type next)) lft (reader))) lft))) ; (define (read-primary) (let ((next (get-token))) (case (type next) ((Identifier Integer) next) ((LeftParen) (let ((v (read-expr))) (match 'RightParen) v)) ((Op_add) ; + sign is ignored (read-primary)) ((Op_not) (list 'Not (read-primary) '())) ((Op_subtract) (list 'Negate (read-primary) '())) (else (error "Unknown primary type"))))) ; (define (read-multiplication-expr) ; * (read-one-or-more read-primary '(Op_multiply Op_divide Op_mod))) ; (define (read-addition-expr) ; * (read-one-or-more read-multiplication-expr '(Op_add Op_subtract))) ; (define (read-relational-expr) ; ? (read-one-or-two read-addition-expr '(Op_less Op_lessequal Op_greater Op_greaterequal))) ; (define (read-equality-expr) ; ? (read-one-or-two read-relational-expr '(Op_equal Op_notequal))) ; (define (read-and-expr) ; * (read-one-or-more read-equality-expr '(Op_and))) ; (define (read-expr) ; * (read-one-or-more read-and-expr '(Op_or))) ; (define (read-prt-list) (define (read-print-part) (if (eq? (type (peek-token)) 'String) (list 'Prts (get-token) '()) (list 'Prti (read-expr) '()))) ; (do ((tok (read-print-part) (read-print-part)) (rec '() (list 'Sequence rec tok))) ((not (eq? (type (peek-token)) 'Comma)) (list 'Sequence rec tok)) (match 'Comma))) ; (define (read-paren-expr) (match 'LeftParen) (let ((v (read-expr))) (match 'RightParen) v)) ; (define (read-stmt) (case (type (peek-token)) ((SemiColon) '()) ((Identifier) (let ((id (get-token))) (match 'Op_assign) (let ((ex (read-expr))) (match 'Semicolon) (list 'Assign id ex)))) ((Keyword_while) (match 'Keyword_while) (let* ((expr (read-paren-expr)) (stmt (read-stmt))) (list 'While expr stmt))) ((Keyword_if) (match 'Keyword_if) (let* ((expr (read-paren-expr)) (then-part (read-stmt)) (else-part (if (eq? (type (peek-token)) 'Keyword_else) (begin (match 'Keyword_else) (read-stmt)) '()))) (list 'If expr (list 'If then-part else-part)))) ((Keyword_print) (match 'Keyword_print) (match 'LeftParen) (let ((v (read-prt-list))) (match 'RightParen) (match 'Semicolon) v)) ((Keyword_putc) (match 'Keyword_putc) (let ((v (read-paren-expr))) (match 'Semicolon) (list 'Putc v '()))) ((LeftBrace) (match 'LeftBrace) (let ((v (read-stmts))) (match 'RightBrace) v)) (else (error "Unknown token type for statement" (type (peek-token)))))) ; (define (read-stmts) (do ((sequence (list 'Sequence '() (read-stmt)) (list 'Sequence sequence (read-stmt)))) ((memq (type (peek-token)) '(End_of_input RightBrace)) sequence))) ; (let ((res (read-stmts))) (match 'End_of_input) res))
- reads tokens from file, parses and returns the AST
(define (parse-file filename)
(define (tokenise line) (let ((port (open-input-string line))) (read port) ; discard line (read port) ; discard col (let* ((type (read port)) ; read type as symbol (val (read port))) ; check for optional value (if (eof-object? val) (list type) (list type val))))) ; (with-input-from-file filename (lambda () (do ((line (read-line) (read-line)) (toks '() (cons (tokenise line) toks))) ((eof-object? line) (parse (list->vector (reverse toks))))))))
- Output the AST in flattened format
(define (display-ast ast)
(cond ((null? ast) (display ";\n")) ((= 2 (length ast)) (display (car ast)) (display #\tab) (write (cadr ast)) ; use write to preserve " " on String (newline)) (else (display (car ast)) (newline) (display-ast (cadr ast)) (display-ast (cadr (cdr ast))))))
- read from filename passed on command line
(if (= 2 (length (command-line)))
(display-ast (parse-file (cadr (command-line)))) (display "Error: provide program filename\n"))
</lang>
Wren
<lang ecmascript>import "/dynamic" for Enum, Struct, Tuple import "/fmt" for Fmt import "/ioutil" for FileUtil
var tokens = [
"EOI", "Mul", "Div", "Mod", "Add", "Sub", "Negate", "Not", "Lss", "Leq", "Gtr", "Geq", "Eql", "Neq", "Assign", "And", "Or", "If", "Else", "While", "Print", "Putc", "Lparen", "Rparen", "Lbrace", "Rbrace", "Semi", "Comma", "Ident", "Integer", "String"
]
var Token = Enum.create("Token", tokens)
var nodes = [
"Ident", "String", "Integer", "Sequence", "If", "Prtc", "Prts", "Prti", "While", "Assign", "Negate", "Not", "Mul", "Div", "Mod", "Add", "Sub", "Lss", "Leq", "Gtr", "Geq", "Eql", "Neq", "And", "Or"
]
var Node = Enum.create("Node", nodes)
// 'text' field represents ident ot string literal or integer value var TokS = Struct.create("TokS", ["tok", "errLn", "errCol", "text"])
var Tree = Struct.create("Tree", ["nodeType", "left", "right", "value"])
// dependency: Ordered by tok, must remain in same order as Token enum constants var Atr = Tuple.create("Atr", ["text", "enumText", "tok", "rightAssociative", "isBinary",
"isUnary", "precedence", "nodeType"])
var atrs = [
Atr.new("EOI", "End_of_input", Token.EOI, false, false, false, -1, -1), Atr.new("*", "Op_multiply", Token.Mul, false, true, false, 13, Node.Mul), Atr.new("/", "Op_divide", Token.Div, false, true, false, 13, Node.Div), Atr.new("\%", "Op_mod", Token.Mod, false, true, false, 13, Node.Mod), Atr.new("+", "Op_add", Token.Add, false, true, false, 12, Node.Add), Atr.new("-", "Op_subtract", Token.Sub, false, true, false, 12, Node.Sub), Atr.new("-", "Op_negate", Token.Negate, false, false, true, 14, Node.Negate), Atr.new("!", "Op_not", Token.Not, false, false, true, 14, Node.Not), Atr.new("<", "Op_less", Token.Lss, false, true, false, 10, Node.Lss), Atr.new("<=", "Op_lessequal", Token.Leq, false, true, false, 10, Node.Leq), Atr.new(">", "Op_greater", Token.Gtr, false, true, false, 10, Node.Gtr), Atr.new(">=", "Op_greaterequal", Token.Geq, false, true, false, 10, Node.Geq), Atr.new("==", "Op_equal", Token.Eql, false, true, false, 9, Node.Eql), Atr.new("!=", "Op_notequal", Token.Neq, false, true, false, 9, Node.Neq), Atr.new("=", "Op_assign", Token.Assign, false, false, false, -1, Node.Assign), Atr.new("&&", "Op_and", Token.And, false, true, false, 5, Node.And), Atr.new("||", "Op_or", Token.Or, false, true, false, 4, Node.Or), Atr.new("if", "Keyword_if", Token.If, false, false, false, -1, Node.If), Atr.new("else", "Keyword_else", Token.Else, false, false, false, -1, -1), Atr.new("while", "Keyword_while", Token.While, false, false, false, -1, Node.While), Atr.new("print", "Keyword_print", Token.Print, false, false, false, -1, -1), Atr.new("putc", "Keyword_putc", Token.Putc, false, false, false, -1, -1), Atr.new("(", "LeftParen", Token.Lparen, false, false, false, -1, -1), Atr.new(")", "RightParen", Token.Rparen, false, false, false, -1, -1), Atr.new("{", "LeftBrace", Token.Lbrace, false, false, false, -1, -1), Atr.new("}", "RightBrace", Token.Rbrace, false, false, false, -1, -1), Atr.new(";", "Semicolon", Token.Semi, false, false, false, -1, -1), Atr.new(",", "Comma", Token.Comma, false, false, false, -1, -1), Atr.new("Ident", "Identifier", Token.Ident, false, false, false, -1, Node.Ident), Atr.new("Integer literal", "Integer", Token.Integer, false, false, false, -1, Node.Integer), Atr.new("String literal", "String", Token.String, false, false, false, -1, Node.String),
]
var displayNodes = [
"Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts", "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal", "NotEqual", "And", "Or"
]
var token = TokS.new(0, 0, 0, "")
var reportError = Fn.new { |eline, ecol, msg| Fiber.abort("(%(eline):%(ecol)) error : %(msg)") }
// return internal version of name var getEnum = Fn.new { |name|
for (atr in atrs) { if (atr.enumText == name) return atr.tok } reportError.call(0, 0, "Unknown token %(name)")
}
var lines = [] var lineCount = 0 var lineNum = 0
var getTok = Fn.new {
var tok = TokS.new(0, 0, 0, "") if (lineNum < lineCount) { var line = lines[lineNum].trimEnd(" \t") lineNum = lineNum + 1 var fields = line.split(" ").where { |s| s != "" }.toList // [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional tok.errLn = Num.fromString(fields[0]) tok.errCol = Num.fromString(fields[1]) tok.tok = getEnum.call(fields[2]) var le = fields.count if (le == 4) { tok.text = fields[3] } else if (le > 4) { var idx = line.indexOf("\"") tok.text = line[idx..-1] } } return tok
}
var makeNode = Fn.new { |nodeType, left, right| Tree.new(nodeType, left, right, "") }
var makeLeaf = Fn.new { |nodeType, value| Tree.new(nodeType, null, null, value) }
var expect = Fn.new { |msg, s|
if (token.tok == s) { token = getTok.call() return } reportError.call(token.errLn, token.errCol, Fmt.swrite("$s: Expecting '$s', found '$s'", msg, atrs[s].text, atrs[token.tok].text))
}
var parenExpr // forward reference
var expr // recursive function expr = Fn.new { |p|
var x var node var t = token.tok if (t == Token.Lparen) { x = parenExpr.call() } else if (t == Token.Sub || t == Token.Add) { var op = t token = getTok.call() node = expr.call(atrs[Token.Negate].precedence) if (op == Token.Sub) { x = makeNode.call(Node.negate, node, null) } else { x = node } } else if (t == Token.Not) { token = getTok.call() x = makeNode.call(Node.Not, expr.call(atrs[Token.Not].precedence), null) } else if (t == Token.Ident) { x = makeLeaf.call(Node.Ident, token.text) token = getTok.call() } else if (t == Token.Integer) { x = makeLeaf.call(Node.Integer, token.text) token = getTok.call() } else { reportError.call(token.errLn, token.errCol, Fmt.swrite("Expecting a primary, found: $s", atrs[token.tok].text)) }
while (atrs[token.tok].isBinary && atrs[token.tok].precedence >= p) { var op = token.tok token = getTok.call() var q = atrs[op].precedence if (!atrs[op].rightAssociative) q = q + 1 node = expr.call(q) x = makeNode.call(atrs[op].nodeType, x, node) } return x
}
parenExpr = Fn.new {
expect.call("parenExpr", Token.Lparen) var t = expr.call(0) expect.call("parenExpr", Token.Rparen) return t
}
var stmt // recursive function stmt = Fn.new {
var t var v var e var s var s2 var tt = token.tok if (tt == Token.If) { token = getTok.call() e = parenExpr.call() s = stmt.call() s2 = null if (token.tok == Token.Else) { token = getTok.call() s2 = stmt.call() } t = makeNode.call(Node.If, e, makeNode.call(Node.If, s, s2)) } else if (tt == Token.Putc) { token = getTok.call() e = parenExpr.call() t = makeNode.call(Node.Prtc, e, null) expect.call("Putc", Token.Semi) } else if (tt == Token.Print) { // print '(' expr {',' expr} ')' token = getTok.call() expect.call("Print", Token.Lparen) while (true) { if (token.tok == Token.String) { e = makeNode.call(Node.Prts, makeLeaf.call(Node.String, token.text), null) token = getTok.call() } else { e = makeNode.call(Node.Prti, expr.call(0), null) } t = makeNode.call(Node.Sequence, t, e) if (token.tok != Token.Comma) break expect.call("Print", Token.Comma) } expect.call("Print", Token.Rparen) expect.call("Print", Token.Semi) } else if (tt == Token.Semi) { token = getTok.call() } else if (tt == Token.Ident) { v = makeLeaf.call(Node.Ident, token.text) token = getTok.call() expect.call("assign", Token.Assign) e = expr.call(0) t = makeNode.call(Node.Assign, v, e) expect.call("assign", Token.Semi) } else if (tt == Token.While) { token = getTok.call() e = parenExpr.call() s = stmt.call() t = makeNode.call(Node.While, e, s) } else if (tt == Token.Lbrace) { // {stmt} expect.call("Lbrace", Token.Lbrace) while (token.tok != Token.Rbrace && token.tok != Token.EOI) { t = makeNode.call(Node.Sequence, t, stmt.call()) } expect.call("Lbrace", Token.Rbrace) } else if (tt == Token.EOI) { // do nothing } else { reportError.call(token.errLn, token.errCol, Fmt.Swrite("expecting start of statement, found '$s'", atrs[token.tok].text)) } return t
}
var parse = Fn.new {
var t token = getTok.call() while (true) { t = makeNode.call(Node.Sequence, t, stmt.call()) if (!t || token.tok == Token.EOI) break } return t
}
var prtAst // recursive function prtAst = Fn.new { |t|
if (!t) { System.print(";") } else { Fmt.write("$-14s ", displayNodes[t.nodeType]) if (t.nodeType == Node.Ident || t.nodeType == Node.Integer || t.nodeType == Node.String) { System.print(t.value) } else { System.print() prtAst.call(t.left) prtAst.call(t.right) } }
}
lines = FileUtil.readLines("source.txt") lineCount = lines.count prtAst.call(parse.call())</lang>
- Output:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Zig
<lang zig> const std = @import("std");
pub const NodeValue = union(enum) {
integer: i32, string: []const u8,
fn fromToken(token: Token) ?NodeValue { if (token.value) |value| { switch (value) { .integer => |int| return NodeValue{ .integer = int }, .string => |str| return NodeValue{ .string = str }, } } else { return null; } }
};
pub const Tree = struct {
left: ?*Tree, right: ?*Tree, typ: NodeType, value: ?NodeValue = null,
};
pub const ParserError = error{
OutOfMemory, ExpectedNotFound,
} || std.fmt.ParseIntError;
pub const Parser = struct {
token_it: LexerOutputTokenizer, curr: Token, allocator: std.mem.Allocator,
const Self = @This();
pub fn init(allocator: std.mem.Allocator, str: []const u8) Self { return Self{ .token_it = LexerOutputTokenizer.init(str), .curr = Token{ .line = 0, .col = 0, .typ = .unknown }, .allocator = allocator, }; }
fn makeNode(self: *Self, typ: NodeType, left: ?*Tree, right: ?*Tree) !*Tree { const result = try self.allocator.create(Tree); result.* = Tree{ .left = left, .right = right, .typ = typ }; return result; }
fn makeLeaf(self: *Self, typ: NodeType, value: ?NodeValue) !*Tree { const result = try self.allocator.create(Tree); result.* = Tree{ .left = null, .right = null, .typ = typ, .value = value }; return result; }
pub fn parse(self: *Self) ParserError!?*Tree { try self.next(); var result: ?*Tree = null; while (true) { const stmt = try self.parseStmt(); result = try self.makeNode(.sequence, result, stmt); if (self.curr.typ == .eof) break; } return result; }
/// Classic "Recursive descent" statement parser. fn parseStmt(self: *Self) ParserError!?*Tree { var result: ?*Tree = null; switch (self.curr.typ) { .kw_print => { try self.next(); try self.expect(.left_paren); // Parse each print's argument as an expression delimited by commas until we reach // a closing parens. while (true) { var expr: ?*Tree = null; if (self.curr.typ == .string) { expr = try self.makeNode( .prts, try self.makeLeaf(.string, NodeValue.fromToken(self.curr)), null, ); try self.next(); } else { expr = try self.makeNode(.prti, try self.parseExpr(0), null); } result = try self.makeNode(.sequence, result, expr); if (self.curr.typ != .comma) break; try self.next(); } try self.expect(.right_paren); try self.expect(.semicolon); }, .kw_putc => { try self.next(); result = try self.makeNode(.prtc, try self.parseParenExpr(), null); try self.expect(.semicolon); }, .kw_while => { try self.next(); const expr = try self.parseParenExpr(); result = try self.makeNode(.kw_while, expr, try self.parseStmt()); }, .kw_if => { try self.next(); const expr = try self.parseParenExpr(); const if_stmt = try self.parseStmt(); const else_stmt = blk: { if (self.curr.typ == .kw_else) { try self.next(); break :blk try self.parseStmt(); } else { break :blk null; } }; const stmt_node = try self.makeNode(.kw_if, if_stmt, else_stmt); // If-statement uses `.kw_if` node for both first node with `expr` on the left // and statements on the right and also `.kw_if` node which goes to the right // and contains both if-branch and else-branch. result = try self.makeNode(.kw_if, expr, stmt_node); }, .left_brace => { try self.next(); while (self.curr.typ != .right_brace and self.curr.typ != .eof) { result = try self.makeNode(.sequence, result, try self.parseStmt()); } try self.expect(.right_brace); }, .identifier => { const identifer = try self.makeLeaf(.identifier, NodeValue.fromToken(self.curr)); try self.next(); try self.expect(.assign); const expr = try self.parseExpr(0); result = try self.makeNode(.assign, identifer, expr); try self.expect(.semicolon); }, .semicolon => try self.next(), else => { std.debug.print("\nSTMT: UNKNOWN {}\n", .{self.curr}); std.os.exit(1); }, } return result; }
/// "Precedence climbing" expression parser. fn parseExpr(self: *Self, precedence: i8) ParserError!?*Tree { var result: ?*Tree = null; switch (self.curr.typ) { .left_paren => { result = try self.parseParenExpr(); }, .subtract => { try self.next(); const metadata = NodeMetadata.find(.negate); const expr = try self.parseExpr(metadata.precedence); result = try self.makeNode(.negate, expr, null); }, .not => { try self.next(); const metadata = NodeMetadata.find(.not); const expr = try self.parseExpr(metadata.precedence); result = try self.makeNode(.not, expr, null); }, .add => { try self.next(); result = try self.parseExpr(precedence); }, .integer, .identifier => { const node_type = NodeMetadata.find(self.curr.typ).node_type; result = try self.makeLeaf(node_type, NodeValue.fromToken(self.curr)); try self.next(); }, else => { std.debug.print("\nEXPR: UNKNOWN {}\n", .{self.curr}); std.os.exit(1); }, }
var curr_metadata = NodeMetadata.find(self.curr.typ); while (curr_metadata.binary and curr_metadata.precedence >= precedence) { const new_precedence = if (curr_metadata.right_associative) curr_metadata.precedence else curr_metadata.precedence + 1; try self.next(); const sub_expr = try self.parseExpr(new_precedence); result = try self.makeNode(curr_metadata.node_type, result, sub_expr); curr_metadata = NodeMetadata.find(self.curr.typ); } return result; }
fn parseParenExpr(self: *Self) ParserError!?*Tree { try self.expect(.left_paren); const result = try self.parseExpr(0); try self.expect(.right_paren); return result; }
fn next(self: *Self) ParserError!void { const token = try self.token_it.next(); if (token) |tok| { self.curr = tok; } else { self.curr = Token{ .line = 0, .col = 0, .typ = .unknown }; } }
fn expect(self: *Self, token_type: TokenType) ParserError!void { if (self.curr.typ != token_type) { const expected_str = NodeMetadata.find(token_type).token_str; const found_str = NodeMetadata.find(self.curr.typ).token_str; std.debug.print( "({d}, {d}) error: Expecting '{s}', found '{s}'\n", .{ self.curr.line, self.curr.col, expected_str, found_str }, ); return ParserError.ExpectedNotFound; } try self.next(); }
};
pub fn parse(allocator: std.mem.Allocator, str: []const u8) !?*Tree {
var parser = Parser.init(allocator, str); return try parser.parse();
}
pub fn main() !void {
var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator); defer arena.deinit(); const allocator = arena.allocator();
var arg_it = std.process.args(); _ = try arg_it.next(allocator) orelse unreachable; // program name const file_name = arg_it.next(allocator); // We accept both files and standard input. var file_handle = blk: { if (file_name) |file_name_delimited| { const fname: []const u8 = try file_name_delimited; break :blk try std.fs.cwd().openFile(fname, .{}); } else { break :blk std.io.getStdIn(); } }; defer file_handle.close(); const input_content = try file_handle.readToEndAlloc(allocator, std.math.maxInt(usize));
const result: ?*Tree = try parse(allocator, input_content); const result_str = try astToFlattenedString(allocator, result); _ = try std.io.getStdOut().write(result_str);
}
const NodeMetadata = struct {
token_type: TokenType, right_associative: bool, binary: bool, unary: bool, precedence: i8, node_type: NodeType, token_str: []const u8,
const self = [_]NodeMetadata{ .{ .token_type = .multiply, .right_associative = false, .binary = true, .unary = false, .precedence = 13, .node_type = .multiply, .token_str = "*" }, .{ .token_type = .divide, .right_associative = false, .binary = true, .unary = false, .precedence = 13, .node_type = .divide, .token_str = "/" }, .{ .token_type = .mod, .right_associative = false, .binary = true, .unary = false, .precedence = 13, .node_type = .mod, .token_str = "%" }, .{ .token_type = .add, .right_associative = false, .binary = true, .unary = false, .precedence = 12, .node_type = .add, .token_str = "+" }, .{ .token_type = .subtract, .right_associative = false, .binary = true, .unary = false, .precedence = 12, .node_type = .subtract, .token_str = "-" }, .{ .token_type = .negate, .right_associative = false, .binary = false, .unary = true, .precedence = 14, .node_type = .negate, .token_str = "-" }, .{ .token_type = .less, .right_associative = false, .binary = true, .unary = false, .precedence = 10, .node_type = .less, .token_str = "<" }, .{ .token_type = .less_equal, .right_associative = false, .binary = true, .unary = false, .precedence = 10, .node_type = .less_equal, .token_str = "<=" }, .{ .token_type = .greater, .right_associative = false, .binary = true, .unary = false, .precedence = 10, .node_type = .greater, .token_str = ">" }, .{ .token_type = .greater_equal, .right_associative = false, .binary = true, .unary = false, .precedence = 10, .node_type = .greater_equal, .token_str = ">=" }, .{ .token_type = .equal, .right_associative = false, .binary = true, .unary = false, .precedence = 9, .node_type = .equal, .token_str = "=" }, .{ .token_type = .not_equal, .right_associative = false, .binary = true, .unary = false, .precedence = 9, .node_type = .not_equal, .token_str = "!=" }, .{ .token_type = .not, .right_associative = false, .binary = false, .unary = true, .precedence = 14, .node_type = .not, .token_str = "!" }, .{ .token_type = .assign, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .assign, .token_str = "=" }, .{ .token_type = .bool_and, .right_associative = false, .binary = true, .unary = false, .precedence = 5, .node_type = .bool_and, .token_str = "&&" }, .{ .token_type = .bool_or, .right_associative = false, .binary = true, .unary = false, .precedence = 4, .node_type = .bool_or, .token_str = "||" }, .{ .token_type = .left_paren, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "(" }, .{ .token_type = .right_paren, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = ")" }, .{ .token_type = .left_brace, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "{" }, .{ .token_type = .right_brace, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "}" }, .{ .token_type = .semicolon, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = ";" }, .{ .token_type = .comma, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "," }, .{ .token_type = .kw_if, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .kw_if, .token_str = "if" }, .{ .token_type = .kw_else, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "else" }, .{ .token_type = .kw_while, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .kw_while, .token_str = "while" }, .{ .token_type = .kw_print, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "print" }, .{ .token_type = .kw_putc, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "putc" }, .{ .token_type = .identifier, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .identifier, .token_str = "Identifier" }, .{ .token_type = .integer, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .integer, .token_str = "Integer literal" }, .{ .token_type = .string, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .string, .token_str = "String literal" }, .{ .token_type = .eof, .right_associative = false, .binary = false, .unary = false, .precedence = -1, .node_type = .unknown, .token_str = "End of line" }, };
pub fn find(token_type: TokenType) NodeMetadata { for (self) |metadata| { if (metadata.token_type == token_type) return metadata; } else { unreachable; } }
};
pub const NodeType = enum {
unknown, identifier, string, integer, sequence, kw_if, prtc, prts, prti, kw_while, assign, negate, not, multiply, divide, mod, add, subtract, less, less_equal, greater, greater_equal, equal, not_equal, bool_and, bool_or,
pub fn toString(self: NodeType) []const u8 { return switch (self) { .unknown => "UNKNOWN", .identifier => "Identifier", .string => "String", .integer => "Integer", .sequence => "Sequence", .kw_if => "If", .prtc => "Prtc", .prts => "Prts", .prti => "Prti", .kw_while => "While", .assign => "Assign", .negate => "Negate", .not => "Not", .multiply => "Multiply", .divide => "Divide", .mod => "Mod", .add => "Add", .subtract => "Subtract", .less => "Less", .less_equal => "LessEqual", .greater => "Greater", .greater_equal => "GreaterEqual", .equal => "Equal", .not_equal => "NotEqual", .bool_and => "And", .bool_or => "Or", }; }
};
fn astToFlattenedString(allocator: std.mem.Allocator, tree: ?*Tree) ![]const u8 {
var result = std.ArrayList(u8).init(allocator); var writer = result.writer(); try treeToString(allocator, writer, tree); return result.items;
}
pub const TokenType = enum {
unknown, multiply, divide, mod, add, subtract, negate, less, less_equal, greater, greater_equal, equal, not_equal, not, assign, bool_and, bool_or, left_paren, right_paren, left_brace, right_brace, semicolon, comma, kw_if, kw_else, kw_while, kw_print, kw_putc, identifier, integer, string, eof,
const from_string_map = std.ComptimeStringMap(TokenType, .{ .{ "Op_multiply", .multiply }, .{ "Op_divide", .divide }, .{ "Op_mod", .mod }, .{ "Op_add", .add }, .{ "Op_subtract", .subtract }, .{ "Op_negate", .negate }, .{ "Op_less", .less }, .{ "Op_lessequal", .less_equal }, .{ "Op_greater", .greater }, .{ "Op_greaterequal", .greater_equal }, .{ "Op_equal", .equal }, .{ "Op_notequal", .not_equal }, .{ "Op_not", .not }, .{ "Op_assign", .assign }, .{ "Op_and", .bool_and }, .{ "Op_or", .bool_or }, .{ "LeftParen", .left_paren }, .{ "RightParen", .right_paren }, .{ "LeftBrace", .left_brace }, .{ "RightBrace", .right_brace }, .{ "Semicolon", .semicolon }, .{ "Comma", .comma }, .{ "Keyword_if", .kw_if }, .{ "Keyword_else", .kw_else }, .{ "Keyword_while", .kw_while }, .{ "Keyword_print", .kw_print }, .{ "Keyword_putc", .kw_putc }, .{ "Identifier", .identifier }, .{ "Integer", .integer }, .{ "String", .string }, .{ "End_of_input", .eof }, });
pub fn fromString(str: []const u8) TokenType { return from_string_map.get(str).?; }
};
pub const TokenValue = union(enum) {
integer: i32, string: []const u8,
};
pub const Token = struct {
line: usize, col: usize, typ: TokenType = .unknown, value: ?TokenValue = null,
};
const TreeToStringError = error{OutOfMemory};
fn treeToString(
allocator: std.mem.Allocator, writer: std.ArrayList(u8).Writer, tree: ?*Tree,
) TreeToStringError!void {
if (tree) |t| { _ = try writer.write(try std.fmt.allocPrint( allocator, "{s}", .{t.typ.toString()}, )); switch (t.typ) { .string, .identifier => _ = try writer.write(try std.fmt.allocPrint( allocator, " {s}\n", .{t.value.?.string}, )), .integer => _ = try writer.write(try std.fmt.allocPrint( allocator, " {d}\n", .{t.value.?.integer}, )), else => { _ = try writer.write(try std.fmt.allocPrint( allocator, "\n", .{}, )); try treeToString(allocator, writer, t.left); try treeToString(allocator, writer, t.right); }, } } else { _ = try writer.write(try std.fmt.allocPrint( allocator, ";\n", .{}, )); }
}
pub const LexerOutputTokenizer = struct {
it: std.mem.SplitIterator(u8),
const Self = @This();
pub fn init(str: []const u8) Self { return Self{ .it = std.mem.split(u8, str, "\n") }; }
pub fn next(self: *Self) std.fmt.ParseIntError!?Token { if (self.it.next()) |line| { if (line.len == 0) return null; var tokens_it = std.mem.tokenize(u8, line, " "); const lineNumber = try std.fmt.parseInt(usize, tokens_it.next().?, 10); const colNumber = try std.fmt.parseInt(usize, tokens_it.next().?, 10); const typ_text = tokens_it.next().?; const typ = TokenType.fromString(typ_text); const pre_value_index = tokens_it.index; const value = tokens_it.next(); var token = Token{ .line = lineNumber, .col = colNumber, .typ = typ }; if (value) |val| { const token_value = blk: { switch (typ) { .string, .identifier => { tokens_it.index = pre_value_index; break :blk TokenValue{ .string = tokens_it.rest() }; }, .integer => break :blk TokenValue{ .integer = try std.fmt.parseInt(i32, val, 10) }, else => unreachable, } }; token.value = token_value; } return token; } else { return null; } }
};
fn stringToTokenList(allocator: std.mem.Allocator, str: []const u8) !std.ArrayList(Token) {
var result = std.ArrayList(Token).init(allocator); var lexer_output_it = LexerOutputTokenizer.init(str); while (try lexer_output_it.next()) |token| { try result.append(token); } return result;
} </lang>