Compiler/syntax analyzer
You are encouraged to solve this task according to the task description, using any language you may know.
A Syntax analyzer transforms a token stream (from the Lexical analyzer) into a Syntax tree, based on a grammar.
Take the output from the Lexical analyzer task, and convert it to an Abstract Syntax Tree (AST), based on the grammar below. The output should be in a flattened format.
The program should read input from a file and/or stdin, and write output to a file and/or stdout. If the language being used has a parser module/library/class, it would be great if two versions of the solution are provided: One without the parser module, and one with.
The simple programming language to be analyzed is more or less a (very tiny) subset of C. The formal grammar in Extended Backus-Naur Form (EBNF):
stmt_list = {stmt} ;
stmt = ';'
| Identifier '=' expr ';'
| 'while' paren_expr stmt
| 'if' paren_expr stmt ['else' stmt]
| 'print' '(' prt_list ')' ';'
| 'putc' paren_expr ';'
| '{' stmt_list '}'
;
paren_expr = '(' expr ')' ;
prt_list = (string | expr) {',' (String | expr)} ;
expr = and_expr {'||' and_expr} ;
and_expr = equality_expr {'&&' equality_expr} ;
equality_expr = relational_expr [('==' | '!=') relational_expr] ;
relational_expr = addition_expr [('<' | '<=' | '>' | '>=') addition_expr] ;
addition_expr = multiplication_expr {('+' | '-') multiplication_expr} ;
multiplication_expr = primary {('*' | '/' | '%') primary } ;
primary = Identifier
| Integer
| '(' expr ')'
| ('+' | '-' | '!') primary
;
The resulting AST should be formulated as a Binary Tree.
- Example - given the simple program (below), stored in a file called while.t, create the list of tokens, using one of the Lexical analyzer solutions
lex < while.t > while.lex
- Run one of the Syntax analyzer solutions
parse < while.lex > while.ast
- The following table shows the input to lex, lex output, and the AST produced by the parser
Input to lex | Output from lex, input to parse | Output from parse |
---|---|---|
count = 1;
while (count < 10) {
print("count is: ", count, "\n");
count = count + 1;
}
|
1 1 Identifier count 1 7 Op_assign 1 9 Integer 1 1 10 Semicolon 2 1 Keyword_while 2 7 LeftParen 2 8 Identifier count 2 14 Op_less 2 16 Integer 10 2 18 RightParen 2 20 LeftBrace 3 5 Keyword_print 3 10 LeftParen 3 11 String "count is: " 3 23 Comma 3 25 Identifier count 3 30 Comma 3 32 String "\n" 3 36 RightParen 3 37 Semicolon 4 5 Identifier count 4 11 Op_assign 4 13 Identifier count 4 19 Op_add 4 21 Integer 1 4 22 Semicolon 5 1 RightBrace 6 1 End_of_input |
Sequence Sequence ; Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence ; Sequence Sequence Sequence ; Prts String "count is: " ; Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1 |
- Specifications
- List of node type names
Identifier String Integer Sequence If Prtc Prts Prti While Assign Negate Not Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or
In the text below, Null/Empty nodes are represented by ";".
- Non-terminal (internal) nodes
For Operators, the following nodes should be created:
Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or
For each of the above nodes, the left and right sub-nodes are the operands of the respective operation.
In pseudo S-Expression format:
(Operator expression expression)
Negate, Not
For these node types, the left node is the operand, and the right node is null.
(Operator expression ;)
Sequence - sub-nodes are either statements or Sequences.
If - left node is the expression, the right node is If node, with it's left node being the if-true statement part, and the right node being the if-false (else) statement part.
(If expression (If statement else-statement))
If there is not an else, the tree becomes:
(If expression (If statement ;))
Prtc
(Prtc (expression) ;)
Prts
(Prts (String "the string") ;)
Prti
(Prti (Integer 12345) ;)
While - left node is the expression, the right node is the statement.
(While expression statement)
Assign - left node is the left-hand side of the assignment, the right node is the right-hand side of the assignment.
(Assign Identifier expression)
Terminal (leaf) nodes:
Identifier: (Identifier ident_name) Integer: (Integer 12345) String: (String "Hello World!") ";": Empty node
- Some simple examples
Sequences denote a list node; they are used to represent a list. semicolon's represent a null node, e.g., the end of this path.
This simple program:
a=11;
Produces the following AST, encoded as a binary tree:
Under each non-leaf node are two '|' lines. The first represents the left sub-node, the second represents the right sub-node:
(1) Sequence (2) |-- ; (3) |-- Assign (4) |-- Identifier: a (5) |-- Integer: 11
In flattened form:
(1) Sequence (2) ; (3) Assign (4) Identifier a (5) Integer 11
This program:
a=11; b=22; c=33;
Produces the following AST:
( 1) Sequence ( 2) |-- Sequence ( 3) | |-- Sequence ( 4) | | |-- ; ( 5) | | |-- Assign ( 6) | | |-- Identifier: a ( 7) | | |-- Integer: 11 ( 8) | |-- Assign ( 9) | |-- Identifier: b (10) | |-- Integer: 22 (11) |-- Assign (12) |-- Identifier: c (13) |-- Integer: 33
In flattened form:
( 1) Sequence ( 2) Sequence ( 3) Sequence ( 4) ; ( 5) Assign ( 6) Identifier a ( 7) Integer 11 ( 8) Assign ( 9) Identifier b (10) Integer 22 (11) Assign (12) Identifier c (13) Integer 33
- Pseudo-code for the parser.
Uses Precedence Climbing for expression parsing, and Recursive Descent for statement parsing. The AST is also built:
def expr(p)
if tok is "("
x = paren_expr()
elif tok in ["-", "+", "!"]
gettok()
y = expr(precedence of operator)
if operator was "+"
x = y
else
x = make_node(operator, y)
elif tok is an Identifier
x = make_leaf(Identifier, variable name)
gettok()
elif tok is an Integer constant
x = make_leaf(Integer, integer value)
gettok()
else
error()
while tok is a binary operator and precedence of tok >= p
save_tok = tok
gettok()
q = precedence of save_tok
if save_tok is not right associative
q += 1
x = make_node(Operator save_tok represents, x, expr(q))
return x
def paren_expr()
expect("(")
x = expr(0)
expect(")")
return x
def stmt()
t = NULL
if accept("if")
e = paren_expr()
s = stmt()
t = make_node(If, e, make_node(If, s, accept("else") ? stmt() : NULL))
elif accept("putc")
t = make_node(Prtc, paren_expr())
expect(";")
elif accept("print")
expect("(")
repeat
if tok is a string
e = make_node(Prts, make_leaf(String, the string))
gettok()
else
e = make_node(Prti, expr(0))
t = make_node(Sequence, t, e)
until not accept(",")
expect(")")
expect(";")
elif tok is ";"
gettok()
elif tok is an Identifier
v = make_leaf(Identifier, variable name)
gettok()
expect("=")
t = make_node(Assign, v, expr(0))
expect(";")
elif accept("while")
e = paren_expr()
t = make_node(While, e, stmt()
elif accept("{")
while tok not equal "}" and tok not equal end-of-file
t = make_node(Sequence, t, stmt())
expect("}")
elif tok is end-of-file
pass
else
error()
return t
def parse()
t = NULL
gettok()
repeat
t = make_node(Sequence, t, stmt())
until tok is end-of-file
return t
- Once the AST is built, it should be output in a flattened format. This can be as simple as the following
def prt_ast(t)
if t == NULL
print(";\n")
else
print(t.node_type)
if t.node_type in [Identifier, Integer, String] # leaf node
print the value of the Ident, Integer or String, "\n"
else
print("\n")
prt_ast(t.left)
prt_ast(t.right)
- If the AST is correctly built, loading it into a subsequent program should be as simple as
def load_ast()
line = readline()
# Each line has at least one token
line_list = tokenize the line, respecting double quotes
text = line_list[0] # first token is always the node type
if text == ";" # a terminal node
return NULL
node_type = text # could convert to internal form if desired
# A line with two tokens is a leaf node
# Leaf nodes are: Identifier, Integer, String
# The 2nd token is the value
if len(line_list) > 1
return make_leaf(node_type, line_list[1])
left = load_ast()
right = load_ast()
return make_node(node_type, left, right)
Finally, the AST can also be tested by running it against one of the AST Interpreter solutions.
- Test program, assuming this is in a file called prime.t
- lex <prime.t | parse
Input to lex | Output from lex, input to parse | Output from parse |
---|---|---|
/*
Simple prime number generator
*/
count = 1;
n = 1;
limit = 100;
while (n < limit) {
k=3;
p=1;
n=n+2;
while ((k*k<=n) && (p)) {
p=n/k*k!=n;
k=k+2;
}
if (p) {
print(n, " is prime\n");
count = count + 1;
}
}
print("Total primes found: ", count, "\n");
|
4 1 Identifier count 4 7 Op_assign 4 9 Integer 1 4 10 Semicolon 5 1 Identifier n 5 3 Op_assign 5 5 Integer 1 5 6 Semicolon 6 1 Identifier limit 6 7 Op_assign 6 9 Integer 100 6 12 Semicolon 7 1 Keyword_while 7 7 LeftParen 7 8 Identifier n 7 10 Op_less 7 12 Identifier limit 7 17 RightParen 7 19 LeftBrace 8 5 Identifier k 8 6 Op_assign 8 7 Integer 3 8 8 Semicolon 9 5 Identifier p 9 6 Op_assign 9 7 Integer 1 9 8 Semicolon 10 5 Identifier n 10 6 Op_assign 10 7 Identifier n 10 8 Op_add 10 9 Integer 2 10 10 Semicolon 11 5 Keyword_while 11 11 LeftParen 11 12 LeftParen 11 13 Identifier k 11 14 Op_multiply 11 15 Identifier k 11 16 Op_lessequal 11 18 Identifier n 11 19 RightParen 11 21 Op_and 11 24 LeftParen 11 25 Identifier p 11 26 RightParen 11 27 RightParen 11 29 LeftBrace 12 9 Identifier p 12 10 Op_assign 12 11 Identifier n 12 12 Op_divide 12 13 Identifier k 12 14 Op_multiply 12 15 Identifier k 12 16 Op_notequal 12 18 Identifier n 12 19 Semicolon 13 9 Identifier k 13 10 Op_assign 13 11 Identifier k 13 12 Op_add 13 13 Integer 2 13 14 Semicolon 14 5 RightBrace 15 5 Keyword_if 15 8 LeftParen 15 9 Identifier p 15 10 RightParen 15 12 LeftBrace 16 9 Keyword_print 16 14 LeftParen 16 15 Identifier n 16 16 Comma 16 18 String " is prime\n" 16 31 RightParen 16 32 Semicolon 17 9 Identifier count 17 15 Op_assign 17 17 Identifier count 17 23 Op_add 17 25 Integer 1 17 26 Semicolon 18 5 RightBrace 19 1 RightBrace 20 1 Keyword_print 20 6 LeftParen 20 7 String "Total primes found: " 20 29 Comma 20 31 Identifier count 20 36 Comma 20 38 String "\n" 20 42 RightParen 20 43 Semicolon 21 1 End_of_input |
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ; |
- Additional examples
Your solution should pass all the test cases above and the additional tests found Here.
The C and Python versions can be considered reference implementations.
- Related Tasks
ALGOL W
begin % syntax analyser %
% parse tree nodes %
record node( integer type
; reference(node) left, right
; integer iValue % nString/nIndentifier number or nInteger value %
);
integer nIdentifier, nString, nInteger, nSequence, nIf, nPrtc, nPrts
, nPrti, nWhile, nAssign, nNegate, nNot, nMultiply
, nDivide, nMod, nAdd, nSubtract, nLess, nLessEqual
, nGreater, nGreaterEqual, nEqual, nNotEqual, nAnd, nOr
;
string(14) array ndName ( 1 :: 25 );
% tokens - names must match those output by the lexical analyser %
integer tkType, tkLine, tkColumn, tkLength, tkIntegerValue;
integer tOp_multiply , tOp_divide , tOp_mod , tOp_add
, tOp_subtract , tOp_negate , tOp_less , tOp_lessequal
, tOp_greater , tOp_greaterequal , tOp_equal , tOp_notequal
, tOp_not , tOp_assign , tOp_and , tOp_or
, tLeftParen , tRightParen , tLeftBrace , tRightBrace
, tSemicolon , tComma , tKeyword_if , tKeyword_else
, tKeyword_while , tKeyword_print , tKeyword_putc , tIdentifier
, tInteger , tString , tEnd_of_input
, MAX_TOKEN_TYPE, PRIMARY_PREC
;
string(16) array tkName ( 1 :: 31 );
integer array tkPrec, tkNode ( 1 :: 31 );
% string literals and identifiers - uses a linked list - a hash table might be better... %
string(1) array text ( 0 :: 4095 );
integer textNext, TEXT_MAX;
record textElement ( integer start, length; reference(textElement) next );
reference(textElement) idList, stList;
% returns a new node with left and right branches %
reference(node) procedure opNode ( integer value opType; reference(node) value opLeft, opRight ) ; begin
node( opType, opLeft, opRight, 0 )
end opNode ;
% returns a new operand node %
reference(node) procedure operandNode ( integer value opType, opValue ) ; begin
node( opType, null, null, opValue )
end operandNode ;
% reports an error %
procedure synError( integer value line, column; string(80) value message ); begin
integer errorPos;
write( i_w := 1, s_w := 0, "**** Error at(", line, ",", column, "): " );
errorPos := 0;
while errorPos < 80 and message( errorPos // 1 ) not = "." do begin
writeon( s_w := 0, message( errorPos // 1 ) );
errorPos := errorPos + 1
end while_not_at_end_of_message ;
writeon( s_w := 0, "." )
end synError ;
% reports an error and stops %
procedure fatalError( integer value line, column; string(80) value message ); begin
synError( line, column, message );
assert( false )
end fatalError ;
% prints a node and its sub-nodes %
procedure writeNode( reference(node) value n ) ; begin
% prints an identifier or string from text %
procedure writeOnText( reference(textElement) value txHead; integer value txNumber ) ;
begin
reference(textElement) txPos;
integer count;
txPos := txHead;
count := 1;
while count < txNumber and txPos not = null do begin
txPos := next(txPos);
count := count + 1
end while_text_element_not_found ;
if txPos = null then fatalError( 0, txNumber, "INTERNAL ERROR: text not found." )
else for cPos := 0 until length(txPos) - 1 do writeon( text( start(txPos) + cPos ) );
if text( start(txPos) ) = """" then writeon( """" );
end writeOnText ;
if n = null then write( ";" )
else begin
write( ndName( type(n) ) );
if type(n) = nInteger then writeon( iValue(n) )
else if type(n) = nIdentifier then writeOnText( idList, iValue(n) )
else if type(n) = nString then writeOnText( stList, iValue(n) )
else begin
writeNode( left(n) );
writeNode( right(n) )
end
end
end writeNode ;
% reads a token from standard input %
procedure readToken ; begin
% parses a string from line and stores it in a string in the text array %
% - if it is not already present in the specified textElement list. %
% returns the position of the string in the text array %
integer procedure readString ( reference(textElement) value result txList; string(1) value terminator ) ; begin
string(256) str;
integer sLen, sPos, ePos;
logical found;
reference(textElement) txPos, txLastPos;
% get the text of the string %
str := " ";
sLen := 0;
str( sLen // 1 ) := line( lPos // 1 );
sLen := sLen + 1;
lPos := lPos + 1;
while lPos <= 255 and line( lPos // 1 ) not = terminator do begin
str( sLen // 1 ) := line( lPos // 1 );
sLen := sLen + 1;
lPos := lPos + 1
end while_more_string ;
if lPos > 255 then fatalError( tkLine, tkColumn, "Unterminated String in token file." );
% attempt to find the text in the list of strings/identifiers %
txLastPos := txPos := txList;
found := false;
ePos := 0;
while not found and txPos not = null do begin
ePos := ePos + 1;
found := ( length(txPos) = sLen );
sPos := 0;
while found and sPos < sLen do begin
found := str( sPos // 1 ) = text( start(txPos) + sPos );
sPos := sPos + 1
end while_not_found ;
txLastPos := txPos;
if not found then txPos := next(txPos)
end while_string_not_found ;
if not found then begin
% the string/identifier is not in the list - add it %
ePos := ePos + 1;
if txList = null then txList := textElement( textNext, sLen, null )
else next(txLastPos) := textElement( textNext, sLen, null );
if textNext + sLen > TEXT_MAX then fatalError( tkLine, tkColumn, "Text space exhausted." )
else begin
for cPos := 0 until sLen - 1 do begin
text( textNext ) := str( cPos // 1 );
textNext := textNext + 1
end for_cPos
end
end if_not_found ;
ePos
end readString ;
% gets an integer from the line - no checks for valid digits %
integer procedure readInteger ; begin
integer n;
while line( lPos // 1 ) = " " do lPos := lPos + 1;
n := 0;
while line( lPos // 1 ) not = " " do begin
n := ( n * 10 ) + ( decode( line( lPos // 1 ) ) - decode( "0" ) );
lPos := lPos + 1
end while_not_end_of_integer ;
n
end readInteger ;
string(256) line;
string(16) name;
integer lPos, tPos;
tPos := lPos := 0;
readcard( line );
% get the line and column numbers %
tkLine := readInteger;
tkColumn := readInteger;
% get the token name %
while line( lPos // 1 ) = " " do lPos := lPos + 1;
name := "";
while lPos < 256 and line( lPos // 1 ) not = " " do begin
name( tPos // 1 ) := line( lPos // 1 );
lPos := lPos + 1;
tPos := tPos + 1
end while_more_name ;
% determine the token type %
tkType := 1;
tkIntegerValue := 0;
while tkType <= MAX_TOKEN_TYPE and name not = tkName( tkType ) do tkType := tkType + 1;
if tkType > MAX_TOKEN_TYPE then fatalError( tkLine, tkColumn, "Malformed token" );
% handle the additional parameter for identifier/string/integer %
if tkType = tInteger or tkType = tIdentifier or tkType = tString then begin
while line( lPos // 1 ) = " " do lPos := lPos + 1;
if tkType = tInteger then tkIntegerValue := readInteger
else if tkType = tIdentifier then tkIntegerValue := readString( idList, " " )
else % tkType = tString % tkIntegerValue := readString( stList, """" )
end if_token_with_additional_parameter ;
end readToken ;
% parses a statement %
reference(node) procedure parseStatement ; begin
reference(node) stmtNode, stmtExpr;
% skips the current token if it is expectedToken, %
% returns true if the token was expectedToken, false otherwise %
logical procedure have ( integer value expectedToken ) ; begin
logical haveExpectedToken;
haveExpectedToken := ( tkType = expectedToken );
if haveExpectedToken and tkType not = tEnd_of_input then readToken;
haveExpectedToken
end have ;
% issues an error message and skips past the next semi-colon or to end of input %
procedure skipStatement ( string(80) value message ) ; begin
synError( tkLine, tkColumn, message );
while tkType not = tEnd_of_input and not have( tSemicolon ) do readToken
end skipStatement ;
% checks we have a semicolon, issues an error and skips the statement if not %
procedure mustBeEndOfStatement ; begin
if not have( tSemicolon ) then skipStatement( """;"" expected." )
end mustBeEndOfStatement ;
% skips the current token if it is "(" and issues an error if it isn't %
procedure mustBeLeftParen ; begin
if not have( tLeftParen ) then synError( tkLine, tkColumn, """("" expected." )
end % mustBeLeftParen % ;
% skips the current token if it is ")" and issues an error if it isn't %
procedure mustBeRightParen ; begin
if not have( tRightParen ) then synError( tkLine, tkColumn, """)"" expected." )
end % mustBeRightParen % ;
% gets the next token and parses an expression with the specified precedence %
reference(node) procedure nextAndparseExpr ( integer value precedence ) ; begin
readToken;
parseExpr( precedence )
end nextAndParseExpr ;
% parses an expression with the specified precedence %
% all operators are assumed to be left-associative %
reference(node) procedure parseExpr ( integer value precedence ) ; begin
% handles a single token primary %
reference(node) procedure simplePrimary ( integer value primaryNodeType ) ; begin
reference(node) primaryNode;
primaryNode := operandNode( primaryNodeType, tkIntegerValue );
readToken;
primaryNode
end simplePrimary ;
reference(node) exprNode;
if precedence < PRIMARY_PREC then begin
exprNode := parseExpr( precedence + 1 );
while tkPrec( tkType ) = precedence do begin
integer op;
op := tkNode( tkType );
exprNode := opNode( op, exprNode, nextAndParseExpr( precedence + 1 ) )
end while_op_at_this_precedence_level
end
else if tkType = tIdentifier then exprNode := simplePrimary( nIdentifier )
else if tkType = tInteger then exprNode := simplePrimary( nInteger )
else if tkType = nString then begin
synError( tkLine, tkColumn, "Unexpected string literal." );
exprNode := simplePrimary( nInteger )
end
else if tkType = tLeftParen then exprNode := parseParenExpr
else if tkType = tOp_add then exprNode := nextAndParseExpr( precedence )
else if tkType = tOp_subtract then exprNode := opNode( nNegate, nextAndParseExpr( precedence ), null )
else if tkType = tOp_not then exprNode := opNode( nNot, nextAndParseExpr( precedence ), null )
else begin
synError( tkLine, tkColumn, "Syntax error in expression." );
exprNode := simplePrimary( nInteger )
end;
exprNode
end parseExpr ;
% parses a preenthesised expression %
reference(node) procedure parseParenExpr ; begin
reference(node) exprNode;
mustBeLeftParen;
exprNode := parseExpr( 0 );
mustBeRightParen;
exprNode
end parseParenExpr ;
% parse statement depending on it's first token %
if tkType = tIdentifier then begin % assignment statement %
stmtExpr := operandNode( nIdentifier, tkIntegerValue );
% skip the identifier and check for "=" %
readToken;
if not have( tOp_Assign ) then synError( tkLine, tkColumn, "Expected ""="" in assignment statement." );
stmtNode := opNode( nAssign, stmtExpr, parseExpr( 0 ) );
mustBeEndOfStatement
end
else if have( tKeyword_while ) then begin
stmtExpr := parseParenExpr;
stmtNode := opNode( nWhile, stmtExpr, parseStatement )
end
else if have( tkeyword_if ) then begin
stmtExpr := parseParenExpr;
stmtNode := opNode( nIf, stmtExpr, opNode( nIf, parseStatement, null ) );
if have( tKeyword_else ) then % have an "else" part % right(right(stmtNode)) := parseStatement
end
else if have( tKeyword_Print ) then begin
mustBeLeftParen;
stmtNode := null;
while begin
if tkType = tString then begin
stmtNode := opNode( nSequence
, stmtNode
, opNode( nPrts, operandNode( nString, tkIntegerValue ), null )
);
readToken
end
else stmtNode := opNode( nSequence, stmtNode, opNode( nPrti, parseExpr( 0 ), null ) );
have( tComma )
end do begin end;
mustBeRightparen;
mustBeEndOfStatement;
end
else if have( tKeyword_Putc ) then begin
stmtNode := opNode( nPrtc, parseParenExpr, null );
mustBeEndOfStatement
end
else if have( tLeftBrace ) then begin % block %
stmtNode := parseStatementList( tRightBrace );
if not have( tRightBrace ) then synError( tkLine, tkColumn, "Expected ""}""." );
end
else if have( tSemicolon ) then stmtNode := null
else begin % unrecognised statement %
skipStatement( "Unrecognised statement." );
stmtNode := null
end if_various_tokens ;
stmtNode
end parseStatement ;
% parses a statement list ending with the specified terminator %
reference(node) procedure parseStatementList ( integer value terminator ) ; begin
reference(node) listNode;
listNode := null;
while tkType not = terminator
and tkType not = tEnd_of_input do listNode := opNode( nSequence, listNode, parseStatement );
listNode
end parseStatementList ;
% sets a node code and name %
procedure setNode ( integer result nd; integer value ndCode; string(14) value name ) ;
begin nd := ndCode; ndName( ndCode ) := name end;
setNode( nIdentifier, 1, "Identifier" ); setNode( nString, 2, "String" );
setNode( nInteger, 3, "Integer" ); setNode( nSequence, 4, "Sequence" ); setNode( nIf, 5, "If" );
setNode( nPrtc, 6, "Prtc" ); setNode( nPrts, 7, "Prts" );
setNode( nPrti, 8, "Prti" ); setNode( nWhile, 9, "While" );
setNode( nAssign, 10, "Assign" ); setNode( nNegate, 11, "Negate" ); setNode( nNot, 12, "Not" );
setNode( nMultiply, 13, "Multiply" ); setNode( nDivide, 14, "Divide" ); setNode( nMod, 15, "Mod" );
setNode( nAdd, 16, "Add" ); setNode( nSubtract, 17, "Subtract" );
setNode( nLess, 18, "Less" ); setNode( nLessEqual, 19, "LessEqual" );
setNode( nGreater, 20, "Greater" );
setNode( nGreaterEqual, 21, "GreaterEqual" ); setNode( nEqual, 22, "Equal" );
setNode( nNotEqual, 23, "NotEqual" ); setNode( nAnd, 24, "And" ); setNode( nOr, 25, "Or" );
tOp_multiply := 1; tkName( tOp_multiply ) := "Op_multiply"; tkPrec( tOp_multiply ) := 5;
tOp_divide := 2; tkName( tOp_divide ) := "Op_divide"; tkPrec( tOp_divide ) := 5;
tOp_mod := 3; tkName( tOp_mod ) := "Op_mod"; tkPrec( tOp_mod ) := 5;
tOp_add := 4; tkName( tOp_add ) := "Op_add"; tkPrec( tOp_add ) := 4;
tOp_subtract := 5; tkName( tOp_subtract ) := "Op_subtract"; tkPrec( tOp_subtract ) := 4;
tOp_negate := 6; tkName( tOp_negate ) := "Op_negate"; tkPrec( tOp_negate ) := -1;
tOp_less := 7; tkName( tOp_less ) := "Op_less"; tkPrec( tOp_less ) := 3;
tOp_lessequal := 8; tkName( tOp_lessequal ) := "Op_lessequal"; tkPrec( tOp_lessequal ) := 3;
tOp_greater := 9; tkName( tOp_greater ) := "Op_greater"; tkPrec( tOp_greater ) := 3;
tOp_greaterequal := 10; tkName( tOp_greaterequal ) := "Op_greaterequal"; tkPrec( tOp_greaterequal ) := 3;
tOp_equal := 11; tkName( tOp_equal ) := "Op_equal"; tkPrec( tOp_equal ) := 2;
tOp_notequal := 12; tkName( tOp_notequal ) := "Op_notequal"; tkPrec( tOp_notequal ) := 2;
tOp_not := 13; tkName( tOp_not ) := "Op_not"; tkPrec( tOp_not ) := -1;
tOp_assign := 14; tkName( tOp_assign ) := "Op_assign"; tkPrec( tOp_assign ) := -1;
tOp_and := 15; tkName( tOp_and ) := "Op_and"; tkPrec( tOp_and ) := 1;
tOp_or := 16; tkName( tOp_or ) := "Op_or"; tkPrec( tOp_or ) := 0;
tLeftParen := 17; tkName( tLeftParen ) := "LeftParen"; tkPrec( tLeftParen ) := -1;
tRightParen := 18; tkName( tRightParen ) := "RightParen"; tkPrec( tRightParen ) := -1;
tLeftBrace := 19; tkName( tLeftBrace ) := "LeftBrace"; tkPrec( tLeftBrace ) := -1;
tRightBrace := 20; tkName( tRightBrace ) := "RightBrace"; tkPrec( tRightBrace ) := -1;
tSemicolon := 21; tkName( tSemicolon ) := "Semicolon"; tkPrec( tSemicolon ) := -1;
tComma := 22; tkName( tComma ) := "Comma"; tkPrec( tComma ) := -1;
tKeyword_if := 23; tkName( tKeyword_if ) := "Keyword_if"; tkPrec( tKeyword_if ) := -1;
tKeyword_else := 24; tkName( tKeyword_else ) := "Keyword_else"; tkPrec( tKeyword_else ) := -1;
tKeyword_while := 25; tkName( tKeyword_while ) := "Keyword_while"; tkPrec( tKeyword_while ) := -1;
tKeyword_print := 26; tkName( tKeyword_print ) := "Keyword_print"; tkPrec( tKeyword_print ) := -1;
tKeyword_putc := 27; tkName( tKeyword_putc ) := "Keyword_putc"; tkPrec( tKeyword_putc ) := -1;
tIdentifier := 28; tkName( tIdentifier ) := "Identifier"; tkPrec( tIdentifier ) := -1;
tInteger := 29; tkName( tInteger ) := "Integer"; tkPrec( tInteger ) := -1;
tString := 30; tkName( tString ) := "String"; tkPrec( tString ) := -1;
tEnd_of_input := 31; tkName( tEnd_of_input ) := "End_of_input"; tkPrec( tEnd_of_input ) := -1;
MAX_TOKEN_TYPE := 31; TEXT_MAX := 4095; textNext := 0; PRIMARY_PREC := 6;
for tkPos := 1 until MAX_TOKEN_TYPE do tkNode( tkPos ) := - tkPos;
tkNode( tOp_multiply ) := nMultiply; tkNode( tOp_divide ) := nDivide; tkNode( tOp_mod ) := nMod;
tkNode( tOp_add ) := nAdd; tkNode( tOp_subtract ) := nSubtract; tkNode( tOp_less ) := nLess;
tkNode( tOp_lessequal ) := nLessEqual; tkNode( tOp_greater ) := nGreater;
tkNode( tOp_greaterequal ) := nGreaterEqual;
tkNode( tOp_equal ) := nEqual; tkNode( tOp_notequal ) := nNotEqual; tkNode( tOp_not ) := nNot;
tkNode( tOp_and ) := nAnd; tkNode( tOp_or ) := nOr;
stList := idList := null;
% parse the output from the lexical analyser and output the linearised parse tree %
readToken;
writeNode( parseStatementList( tEnd_of_input ) )
end.
- Output:
Output from parsing the Prime Numbers example program.
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
ATS
(********************************************************************)
(* Usage: parse [INPUTFILE [OUTPUTFILE]]
If INPUTFILE or OUTPUTFILE is "-" or missing, then standard input
or standard output is used, respectively. *)
#include "share/atspre_staload.hats"
staload UN = "prelude/SATS/unsafe.sats"
#define NIL list_nil ()
#define :: list_cons
%{^
/* alloca(3) is needed for ATS exceptions. */
#include <alloca.h>
%}
(********************************************************************)
#define NUM_TOKENS 31
#define TOKEN_ELSE 0
#define TOKEN_IF 1
#define TOKEN_PRINT 2
#define TOKEN_PUTC 3
#define TOKEN_WHILE 4
#define TOKEN_MULTIPLY 5
#define TOKEN_DIVIDE 6
#define TOKEN_MOD 7
#define TOKEN_ADD 8
#define TOKEN_SUBTRACT 9
#define TOKEN_NEGATE 10
#define TOKEN_LESS 11
#define TOKEN_LESSEQUAL 12
#define TOKEN_GREATER 13
#define TOKEN_GREATEREQUAL 14
#define TOKEN_EQUAL 15
#define TOKEN_NOTEQUAL 16
#define TOKEN_NOT 17
#define TOKEN_ASSIGN 18
#define TOKEN_AND 19
#define TOKEN_OR 20
#define TOKEN_LEFTPAREN 21
#define TOKEN_RIGHTPAREN 22
#define TOKEN_LEFTBRACE 23
#define TOKEN_RIGHTBRACE 24
#define TOKEN_SEMICOLON 25
#define TOKEN_COMMA 26
#define TOKEN_IDENTIFIER 27
#define TOKEN_INTEGER 28
#define TOKEN_STRING 29
#define TOKEN_END_OF_INPUT 30
typedef token_t =
[i : int | TOKEN_ELSE <= i; i <= TOKEN_END_OF_INPUT]
int i
typedef tokentuple_t = (token_t, String, ullint, ullint)
fn
token_text (tok : token_t) : String =
case+ tok of
| TOKEN_ELSE => "else"
| TOKEN_IF => "if"
| TOKEN_PRINT => "print"
| TOKEN_PUTC => "putc"
| TOKEN_WHILE => "while"
| TOKEN_MULTIPLY => "*"
| TOKEN_DIVIDE => "/"
| TOKEN_MOD => "%"
| TOKEN_ADD => "+"
| TOKEN_SUBTRACT => "-"
| TOKEN_NEGATE => "-"
| TOKEN_LESS => "<"
| TOKEN_LESSEQUAL => "<="
| TOKEN_GREATER => ">"
| TOKEN_GREATEREQUAL => ">="
| TOKEN_EQUAL => "=="
| TOKEN_NOTEQUAL => "!="
| TOKEN_NOT => "!"
| TOKEN_ASSIGN => "="
| TOKEN_AND => "&&"
| TOKEN_OR => "||"
| TOKEN_LEFTPAREN => "("
| TOKEN_RIGHTPAREN => ")"
| TOKEN_LEFTBRACE => "{"
| TOKEN_RIGHTBRACE => "}"
| TOKEN_SEMICOLON => ";"
| TOKEN_COMMA => ","
| TOKEN_IDENTIFIER => "Ident"
| TOKEN_INTEGER => "Integer literal"
| TOKEN_STRING => "String literal"
| TOKEN_END_OF_INPUT => "EOI"
(********************************************************************)
(* A perfect hash for the lexical token names.
This hash was generated by GNU gperf and then translated to
reasonable ATS by hand. Note, though, that one could have embedded
the generated C code directly and used it. *)
#define MIN_WORD_LENGTH 5
#define MAX_WORD_LENGTH 15
#define MIN_HASH_VALUE 5
#define MAX_HASH_VALUE 64
#define HASH_TABLE_SIZE 65
local
extern castfn u : {n : nat | n < 256} int n -<> uint8 n
in
vtypedef asso_values_vt = @[[n : nat | n < 256] uint8 n][256]
var asso_values =
@[[n : nat | n < 256] uint8 n][256]
(u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 10, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 0, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 0, u 65, u 25,
u 5, u 5, u 0, u 15, u 65, u 0, u 65, u 65, u 10, u 65,
u 30, u 0, u 65, u 5, u 10, u 10, u 0, u 15, u 65, u 65,
u 65, u 5, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65, u 65,
u 65, u 65, u 65, u 65, u 65, u 65)
end
fn
get_asso_value {i : nat | i < 256}
(i : uint i) :<>
[n : nat | n < 256] uint n =
let
extern castfn u8ui : {n : nat} uint8 n -<> uint n
extern castfn mk_asso_values :<>
{p : addr} ptr p -<> (asso_values_vt @ p | ptr p)
val asso_values_tup = mk_asso_values (addr@ asso_values)
macdef asso_values = !(asso_values_tup.1)
val retval = asso_values[i]
val _ = $UN.castvwtp0{void} asso_values_tup
in
u8ui retval
end
fn
hash {n : int | MIN_WORD_LENGTH <= n; n <= MAX_WORD_LENGTH}
(str : string n,
len : size_t n) :<>
[key : nat] uint key =
let
extern castfn uc2ui : {n : nat} uchar n -<> uint n
val c1 = uc2ui (c2uc str[4])
val c2 = uc2ui (c2uc str[pred len])
in
sz2u len + get_asso_value c1 + get_asso_value c2
end
typedef wordlist_vt = @[(String, token_t)][HASH_TABLE_SIZE]
var wordlist =
@[(String, token_t)][HASH_TABLE_SIZE]
(("", 0), ("", 0), ("", 0), ("", 0), ("", 0),
("Comma", 26),
("Op_not", 17),
("", 0), ("", 0), ("", 0),
("Keyword_if", 1),
("Op_mod", 7),
("End_of_input", 30),
("Keyword_print", 2),
("Op_divide", 6),
("RightBrace", 24),
("Op_add", 8),
("Keyword_else", 0),
("Keyword_while", 4),
("Op_negate", 10),
("Identifier", 27),
("Op_notequal", 16),
("Op_less", 11),
("Op_equal", 15),
("LeftBrace", 23),
("Op_or", 20),
("Op_subtract", 9),
("Op_lessequal", 12),
("", 0), ("", 0),
("Op_greater", 13),
("Op_multiply", 5 ),
("Integer", 28),
("", 0), ("", 0),
("Op_greaterequal", 14),
("", 0),
("Keyword_putc", 3),
("", 0),
("LeftParen", 21),
("RightParen", 22),
("Op_and", 19),
("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0),
("Op_assign", 18),
("", 0),
("String", 29),
("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0),
("", 0), ("", 0), ("", 0), ("", 0), ("", 0),
("Semicolon", 25))
fn
get_wordlist_entry
{n : nat | n <= MAX_HASH_VALUE}
(key : uint n) :<> (String, token_t) =
let
extern castfn mk_wordlist_tup :<>
{p : addr} ptr p -<> (wordlist_vt @ p | ptr p)
val wordlist_tup = mk_wordlist_tup (addr@ wordlist)
macdef wordlist = !(wordlist_tup.1)
val retval = wordlist[key]
val _ = $UN.castvwtp0{void} wordlist_tup
in
retval
end
fn
string2token_t_opt
{n : int}
(str : string n) :<>
Option token_t =
let
val len = string_length str
in
if len < i2sz MIN_WORD_LENGTH then
None ()
else if i2sz MAX_WORD_LENGTH < len then
None ()
else
let
val key = hash (str, len)
in
if i2u MAX_HASH_VALUE < key then
None ()
else
let
val (s, tok) = get_wordlist_entry (key)
in
if str <> s then
None ()
else
Some tok
end
end
end
(********************************************************************)
exception bad_lex_integer of (String)
exception bad_lex_token_name of (String)
exception bad_string_literal of (String)
extern fun {}
skip_something$pred : char -<> bool
fn {}
skip_something {n : nat}
{i : nat | i <= n}
(s : string n,
n : size_t n,
i : size_t i) :<>
[j : nat | i <= j; j <= n]
size_t j =
let
fun
loop {k : nat | i <= k; k <= n} .<n - k>.
(k : size_t k) :<>
[j : nat | i <= j; j <= n]
size_t j =
if k = n then
k
else if ~(skip_something$pred<> s[k]) then
k
else
loop (succ k)
in
loop i
end
fn
skip_space {n : nat}
{i : nat | i <= n}
(s : string n,
n : size_t n,
i : size_t i) :<>
[j : nat | i <= j; j <= n]
size_t j =
let
implement skip_something$pred<> (c) = isspace c
in
skip_something (s, n, i)
end
fn
skip_nonspace {n : nat}
{i : nat | i <= n}
(s : string n,
n : size_t n,
i : size_t i) :<>
[j : nat | i <= j; j <= n]
size_t j =
let
implement skip_something$pred<> (c) = ~isspace c
in
skip_something (s, n, i)
end
fn
skip_nonquote {n : nat}
{i : nat | i <= n}
(s : string n,
n : size_t n,
i : size_t i) :<>
[j : nat | i <= j; j <= n]
size_t j =
let
implement skip_something$pred<> (c) = c <> '"'
in
skip_something (s, n, i)
end
fn
skip_string_literal
{n : nat}
{i : nat | i <= n}
(s : string n,
n : size_t n,
i : size_t i) :<>
[j : nat | i <= j; j <= n]
size_t j =
if i = n then
i
else if s[i] <> '"' then
i
else
let
val j = skip_nonquote (s, n, succ i)
in
if j = n then
i
else
succ j
end
fn
get_substr {n, i, j : nat | i <= j; j <= n}
(s : string n,
i : size_t i,
j : size_t j) :
[m : int | m == j - i] string m =
let
val s = string_make_substring (s, i, j - i)
in
strnptr2string s
end
fn
string2ullint
{n : nat}
(s : string n) : ullint =
let
val n = string_length s
in
if n = i2sz 0 then
$raise bad_lex_integer ("")
else
let
extern castfn u2ull : uint -<> ullint
fun
evaluate {k : nat | k <= n} .<n - k>.
(k : size_t k,
v : ullint) : ullint =
if k = n then
v
else if ~isdigit s[k] then
$raise bad_lex_integer (s)
else
let
val d = char2ui s[k] - char2ui '0'
in
evaluate (succ k, (10ULL * v) + u2ull d)
end
in
evaluate (i2sz 0, 0ULL)
end
end
fn
string2token {n : int}
(str : string n) : token_t =
case+ string2token_t_opt str of
| None () => $raise bad_lex_token_name (str)
| Some tok => tok
fn
read_lex_file (inpf : FILEref) : List0 tokentuple_t =
(* Convert the output of "lex" to a list of tokens. *)
(* This routine could stand to do more validation of the input. *)
let
fun
loop (lst : List0 tokentuple_t) : List0 tokentuple_t =
if fileref_is_eof inpf then
lst
else
let
val s = strptr2string (fileref_get_line_string inpf)
val n = string_length s
prval _ = lemma_g1uint_param n
val i0_line_no = skip_space (s, n, i2sz 0)
in
if i0_line_no = n then
(* Skip any blank lines, including end of file. *)
loop lst
else
let
val i1_line_no = skip_nonspace (s, n, i0_line_no)
val s_line_no = get_substr (s, i0_line_no, i1_line_no)
val line_no = string2ullint s_line_no
val i0_column_no = skip_space (s, n, i1_line_no)
val i1_column_no = skip_nonspace (s, n, i0_column_no)
val s_column_no = get_substr (s, i0_column_no,
i1_column_no)
val column_no = string2ullint s_column_no
val i0_tokname = skip_space (s, n, i1_column_no)
val i1_tokname = skip_nonspace (s, n, i0_tokname)
val tokname = get_substr (s, i0_tokname, i1_tokname)
val tok = string2token tokname
in
case+ tok of
| TOKEN_INTEGER =>
let
val i0 = skip_space (s, n, i1_tokname)
val i1 = skip_nonspace (s, n, i0)
val arg = get_substr (s, i0, i1)
val toktup = (tok, arg, line_no, column_no)
in
loop (toktup :: lst)
end
| TOKEN_IDENTIFIER =>
let
val i0 = skip_space (s, n, i1_tokname)
val i1 = skip_nonspace (s, n, i0)
val arg = get_substr (s, i0, i1)
val toktup = (tok, arg, line_no, column_no)
in
loop (toktup :: lst)
end
| TOKEN_STRING =>
let
val i0 = skip_space (s, n, i1_tokname)
val i1 = skip_string_literal (s, n, i0)
val arg = get_substr (s, i0, i1)
val toktup = (tok, arg, line_no, column_no)
in
loop (toktup :: lst)
end
| _ =>
let
val toktup = (tok, "", line_no, column_no)
in
loop (toktup :: lst)
end
end
end
in
list_vt2t (list_reverse (loop NIL))
end
(********************************************************************)
exception truncated_lexical of ()
exception unexpected_token of (tokentuple_t, token_t)
exception unexpected_primary of (tokentuple_t)
exception unterminated_statement_block of (ullint, ullint)
exception expected_a_statement of (tokentuple_t)
datatype node_t =
| node_t_nil of ()
| node_t_leaf of (String, String)
| node_t_cons of (String, node_t, node_t)
fn
right_assoc (tok : token_t) : bool =
(* None of the currently supported operators is right
associative. *)
false
fn
binary_op (tok : token_t) : bool =
case+ tok of
| TOKEN_ADD => true
| TOKEN_SUBTRACT => true
| TOKEN_MULTIPLY => true
| TOKEN_DIVIDE => true
| TOKEN_MOD => true
| TOKEN_LESS => true
| TOKEN_LESSEQUAL => true
| TOKEN_GREATER => true
| TOKEN_GREATEREQUAL => true
| TOKEN_EQUAL => true
| TOKEN_NOTEQUAL => true
| TOKEN_AND => true
| TOKEN_OR => true
| _ => false
fn
precedence (tok : token_t) : int =
case+ tok of
| TOKEN_MULTIPLY => 13
| TOKEN_DIVIDE => 13
| TOKEN_MOD => 13
| TOKEN_ADD => 12
| TOKEN_SUBTRACT => 12
| TOKEN_NEGATE => 14
| TOKEN_NOT => 14
| TOKEN_LESS => 10
| TOKEN_LESSEQUAL => 10
| TOKEN_GREATER => 10
| TOKEN_GREATEREQUAL => 10
| TOKEN_EQUAL => 9
| TOKEN_NOTEQUAL => 9
| TOKEN_AND => 5
| TOKEN_OR => 4
| _ => ~1
fn
opname (tok : token_t) : String =
case- tok of
| TOKEN_MULTIPLY => "Multiply"
| TOKEN_DIVIDE => "Divide"
| TOKEN_MOD => "Mod"
| TOKEN_ADD => "Add"
| TOKEN_SUBTRACT => "Subtract"
| TOKEN_NEGATE => "Negate"
| TOKEN_NOT => "Not"
| TOKEN_LESS => "Less"
| TOKEN_LESSEQUAL => "LessEqual"
| TOKEN_GREATER => "Greater"
| TOKEN_GREATEREQUAL => "GreaterEqual"
| TOKEN_EQUAL => "Equal"
| TOKEN_NOTEQUAL => "NotEqual"
| TOKEN_AND => "And"
| TOKEN_OR => "Or"
fn
parse (lex : List0 tokentuple_t) : node_t =
let
typedef toktups_t (n : int) = list (tokentuple_t, n)
typedef toktups_t = [n : nat] toktups_t n
fn
expect (expected : token_t,
lex : toktups_t) : toktups_t =
case+ lex of
| NIL => $raise truncated_lexical ()
| toktup :: tail =>
if toktup.0 = expected then
tail
else
$raise unexpected_token (toktup, expected)
fn
peek {n : int} (lex : toktups_t n) : [1 <= n] token_t =
case+ lex of
| NIL => $raise truncated_lexical ()
| (tok, _, _, _) :: _ => tok
fun
stmt (lex : toktups_t) : (node_t, toktups_t) =
case+ lex of
| NIL => $raise truncated_lexical ()
| (TOKEN_IF, _, _, _) :: lex =>
let
val (e, lex) = paren_expr lex
val (s, lex) = stmt lex
in
case+ lex of
| (TOKEN_ELSE, _, _, _) :: lex =>
let
val (t, lex) = stmt lex
in
(node_t_cons ("If", e, node_t_cons ("If", s, t)), lex)
end
| _ =>
let
(* There is no 'else' clause. *)
val t = node_t_nil ()
in
(node_t_cons ("If", e, node_t_cons ("If", s, t)), lex)
end
end
| (TOKEN_PUTC, _, _, _) :: lex =>
let
val (subtree, lex) = paren_expr lex
val subtree = node_t_cons ("Prtc", subtree, node_t_nil ())
val lex = expect (TOKEN_SEMICOLON, lex)
in
(subtree, lex)
end
| (TOKEN_PRINT, _, _, _) :: lex =>
let
val lex = expect (TOKEN_LEFTPAREN, lex)
fun
loop_over_args (subtree : node_t,
lex : toktups_t) : (node_t, toktups_t) =
case+ lex of
| (TOKEN_STRING, arg, _, _) ::
(TOKEN_COMMA, _, _, _) :: lex =>
let
val leaf = node_t_leaf ("String", arg)
val e = node_t_cons ("Prts", leaf, node_t_nil ())
in
loop_over_args
(node_t_cons ("Sequence", subtree, e), lex)
end
| (TOKEN_STRING, arg, _, _) :: lex =>
let
val lex = expect (TOKEN_RIGHTPAREN, lex)
val lex = expect (TOKEN_SEMICOLON, lex)
val leaf = node_t_leaf ("String", arg)
val e = node_t_cons ("Prts", leaf, node_t_nil ())
in
(node_t_cons ("Sequence", subtree, e), lex)
end
| _ :: _ =>
let
val (x, lex) = expr (0, lex)
val e = node_t_cons ("Prti", x, node_t_nil ())
val subtree = node_t_cons ("Sequence", subtree, e)
in
case+ peek lex of
| TOKEN_COMMA =>
let
val lex = expect (TOKEN_COMMA, lex)
in
loop_over_args (subtree, lex)
end
| _ =>
let
val lex = expect (TOKEN_RIGHTPAREN, lex)
val lex = expect (TOKEN_SEMICOLON, lex)
in
(subtree, lex)
end
end
| NIL => $raise truncated_lexical ()
in
loop_over_args (node_t_nil (), lex)
end
| (TOKEN_SEMICOLON, _, _, _) :: lex => (node_t_nil (), lex)
| (TOKEN_IDENTIFIER, arg, _, _) :: lex =>
let
val v = node_t_leaf ("Identifier", arg)
val lex = expect (TOKEN_ASSIGN, lex)
val (subtree, lex) = expr (0, lex)
val t = node_t_cons ("Assign", v, subtree)
val lex = expect (TOKEN_SEMICOLON, lex)
in
(t, lex)
end
| (TOKEN_WHILE, _, _, _) :: lex =>
let
val (e, lex) = paren_expr lex
val (t, lex) = stmt lex
in
(node_t_cons ("While", e, t), lex)
end
| (TOKEN_LEFTBRACE, _, _, _) :: lex =>
let
fun
loop_over_stmts (subtree : node_t,
lex : toktups_t) :
(node_t, toktups_t) =
case+ lex of
| (TOKEN_RIGHTBRACE, _, _, _) :: lex => (subtree, lex)
| (TOKEN_END_OF_INPUT, _, line_no, column_no) :: _ =>
$raise unterminated_statement_block (line_no, column_no)
| _ =>
let
val (e, lex) = stmt lex
in
loop_over_stmts
(node_t_cons ("Sequence", subtree, e), lex)
end
in
loop_over_stmts (node_t_nil (), lex)
end
| (TOKEN_END_OF_INPUT, _, _, _) :: lex => (node_t_nil (), lex)
| toktup :: _ => $raise expected_a_statement (toktup)
and
expr (prec : int,
lex : toktups_t) : (node_t, toktups_t) =
case+ lex of
| (TOKEN_LEFTPAREN, _, _, _) :: _ =>
(* '(' expr ')' *)
let
val (subtree, lex) = paren_expr lex
in
prec_climb (prec, subtree, lex)
end
| (TOKEN_ADD, _, _, _) :: lex =>
(* '+' expr *)
let
val (subtree, lex) = expr (precedence TOKEN_ADD, lex)
in
prec_climb (prec, subtree, lex)
end
| (TOKEN_SUBTRACT, _, _, _) :: lex =>
(* '-' expr *)
let
val (subtree, lex) = expr (precedence TOKEN_NEGATE, lex)
val subtree = node_t_cons ("Negate", subtree, node_t_nil ())
in
prec_climb (prec, subtree, lex)
end
| (TOKEN_NOT, _, _, _) :: lex =>
(* '!' expr *)
let
val (subtree, lex) = expr (precedence TOKEN_NOT, lex)
val subtree = node_t_cons ("Not", subtree, node_t_nil ())
in
prec_climb (prec, subtree, lex)
end
| (TOKEN_IDENTIFIER, arg, _, _) :: lex =>
let
val leaf = node_t_leaf ("Identifier", arg)
in
prec_climb (prec, leaf, lex)
end
| (TOKEN_INTEGER, arg, _, _) :: lex =>
let
val leaf = node_t_leaf ("Integer", arg)
in
prec_climb (prec, leaf, lex)
end
| toktup :: lex =>
$raise unexpected_primary (toktup)
| NIL =>
$raise truncated_lexical ()
and
prec_climb (prec : int,
subtree : node_t,
lex : toktups_t) : (node_t, toktups_t) =
case+ peek lex of
| tokval =>
if ~binary_op tokval then
(subtree, lex)
else if precedence tokval < prec then
(subtree, lex)
else
case+ lex of
| toktup :: lex =>
let
val q =
if right_assoc (toktup.0) then
precedence tokval
else
succ (precedence tokval)
val (e, lex) = expr (q, lex)
val subtree1 =
node_t_cons (opname (toktup.0), subtree, e)
in
prec_climb (prec, subtree1, lex)
end
and
paren_expr (lex : toktups_t) : (node_t, toktups_t) =
(* '(' expr ')' *)
let
val lex = expect (TOKEN_LEFTPAREN, lex)
val (subtree, lex) = expr (0, lex)
val lex = expect (TOKEN_RIGHTPAREN, lex)
in
(subtree, lex)
end
fun
main_loop (subtree : node_t,
lex : toktups_t) : node_t =
case+ peek lex of
| TOKEN_END_OF_INPUT => subtree
| _ =>
let
val (x, lex) = stmt lex
in
main_loop (node_t_cons ("Sequence", subtree, x), lex)
end
in
main_loop (node_t_nil (), lex)
end
fn
print_ast (outf : FILEref,
ast : node_t) : void =
let
fun
traverse (ast : node_t) : void =
case+ ast of
| node_t_nil () => fprintln! (outf, ";")
| node_t_leaf (str, arg) => fprintln! (outf, str, " ", arg)
| node_t_cons (str, left, right) =>
begin
fprintln! (outf, str);
traverse left;
traverse right
end
in
traverse ast
end
(********************************************************************)
fn
main_program (inpf : FILEref,
outf : FILEref) : int =
let
val toklst = read_lex_file inpf
val ast = parse toklst
val () = print_ast (outf, ast)
in
0
end
fn
error_start (line_no : ullint,
column_no : ullint) : void =
print! ("(", line_no, ", ", column_no, ") error: ")
implement
main (argc, argv) =
let
val inpfname =
if 2 <= argc then
$UN.cast{string} argv[1]
else
"-"
val outfname =
if 3 <= argc then
$UN.cast{string} argv[2]
else
"-"
in
try
let
val inpf =
if (inpfname : string) = "-" then
stdin_ref
else
fileref_open_exn (inpfname, file_mode_r)
val outf =
if (outfname : string) = "-" then
stdout_ref
else
fileref_open_exn (outfname, file_mode_w)
in
main_program (inpf, outf)
end
with
| ~ unexpected_primary @(tok, _, line_no, column_no) =>
begin
error_start (line_no, column_no);
println! ("Expecting a primary, found: ", token_text tok);
1
end
| ~ unexpected_token (@(tok, _, line_no, column_no), expected) =>
begin
error_start (line_no, column_no);
println! ("Expecting '", token_text expected,
"', found '", token_text tok, "'");
1
end
| ~ expected_a_statement @(tok, _, line_no, column_no) =>
begin
error_start (line_no, column_no);
println! ("expecting start of statement, found '",
token_text tok, "'");
1
end
| ~ unterminated_statement_block (line_no, column_no) =>
begin
error_start (line_no, column_no);
println! ("unterminated statement block");
1
end
| ~ truncated_lexical () =>
begin
println! ("truncated input token stream");
2
end
| ~ bad_lex_integer (s) =>
begin
println! ("bad integer literal in the token stream: '",
s, "'");
2
end
| ~ bad_string_literal (s) =>
begin
println! ("bad string literal in the token stream: '",
s, "'");
2
end
| ~ bad_lex_token_name (s) =>
begin
println! ("bad token name in the token stream: '",
s, "'");
2
end
end
(********************************************************************)
- Output:
$ patscc -O2 -DATS_MEMALLOC_GCBDW -o parse parse-in-ATS.dats -latslib -lgc && ./lex compiler-tests/primes.t | ./parse Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
AWK
Tested with gawk 4.1.1 and mawk 1.3.4.
function Token_assign(tk, attr, attr_array, n, i) {
n=split(attr, attr_array)
for(i=1; i<=n; i++)
Tokens[tk,i-1] = attr_array[i]
}
#*** show error and exit
function error(msg) {
printf("(%s, %s) %s\n", err_line, err_col, msg)
exit(1)
}
function gettok( line, n, i) {
getline line
if (line == "")
error("empty line")
n=split(line, line_list)
# line col Ident var_name
# 1 2 3 4
err_line = line_list[1]
err_col = line_list[2]
tok_text = line_list[3]
tok = all_syms[tok_text]
for (i=5; i<=n; i++)
line_list[4] = line_list[4] " " line_list[i]
if (tok == "")
error("Unknown token " tok_text)
tok_other = ""
if (tok == "tk_Integer" || tok == "tk_Ident" || tok =="tk_String")
tok_other = line_list[4]
}
function make_node(oper, left, right, value) {
node_type [next_free_node_index] = oper
node_left [next_free_node_index] = left
node_right[next_free_node_index] = right
node_value[next_free_node_index] = value
return next_free_node_index ++
}
function make_leaf(oper, n) {
return make_node(oper, 0, 0, n)
}
function expect(msg, s) {
if (tok == s) {
gettok()
return
}
error(msg ": Expecting '" Tokens[s,TK_NAME] "', found '" Tokens[tok,TK_NAME] "'")
}
function expr(p, x, op, node) {
x = 0
if (tok == "tk_Lparen") {
x = paren_expr()
} else if (tok == "tk_Sub" || tok == "tk_Add") {
if (tok == "tk_Sub")
op = "tk_Negate"
else
op = "tk_Add"
gettok()
node = expr(Tokens["tk_Negate",TK_PRECEDENCE]+0)
if (op == "tk_Negate")
x = make_node("nd_Negate", node)
else
x = node
} else if (tok == "tk_Not") {
gettok()
x = make_node("nd_Not", expr(Tokens["tk_Not",TK_PRECEDENCE]+0))
} else if (tok == "tk_Ident") {
x = make_leaf("nd_Ident", tok_other)
gettok()
} else if (tok == "tk_Integer") {
x = make_leaf("nd_Integer", tok_other)
gettok()
} else {
error("Expecting a primary, found: " Tokens[tok,TK_NAME])
}
while (((Tokens[tok,TK_IS_BINARY]+0) > 0) && ((Tokens[tok,TK_PRECEDENCE]+0) >= p)) {
op = tok
gettok()
q = Tokens[op,TK_PRECEDENCE]+0
if (! (Tokens[op,TK_RIGHT_ASSOC]+0 > 0))
q += 1
node = expr(q)
x = make_node(Tokens[op,TK_NODE], x, node)
}
return x
}
function paren_expr( node) {
expect("paren_expr", "tk_Lparen")
node = expr(0)
expect("paren_expr", "tk_Rparen")
return node
}
function stmt( t, e, s, s2, v) {
t = 0
if (tok == "tk_If") {
gettok()
e = paren_expr()
s = stmt()
s2 = 0
if (tok == "tk_Else") {
gettok()
s2 = stmt()
}
t = make_node("nd_If", e, make_node("nd_If", s, s2))
} else if (tok == "tk_Putc") {
gettok()
e = paren_expr()
t = make_node("nd_Prtc", e)
expect("Putc", "tk_Semi")
} else if (tok == "tk_Print") {
gettok()
expect("Print", "tk_Lparen")
while (1) {
if (tok == "tk_String") {
e = make_node("nd_Prts", make_leaf("nd_String", tok_other))
gettok()
} else {
e = make_node("nd_Prti", expr(0))
}
t = make_node("nd_Sequence", t, e)
if (tok != "tk_Comma")
break
gettok()
}
expect("Print", "tk_Rparen")
expect("Print", "tk_Semi")
} else if (tok == "tk_Semi") {
gettok()
} else if (tok == "tk_Ident") {
v = make_leaf("nd_Ident", tok_other)
gettok()
expect("assign", "tk_Assign")
e = expr(0)
t = make_node("nd_Assign", v, e)
expect("assign", "tk_Semi")
} else if (tok == "tk_While") {
gettok()
e = paren_expr()
s = stmt()
t = make_node("nd_While", e, s)
} else if (tok == "tk_Lbrace") {
gettok()
while (tok != "tk_Rbrace" && tok != "tk_EOI")
t = make_node("nd_Sequence", t, stmt())
expect("Lbrace", "tk_Rbrace")
} else if (tok == "tk_EOI") {
} else {
error("Expecting start of statement, found: " Tokens[tok,TK_NAME])
}
return t
}
function parse( t) {
t = 0 # None
gettok()
while (1) {
t = make_node("nd_Sequence", t, stmt())
if (tok == "tk_EOI" || t == 0)
break
}
return t
}
function prt_ast(t) {
if (t == 0) {
print(";")
} else {
printf("%-14s", Display_nodes[node_type[t]])
if ((node_type[t] == "nd_Ident") || (node_type[t] == "nd_Integer"))
printf("%s\n", node_value[t])
else if (node_type[t] == "nd_String") {
printf("%s\n", node_value[t])
} else {
print("")
prt_ast(node_left[t])
prt_ast(node_right[t])
}
}
}
BEGIN {
all_syms["End_of_input" ] = "tk_EOI"
all_syms["Op_multiply" ] = "tk_Mul"
all_syms["Op_divide" ] = "tk_Div"
all_syms["Op_mod" ] = "tk_Mod"
all_syms["Op_add" ] = "tk_Add"
all_syms["Op_subtract" ] = "tk_Sub"
all_syms["Op_negate" ] = "tk_Negate"
all_syms["Op_not" ] = "tk_Not"
all_syms["Op_less" ] = "tk_Lss"
all_syms["Op_lessequal" ] = "tk_Leq"
all_syms["Op_greater" ] = "tk_Gtr"
all_syms["Op_greaterequal" ] = "tk_Geq"
all_syms["Op_equal" ] = "tk_Eq"
all_syms["Op_notequal" ] = "tk_Neq"
all_syms["Op_assign" ] = "tk_Assign"
all_syms["Op_and" ] = "tk_And"
all_syms["Op_or" ] = "tk_Or"
all_syms["Keyword_if" ] = "tk_If"
all_syms["Keyword_else" ] = "tk_Else"
all_syms["Keyword_while" ] = "tk_While"
all_syms["Keyword_print" ] = "tk_Print"
all_syms["Keyword_putc" ] = "tk_Putc"
all_syms["LeftParen" ] = "tk_Lparen"
all_syms["RightParen" ] = "tk_Rparen"
all_syms["LeftBrace" ] = "tk_Lbrace"
all_syms["RightBrace" ] = "tk_Rbrace"
all_syms["Semicolon" ] = "tk_Semi"
all_syms["Comma" ] = "tk_Comma"
all_syms["Identifier" ] = "tk_Ident"
all_syms["Integer" ] = "tk_Integer"
all_syms["String" ] = "tk_String"
Display_nodes["nd_Ident" ] = "Identifier"
Display_nodes["nd_String" ] = "String"
Display_nodes["nd_Integer" ] = "Integer"
Display_nodes["nd_Sequence"] = "Sequence"
Display_nodes["nd_If" ] = "If"
Display_nodes["nd_Prtc" ] = "Prtc"
Display_nodes["nd_Prts" ] = "Prts"
Display_nodes["nd_Prti" ] = "Prti"
Display_nodes["nd_While" ] = "While"
Display_nodes["nd_Assign" ] = "Assign"
Display_nodes["nd_Negate" ] = "Negate"
Display_nodes["nd_Not" ] = "Not"
Display_nodes["nd_Mul" ] = "Multiply"
Display_nodes["nd_Div" ] = "Divide"
Display_nodes["nd_Mod" ] = "Mod"
Display_nodes["nd_Add" ] = "Add"
Display_nodes["nd_Sub" ] = "Subtract"
Display_nodes["nd_Lss" ] = "Less"
Display_nodes["nd_Leq" ] = "LessEqual"
Display_nodes["nd_Gtr" ] = "Greater"
Display_nodes["nd_Geq" ] = "GreaterEqual"
Display_nodes["nd_Eql" ] = "Equal"
Display_nodes["nd_Neq" ] = "NotEqual"
Display_nodes["nd_And" ] = "And"
Display_nodes["nd_Or" ] = "Or"
TK_NAME = 0
TK_RIGHT_ASSOC = 1
TK_IS_BINARY = 2
TK_IS_UNARY = 3
TK_PRECEDENCE = 4
TK_NODE = 5
Token_assign("tk_EOI" , "EOI 0 0 0 -1 -1 ")
Token_assign("tk_Mul" , "* 0 1 0 13 nd_Mul ")
Token_assign("tk_Div" , "/ 0 1 0 13 nd_Div ")
Token_assign("tk_Mod" , "% 0 1 0 13 nd_Mod ")
Token_assign("tk_Add" , "+ 0 1 0 12 nd_Add ")
Token_assign("tk_Sub" , "- 0 1 0 12 nd_Sub ")
Token_assign("tk_Negate" , "- 0 0 1 14 nd_Negate ")
Token_assign("tk_Not" , "! 0 0 1 14 nd_Not ")
Token_assign("tk_Lss" , "< 0 1 0 10 nd_Lss ")
Token_assign("tk_Leq" , "<= 0 1 0 10 nd_Leq ")
Token_assign("tk_Gtr" , "> 0 1 0 10 nd_Gtr ")
Token_assign("tk_Geq" , ">= 0 1 0 10 nd_Geq ")
Token_assign("tk_Eql" , "== 0 1 0 9 nd_Eql ")
Token_assign("tk_Neq" , "!= 0 1 0 9 nd_Neq ")
Token_assign("tk_Assign" , "= 0 0 0 -1 nd_Assign ")
Token_assign("tk_And" , "&& 0 1 0 5 nd_And ")
Token_assign("tk_Or" , "|| 0 1 0 4 nd_Or ")
Token_assign("tk_If" , "if 0 0 0 -1 nd_If ")
Token_assign("tk_Else" , "else 0 0 0 -1 -1 ")
Token_assign("tk_While" , "while 0 0 0 -1 nd_While ")
Token_assign("tk_Print" , "print 0 0 0 -1 -1 ")
Token_assign("tk_Putc" , "putc 0 0 0 -1 -1 ")
Token_assign("tk_Lparen" , "( 0 0 0 -1 -1 ")
Token_assign("tk_Rparen" , ") 0 0 0 -1 -1 ")
Token_assign("tk_Lbrace" , "{ 0 0 0 -1 -1 ")
Token_assign("tk_Rbrace" , "} 0 0 0 -1 -1 ")
Token_assign("tk_Semi" , "; 0 0 0 -1 -1 ")
Token_assign("tk_Comma" , ", 0 0 0 -1 -1 ")
Token_assign("tk_Ident" , "Ident 0 0 0 -1 nd_Ident ")
Token_assign("tk_Integer", "Integer 0 0 0 -1 nd_Integer")
Token_assign("tk_String" , "String 0 0 0 -1 nd_String ")
input_file = "-"
err_line = 0
err_col = 0
tok = ""
tok_text = ""
next_free_node_index = 1
if (ARGC > 1)
input_file = ARGV[1]
t = parse()
prt_ast(t)
}
- Output — count:
Sequence Sequence ; Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence ; Sequence Sequence Sequence ; Prts String "count is: " ; Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1
C
Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <stdbool.h>
#include <ctype.h>
#define NELEMS(arr) (sizeof(arr) / sizeof(arr[0]))
typedef enum {
tk_EOI, tk_Mul, tk_Div, tk_Mod, tk_Add, tk_Sub, tk_Negate, tk_Not, tk_Lss, tk_Leq, tk_Gtr,
tk_Geq, tk_Eql, tk_Neq, tk_Assign, tk_And, tk_Or, tk_If, tk_Else, tk_While, tk_Print,
tk_Putc, tk_Lparen, tk_Rparen, tk_Lbrace, tk_Rbrace, tk_Semi, tk_Comma, tk_Ident,
tk_Integer, tk_String
} TokenType;
typedef enum {
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While,
nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq,
nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or
} NodeType;
typedef struct {
TokenType tok;
int err_ln;
int err_col;
char *text; /* ident or string literal or integer value */
} tok_s;
typedef struct Tree {
NodeType node_type;
struct Tree *left;
struct Tree *right;
char *value;
} Tree;
// dependency: Ordered by tok, must remain in same order as TokenType enum
struct {
char *text, *enum_text;
TokenType tok;
bool right_associative, is_binary, is_unary;
int precedence;
NodeType node_type;
} atr[] = {
{"EOI", "End_of_input" , tk_EOI, false, false, false, -1, -1 },
{"*", "Op_multiply" , tk_Mul, false, true, false, 13, nd_Mul },
{"/", "Op_divide" , tk_Div, false, true, false, 13, nd_Div },
{"%", "Op_mod" , tk_Mod, false, true, false, 13, nd_Mod },
{"+", "Op_add" , tk_Add, false, true, false, 12, nd_Add },
{"-", "Op_subtract" , tk_Sub, false, true, false, 12, nd_Sub },
{"-", "Op_negate" , tk_Negate, false, false, true, 14, nd_Negate },
{"!", "Op_not" , tk_Not, false, false, true, 14, nd_Not },
{"<", "Op_less" , tk_Lss, false, true, false, 10, nd_Lss },
{"<=", "Op_lessequal" , tk_Leq, false, true, false, 10, nd_Leq },
{">", "Op_greater" , tk_Gtr, false, true, false, 10, nd_Gtr },
{">=", "Op_greaterequal", tk_Geq, false, true, false, 10, nd_Geq },
{"==", "Op_equal" , tk_Eql, false, true, false, 9, nd_Eql },
{"!=", "Op_notequal" , tk_Neq, false, true, false, 9, nd_Neq },
{"=", "Op_assign" , tk_Assign, false, false, false, -1, nd_Assign },
{"&&", "Op_and" , tk_And, false, true, false, 5, nd_And },
{"||", "Op_or" , tk_Or, false, true, false, 4, nd_Or },
{"if", "Keyword_if" , tk_If, false, false, false, -1, nd_If },
{"else", "Keyword_else" , tk_Else, false, false, false, -1, -1 },
{"while", "Keyword_while" , tk_While, false, false, false, -1, nd_While },
{"print", "Keyword_print" , tk_Print, false, false, false, -1, -1 },
{"putc", "Keyword_putc" , tk_Putc, false, false, false, -1, -1 },
{"(", "LeftParen" , tk_Lparen, false, false, false, -1, -1 },
{")", "RightParen" , tk_Rparen, false, false, false, -1, -1 },
{"{", "LeftBrace" , tk_Lbrace, false, false, false, -1, -1 },
{"}", "RightBrace" , tk_Rbrace, false, false, false, -1, -1 },
{";", "Semicolon" , tk_Semi, false, false, false, -1, -1 },
{",", "Comma" , tk_Comma, false, false, false, -1, -1 },
{"Ident", "Identifier" , tk_Ident, false, false, false, -1, nd_Ident },
{"Integer literal", "Integer" , tk_Integer, false, false, false, -1, nd_Integer},
{"String literal", "String" , tk_String, false, false, false, -1, nd_String },
};
char *Display_nodes[] = {"Identifier", "String", "Integer", "Sequence", "If", "Prtc",
"Prts", "Prti", "While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod",
"Add", "Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal",
"NotEqual", "And", "Or"};
static tok_s tok;
static FILE *source_fp, *dest_fp;
Tree *paren_expr();
void error(int err_line, int err_col, const char *fmt, ... ) {
va_list ap;
char buf[1000];
va_start(ap, fmt);
vsprintf(buf, fmt, ap);
va_end(ap);
printf("(%d, %d) error: %s\n", err_line, err_col, buf);
exit(1);
}
char *read_line(int *len) {
static char *text = NULL;
static int textmax = 0;
for (*len = 0; ; (*len)++) {
int ch = fgetc(source_fp);
if (ch == EOF || ch == '\n') {
if (*len == 0)
return NULL;
break;
}
if (*len + 1 >= textmax) {
textmax = (textmax == 0 ? 128 : textmax * 2);
text = realloc(text, textmax);
}
text[*len] = ch;
}
text[*len] = '\0';
return text;
}
char *rtrim(char *text, int *len) { // remove trailing spaces
for (; *len > 0 && isspace(text[*len - 1]); --(*len))
;
text[*len] = '\0';
return text;
}
TokenType get_enum(const char *name) { // return internal version of name
for (size_t i = 0; i < NELEMS(atr); i++) {
if (strcmp(atr[i].enum_text, name) == 0)
return atr[i].tok;
}
error(0, 0, "Unknown token %s\n", name);
return 0;
}
tok_s gettok() {
int len;
tok_s tok;
char *yytext = read_line(&len);
yytext = rtrim(yytext, &len);
// [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional
// get line and column
tok.err_ln = atoi(strtok(yytext, " "));
tok.err_col = atoi(strtok(NULL, " "));
// get the token name
char *name = strtok(NULL, " ");
tok.tok = get_enum(name);
// if there is extra data, get it
char *p = name + strlen(name);
if (p != &yytext[len]) {
for (++p; isspace(*p); ++p)
;
tok.text = strdup(p);
}
return tok;
}
Tree *make_node(NodeType node_type, Tree *left, Tree *right) {
Tree *t = calloc(sizeof(Tree), 1);
t->node_type = node_type;
t->left = left;
t->right = right;
return t;
}
Tree *make_leaf(NodeType node_type, char *value) {
Tree *t = calloc(sizeof(Tree), 1);
t->node_type = node_type;
t->value = strdup(value);
return t;
}
void expect(const char msg[], TokenType s) {
if (tok.tok == s) {
tok = gettok();
return;
}
error(tok.err_ln, tok.err_col, "%s: Expecting '%s', found '%s'\n", msg, atr[s].text, atr[tok.tok].text);
}
Tree *expr(int p) {
Tree *x = NULL, *node;
TokenType op;
switch (tok.tok) {
case tk_Lparen:
x = paren_expr();
break;
case tk_Sub: case tk_Add:
op = tok.tok;
tok = gettok();
node = expr(atr[tk_Negate].precedence);
x = (op == tk_Sub) ? make_node(nd_Negate, node, NULL) : node;
break;
case tk_Not:
tok = gettok();
x = make_node(nd_Not, expr(atr[tk_Not].precedence), NULL);
break;
case tk_Ident:
x = make_leaf(nd_Ident, tok.text);
tok = gettok();
break;
case tk_Integer:
x = make_leaf(nd_Integer, tok.text);
tok = gettok();
break;
default:
error(tok.err_ln, tok.err_col, "Expecting a primary, found: %s\n", atr[tok.tok].text);
}
while (atr[tok.tok].is_binary && atr[tok.tok].precedence >= p) {
TokenType op = tok.tok;
tok = gettok();
int q = atr[op].precedence;
if (!atr[op].right_associative)
q++;
node = expr(q);
x = make_node(atr[op].node_type, x, node);
}
return x;
}
Tree *paren_expr() {
expect("paren_expr", tk_Lparen);
Tree *t = expr(0);
expect("paren_expr", tk_Rparen);
return t;
}
Tree *stmt() {
Tree *t = NULL, *v, *e, *s, *s2;
switch (tok.tok) {
case tk_If:
tok = gettok();
e = paren_expr();
s = stmt();
s2 = NULL;
if (tok.tok == tk_Else) {
tok = gettok();
s2 = stmt();
}
t = make_node(nd_If, e, make_node(nd_If, s, s2));
break;
case tk_Putc:
tok = gettok();
e = paren_expr();
t = make_node(nd_Prtc, e, NULL);
expect("Putc", tk_Semi);
break;
case tk_Print: /* print '(' expr {',' expr} ')' */
tok = gettok();
for (expect("Print", tk_Lparen); ; expect("Print", tk_Comma)) {
if (tok.tok == tk_String) {
e = make_node(nd_Prts, make_leaf(nd_String, tok.text), NULL);
tok = gettok();
} else
e = make_node(nd_Prti, expr(0), NULL);
t = make_node(nd_Sequence, t, e);
if (tok.tok != tk_Comma)
break;
}
expect("Print", tk_Rparen);
expect("Print", tk_Semi);
break;
case tk_Semi:
tok = gettok();
break;
case tk_Ident:
v = make_leaf(nd_Ident, tok.text);
tok = gettok();
expect("assign", tk_Assign);
e = expr(0);
t = make_node(nd_Assign, v, e);
expect("assign", tk_Semi);
break;
case tk_While:
tok = gettok();
e = paren_expr();
s = stmt();
t = make_node(nd_While, e, s);
break;
case tk_Lbrace: /* {stmt} */
for (expect("Lbrace", tk_Lbrace); tok.tok != tk_Rbrace && tok.tok != tk_EOI;)
t = make_node(nd_Sequence, t, stmt());
expect("Lbrace", tk_Rbrace);
break;
case tk_EOI:
break;
default: error(tok.err_ln, tok.err_col, "expecting start of statement, found '%s'\n", atr[tok.tok].text);
}
return t;
}
Tree *parse() {
Tree *t = NULL;
tok = gettok();
do {
t = make_node(nd_Sequence, t, stmt());
} while (t != NULL && tok.tok != tk_EOI);
return t;
}
void prt_ast(Tree *t) {
if (t == NULL)
printf(";\n");
else {
printf("%-14s ", Display_nodes[t->node_type]);
if (t->node_type == nd_Ident || t->node_type == nd_Integer || t->node_type == nd_String) {
printf("%s\n", t->value);
} else {
printf("\n");
prt_ast(t->left);
prt_ast(t->right);
}
}
}
void init_io(FILE **fp, FILE *std, const char mode[], const char fn[]) {
if (fn[0] == '\0')
*fp = std;
else if ((*fp = fopen(fn, mode)) == NULL)
error(0, 0, "Can't open %s\n", fn);
}
int main(int argc, char *argv[]) {
init_io(&source_fp, stdin, "r", argc > 1 ? argv[1] : "");
init_io(&dest_fp, stdout, "wb", argc > 2 ? argv[2] : "");
prt_ast(parse());
}
- Output — prime numbers AST:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
COBOL
Code by Steve Williams. Tested with GnuCOBOL 2.2.
>>SOURCE FORMAT IS FREE
identification division.
*> this code is dedicated to the public domain
*> (GnuCOBOL) 2.3-dev.0
*> for extra credit, generate this program directly from the EBNF
program-id. parser.
environment division.
configuration section.
repository. function all intrinsic.
input-output section.
file-control.
select input-file assign using input-name
status is input-status
organization is line sequential.
data division.
file section.
fd input-file global.
01 input-record global.
03 input-line pic zzzz9.
03 input-column pic zzzzzz9.
03 filler pic x(3).
03 input-token pic x(16).
03 input-value pic x(48).
working-storage section.
01 program-name pic x(32) value spaces global.
01 input-name pic x(32) value spaces global.
01 input-status pic xx global.
01 line-no pic 999 value 0.
01 col-no pic 99 value 0.
01 error-record global.
03 error-line-no pic zzzz9.
03 error-col-no pic zzzzzz9.
03 filler pic x value space.
03 error-message pic x(64) value spaces.
01 token global.
03 token-type pic x(16).
03 token-line pic 999.
03 token-column pic 99.
03 token-value pic x(48).
01 parse-stack global.
03 p pic 999 value 0.
03 p-lim pic 999 value 200.
03 p-zero pic 999 value 0.
03 parse-entry occurs 200.
05 parse-name pic x(24).
05 parse-token pic x(16).
05 parse-left pic 999.
05 parse-right pic 999.
05 parse-work pic 999.
05 parse-work1 pic 999.
01 abstract-syntax-tree global.
03 t pic 999 value 0.
03 t1 pic 999.
03 t-lim pic 999 value 998.
03 filler occurs 998.
05 leaf.
07 leaf-type pic x(14).
07 leaf-value pic x(48).
05 node redefines leaf.
07 node-type pic x(14).
07 node-left pic 999.
07 node-right pic 999.
01 indent pic x(200) value all '| ' global.
procedure division chaining program-name.
start-parser.
if program-name <> spaces
string program-name delimited by space '.lex' into input-name
open input input-file
if input-status <> '00'
string 'in parser ' trim(input-name) ' open status ' input-status
into error-message
call 'reporterror'
end-if
end-if
call 'gettoken'
call 'stmt_list'
if input-name <> spaces
close input-file
end-if
call 'printast' using t
>>d perform dump-ast
stop run
.
dump-ast.
display '==========' upon syserr
display 'ast:' upon syserr
display 't=' t upon syserr
perform varying t1 from 1 by 1 until t1 > t
if leaf-type(t1) = 'Identifier' or 'Integer' or 'String'
display t1 space trim(leaf-type(t1)) space trim(leaf-value(t1)) upon syserr
else
display t1 space node-left(t1) space node-right(t1) space trim(node-type(t1))
upon syserr
end-if
end-perform
.
identification division.
program-id. stmt_list common recursive.
data division.
procedure division.
start-stmt_list.
call 'push' using module-id
move p-zero to parse-left(p)
perform forever
call 'stmt'
move return-code to parse-right(p)
call 'makenode' using 'Sequence' parse-left(p) parse-right(p)
move return-code to parse-left(p)
if parse-right(p) = 0
or token-type = 'End_of_input'
exit perform
end-if
end-perform
call 'pop'
.
end program stmt_list.
identification division.
program-id. stmt common recursive.
procedure division.
start-stmt.
call 'push' using module-id
move p-zero to parse-left(p)
evaluate token-type
when 'Semicolon'
call 'gettoken'
when 'Identifier'
*>Identifier '=' expr ';'
call 'makeleaf' using 'Identifier' token-value
move return-code to parse-left(p)
call 'gettoken'
call 'expect' using 'Op_assign'
call 'expr'
move return-code to parse-right(p)
call 'expect' using 'Semicolon'
call 'makenode' using 'Assign' parse-left(p) parse-right(p)
move return-code to parse-left(p)
when 'Keyword_while'
*>'while' paren_expr '{' stmt '}'
call 'gettoken'
call 'paren_expr'
move return-code to parse-work(p)
call 'stmt'
move return-code to parse-right(p)
call 'makenode' using 'While' parse-work(p) parse-right(p)
move return-code to parse-left(p)
when 'Keyword_if'
*>'if' paren_expr stmt ['else' stmt]
call 'gettoken'
call 'paren_expr'
move return-code to parse-left(p)
call 'stmt'
move return-code to parse-work(p)
move p-zero to parse-work1(p)
if token-type = 'Keyword_else'
call 'gettoken'
call 'stmt'
move return-code to parse-work1(p)
end-if
call 'makenode' using 'If' parse-work(p) parse-work1(p)
move return-code to parse-right(p)
call 'makenode' using 'If' parse-left(p) parse-right(p)
move return-code to parse-left(p)
when 'Keyword_print'
*>'print' '(' prt_list ')' ';'
call 'gettoken'
call 'expect' using 'LeftParen'
call 'prt_list'
move return-code to parse-left(p)
call 'expect' using 'RightParen'
call 'expect' using 'Semicolon'
when 'Keyword_putc'
*>'putc' paren_expr ';'
call 'gettoken'
call 'paren_expr'
move return-code to parse-left(p)
call 'makenode' using 'Prtc' parse-left(p) p-zero
move return-code to parse-left(p)
call 'expect' using 'Semicolon'
when 'LeftBrace'
*>'{' stmt '}'
call 'gettoken'
move p-zero to parse-left(p)
perform until token-type = 'RightBrace' or 'End_of_input'
call 'stmt'
move return-code to parse-right(p)
call 'makenode' using 'Sequence' parse-left(p) parse-right(p)
move return-code to parse-left(p)
end-perform
if token-type <> 'End_of_input'
call 'gettoken'
end-if
when other
move 0 to parse-left(p)
end-evaluate
move parse-left(p) to return-code
call 'pop'
.
end program stmt.
identification division.
program-id. paren_expr common recursive.
procedure division.
start-paren_expr.
*>'(' expr ')' ;
call 'push' using module-id
call 'expect' using 'LeftParen'
call 'expr'
call 'expect' using 'RightParen'
call 'pop'
.
end program paren_expr.
identification division.
program-id. prt_list common.
procedure division.
start-prt_list.
*>(string | expr) {',' (String | expr)} ;
call 'push' using module-id
move p-zero to parse-work(p)
perform prt_entry
perform until token-type <> 'Comma'
call 'gettoken'
perform prt_entry
end-perform
call 'pop'
exit program
.
prt_entry.
if token-type = 'String'
call 'makeleaf' using token-type token-value
move return-code to parse-left(p)
call 'makenode' using 'Prts' parse-left(p) p-zero
call 'gettoken'
else
call 'expr'
move return-code to parse-left(p)
call 'makenode' using 'Prti' parse-left(p) p-zero
end-if
move return-code to parse-right(p)
call 'makenode' using 'Sequence' parse-work(p) parse-right(p)
move return-code to parse-work(p)
.
end program prt_list.
identification division.
program-id. expr common recursive.
procedure division.
start-expr.
*>and_expr {'||' and_expr} ;
call 'push' using module-id
call 'and_expr'
move return-code to parse-left(p)
perform forever
if token-type <> 'Op_or'
exit perform
end-if
call 'gettoken'
call 'and_expr'
move return-code to parse-right(p)
call 'makenode' using 'Or' parse-left(p) parse-right(p)
move return-code to parse-left(p)
end-perform
move parse-left(p) to return-code
call 'pop'
.
end program expr.
identification division.
program-id. and_expr common recursive.
procedure division.
start-and_expr.
*>equality_expr {'&&' equality_expr} ;
call 'push' using module-id
call 'equality_expr'
move return-code to parse-left(p)
perform forever
if token-type <> 'Op_and'
exit perform
end-if
call 'gettoken'
call 'equality_expr'
move return-code to parse-right(p)
call 'makenode' using 'And' parse-left(p) parse-right(p)
move return-code to parse-left(p)
end-perform
call 'pop'
.
end program and_expr.
identification division.
program-id. equality_expr common recursive.
procedure division.
start-equality_expr.
*>relational_expr [('==' | '!=') relational_expr] ;
call 'push' using module-id
call 'relational_expr'
move return-code to parse-left(p)
evaluate token-type
when 'Op_equal'
move 'Equal' to parse-token(p)
when 'Op_notequal'
move 'NotEqual' to parse-token(p)
end-evaluate
if parse-token(p) <> spaces
call 'gettoken'
call 'relational_expr'
move return-code to parse-right(p)
call 'makenode' using parse-token(p) parse-left(p) parse-right(p)
move return-code to parse-left(p)
end-if
call 'pop'
.
end program equality_expr.
identification division.
program-id. relational_expr common recursive.
procedure division.
start-relational_expr.
*>addition_expr [('<' | '<=' | '>' | '>=') addition_expr] ;
call 'push' using module-id
call 'addition_expr'
move return-code to parse-left(p)
evaluate token-type
when 'Op_less'
move 'Less' to parse-token(p)
when 'Op_lessequal'
move 'LessEqual' to parse-token(p)
when 'Op_greater'
move 'Greater' to parse-token(p)
when 'Op_greaterequal'
move 'GreaterEqual' to parse-token(p)
end-evaluate
if parse-token(p) <> spaces
call 'gettoken'
call 'addition_expr'
move return-code to parse-right(p)
call 'makenode' using parse-token(p) parse-left(p) parse-right(p)
move return-code to parse-left(p)
end-if
call 'pop'
.
end program relational_expr.
identification division.
program-id. addition_expr common recursive.
procedure division.
start-addition_expr.
*>multiplication_expr {('+' | '-') multiplication_expr} ;
call 'push' using module-id
call 'multiplication_expr'
move return-code to parse-left(p)
perform forever
evaluate token-type
when 'Op_add'
move 'Add' to parse-token(p)
when 'Op_subtract'
move 'Subtract' to parse-token(p)
when other
exit perform
end-evaluate
call 'gettoken'
call 'multiplication_expr'
move return-code to parse-right(p)
call 'makenode' using parse-token(p) parse-left(p) parse-right(p)
move return-code to parse-left(p)
end-perform
call 'pop'
.
end program addition_expr.
identification division.
program-id. multiplication_expr common recursive.
procedure division.
start-multiplication_expr.
*>primary {('*' | '/' | '%') primary } ;
call 'push' using module-id
call 'primary'
move return-code to parse-left(p)
perform forever
evaluate token-type
when 'Op_multiply'
move 'Multiply' to parse-token(p)
when 'Op_divide'
move 'Divide' to parse-token(p)
when 'Op_mod'
move 'Mod' to parse-token(p)
when other
exit perform
end-evaluate
call 'gettoken'
call 'primary'
move return-code to parse-right(p)
call 'makenode' using parse-token(p) parse-left(p) parse-right(p)
move return-code to parse-left(p)
end-perform
call 'pop'
.
end program multiplication_expr.
identification division.
program-id. primary common recursive.
procedure division.
start-primary.
*> Identifier
*>| Integer
*>| 'LeftParen' expr 'RightParen'
*>| ('+' | '-' | '!') primary
*>;
call 'push' using module-id
evaluate token-type
when 'Identifier'
call 'makeleaf' using 'Identifier' token-value
call 'gettoken'
when 'Integer'
call 'makeleaf' using 'Integer' token-value
call 'gettoken'
when 'LeftParen'
call 'gettoken'
call 'expr'
call 'expect' using 'RightParen'
move t to return-code
when 'Op_add'
call 'gettoken'
call 'primary'
when 'Op_subtract'
call 'gettoken'
call 'primary'
move return-code to parse-left(p)
call 'makenode' using 'Negate' parse-left(p) p-zero
when 'Op_not'
call 'gettoken'
call 'primary'
move return-code to parse-left(p)
call 'makenode' using 'Not' parse-left(p) p-zero
when other
move 0 to return-code
end-evaluate
call 'pop'
.
end program primary.
program-id. reporterror common.
procedure division.
start-reporterror.
report-error.
move token-line to error-line-no
move token-column to error-col-no
display error-record upon syserr
stop run with error status -1
.
end program reporterror.
identification division.
program-id. gettoken common.
procedure division.
start-gettoken.
if program-name = spaces
move '00' to input-status
accept input-record on exception move '10' to input-status end-accept
else
read input-file
end-if
evaluate input-status
when '00'
move input-token to token-type
move input-value to token-value
move numval(input-line) to token-line
move numval(input-column) to token-column
>>d display indent(1:min(4 * p,length(indent))) 'new token: ' token-type upon syserr
when '10'
string 'in parser ' trim(input-name) ' unexpected end of input'
into error-message
call 'reporterror'
when other
string 'in parser ' trim(input-name) ' unexpected input-status ' input-status
into error-message
call 'reporterror'
end-evaluate
.
end program gettoken.
identification division.
program-id. expect common.
data division.
linkage section.
01 what any length.
procedure division using what.
start-expect.
if token-type <> what
string 'in parser expected ' what ' found ' token-type into error-message
call 'reporterror'
end-if
>>d display indent(1:min(4 * p,length(indent))) 'match: ' token-type upon syserr
call 'gettoken'
.
end program expect.
identification division.
program-id. push common.
data division.
linkage section.
01 what any length.
procedure division using what.
start-push.
>>d display indent(1:min(4 * p,length(indent))) 'push ' what upon syserr
if p >= p-lim
move 'in parser stack overflow' to error-message
call 'reporterror'
end-if
add 1 to p
initialize parse-entry(p)
move what to parse-name(p)
.
end program push.
identification division.
program-id. pop common.
procedure division.
start-pop.
if p < 1
move 'in parser stack underflow' to error-message
call 'reporterror'
end-if
>>d display indent(1:4 * p - 4) 'pop ' parse-name(p) upon syserr
subtract 1 from p
.
end program pop.
identification division.
program-id. makenode common.
data division.
linkage section.
01 parm-type any length.
01 parm-left pic 999.
01 parm-right pic 999.
procedure division using parm-type parm-left parm-right.
start-makenode.
if t >= t-lim
string 'in parser makenode tree index t exceeds ' t-lim into error-message
call 'reporterror'
end-if
add 1 to t
move parm-type to node-type(t)
move parm-left to node-left(t)
move parm-right to node-right(t)
move t to return-code
.
end program makenode.
identification division.
program-id. makeleaf common.
data division.
linkage section.
01 parm-type any length.
01 parm-value pic x(48).
procedure division using parm-type parm-value.
start-makeleaf.
if t >= t-lim
string 'in parser makeleaf tree index t exceeds ' t-lim into error-message
call 'reporterror'
end-if
add 1 to t
move parm-type to leaf-type(t)
move parm-value to leaf-value(t)
move t to return-code
.
end program makeleaf.
identification division.
program-id. printast recursive.
data division.
linkage section.
01 n pic 999.
procedure division using n.
start-printast.
if n = 0
display ';'
exit program
end-if
evaluate leaf-type(n)
when 'Identifier'
when 'Integer'
when 'String'
display leaf-type(n) trim(leaf-value(n))
when other
display node-type(n)
call 'printast' using node-left(n)
call 'printast' using node-right(n)
end-evaluate
.
end program printast.
end program parser.
- Output — Primes:
prompt$ ./lexer <testcases/Primes | ./parser Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Common Lisp
#!/bin/sh
#|-*- mode:lisp -*-|#
#|
exec ros -Q -- $0 "$@"
|#
(progn ;;init forms
(ros:ensure-asdf)
#+quicklisp(ql:quickload '() :silent t))
(defpackage :ros.script.parse.3859374047
(:use :cl))
(in-package :ros.script.parse.3859374047)
;;;
;;; The Rosetta Code Tiny-Language Parser, in Common Lisp.
;;;
(require "cl-ppcre")
(require "trivia")
(defstruct tokstruc line-no column-no tok tokval)
(defconstant re-blank-line
(ppcre:create-scanner "^\\s*$"))
(defconstant re-token-1
(ppcre:create-scanner
"^\\s*(\\d+)\\s+(\\d+)\\s+(\\S+)\\s*$"))
(defconstant re-token-2
(ppcre:create-scanner
"^\\s*(\\d+)\\s+(\\d+)\\s+(\\S+)\\s+(\\S(.*\\S)?)\\s*$"))
(defun string-to-tok (s)
(trivia:match s
("Keyword_else" 'TOK-ELSE)
("Keyword_if" 'TOK-IF)
("Keyword_print" 'TOK-PRINT)
("Keyword_putc" 'TOK-PUTC)
("Keyword_while" 'TOK-WHILE)
("Op_multiply" 'TOK-MULTIPLY)
("Op_divide" 'TOK-DIVIDE)
("Op_mod" 'TOK-MOD)
("Op_add" 'TOK-ADD)
("Op_subtract" 'TOK-SUBTRACT)
("Op_negate" 'TOK-NEGATE)
("Op_less" 'TOK-LESS)
("Op_lessequal" 'TOK-LESSEQUAL)
("Op_greater" 'TOK-GREATER)
("Op_greaterequal" 'TOK-GREATEREQUAL)
("Op_equal" 'TOK-EQUAL)
("Op_notequal" 'TOK-NOTEQUAL)
("Op_not" 'TOK-NOT)
("Op_assign" 'TOK-ASSIGN)
("Op_and" 'TOK-AND)
("Op_or" 'TOK-OR)
("LeftParen" 'TOK-LEFTPAREN)
("RightParen" 'TOK-RIGHTPAREN)
("LeftBrace" 'TOK-LEFTBRACE)
("RightBrace" 'TOK-RIGHTBRACE)
("Semicolon" 'TOK-SEMICOLON)
("Comma" 'TOK-COMMA)
("Identifier" 'TOK-IDENTIFIER)
("Integer" 'TOK-INTEGER)
("String" 'TOK-STRING)
("End_of_input" 'TOK-END-OF-INPUT)
(_ (warn "unparseable token line")
(uiop:quit 1))))
(defun precedence (tok)
(case tok
(TOK-MULTIPLY 13)
(TOK-DIVIDE 13)
(TOK-MOD 13)
(TOK-ADD 12)
(TOK-SUBTRACT 12)
(TOK-NEGATE 14)
(TOK-NOT 14)
(TOK-LESS 10)
(TOK-LESSEQUAL 10)
(TOK-GREATER 10)
(TOK-GREATEREQUAL 10)
(TOK-EQUAL 9)
(TOK-NOTEQUAL 9)
(TOK-AND 5)
(TOK-OR 4)
(otherwise -1)))
(defun binary-p (tok)
(case tok
(TOK-ADD t)
(TOK-SUBTRACT t)
(TOK-MULTIPLY t)
(TOK-DIVIDE t)
(TOK-MOD t)
(TOK-LESS t)
(TOK-LESSEQUAL t)
(TOK-GREATER t)
(TOK-GREATEREQUAL t)
(TOK-EQUAL t)
(TOK-NOTEQUAL t)
(TOK-AND t)
(TOK-OR t)
(otherwise nil)))
(defun right-associative-p (tok)
(declare (ignorable tok))
nil) ; None of the current operators is right associative.
(defun tok-text (tok)
(ecase tok
(TOK-ELSE "else")
(TOK-IF "if")
(TOK-PRINT "print")
(TOK-PUTC "putc")
(TOK-WHILE "while")
(TOK-MULTIPLY "*")
(TOK-DIVIDE "/")
(TOK-MOD "%")
(TOK-ADD "+")
(TOK-SUBTRACT "-")
(TOK-NEGATE "-")
(TOK-LESS "<")
(TOK-LESSEQUAL "<=")
(TOK-GREATER ">")
(TOK-GREATEREQUAL ">=")
(TOK-EQUAL "==")
(TOK-NOTEQUAL "!=")
(TOK-NOT "!")
(TOK-ASSIGN "=")
(TOK-AND "&&")
(TOK-OR "((")
(TOK-LEFTPAREN "(")
(TOK-RIGHTPAREN ")")
(TOK-LEFTBRACE "{")
(TOK-RIGHTBRACE "}")
(TOK-SEMICOLON ";")
(TOK-COMMA ",")
(TOK-IDENTIFIER "Ident")
(TOK-INTEGER "Integer literal")
(TOK-STRING "String literal")
(TOK-END_OF_INPUT "EOI")))
(defun operator (tok)
(ecase tok
(TOK-MULTIPLY "Multiply")
(TOK-DIVIDE "Divide")
(TOK-MOD "Mod")
(TOK-ADD "Add")
(TOK-SUBTRACT "Subtract")
(TOK-NEGATE "Negate")
(TOK-NOT "Not")
(TOK-LESS "Less")
(TOK-LESSEQUAL "LessEqual")
(TOK-GREATER "Greater")
(TOK-GREATEREQUAL "GreaterEqual")
(TOK-EQUAL "Equal")
(TOK-NOTEQUAL "NotEqual")
(TOK-AND "And")
(TOK-OR "Or")))
(defun join (&rest args)
(apply #'concatenate 'string args))
(defun nxt (gettok)
(funcall gettok :nxt))
(defun curr (gettok)
(funcall gettok :curr))
(defun err (token msg)
(format t "(~A, ~A) error: ~A~%"
(tokstruc-line-no token)
(tokstruc-column-no token)
msg)
(uiop:quit 1))
(defun prt-ast (outf ast)
;;
;; For fun, let us do prt-ast *non*-recursively, with a stack and a
;; loop.
;;
(let ((stack `(,ast)))
(loop while stack
do (let ((x (car stack)))
(setf stack (cdr stack))
(cond ((not x) (format outf ";~%"))
((or (string= (car x) "Identifier")
(string= (car x) "Integer")
(string= (car x) "String"))
(format outf "~A ~A~%" (car x) (cadr x)))
(t (format outf "~A~%" (car x))
(setf stack (cons (caddr x) stack))
(setf stack (cons (cadr x) stack))))))))
(defun accept (gettok tok)
(if (eq (tokstruc-tok (curr gettok)) tok)
(nxt gettok)
nil))
(defun expect (gettok msg tok)
(let ((curr-tok (tokstruc-tok (curr gettok))))
(if (eq curr-tok tok)
(nxt gettok)
(err (curr gettok)
(join msg ": Expecting '"
(tok-text tok) "', found '"
(tok-text curr-tok) "'")))))
(defun parse (gettok)
(defun paren-expr (gettok)
(expect gettok "paren_expr" 'TOK-LEFTPAREN)
(let ((x (expr gettok 0)))
(expect gettok "paren_expr" 'TOK-RIGHTPAREN)
x))
(defun expr (gettok p)
(let* ((tok (curr gettok))
(x (case (tokstruc-tok tok)
(TOK-LEFTPAREN (paren-expr gettok))
(TOK-SUBTRACT
(nxt gettok)
(let ((y (expr gettok (precedence 'TOK-NEGATE))))
`("Negate" ,y ())))
(TOK-ADD
(nxt gettok)
(expr gettok (precedence 'TOK-NEGATE)))
(TOK-NOT
(nxt gettok)
(let ((y (expr gettok (precedence 'TOK-NOT))))
`("Not" ,y ())))
(TOK-IDENTIFIER
(let ((y `("Identifier" ,(tokstruc-tokval tok))))
(nxt gettok)
y))
(TOK-INTEGER
(let ((y `("Integer" ,(tokstruc-tokval tok))))
(nxt gettok)
y))
(otherwise
(err tok (join "Expecting a primary, found: "
(tok-text (tokstruc-tok tok))))))))
;;
;; Precedence climbing for binary operators.
;;
(loop for tok = (curr gettok)
for toktok = (tokstruc-tok tok)
while (and (binary-p toktok) (<= p (precedence toktok)))
do (progn (nxt gettok)
(let ((q (if (right-associative-p toktok)
(precedence toktok)
(1+ (precedence toktok)))))
(setf x `(,(operator toktok) ,x
,(expr gettok q))))))
x))
(defun stmt (gettok)
(cond ((accept gettok 'TOK-IF)
(let* ((e (paren-expr gettok))
(s (stmt gettok))
(x (if (accept gettok 'TOK-ELSE)
`("If" ,s ,(stmt gettok))
`("If" ,s ()))))
`("If" ,e ,x)))
((accept gettok 'TOK-PUTC)
(let ((x `("Prtc" ,(paren-expr gettok) ())))
(expect gettok "Putc" 'TOK-SEMICOLON)
x))
((accept gettok 'TOK-PRINT)
(expect gettok "Print" 'TOK-LEFTPAREN)
(let ((x '()))
(loop for tok = (curr gettok)
for toktok = (tokstruc-tok tok)
for e = (if (eq toktok 'TOK-STRING)
(let* ((tokval (tokstruc-tokval tok))
(leaf `("String" ,tokval))
(e `("Prts" ,leaf ())))
(nxt gettok)
e)
`("Prti" ,(expr gettok 0) ()))
do (setf x `("Sequence" ,x ,e))
while (accept gettok 'TOK-COMMA))
(expect gettok "Print" 'TOK-RIGHTPAREN)
(expect gettok "Print" 'TOK-SEMICOLON)
x))
((eq (tokstruc-tok (curr gettok)) 'TOK-SEMICOLON)
(nxt gettok))
((eq (tokstruc-tok (curr gettok)) 'TOK-IDENTIFIER)
(let ((v `("Identifier" ,(tokstruc-tokval (curr gettok)))))
(nxt gettok)
(expect gettok "assign" 'TOK-ASSIGN)
(let ((x `("Assign" ,v ,(expr gettok 0))))
(expect gettok "assign" 'TOK-SEMICOLON)
x)))
((accept gettok 'TOK-WHILE)
(let ((e (paren-expr gettok)))
`("While" ,e ,(stmt gettok))))
((accept gettok 'TOK-LEFTBRACE)
(let ((x '()))
(loop for tok = (curr gettok)
for toktok = (tokstruc-tok tok)
until (or (eq toktok 'TOK-RIGHTBRACE)
(eq toktok 'TOK-END-OF-INPUT))
do (setf x `("Sequence" ,x ,(stmt gettok))))
(expect gettok "Lbrace" 'TOK-RIGHTBRACE)
x))
((eq (tokstruc-tok (curr gettok)) 'TOK-END-OF-INPUT)
'())
(t (let* ((tok (curr gettok))
(toktok (tokstruc-tok tok)))
(err tok (join "expecting start of statement, found '"
(tok-text toktok) "'"))))))
;;
;; Parsing of the top-level statement sequence.
;;
(let ((x '()))
(nxt gettok)
(loop do (setf x `("Sequence" ,x ,(stmt gettok)))
until (eq (tokstruc-tok (curr gettok)) 'TOK-END-OF-INPUT))
x))
(defun string-to-tokstruc (s)
(let ((strings
(nth-value 1 (ppcre:scan-to-strings re-token-1 s))))
(if strings
(make-tokstruc :line-no (elt strings 0)
:column-no (elt strings 1)
:tok (string-to-tok (elt strings 2))
:tokval nil)
(let ((strings
(nth-value 1 (ppcre:scan-to-strings re-token-2 s))))
(if strings
(make-tokstruc :line-no (elt strings 0)
:column-no (elt strings 1)
:tok (string-to-tok (elt strings 2))
:tokval (elt strings 3))
(progn
(warn "unparseable token line")
(uiop:quit 1)))))))
(defun read-token-line (inpf)
(loop for line = (read-line inpf nil "End_of_input")
while (ppcre:scan re-blank-line line)
finally (return line)))
(defun open-inpf (inpf-filename)
(if (string= inpf-filename "-")
*standard-input*
(open inpf-filename :direction :input)))
(defun open-outf (outf-filename)
(if (string= outf-filename "-")
*standard-output*
(open outf-filename :direction :output
:if-exists :overwrite
:if-does-not-exist :create)))
(defun usage-error ()
(princ "Usage: parse [INPUTFILE [OUTPUTFILE]]" *standard-output*)
(terpri *standard-output*)
(princ "If either INPUTFILE or OUTPUTFILE is \"-\", the respective"
*standard-output*)
(princ " standard I/O is used." *standard-output*)
(terpri *standard-output*)
(uiop:quit 1))
(defun get-filenames (argv)
(trivia:match argv
((list) '("-" "-"))
((list inpf-filename) `(,inpf-filename "-"))
((list inpf-filename outf-filename) `(,inpf-filename
,outf-filename))
(_ (usage-error))))
(defun main (&rest argv)
(let* ((filenames (get-filenames argv))
(inpf-filename (car filenames))
(inpf (open-inpf inpf-filename))
(outf-filename (cadr filenames))
(outf (open-outf outf-filename)))
(let* ((current-token (list nil))
(gettok-curr (lambda () (elt current-token 0)))
(gettok-nxt (lambda ()
(let* ((s (read-token-line inpf))
(tok (string-to-tokstruc s)))
(setf (elt current-token 0) tok)
tok)))
(gettok (lambda (instruction)
(trivia:match instruction
(:curr (funcall gettok-curr))
(:nxt (funcall gettok-nxt)))))
(ast (parse gettok)))
(prt-ast outf ast))
(unless (string= inpf-filename "-")
(close inpf))
(unless (string= outf-filename "-")
(close outf))
(uiop:quit 0)))
;;; vim: set ft=lisp lisp:
- Output:
$ ./parse.ros compiler-tests/primes.lex Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ; quence Sequence Sequence Sequence ; Assign Identifier x_x Divide Multiply Identifier x Identifier x Integer 200 Assign Identifier y_y Divide Multiply Identifier y Identifier y Integer 200 If Greater Add Identifier x_x Identifier y_y Integer 800 If Sequence Sequence Sequence ; Assign Identifier the_char Add Integer 48 Identifier i If Greater Identifier i Integer 9 If Sequence ; Assign Identifier the_char Integer 64 ; Assign Identifier i Identifier max_iter ; Assign Identifier y Add Divide Multiply Identifier x Identifier y Integer 100 Identifier y0 Assign Identifier x Add Subtract Identifier x_x Identifier y_y Identifier x0 Assign Identifier i Add Identifier i Integer 1 Prtc Identifier the_char ; Assign Identifier x0 Add Identifier x0 Identifier x_step Prtc Integer 10 ; Assign Identifier y0 Subtract Identifier y0 Identifier y_step
Forth
Tested with Gforth 0.7.3.
CREATE BUF 0 , \ single-character look-ahead buffer
: PEEK BUF @ 0= IF KEY BUF ! THEN BUF @ ;
: GETC PEEK 0 BUF ! ;
: SPACE? DUP BL = SWAP 9 14 WITHIN OR ;
: >SPACE BEGIN PEEK SPACE? WHILE GETC DROP REPEAT ;
: DIGIT? 48 58 WITHIN ;
: GETINT >SPACE 0
BEGIN PEEK DIGIT?
WHILE GETC [CHAR] 0 - SWAP 10 * +
REPEAT ;
: GETNAM >SPACE PAD 1+
BEGIN PEEK SPACE? INVERT
WHILE GETC OVER C! CHAR+
REPEAT PAD TUCK - 1- PAD C! ;
: GETSTR >SPACE PAD 1+ GETC DROP \ skip leading "
BEGIN GETC DUP [CHAR] " <>
WHILE OVER C! CHAR+
REPEAT DROP PAD TUCK - 1- PAD C! ;
: INTERN HERE SWAP DUP C@ 1+ BOUNDS DO I C@ C, LOOP ALIGN ;
CREATE #TK 0 ,
: TK: CREATE #TK @ , 1 #TK +! DOES> @ ;
TK: End_of_input TK: Keyword_if TK: Keyword_else
TK: Keyword_while TK: Keyword_print TK: Keyword_putc
TK: String TK: Integer TK: Identifier
TK: LeftParen TK: RightParen
TK: LeftBrace TK: RightBrace
TK: Semicolon TK: Comma
TK: Op_assign TK: Op_not
: (BINARY?) [ #TK @ ] literal >= ;
TK: Op_subtract TK: Op_add
TK: Op_mod TK: Op_multiply TK: Op_divide
TK: Op_equal TK: Op_notequal
TK: Op_less TK: Op_lessequal
TK: Op_greater TK: Op_greaterequal
TK: Op_and TK: Op_or
CREATE TOKEN 0 , 0 , 0 , 0 ,
: TOKEN-TYPE TOKEN 2 CELLS + @ ;
: TOKEN-VALUE TOKEN 3 CELLS + @ ;
: GETTOK GETINT GETINT TOKEN 2!
GETNAM FIND DROP EXECUTE
DUP Integer = IF GETINT ELSE
DUP String = IF GETSTR INTERN ELSE
DUP Identifier = IF GETNAM INTERN ELSE
0 THEN THEN THEN
TOKEN 3 CELLS + ! TOKEN 2 CELLS + ! ;
: BINARY? TOKEN-TYPE (BINARY?) ;
CREATE PREC #TK @ CELLS ALLOT PREC #TK @ CELLS -1 FILL
: PREC! CELLS PREC + ! ;
14 Op_not PREC! 13 Op_multiply PREC!
13 Op_divide PREC! 13 Op_mod PREC!
12 Op_add PREC! 12 Op_subtract PREC!
10 Op_less PREC! 10 Op_greater PREC!
10 Op_lessequal PREC! 10 Op_greaterequal PREC!
9 Op_equal PREC! 9 Op_notequal PREC!
5 Op_and PREC! 4 Op_or PREC!
: PREC@ CELLS PREC + @ ;
\ Each AST Node is a sequence of cells in data space consisting
\ of the execution token of a printing word, followed by that
\ node's data. Each printing word receives the address of the
\ node's data, and is responsible for printing that data
\ appropriately.
DEFER .NODE
: .NULL DROP ." ;" CR ;
CREATE $NULL ' .NULL ,
: .IDENTIFIER ." Identifier " @ COUNT TYPE CR ;
: $IDENTIFIER ( a-addr --) HERE SWAP ['] .IDENTIFIER , , ;
: .INTEGER ." Integer " @ . CR ;
: $INTEGER ( n --) HERE SWAP ['] .INTEGER , , ;
: "TYPE" [CHAR] " EMIT TYPE [CHAR] " EMIT ;
: .STRING ." String " @ COUNT "TYPE" CR ;
: $STRING ( a-addr --) HERE SWAP ['] .STRING , , ;
: .LEAF DUP @ COUNT TYPE CR CELL+ @ .NODE 0 .NULL ;
: LEAF CREATE HERE CELL+ , BL WORD INTERN .
DOES> HERE >R ['] .LEAF , @ , , R> ;
LEAF $PRTC Prtc LEAF $PRTS Prts LEAF $PRTI Prti
LEAF $NOT Not LEAF $NEGATE Negate
: .BINARY DUP @ COUNT TYPE CR
CELL+ DUP @ .NODE CELL+ @ .NODE ;
: BINARY CREATE HERE CELL+ , BL WORD INTERN .
DOES> HERE >R ['] .BINARY , @ , SWAP 2, R> ;
BINARY $SEQUENCE Sequence BINARY $ASSIGN Assign
BINARY $WHILE While BINARY $IF If
BINARY $SUBTRACT Subtract BINARY $ADD Add
BINARY $MOD Mod BINARY $MULTIPLY Multiply
BINARY $DIVIDE Divide
BINARY $LESS Less BINARY $LESSEQUAL LessEqual
BINARY $GREATER Greater BINARY $GREATEREQUAL GreaterEqual
BINARY $EQUAL Equal BINARY $NOTEQUAL NotEqual
BINARY $AND And BINARY $OR Or
: TOK-CONS ( x* -- node-xt) TOKEN-TYPE CASE
Op_subtract OF ['] $SUBTRACT ENDOF
Op_add OF ['] $ADD ENDOF
op_mod OF ['] $MOD ENDOF
op_multiply OF ['] $MULTIPLY ENDOF
Op_divide OF ['] $DIVIDE ENDOF
Op_equal OF ['] $EQUAL ENDOF
Op_notequal OF ['] $NOTEQUAL ENDOF
Op_less OF ['] $LESS ENDOF
Op_lessequal OF ['] $LESSEQUAL ENDOF
Op_greater OF ['] $GREATER ENDOF
Op_greaterequal OF ['] $GREATEREQUAL ENDOF
Op_and OF ['] $AND ENDOF
Op_or OF ['] $OR ENDOF
ENDCASE ;
: (.NODE) DUP CELL+ SWAP @ EXECUTE ;
' (.NODE) IS .NODE
: .- ( n --) 0 <# #S #> TYPE ;
: EXPECT ( tk --) DUP TOKEN-TYPE <>
IF CR ." stdin:" TOKEN 2@ SWAP .- ." :" .-
." : unexpected token, expecting " . CR BYE
THEN DROP GETTOK ;
: '(' LeftParen EXPECT ;
: ')' RightParen EXPECT ;
: '}' RightBrace EXPECT ;
: ';' Semicolon EXPECT ;
: ',' Comma EXPECT ;
: '=' Op_assign EXPECT ;
DEFER *EXPR DEFER EXPR DEFER STMT
: PAREN-EXPR '(' EXPR ')' ;
: PRIMARY
TOKEN-TYPE LeftParen = IF PAREN-EXPR EXIT THEN
TOKEN-TYPE Op_add = IF GETTOK 12 *EXPR EXIT THEN
TOKEN-TYPE Op_subtract = IF GETTOK 14 *EXPR $NEGATE EXIT THEN
TOKEN-TYPE Op_not = IF GETTOK 14 *EXPR $NOT EXIT THEN
TOKEN-TYPE Identifier = IF TOKEN-VALUE $IDENTIFIER ELSE
TOKEN-TYPE Integer = IF TOKEN-VALUE $INTEGER THEN THEN
GETTOK ;
: (*EXPR) ( n -- node)
PRIMARY ( n node)
BEGIN OVER TOKEN-TYPE PREC@ SWAP OVER <= BINARY? AND
WHILE ( n node prec) 1+ TOK-CONS SWAP GETTOK *EXPR SWAP EXECUTE
REPEAT ( n node prec) DROP NIP ( node) ;
: (EXPR) 0 *EXPR ;
: -)? TOKEN-TYPE RightParen <> ;
: -}? TOKEN-TYPE RightBrace <> ;
: (STMT)
TOKEN-TYPE Semicolon = IF GETTOK STMT EXIT THEN
TOKEN-TYPE Keyword_while =
IF GETTOK PAREN-EXPR STMT $WHILE EXIT THEN
TOKEN-TYPE Keyword_if =
IF GETTOK PAREN-EXPR STMT
TOKEN-TYPE Keyword_else = IF GETTOK STMT ELSE $NULL THEN
$IF $IF EXIT
THEN
TOKEN-TYPE Keyword_putc =
IF GETTOK PAREN-EXPR ';' $PRTC EXIT THEN
TOKEN-TYPE Keyword_print =
IF GETTOK '(' $NULL
BEGIN TOKEN-TYPE String =
IF TOKEN-VALUE $STRING $PRTS GETTOK
ELSE EXPR $PRTI THEN $SEQUENCE -)?
WHILE ',' REPEAT ')' ';' EXIT THEN
TOKEN-TYPE Identifier =
IF TOKEN-VALUE $IDENTIFIER GETTOK '=' EXPR ';' $ASSIGN
EXIT THEN
TOKEN-TYPE LeftBrace =
IF $NULL GETTOK BEGIN -}? WHILE STMT $SEQUENCE REPEAT
'}' EXIT THEN
TOKEN-TYPE End_of_input = IF EXIT THEN EXPR ;
' (*EXPR) IS *EXPR ' (EXPR) IS EXPR ' (STMT) IS STMT
: -EOI? TOKEN-TYPE End_of_input <> ;
: PARSE $NULL GETTOK BEGIN -EOI? WHILE STMT $SEQUENCE REPEAT ;
PARSE .NODE
- Output — Count AST:
Sequence Sequence ; Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence ; Sequence Sequence Sequence ; Prts String "count is: " ; Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1
Fortran
The following is Fortran 2008/2018 code with C preprocessing directives. If you call the program source ‘parse.F90’, with a capital ‘F’, then gfortran will know to run the C preprocessor.
!!!
!!! An implementation of the Rosetta Code parser task:
!!! https://rosettacode.org/wiki/Compiler/syntax_analyzer
!!!
!!! The implementation is based on the published pseudocode.
!!!
module compiler_type_kinds
use, intrinsic :: iso_fortran_env, only: int32
use, intrinsic :: iso_fortran_env, only: int64
implicit none
private
! Synonyms.
integer, parameter, public :: size_kind = int64
integer, parameter, public :: length_kind = size_kind
integer, parameter, public :: nk = size_kind
! Synonyms for character capable of storing a Unicode code point.
integer, parameter, public :: unicode_char_kind = selected_char_kind ('ISO_10646')
integer, parameter, public :: ck = unicode_char_kind
! Synonyms for integers capable of storing a Unicode code point.
integer, parameter, public :: unicode_ichar_kind = int32
integer, parameter, public :: ick = unicode_ichar_kind
end module compiler_type_kinds
module string_buffers
use, intrinsic :: iso_fortran_env, only: error_unit
use, intrinsic :: iso_fortran_env, only: int64
use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
implicit none
private
public :: strbuf_t
type :: strbuf_t
integer(kind = nk), private :: len = 0
!
! ‘chars’ is made public for efficient access to the individual
! characters.
!
character(1, kind = ck), allocatable, public :: chars(:)
contains
procedure, pass, private :: ensure_storage => strbuf_t_ensure_storage
procedure, pass :: to_unicode_full_string => strbuf_t_to_unicode_full_string
procedure, pass :: to_unicode_substring => strbuf_t_to_unicode_substring
procedure, pass :: length => strbuf_t_length
procedure, pass :: set => strbuf_t_set
procedure, pass :: append => strbuf_t_append
generic :: to_unicode => to_unicode_full_string
generic :: to_unicode => to_unicode_substring
generic :: assignment(=) => set
end type strbuf_t
contains
function strbuf_t_to_unicode_full_string (strbuf) result (s)
class(strbuf_t), intent(in) :: strbuf
character(:, kind = ck), allocatable :: s
!
! This does not actually ensure that the string is valid Unicode;
! any 31-bit ‘character’ is supported.
!
integer(kind = nk) :: i
allocate (character(len = strbuf%len, kind = ck) :: s)
do i = 1, strbuf%len
s(i:i) = strbuf%chars(i)
end do
end function strbuf_t_to_unicode_full_string
function strbuf_t_to_unicode_substring (strbuf, i, j) result (s)
!
! ‘Extreme’ values of i and j are allowed, as shortcuts for ‘from
! the beginning’, ‘up to the end’, or ‘empty substring’.
!
class(strbuf_t), intent(in) :: strbuf
integer(kind = nk), intent(in) :: i, j
character(:, kind = ck), allocatable :: s
!
! This does not actually ensure that the string is valid Unicode;
! any 31-bit ‘character’ is supported.
!
integer(kind = nk) :: i1, j1
integer(kind = nk) :: n
integer(kind = nk) :: k
i1 = max (1_nk, i)
j1 = min (strbuf%len, j)
n = max (0_nk, (j1 - i1) + 1_nk)
allocate (character(n, kind = ck) :: s)
do k = 1, n
s(k:k) = strbuf%chars(i1 + (k - 1_nk))
end do
end function strbuf_t_to_unicode_substring
elemental function strbuf_t_length (strbuf) result (n)
class(strbuf_t), intent(in) :: strbuf
integer(kind = nk) :: n
n = strbuf%len
end function strbuf_t_length
elemental function next_power_of_two (x) result (y)
integer(kind = nk), intent(in) :: x
integer(kind = nk) :: y
!
! It is assumed that no more than 64 bits are used.
!
! The branch-free algorithm is that of
! https://archive.is/nKxAc#RoundUpPowerOf2
!
! Fill in bits until one less than the desired power of two is
! reached, and then add one.
!
y = x - 1
y = ior (y, ishft (y, -1))
y = ior (y, ishft (y, -2))
y = ior (y, ishft (y, -4))
y = ior (y, ishft (y, -8))
y = ior (y, ishft (y, -16))
y = ior (y, ishft (y, -32))
y = y + 1
end function next_power_of_two
elemental function new_storage_size (length_needed) result (size)
integer(kind = nk), intent(in) :: length_needed
integer(kind = nk) :: size
! Increase storage by orders of magnitude.
if (2_nk**32 < length_needed) then
size = huge (1_nk)
else
size = next_power_of_two (length_needed)
end if
end function new_storage_size
subroutine strbuf_t_ensure_storage (strbuf, length_needed)
class(strbuf_t), intent(inout) :: strbuf
integer(kind = nk), intent(in) :: length_needed
integer(kind = nk) :: new_size
type(strbuf_t) :: new_strbuf
if (.not. allocated (strbuf%chars)) then
! Initialize a new strbuf%chars array.
new_size = new_storage_size (length_needed)
allocate (strbuf%chars(1:new_size))
else if (ubound (strbuf%chars, 1) < length_needed) then
! Allocate a new strbuf%chars array, larger than the current
! one, but containing the same characters.
new_size = new_storage_size (length_needed)
allocate (new_strbuf%chars(1:new_size))
new_strbuf%chars(1:strbuf%len) = strbuf%chars(1:strbuf%len)
call move_alloc (new_strbuf%chars, strbuf%chars)
end if
end subroutine strbuf_t_ensure_storage
subroutine strbuf_t_set (dst, src)
class(strbuf_t), intent(inout) :: dst
class(*), intent(in) :: src
integer(kind = nk) :: n
integer(kind = nk) :: i
select type (src)
type is (character(*, kind = ck))
n = len (src, kind = nk)
call dst%ensure_storage(n)
do i = 1, n
dst%chars(i) = src(i:i)
end do
dst%len = n
type is (character(*))
n = len (src, kind = nk)
call dst%ensure_storage(n)
do i = 1, n
dst%chars(i) = src(i:i)
end do
dst%len = n
class is (strbuf_t)
n = src%len
call dst%ensure_storage(n)
dst%chars(1:n) = src%chars(1:n)
dst%len = n
class default
error stop
end select
end subroutine strbuf_t_set
subroutine strbuf_t_append (dst, src)
class(strbuf_t), intent(inout) :: dst
class(*), intent(in) :: src
integer(kind = nk) :: n_dst, n_src, n
integer(kind = nk) :: i
select type (src)
type is (character(*, kind = ck))
n_dst = dst%len
n_src = len (src, kind = nk)
n = n_dst + n_src
call dst%ensure_storage(n)
do i = 1, n_src
dst%chars(n_dst + i) = src(i:i)
end do
dst%len = n
type is (character(*))
n_dst = dst%len
n_src = len (src, kind = nk)
n = n_dst + n_src
call dst%ensure_storage(n)
do i = 1, n_src
dst%chars(n_dst + i) = src(i:i)
end do
dst%len = n
class is (strbuf_t)
n_dst = dst%len
n_src = src%len
n = n_dst + n_src
call dst%ensure_storage(n)
dst%chars((n_dst + 1):n) = src%chars(1:n_src)
dst%len = n
class default
error stop
end select
end subroutine strbuf_t_append
end module string_buffers
module reading_one_line_from_a_stream
use, intrinsic :: iso_fortran_env, only: input_unit
use, intrinsic :: iso_fortran_env, only: error_unit
use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
use, non_intrinsic :: string_buffers
implicit none
private
! get_line_from_stream: read an entire input line from a stream into
! a strbuf_t.
public :: get_line_from_stream
character(1, kind = ck), parameter :: linefeed_char = char (10, kind = ck)
! The following is correct for Unix and its relatives.
character(1, kind = ck), parameter :: newline_char = linefeed_char
contains
subroutine get_line_from_stream (unit_no, eof, no_newline, strbuf)
integer, intent(in) :: unit_no
logical, intent(out) :: eof ! End of file?
logical, intent(out) :: no_newline ! There is a line but it has no
! newline? (Thus eof also must
! be .true.)
class(strbuf_t), intent(inout) :: strbuf
character(1, kind = ck) :: ch
strbuf = ''
call get_ch (unit_no, eof, ch)
do while (.not. eof .and. ch /= newline_char)
call strbuf%append (ch)
call get_ch (unit_no, eof, ch)
end do
no_newline = eof .and. (strbuf%length() /= 0)
end subroutine get_line_from_stream
subroutine get_ch (unit_no, eof, ch)
!
! Read a single code point from the stream.
!
! Currently this procedure simply inputs ‘ASCII’ bytes rather than
! Unicode code points.
!
integer, intent(in) :: unit_no
logical, intent(out) :: eof
character(1, kind = ck), intent(out) :: ch
integer :: stat
character(1) :: c = '*'
eof = .false.
if (unit_no == input_unit) then
call get_input_unit_char (c, stat)
else
read (unit = unit_no, iostat = stat) c
end if
if (stat < 0) then
ch = ck_'*'
eof = .true.
else if (0 < stat) then
write (error_unit, '("Input error with status code ", I0)') stat
stop 1
else
ch = char (ichar (c, kind = ick), kind = ck)
end if
end subroutine get_ch
!!!
!!! If you tell gfortran you want -std=f2008 or -std=f2018, you likely
!!! will need to add also -fall-intrinsics or -U__GFORTRAN__
!!!
!!! The first way, you get the FGETC intrinsic. The latter way, you
!!! get the C interface code that uses getchar(3).
!!!
#ifdef __GFORTRAN__
subroutine get_input_unit_char (c, stat)
!
! The following works if you are using gfortran.
!
! (FGETC is considered a feature for backwards compatibility with
! g77. However, I know of no way to reconfigure input_unit as a
! Fortran 2003 stream, for use with ordinary ‘read’.)
!
character, intent(inout) :: c
integer, intent(out) :: stat
call fgetc (input_unit, c, stat)
end subroutine get_input_unit_char
#else
subroutine get_input_unit_char (c, stat)
!
! An alternative implementation of get_input_unit_char. This
! actually reads input from the C standard input, which might not
! be the same as input_unit.
!
use, intrinsic :: iso_c_binding, only: c_int
character, intent(inout) :: c
integer, intent(out) :: stat
interface
!
! Use getchar(3) to read characters from standard input. This
! assumes there is actually such a function available, and that
! getchar(3) does not exist solely as a macro. (One could write
! one’s own getchar() if necessary, of course.)
!
function getchar () result (c) bind (c, name = 'getchar')
use, intrinsic :: iso_c_binding, only: c_int
integer(kind = c_int) :: c
end function getchar
end interface
integer(kind = c_int) :: i_char
i_char = getchar ()
!
! The C standard requires that EOF have a negative value. If the
! value returned by getchar(3) is not EOF, then it will be
! representable as an unsigned char. Therefore, to check for end
! of file, one need only test whether i_char is negative.
!
if (i_char < 0) then
stat = -1
else
stat = 0
c = char (i_char)
end if
end subroutine get_input_unit_char
#endif
end module reading_one_line_from_a_stream
module lexer_token_facts
implicit none
private
integer, parameter, public :: tk_EOI = 0
integer, parameter, public :: tk_Mul = 1
integer, parameter, public :: tk_Div = 2
integer, parameter, public :: tk_Mod = 3
integer, parameter, public :: tk_Add = 4
integer, parameter, public :: tk_Sub = 5
integer, parameter, public :: tk_Negate = 6
integer, parameter, public :: tk_Not = 7
integer, parameter, public :: tk_Lss = 8
integer, parameter, public :: tk_Leq = 9
integer, parameter, public :: tk_Gtr = 10
integer, parameter, public :: tk_Geq = 11
integer, parameter, public :: tk_Eq = 12
integer, parameter, public :: tk_Neq = 13
integer, parameter, public :: tk_Assign = 14
integer, parameter, public :: tk_And = 15
integer, parameter, public :: tk_Or = 16
integer, parameter, public :: tk_If = 17
integer, parameter, public :: tk_Else = 18
integer, parameter, public :: tk_While = 19
integer, parameter, public :: tk_Print = 20
integer, parameter, public :: tk_Putc = 21
integer, parameter, public :: tk_Lparen = 22
integer, parameter, public :: tk_Rparen = 23
integer, parameter, public :: tk_Lbrace = 24
integer, parameter, public :: tk_Rbrace = 25
integer, parameter, public :: tk_Semi = 26
integer, parameter, public :: tk_Comma = 27
integer, parameter, public :: tk_Ident = 28
integer, parameter, public :: tk_Integer = 29
integer, parameter, public :: tk_String = 30
integer, parameter, public :: tk_Positive = 31
character(16), parameter, public :: lexer_token_string(0:31) = &
(/ "EOI ", &
& "* ", &
& "/ ", &
& "% ", &
& "+ ", &
& "- ", &
& "- ", &
& "! ", &
& "< ", &
& "<= ", &
& "> ", &
& ">= ", &
& "== ", &
& "!= ", &
& "= ", &
& "&& ", &
& "|| ", &
& "if ", &
& "else ", &
& "while ", &
& "print ", &
& "putc ", &
& "( ", &
& ") ", &
& "{ ", &
& "} ", &
& "; ", &
& ", ", &
& "Ident ", &
& "Integer literal ", &
& "String literal ", &
& "+ " /)
integer, parameter, public :: lexer_token_arity(0:31) = &
& (/ -1, & ! EOI
& 2, 2, 2, 2, 2, & ! * / % + -
& 1, 1, & ! negate !
& 2, 2, 2, 2, 2, 2, & ! < <= > >= == !=
& -1, & ! =
& 2, 2, & ! && ||
& -1, -1, -1, -1, -1, & !
& -1, -1, -1, -1, -1, & !
& -1, -1, -1, -1, & !
& 1 /) ! positive
integer, parameter, public :: lexer_token_precedence(0:31) = &
& (/ -1, & ! EOI
& 13, 13, 13, & ! * / %
& 12, 12, & ! + -
& 14, 14, & ! negate !
& 10, 10, 10, 10, & ! < <= > >=
& 9, 9, & ! == !=
& -1, & ! =
& 5, & ! &&
& 4, & ! ||
& -1, -1, -1, -1, -1, & !
& -1, -1, -1, -1, -1, & !
& -1, -1, -1, -1, & !
& 14 /) ! positive
integer, parameter, public :: left_associative = 0
integer, parameter, public :: right_associative = 1
! All current operators are left associative. (The values in the
! array for things that are not operators are unimportant.)
integer, parameter, public :: lexer_token_associativity(0:31) = left_associative
end module lexer_token_facts
module reading_of_lexer_tokens
use, intrinsic :: iso_fortran_env, only: error_unit
use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
use, non_intrinsic :: string_buffers
use, non_intrinsic :: reading_one_line_from_a_stream
use, non_intrinsic :: lexer_token_facts
implicit none
private
public :: lexer_token_t
public :: get_lexer_token
character(1, kind = ck), parameter :: horizontal_tab_char = char (9, kind = ck)
character(1, kind = ck), parameter :: linefeed_char = char (10, kind = ck)
character(1, kind = ck), parameter :: vertical_tab_char = char (11, kind = ck)
character(1, kind = ck), parameter :: formfeed_char = char (12, kind = ck)
character(1, kind = ck), parameter :: carriage_return_char = char (13, kind = ck)
character(1, kind = ck), parameter :: space_char = ck_' '
type :: lexer_token_t
integer :: token_no = -(huge (1))
character(:, kind = ck), allocatable :: val
integer(nk) :: line_no = -(huge (1_nk))
integer(nk) :: column_no = -(huge (1_nk))
end type lexer_token_t
contains
subroutine get_lexer_token (unit_no, lex_line_no, eof, token)
!
! Lines that are empty or contain only whitespace are tolerated.
!
! Also tolerated are comment lines, whose first character is a
! '!'. It is convenient for debugging to be able to comment out
! lines.
!
! A last line be without a newline is *not* tolerated, unless it
! contains only whitespace.
!
! Letting there be some whitespace is partly for the sake of
! reading cut-and-paste from a browser display.
!
integer, intent(in) :: unit_no
integer(kind = nk), intent(inout) :: lex_line_no
logical, intent(out) :: eof
type(lexer_token_t), intent(out) :: token
type(strbuf_t) :: strbuf
logical :: no_newline
logical :: input_found
! Let a negative setting initialize the line number.
lex_line_no = max (0_nk, lex_line_no)
strbuf = ''
eof = .false.
input_found = .false.
do while (.not. eof .and. .not. input_found)
call get_line_from_stream (unit_no, eof, no_newline, strbuf)
if (eof) then
if (no_newline) then
lex_line_no = lex_line_no + 1
if (.not. strbuf_is_all_whitespace (strbuf)) then
call start_error_message (lex_line_no)
write (error_unit, '("lexer line ends without a newline")')
stop 1
end if
end if
else
lex_line_no = lex_line_no + 1
input_found = .true.
if (strbuf_is_all_whitespace (strbuf)) then
! A blank line.
input_found = .false.
else if (0 < strbuf%length()) then
if (strbuf%chars(1) == ck_'!') then
! A comment line.
input_found = .false.
end if
end if
end if
end do
token = lexer_token_t ()
if (.not. eof) then
token = strbuf_to_token (lex_line_no, strbuf)
end if
end subroutine get_lexer_token
function strbuf_to_token (lex_line_no, strbuf) result (token)
integer(kind = nk), intent(in) :: lex_line_no
class(strbuf_t), intent(in) :: strbuf
type(lexer_token_t) :: token
character(:, kind = ck), allocatable :: line_no
character(:, kind = ck), allocatable :: column_no
character(:, kind = ck), allocatable :: token_name
character(:, kind = ck), allocatable :: val_string
integer :: stat
integer(kind = nk) :: n
call split_line (lex_line_no, strbuf, line_no, column_no, token_name, val_string)
read (line_no, *, iostat = stat) token%line_no
if (stat /= 0) then
call start_error_message (lex_line_no)
write (error_unit, '("line number field is unreadable or too large")')
stop 1
end if
read (column_no, *, iostat = stat) token%column_no
if (stat /= 0) then
call start_error_message (lex_line_no)
write (error_unit, '("column number field is unreadable or too large")')
stop 1
end if
token%token_no = token_name_to_token_no (lex_line_no, token_name)
select case (token%token_no)
case (tk_Ident)
! I do no checking of identifier names.
allocate (token%val, source = val_string)
case (tk_Integer)
call check_is_all_digits (lex_line_no, val_string)
allocate (token%val, source = val_string)
case (tk_String)
n = len (val_string, kind = nk)
if (n < 2) then
call string_literal_missing_or_no_good
else if (val_string(1:1) /= ck_'"' .or. val_string(n:n) /= ck_'"') then
call string_literal_missing_or_no_good
else
allocate (token%val, source = val_string)
end if
case default
if (len (val_string, kind = nk) /= 0) then
call start_error_message (lex_line_no)
write (error_unit, '("token should not have a value")')
stop 1
end if
end select
contains
subroutine string_literal_missing_or_no_good
call start_error_message (lex_line_no)
write (error_unit, '("""String"" token requires a string literal")')
stop 1
end subroutine string_literal_missing_or_no_good
end function strbuf_to_token
subroutine split_line (lex_line_no, strbuf, line_no, column_no, token_name, val_string)
integer(kind = nk), intent(in) :: lex_line_no
class(strbuf_t), intent(in) :: strbuf
character(:, kind = ck), allocatable, intent(out) :: line_no
character(:, kind = ck), allocatable, intent(out) :: column_no
character(:, kind = ck), allocatable, intent(out) :: token_name
character(:, kind = ck), allocatable, intent(out) :: val_string
integer(kind = nk) :: i, j
i = skip_whitespace (strbuf, 1_nk)
j = skip_non_whitespace (strbuf, i)
line_no = strbuf%to_unicode(i, j - 1)
call check_is_all_digits (lex_line_no, line_no)
i = skip_whitespace (strbuf, j)
j = skip_non_whitespace (strbuf, i)
column_no = strbuf%to_unicode(i, j - 1)
call check_is_all_digits (lex_line_no, column_no)
i = skip_whitespace (strbuf, j)
j = skip_non_whitespace (strbuf, i)
token_name = strbuf%to_unicode(i, j - 1)
i = skip_whitespace (strbuf, j)
if (strbuf%length() < i) then
val_string = ck_''
else if (strbuf%chars(i) == ck_'"') then
j = skip_whitespace_backwards (strbuf, strbuf%length())
if (strbuf%chars(j) == ck_'"') then
val_string = strbuf%to_unicode(i, j)
else
call start_error_message (lex_line_no)
write (error_unit, '("string literal does not end in a double quote")')
stop 1
end if
else
j = skip_non_whitespace (strbuf, i)
val_string = strbuf%to_unicode(i, j - 1)
i = skip_whitespace (strbuf, j)
if (i <= strbuf%length()) then
call start_error_message (lex_line_no)
write (error_unit, '("token line contains unexpected text")')
stop 1
end if
end if
end subroutine split_line
function token_name_to_token_no (lex_line_no, token_name) result (token_no)
integer(kind = nk), intent(in) :: lex_line_no
character(*, kind = ck), intent(in) :: token_name
integer :: token_no
!!
!! This implementation is not optimized in any way, unless the
!! Fortran compiler can optimize the SELECT CASE.
!!
select case (token_name)
case (ck_"End_of_input")
token_no = tk_EOI
case (ck_"Op_multiply")
token_no = tk_Mul
case (ck_"Op_divide")
token_no = tk_Div
case (ck_"Op_mod")
token_no = tk_Mod
case (ck_"Op_add")
token_no = tk_Add
case (ck_"Op_subtract")
token_no = tk_Sub
case (ck_"Op_negate")
token_no = tk_Negate
case (ck_"Op_not")
token_no = tk_Not
case (ck_"Op_less")
token_no = tk_Lss
case (ck_"Op_lessequal ")
token_no = tk_Leq
case (ck_"Op_greater")
token_no = tk_Gtr
case (ck_"Op_greaterequal")
token_no = tk_Geq
case (ck_"Op_equal")
token_no = tk_Eq
case (ck_"Op_notequal")
token_no = tk_Neq
case (ck_"Op_assign")
token_no = tk_Assign
case (ck_"Op_and")
token_no = tk_And
case (ck_"Op_or")
token_no = tk_Or
case (ck_"Keyword_if")
token_no = tk_If
case (ck_"Keyword_else")
token_no = tk_Else
case (ck_"Keyword_while")
token_no = tk_While
case (ck_"Keyword_print")
token_no = tk_Print
case (ck_"Keyword_putc")
token_no = tk_Putc
case (ck_"LeftParen")
token_no = tk_Lparen
case (ck_"RightParen")
token_no = tk_Rparen
case (ck_"LeftBrace")
token_no = tk_Lbrace
case (ck_"RightBrace")
token_no = tk_Rbrace
case (ck_"Semicolon")
token_no = tk_Semi
case (ck_"Comma")
token_no = tk_Comma
case (ck_"Identifier")
token_no = tk_Ident
case (ck_"Integer")
token_no = tk_Integer
case (ck_"String")
token_no = tk_String
case default
call start_error_message (lex_line_no)
write (error_unit, '("unrecognized token name: ", A)') token_name
stop 1
end select
end function token_name_to_token_no
function skip_whitespace (strbuf, i) result (j)
class(strbuf_t), intent(in) :: strbuf
integer(kind = nk), intent(in) :: i
integer(kind = nk) :: j
logical :: done
j = i
done = .false.
do while (.not. done)
if (at_end_of_line (strbuf, j)) then
done = .true.
else if (.not. isspace (strbuf%chars(j))) then
done = .true.
else
j = j + 1
end if
end do
end function skip_whitespace
function skip_non_whitespace (strbuf, i) result (j)
class(strbuf_t), intent(in) :: strbuf
integer(kind = nk), intent(in) :: i
integer(kind = nk) :: j
logical :: done
j = i
done = .false.
do while (.not. done)
if (at_end_of_line (strbuf, j)) then
done = .true.
else if (isspace (strbuf%chars(j))) then
done = .true.
else
j = j + 1
end if
end do
end function skip_non_whitespace
function skip_whitespace_backwards (strbuf, i) result (j)
class(strbuf_t), intent(in) :: strbuf
integer(kind = nk), intent(in) :: i
integer(kind = nk) :: j
logical :: done
j = i
done = .false.
do while (.not. done)
if (j == -1) then
done = .true.
else if (.not. isspace (strbuf%chars(j))) then
done = .true.
else
j = j - 1
end if
end do
end function skip_whitespace_backwards
function at_end_of_line (strbuf, i) result (bool)
class(strbuf_t), intent(in) :: strbuf
integer(kind = nk), intent(in) :: i
logical :: bool
bool = (strbuf%length() < i)
end function at_end_of_line
elemental function strbuf_is_all_whitespace (strbuf) result (bool)
class(strbuf_t), intent(in) :: strbuf
logical :: bool
integer(kind = nk) :: n
integer(kind = nk) :: i
n = strbuf%length()
if (n == 0) then
bool = .true.
else
i = 1
bool = .true.
do while (bool .and. i /= n + 1)
bool = isspace (strbuf%chars(i))
i = i + 1
end do
end if
end function strbuf_is_all_whitespace
elemental function isspace (ch) result (bool)
character(1, kind = ck), intent(in) :: ch
logical :: bool
bool = (ch == horizontal_tab_char) .or. &
& (ch == linefeed_char) .or. &
& (ch == vertical_tab_char) .or. &
& (ch == formfeed_char) .or. &
& (ch == carriage_return_char) .or. &
& (ch == space_char)
end function isspace
elemental function isdigit (ch) result (bool)
character(1, kind = ck), intent(in) :: ch
logical :: bool
integer(kind = ick), parameter :: zero = ichar (ck_'0', kind = ick)
integer(kind = ick), parameter :: nine = ichar (ck_'9', kind = ick)
integer(kind = ick) :: i_ch
i_ch = ichar (ch, kind = ick)
bool = (zero <= i_ch .and. i_ch <= nine)
end function isdigit
subroutine check_is_all_digits (lex_line_no, str)
integer(kind = nk), intent(in) :: lex_line_no
character(*, kind = ck), intent(in) :: str
integer(kind = nk) :: n
integer(kind = nk) :: i
n = len (str, kind = nk)
if (n == 0_nk) then
call start_error_message (lex_line_no)
write (error_unit, '("a required field is missing")')
stop 1
else
do i = 1, n
if (.not. isdigit (str(i:i))) then
call start_error_message (lex_line_no)
write (error_unit, '("a numeric field contains a non-digit")')
stop 1
end if
end do
end if
end subroutine check_is_all_digits
subroutine start_error_message (lex_line_no)
integer(kind = nk), intent(in) :: lex_line_no
write (error_unit, '("Token stream error at line ", I0, ": ")', advance = 'no') &
& lex_line_no
end subroutine start_error_message
end module reading_of_lexer_tokens
module syntactic_analysis
use, intrinsic :: iso_fortran_env, only: input_unit
use, intrinsic :: iso_fortran_env, only: output_unit
use, intrinsic :: iso_fortran_env, only: error_unit
use, non_intrinsic :: compiler_type_kinds, only: nk, ck, ick
use, non_intrinsic :: string_buffers
use, non_intrinsic :: lexer_token_facts
use, non_intrinsic :: reading_of_lexer_tokens
implicit none
private
public :: ast_node_t
public :: ast_t
public :: parse_token_stream
public :: output_ast_flattened
integer, parameter, public :: tk_start_of_statement = -1
integer, parameter, public :: tk_primary = -2
integer, parameter :: node_Identifier = 1
integer, parameter :: node_String = 2
integer, parameter :: node_Integer = 3
integer, parameter :: node_Sequence = 4
integer, parameter :: node_If = 5
integer, parameter :: node_Prtc = 6
integer, parameter :: node_Prts = 7
integer, parameter :: node_Prti = 8
integer, parameter :: node_While = 9
integer, parameter :: node_Assign = 10
integer, parameter :: node_Negate = 11
integer, parameter :: node_Not = 12
integer, parameter :: node_Multiply = 13
integer, parameter :: node_Divide = 14
integer, parameter :: node_Mod = 15
integer, parameter :: node_Add = 16
integer, parameter :: node_Subtract = 17
integer, parameter :: node_Less = 18
integer, parameter :: node_LessEqual = 19
integer, parameter :: node_Greater = 20
integer, parameter :: node_GreaterEqual = 21
integer, parameter :: node_Equal = 22
integer, parameter :: node_NotEqual = 23
integer, parameter :: node_And = 24
integer, parameter :: node_Or = 25
character(16), parameter :: node_variety_string(1:25) = &
(/ "Identifier ", &
& "String ", &
& "Integer ", &
& "Sequence ", &
& "If ", &
& "Prtc ", &
& "Prts ", &
& "Prti ", &
& "While ", &
& "Assign ", &
& "Negate ", &
& "Not ", &
& "Multiply ", &
& "Divide ", &
& "Mod ", &
& "Add ", &
& "Subtract ", &
& "Less ", &
& "LessEqual ", &
& "Greater ", &
& "GreaterEqual ", &
& "Equal ", &
& "NotEqual ", &
& "And ", &
& "Or " /)
type :: ast_node_t
integer :: node_variety
character(:, kind = ck), allocatable :: val
type(ast_node_t), pointer :: left => null ()
type(ast_node_t), pointer :: right => null ()
contains
procedure, pass :: assign => ast_node_t_assign
generic :: assignment(=) => assign
final :: ast_node_t_finalize
end type ast_node_t
! ast_t phases.
integer, parameter :: building = 1
integer, parameter :: completed = 2
type :: ast_t
!
! This type is used to build the subtrees, as well as for the
! completed AST. The difference is in the setting of ‘phase’.
!
type(ast_node_t), pointer :: node => null ()
integer, private :: phase = building
contains
procedure, pass :: assign => ast_t_assign
generic :: assignment(=) => assign
final :: ast_t_finalize
end type ast_t
type(ast_t), parameter :: ast_nil = ast_t (null ())
contains
recursive subroutine ast_node_t_assign (node, other)
class(ast_node_t), intent(out) :: node
class(*), intent(in) :: other
select type (other)
class is (ast_node_t)
node%node_variety = other%node_variety
if (allocated (other%val)) allocate (node%val, source = other%val)
if (associated (other%left)) allocate (node%left, source = other%left)
if (associated (other%right)) allocate (node%right, source = other%right)
class default
! This branch should never be reached.
error stop
end select
end subroutine ast_node_t_assign
recursive subroutine ast_node_t_finalize (node)
type(ast_node_t), intent(inout) :: node
if (associated (node%left)) deallocate (node%left)
if (associated (node%right)) deallocate (node%right)
end subroutine ast_node_t_finalize
recursive subroutine ast_t_assign (ast, other)
class(ast_t), intent(out) :: ast
class(*), intent(in) :: other
select type (other)
class is (ast_t)
if (associated (other%node)) allocate (ast%node, source = other%node)
!
! Whether it is better to set phase to ‘building’ or to set it
! to ‘other%phase’ is unclear to me. Probably ‘building’ is the
! better choice. Which variable controls memory recovery is
! clear and unchanging, in that case: it is the original,
! ‘other’, that does.
!
ast%phase = building
class default
! This should not happen.
error stop
end select
end subroutine ast_t_assign
subroutine ast_t_finalize (ast)
type(ast_t), intent(inout) :: ast
!
! When we are building the tree, the tree’s nodes should not be
! deallocated when the ast_t variable temporarily holding them
! goes out of scope.
!
! However, once the AST is completed, we do want the memory
! recovered when the variable goes out of scope.
!
! (Elsewhere I have written a primitive garbage collector for
! Fortran programs, but in this case it would be a lot of overhead
! for little gain. In fact, we could reasonably just let the
! memory leak, in this program.
!
! Fortran runtimes *are* allowed by the standard to have garbage
! collectors built in. To my knowledge, at the time of this
! writing, only NAG Fortran has a garbage collector option.)
!
if (ast%phase == completed) then
if (associated (ast%node)) deallocate (ast%node)
end if
end subroutine ast_t_finalize
function parse_token_stream (unit_no) result (ast)
integer, intent(in) :: unit_no
type(ast_t) :: ast
integer(kind = nk) :: lex_line_no
type(ast_t) :: statement
type(lexer_token_t) :: token
lex_line_no = -1_nk
call get_token (unit_no, lex_line_no, token)
call parse_statement (unit_no, lex_line_no, token, statement)
ast = make_internal_node (node_Sequence, ast, statement)
do while (token%token_no /= tk_EOI)
call parse_statement (unit_no, lex_line_no, token, statement)
ast = make_internal_node (node_Sequence, ast, statement)
end do
ast%phase = completed
end function parse_token_stream
recursive subroutine parse_statement (unit_no, lex_line_no, token, ast)
integer, intent(in) :: unit_no
integer(kind = nk), intent(inout) :: lex_line_no
type(lexer_token_t), intent(inout) :: token
type(ast_t), intent(out) :: ast
ast = ast_nil
select case (token%token_no)
case (tk_If)
call parse_ifelse_construct
case (tk_Putc)
call parse_putc
case (tk_Print)
call parse_print
case (tk_Semi)
call get_token (unit_no, lex_line_no, token)
case (tk_Ident)
call parse_identifier
case (tk_While)
call parse_while_construct
case (tk_Lbrace)
call parse_lbrace_construct
case (tk_EOI)
continue
case default
call syntax_error_message ("", tk_start_of_statement, token)
stop 1
end select
contains
recursive subroutine parse_ifelse_construct
type(ast_t) :: predicate
type(ast_t) :: statement_for_predicate_true
type(ast_t) :: statement_for_predicate_false
call expect_token ("If", tk_If, token)
call get_token (unit_no, lex_line_no, token)
call parse_parenthesized_expression (unit_no, lex_line_no, token, predicate)
call parse_statement (unit_no, lex_line_no, token, statement_for_predicate_true)
if (token%token_no == tk_Else) then
call get_token (unit_no, lex_line_no, token)
call parse_statement (unit_no, lex_line_no, token, statement_for_predicate_false)
ast = make_internal_node (node_If, statement_for_predicate_true, &
& statement_for_predicate_false)
else
ast = make_internal_node (node_If, statement_for_predicate_true, ast_nil)
end if
ast = make_internal_node (node_If, predicate, ast)
end subroutine parse_ifelse_construct
recursive subroutine parse_putc
type(ast_t) :: arguments
call expect_token ("Putc", tk_Putc, token)
call get_token (unit_no, lex_line_no, token)
call parse_parenthesized_expression (unit_no, lex_line_no, token, arguments)
ast = make_internal_node (node_Prtc, arguments, ast_nil)
call expect_token ("Putc", tk_Semi, token)
call get_token (unit_no, lex_line_no, token)
end subroutine parse_putc
recursive subroutine parse_print
logical :: done
type(ast_t) :: arg
type(ast_t) :: printer
call expect_token ("Print", tk_Print, token)
call get_token (unit_no, lex_line_no, token)
call expect_token ("Print", tk_Lparen, token)
done = .false.
do while (.not. done)
call get_token (unit_no, lex_line_no, token)
select case (token%token_no)
case (tk_String)
arg = make_leaf_node (node_String, token%val)
printer = make_internal_node (node_Prts, arg, ast_nil)
call get_token (unit_no, lex_line_no, token)
case default
call parse_expression (unit_no, 0, lex_line_no, token, arg)
printer = make_internal_node (node_Prti, arg, ast_nil)
end select
ast = make_internal_node (node_Sequence, ast, printer)
done = (token%token_no /= tk_Comma)
end do
call expect_token ("Print", tk_Rparen, token)
call get_token (unit_no, lex_line_no, token)
call expect_token ("Print", tk_Semi, token)
call get_token (unit_no, lex_line_no, token)
end subroutine parse_print
recursive subroutine parse_identifier
type(ast_t) :: left_side
type(ast_t) :: right_side
left_side = make_leaf_node (node_Identifier, token%val)
call get_token (unit_no, lex_line_no, token)
call expect_token ("assign", tk_Assign, token)
call get_token (unit_no, lex_line_no, token)
call parse_expression (unit_no, 0, lex_line_no, token, right_side)
ast = make_internal_node (node_Assign, left_side, right_side)
call expect_token ("assign", tk_Semi, token)
call get_token (unit_no, lex_line_no, token)
end subroutine parse_identifier
recursive subroutine parse_while_construct
type(ast_t) :: predicate
type(ast_t) :: statement_to_be_repeated
call expect_token ("While", tk_While, token)
call get_token (unit_no, lex_line_no, token)
call parse_parenthesized_expression (unit_no, lex_line_no, token, predicate)
call parse_statement (unit_no, lex_line_no, token, statement_to_be_repeated)
ast = make_internal_node (node_While, predicate, statement_to_be_repeated)
end subroutine parse_while_construct
recursive subroutine parse_lbrace_construct
type(ast_t) :: statement
call expect_token ("Lbrace", tk_Lbrace, token)
call get_token (unit_no, lex_line_no, token)
do while (token%token_no /= tk_Rbrace .and. token%token_no /= tk_EOI)
call parse_statement (unit_no, lex_line_no, token, statement)
ast = make_internal_node (node_Sequence, ast, statement)
end do
call expect_token ("Lbrace", tk_Rbrace, token)
call get_token (unit_no, lex_line_no, token)
end subroutine parse_lbrace_construct
end subroutine parse_statement
recursive subroutine parse_expression (unit_no, p, lex_line_no, token, ast)
integer, intent(in) :: unit_no
integer, intent(in) :: p
integer(kind = nk), intent(inout) :: lex_line_no
type(lexer_token_t), intent(inout) :: token
type(ast_t), intent(out) :: ast
integer :: precedence
type(ast_t) :: expression
select case (token%token_no)
case (tk_Lparen)
call parse_parenthesized_expression (unit_no, lex_line_no, token, ast)
case (tk_Sub)
token%token_no = tk_Negate
precedence = lexer_token_precedence(token%token_no)
call get_token (unit_no, lex_line_no, token)
call parse_expression (unit_no, precedence, lex_line_no, token, expression)
ast = make_internal_node (node_Negate, expression, ast_nil)
case (tk_Add)
token%token_no = tk_Positive
precedence = lexer_token_precedence(token%token_no)
call get_token (unit_no, lex_line_no, token)
call parse_expression (unit_no, precedence, lex_line_no, token, expression)
ast = expression
case (tk_Not)
precedence = lexer_token_precedence(token%token_no)
call get_token (unit_no, lex_line_no, token)
call parse_expression (unit_no, precedence, lex_line_no, token, expression)
ast = make_internal_node (node_Not, expression, ast_nil)
case (tk_Ident)
ast = make_leaf_node (node_Identifier, token%val)
call get_token (unit_no, lex_line_no, token)
case (tk_Integer)
ast = make_leaf_node (node_Integer, token%val)
call get_token (unit_no, lex_line_no, token)
case default
call syntax_error_message ("", tk_primary, token)
stop 1
end select
do while (lexer_token_arity(token%token_no) == 2 .and. &
& p <= lexer_token_precedence(token%token_no))
block
type(ast_t) :: right_expression
integer :: q
integer :: node_variety
if (lexer_token_associativity(token%token_no) == right_associative) then
q = lexer_token_precedence(token%token_no)
else
q = lexer_token_precedence(token%token_no) + 1
end if
node_variety = binary_operator_node_variety (token%token_no)
call get_token (unit_no, lex_line_no, token)
call parse_expression (unit_no, q, lex_line_no, token, right_expression)
ast = make_internal_node (node_variety, ast, right_expression)
end block
end do
end subroutine parse_expression
recursive subroutine parse_parenthesized_expression (unit_no, lex_line_no, token, ast)
integer, intent(in) :: unit_no
integer(kind = nk), intent(inout) :: lex_line_no
type(lexer_token_t), intent(inout) :: token
type(ast_t), intent(out) :: ast
call expect_token ("paren_expr", tk_Lparen, token)
call get_token (unit_no, lex_line_no, token)
call parse_expression (unit_no, 0, lex_line_no, token, ast)
call expect_token ("paren_expr", tk_Rparen, token)
call get_token (unit_no, lex_line_no, token)
end subroutine parse_parenthesized_expression
elemental function binary_operator_node_variety (token_no) result (node_variety)
integer, intent(in) :: token_no
integer :: node_variety
select case (token_no)
case (tk_Mul)
node_variety = node_Multiply
case (tk_Div)
node_variety = node_Divide
case (tk_Mod)
node_variety = node_Mod
case (tk_Add)
node_variety = node_Add
case (tk_Sub)
node_variety = node_Subtract
case (tk_Lss)
node_variety = node_Less
case (tk_Leq)
node_variety = node_LessEqual
case (tk_Gtr)
node_variety = node_Greater
case (tk_Geq)
node_variety = node_GreaterEqual
case (tk_Eq)
node_variety = node_Equal
case (tk_Neq)
node_variety = node_NotEqual
case (tk_And)
node_variety = node_And
case (tk_Or)
node_variety = node_Or
case default
! This branch should never be reached.
error stop
end select
end function binary_operator_node_variety
function make_internal_node (node_variety, left, right) result (ast)
integer, intent(in) :: node_variety
class(ast_t), intent(in) :: left, right
type(ast_t) :: ast
type(ast_node_t), pointer :: node
allocate (node)
node%node_variety = node_variety
node%left => left%node
node%right => right%node
ast%node => node
end function make_internal_node
function make_leaf_node (node_variety, val) result (ast)
integer, intent(in) :: node_variety
character(*, kind = ck), intent(in) :: val
type(ast_t) :: ast
type(ast_node_t), pointer :: node
allocate (node)
node%node_variety = node_variety
node%val = val
ast%node => node
end function make_leaf_node
subroutine get_token (unit_no, lex_line_no, token)
integer, intent(in) :: unit_no
integer(kind = nk), intent(inout) :: lex_line_no
type(lexer_token_t), intent(out) :: token
logical :: eof
call get_lexer_token (unit_no, lex_line_no, eof, token)
if (eof) then
write (error_unit, '("Parser error: the stream of input tokens is incomplete")')
stop 1
end if
end subroutine get_token
subroutine expect_token (message, token_no, token)
character(*), intent(in) :: message
integer, intent (in) :: token_no
class(lexer_token_t), intent(in) :: token
if (token%token_no /= token_no) then
call syntax_error_message (message, token_no, token)
stop 1
end if
end subroutine expect_token
subroutine syntax_error_message (message, expected_token_no, token)
character(*), intent(in) :: message
integer, intent(in) :: expected_token_no
class(lexer_token_t), intent(in) :: token
! Write a message to an output unit dedicated to printing
! errors. The message could, of course, be more detailed than what
! we are doing here.
write (error_unit, '("Syntax error at ", I0, ".", I0)') &
& token%line_no, token%column_no
!
! For the sake of the exercise, also write, to output_unit, a
! message in the style of the C reference program.
!
write (output_unit, '("(", I0, ", ", I0, ") error: ")', advance = 'no') &
& token%line_no, token%column_no
select case (expected_token_no)
case (tk_start_of_statement)
write (output_unit, '("expecting start of statement, found ''", 1A, "''")') &
& trim (lexer_token_string(token%token_no))
case (tk_primary)
write (output_unit, '("Expecting a primary, found ''", 1A, "''")') &
& trim (lexer_token_string(token%token_no))
case default
write (output_unit, '(1A, ": Expecting ''", 1A, "'', found ''", 1A, "''")') &
& trim (message), trim (lexer_token_string(expected_token_no)), &
& trim (lexer_token_string(token%token_no))
end select
end subroutine syntax_error_message
subroutine output_ast_flattened (unit_no, ast)
integer, intent(in) :: unit_no
type(ast_t), intent(in) :: ast
call output_ast_node_flattened (unit_no, ast%node)
end subroutine output_ast_flattened
recursive subroutine output_ast_node_flattened (unit_no, node)
integer, intent(in) :: unit_no
type(ast_node_t), pointer, intent(in) :: node
if (.not. associated (node)) then
write (unit_no, '(";")')
else
if (allocated (node%val)) then
write (unit_no, '(1A16, 2X, 1A)') &
& node_variety_string(node%node_variety), node%val
else
write (unit_no, '(1A)') &
& trim (node_variety_string(node%node_variety))
call output_ast_node_flattened (unit_no, node%left)
call output_ast_node_flattened (unit_no, node%right)
end if
end if
end subroutine output_ast_node_flattened
end module syntactic_analysis
program parse
use, intrinsic :: iso_fortran_env, only: input_unit
use, intrinsic :: iso_fortran_env, only: output_unit
use, intrinsic :: iso_fortran_env, only: error_unit
use, non_intrinsic :: syntactic_analysis
implicit none
integer, parameter :: inp_unit_no = 100
integer, parameter :: outp_unit_no = 101
integer :: arg_count
character(200) :: arg
integer :: inp
integer :: outp
arg_count = command_argument_count ()
if (3 <= arg_count) then
call print_usage
else
if (arg_count == 0) then
inp = input_unit
outp = output_unit
else if (arg_count == 1) then
call get_command_argument (1, arg)
inp = open_for_input (trim (arg))
outp = output_unit
else if (arg_count == 2) then
call get_command_argument (1, arg)
inp = open_for_input (trim (arg))
call get_command_argument (2, arg)
outp = open_for_output (trim (arg))
end if
block
type(ast_t) :: ast
ast = parse_token_stream (inp)
call output_ast_flattened (outp, ast)
end block
end if
contains
function open_for_input (filename) result (unit_no)
character(*), intent(in) :: filename
integer :: unit_no
integer :: stat
open (unit = inp_unit_no, file = filename, status = 'old', &
& action = 'read', access = 'stream', form = 'unformatted', &
& iostat = stat)
if (stat /= 0) then
write (error_unit, '("Error: failed to open ", 1A, " for input")') filename
stop 1
end if
unit_no = inp_unit_no
end function open_for_input
function open_for_output (filename) result (unit_no)
character(*), intent(in) :: filename
integer :: unit_no
integer :: stat
open (unit = outp_unit_no, file = filename, action = 'write', iostat = stat)
if (stat /= 0) then
write (error_unit, '("Error: failed to open ", 1A, " for output")') filename
stop 1
end if
unit_no = outp_unit_no
end function open_for_output
subroutine print_usage
character(200) :: progname
call get_command_argument (0, progname)
write (output_unit, '("Usage: ", 1A, " [INPUT_FILE [OUTPUT_FILE]]")') &
& trim (progname)
end subroutine print_usage
end program parse
- Output:
Prime numbers example:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Go
package main
import (
"bufio"
"fmt"
"log"
"os"
"strconv"
"strings"
)
type TokenType int
const (
tkEOI TokenType = iota
tkMul
tkDiv
tkMod
tkAdd
tkSub
tkNegate
tkNot
tkLss
tkLeq
tkGtr
tkGeq
tkEql
tkNeq
tkAssign
tkAnd
tkOr
tkIf
tkElse
tkWhile
tkPrint
tkPutc
tkLparen
tkRparen
tkLbrace
tkRbrace
tkSemi
tkComma
tkIdent
tkInteger
tkString
)
type NodeType int
const (
ndIdent NodeType = iota
ndString
ndInteger
ndSequence
ndIf
ndPrtc
ndPrts
ndPrti
ndWhile
ndAssign
ndNegate
ndNot
ndMul
ndDiv
ndMod
ndAdd
ndSub
ndLss
ndLeq
ndGtr
ndGeq
ndEql
ndNeq
ndAnd
ndOr
)
type tokS struct {
tok TokenType
errLn int
errCol int
text string // ident or string literal or integer value
}
type Tree struct {
nodeType NodeType
left *Tree
right *Tree
value string
}
// dependency: Ordered by tok, must remain in same order as TokenType consts
type atr struct {
text string
enumText string
tok TokenType
rightAssociative bool
isBinary bool
isUnary bool
precedence int
nodeType NodeType
}
var atrs = []atr{
{"EOI", "End_of_input", tkEOI, false, false, false, -1, -1},
{"*", "Op_multiply", tkMul, false, true, false, 13, ndMul},
{"/", "Op_divide", tkDiv, false, true, false, 13, ndDiv},
{"%", "Op_mod", tkMod, false, true, false, 13, ndMod},
{"+", "Op_add", tkAdd, false, true, false, 12, ndAdd},
{"-", "Op_subtract", tkSub, false, true, false, 12, ndSub},
{"-", "Op_negate", tkNegate, false, false, true, 14, ndNegate},
{"!", "Op_not", tkNot, false, false, true, 14, ndNot},
{"<", "Op_less", tkLss, false, true, false, 10, ndLss},
{"<=", "Op_lessequal", tkLeq, false, true, false, 10, ndLeq},
{">", "Op_greater", tkGtr, false, true, false, 10, ndGtr},
{">=", "Op_greaterequal", tkGeq, false, true, false, 10, ndGeq},
{"==", "Op_equal", tkEql, false, true, false, 9, ndEql},
{"!=", "Op_notequal", tkNeq, false, true, false, 9, ndNeq},
{"=", "Op_assign", tkAssign, false, false, false, -1, ndAssign},
{"&&", "Op_and", tkAnd, false, true, false, 5, ndAnd},
{"||", "Op_or", tkOr, false, true, false, 4, ndOr},
{"if", "Keyword_if", tkIf, false, false, false, -1, ndIf},
{"else", "Keyword_else", tkElse, false, false, false, -1, -1},
{"while", "Keyword_while", tkWhile, false, false, false, -1, ndWhile},
{"print", "Keyword_print", tkPrint, false, false, false, -1, -1},
{"putc", "Keyword_putc", tkPutc, false, false, false, -1, -1},
{"(", "LeftParen", tkLparen, false, false, false, -1, -1},
{")", "RightParen", tkRparen, false, false, false, -1, -1},
{"{", "LeftBrace", tkLbrace, false, false, false, -1, -1},
{"}", "RightBrace", tkRbrace, false, false, false, -1, -1},
{";", "Semicolon", tkSemi, false, false, false, -1, -1},
{",", "Comma", tkComma, false, false, false, -1, -1},
{"Ident", "Identifier", tkIdent, false, false, false, -1, ndIdent},
{"Integer literal", "Integer", tkInteger, false, false, false, -1, ndInteger},
{"String literal", "String", tkString, false, false, false, -1, ndString},
}
var displayNodes = []string{
"Identifier", "String", "Integer", "Sequence", "If", "Prtc", "Prts", "Prti",
"While", "Assign", "Negate", "Not", "Multiply", "Divide", "Mod", "Add",
"Subtract", "Less", "LessEqual", "Greater", "GreaterEqual", "Equal",
"NotEqual", "And", "Or",
}
var (
err error
token tokS
scanner *bufio.Scanner
)
func reportError(errLine, errCol int, msg string) {
log.Fatalf("(%d, %d) error : %s\n", errLine, errCol, msg)
}
func check(err error) {
if err != nil {
log.Fatal(err)
}
}
func getEum(name string) TokenType { // return internal version of name#
for _, atr := range atrs {
if atr.enumText == name {
return atr.tok
}
}
reportError(0, 0, fmt.Sprintf("Unknown token %s\n", name))
return tkEOI
}
func getTok() tokS {
tok := tokS{}
if scanner.Scan() {
line := strings.TrimRight(scanner.Text(), " \t")
fields := strings.Fields(line)
// [ ]*{lineno}[ ]+{colno}[ ]+token[ ]+optional
tok.errLn, err = strconv.Atoi(fields[0])
check(err)
tok.errCol, err = strconv.Atoi(fields[1])
check(err)
tok.tok = getEum(fields[2])
le := len(fields)
if le == 4 {
tok.text = fields[3]
} else if le > 4 {
idx := strings.Index(line, `"`)
tok.text = line[idx:]
}
}
check(scanner.Err())
return tok
}
func makeNode(nodeType NodeType, left *Tree, right *Tree) *Tree {
return &Tree{nodeType, left, right, ""}
}
func makeLeaf(nodeType NodeType, value string) *Tree {
return &Tree{nodeType, nil, nil, value}
}
func expect(msg string, s TokenType) {
if token.tok == s {
token = getTok()
return
}
reportError(token.errLn, token.errCol,
fmt.Sprintf("%s: Expecting '%s', found '%s'\n", msg, atrs[s].text, atrs[token.tok].text))
}
func expr(p int) *Tree {
var x, node *Tree
switch token.tok {
case tkLparen:
x = parenExpr()
case tkSub, tkAdd:
op := token.tok
token = getTok()
node = expr(atrs[tkNegate].precedence)
if op == tkSub {
x = makeNode(ndNegate, node, nil)
} else {
x = node
}
case tkNot:
token = getTok()
x = makeNode(ndNot, expr(atrs[tkNot].precedence), nil)
case tkIdent:
x = makeLeaf(ndIdent, token.text)
token = getTok()
case tkInteger:
x = makeLeaf(ndInteger, token.text)
token = getTok()
default:
reportError(token.errLn, token.errCol,
fmt.Sprintf("Expecting a primary, found: %s\n", atrs[token.tok].text))
}
for atrs[token.tok].isBinary && atrs[token.tok].precedence >= p {
op := token.tok
token = getTok()
q := atrs[op].precedence
if !atrs[op].rightAssociative {
q++
}
node = expr(q)
x = makeNode(atrs[op].nodeType, x, node)
}
return x
}
func parenExpr() *Tree {
expect("parenExpr", tkLparen)
t := expr(0)
expect("parenExpr", tkRparen)
return t
}
func stmt() *Tree {
var t, v, e, s, s2 *Tree
switch token.tok {
case tkIf:
token = getTok()
e = parenExpr()
s = stmt()
s2 = nil
if token.tok == tkElse {
token = getTok()
s2 = stmt()
}
t = makeNode(ndIf, e, makeNode(ndIf, s, s2))
case tkPutc:
token = getTok()
e = parenExpr()
t = makeNode(ndPrtc, e, nil)
expect("Putc", tkSemi)
case tkPrint: // print '(' expr {',' expr} ')'
token = getTok()
for expect("Print", tkLparen); ; expect("Print", tkComma) {
if token.tok == tkString {
e = makeNode(ndPrts, makeLeaf(ndString, token.text), nil)
token = getTok()
} else {
e = makeNode(ndPrti, expr(0), nil)
}
t = makeNode(ndSequence, t, e)
if token.tok != tkComma {
break
}
}
expect("Print", tkRparen)
expect("Print", tkSemi)
case tkSemi:
token = getTok()
case tkIdent:
v = makeLeaf(ndIdent, token.text)
token = getTok()
expect("assign", tkAssign)
e = expr(0)
t = makeNode(ndAssign, v, e)
expect("assign", tkSemi)
case tkWhile:
token = getTok()
e = parenExpr()
s = stmt()
t = makeNode(ndWhile, e, s)
case tkLbrace: // {stmt}
for expect("Lbrace", tkLbrace); token.tok != tkRbrace && token.tok != tkEOI; {
t = makeNode(ndSequence, t, stmt())
}
expect("Lbrace", tkRbrace)
case tkEOI:
// do nothing
default:
reportError(token.errLn, token.errCol,
fmt.Sprintf("expecting start of statement, found '%s'\n", atrs[token.tok].text))
}
return t
}
func parse() *Tree {
var t *Tree
token = getTok()
for {
t = makeNode(ndSequence, t, stmt())
if t == nil || token.tok == tkEOI {
break
}
}
return t
}
func prtAst(t *Tree) {
if t == nil {
fmt.Print(";\n")
} else {
fmt.Printf("%-14s ", displayNodes[t.nodeType])
if t.nodeType == ndIdent || t.nodeType == ndInteger || t.nodeType == ndString {
fmt.Printf("%s\n", t.value)
} else {
fmt.Println()
prtAst(t.left)
prtAst(t.right)
}
}
}
func main() {
source, err := os.Open("source.txt")
check(err)
defer source.Close()
scanner = bufio.NewScanner(source)
prtAst(parse())
}
- Output:
Prime Numbers example:
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
Icon
I use co-expressions in a way that could easily be done differently, but I prefer to use the co-expressions. (These can be sluggish or fast, depending on what sort of Icon you are running. In this case, the speed differences are of little concern.)
- Output:
$ icont -s -u parse.icn && ./parse compiler-tests/primes.lex Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;
J
Implementation:
require'format/printf'
tkref=: tokenize 'End_of_input*/%+--<<=>>===!=!&&||print=print(if{else}while;,putc)a""0'
tkref,. (tknames)=: tknames=:;: {{)n
End_of_input Op_multiply Op_divide Op_mod Op_add Op_subtract Op_negate Op_less
Op_lessequal Op_greater Op_greaterequal Op_equal Op_notequal Op_not Op_and
Op_or Keyword_print Op_assign Keyword_print LeftParen Keyword_if LeftBrace Keyword_else RightBrace
Keyword_while Semicolon Comma Keyword_putc RightParen
Identifier String Integer
}}-.LF
tkV=: 2 (tkref i.tokenize '*/%+-<<=>>===!=&&||')} (#tktyp)#0
tkV=: 1 (1 0+tkref i.tokenize '-!')} tkV
tkPrec=: 13 13 13 12 12 10 10 10 10 9 9 5 4 (tkref i.tokenize'*/%+-<<=>>==!=&&||')} tkV<._1
tkPrec=: 14 (1 0+tkref i.tokenize'-!')} tkPrec
NB. proofread |:(<"1 tkV,.tkPrec),tkref,:tknames
tkref,.(ndDisp)=: ndDisp=:;:{{)n
Sequence Multiply Divide Mod Add Subtract Negate Less LessEqual Greater
GreaterEqual Equal NotEqual Not And Or Prts Assign Prti x If x x x While
x x Prtc x Identifier String Integer
}}-.LF
NB. proofread |:tkref,:ndDisp
gettoken=: {{
'tok_ln tok_col'=: (0;ndx){::x
'tok_name tok_value'=: (1;ndx){::x
if. 'Error'-:tok_name do.
error 'invalid word ',":tok_value
end.
ind=. tknames i.<tok_name
tok_text=: ind{::tkref
tok_valence=: ind{::tkV
tok_precedence=: ind{::tkPrec
ndx=:ndx+1
node_display=: ind{::ndDisp
}}
parse=: {{
ndx=: tok_ln=: tok_col=: 0
gettok=: y&gettoken
gettok''
t=.a:
whilst.-.(a:-:t)+.tok_name-:End_of_input do.
t=. Sequence make_node t stmt''
end.
}}
stmt=:{{)v
t=. a:
select.tok_name
case.Keyword_if do.
s=. stmt e=. paren_expr gettok''
if.Keyword_else-:tok_name
do. S=. stmt gettok''
else. S=. a: end.
t=. If make_node e If make_node s S
case.Keyword_putc do.
e=. paren_expr gettok''
t=. Prtc make_node e a:
Prtc expect Semicolon
case.Keyword_print do.gettok''
'Print' expect LeftParen
while.do.
if.String-:tok_name
do. gettok e=. Prts make_node (String make_leaf tok_value) a:
else. e=. Prti make_node (expr 0) a: end.
t=. Sequence make_node t e
if.Comma-:tok_name
do.Comma expect Comma
else.break.end.
end.
'Print' expect RightParen
'Print' expect Semicolon
case.Semicolon do.gettok''
case.Identifier do.
gettok v=. Identifier make_leaf tok_value
Assign expect Op_assign
t=. Assign make_node v e=. expr 0
Assign expect Semicolon
case.Keyword_while do.
t=. While make_node e s=. stmt e=. paren_expr gettok''
case.LeftBrace do.
'LeftBrace' expect LeftBrace
while.-.(<tok_name) e. RightBrace;End_of_input do.
t=. Sequence make_node t stmt''
end.
'LeftBrace' expect RightBrace
case.End_of_input do.
case.do. error 'Expecting start of statement, found %s'sprintf<tok_text
end.
t
}}
paren_expr=: {{
'paren_expr' expect LeftParen
t=. expr 0
'paren_expr' expect RightParen
t
}}
not_prec=: tkPrec{~tknames i.<Op_not
expr=: {{
select.tok_name
case.LeftParen do.e=. paren_expr''
case.Op_add do.gettok''
e=. expr not_prec
case.Op_subtract do.gettok''
e=. Negate make_node (expr not_prec) a:
case.Op_not do.gettok''
e=. Not make_node (expr not_prec) a:
case.Identifier do.
gettok e=. Identifier make_leaf tok_value
case.Integer do.
gettok e=. Integer make_leaf tok_value
case.do. error 'Expecting a primary, found %s'sprintf<tok_text
end.
while.(2=tok_valence)*tok_precedence>:y do.
q=. 1+tok_precedence [ op=. node_display NB. no right associative operators
gettok''
node=. expr q
e=. op make_node e node
end.
e
}}
expect=: {{
if.y-:tok_name do. gettok'' return.end.
error '%s: Expecting "%s", found "%s"'sprintf x;(tkref{::~tknames i.<y);tok_text
}}
make_leaf=: {{
x;y
}}
make_node=: {{
m;n;<y
}}
error=: {{
echo 'Error: line %d, column %d: %s\n'sprintf tok_ln;tok_col;y throw.
}}
syntax=: {{
;(flatAST parse y),each LF
}}
flatAST=: {{
assert.*L.y
select.#y
case.1 do.<';' assert.y-:a:
case.2 do.<;:inv ":each y
case.3 do.({.y),(flatAST 1{::y),flatAST 2{::y
case.do.assert.0
end.
}}
Some quirks worth noting:
(1) '+' appears in the productions for 'primary' and 'addition_expr' but has only one node type (because we do not represent its appearance in 'primary' with a node.
(2) '-' and 'print' do have two node types (which we sort out on the fly).
(3) In this implementation, we require a 1:1 mapping between the data structure representing token types and the data structure representing node types. This means two token entries for both - and print (the second instance of both gets ignored by the lexer).
(4) Because the data structure produced by the lexer is independent of any type system implementation, we can use the same type system for the lexer or a different type system for the lexer and either way works (as long as the implementations are consistent with the spec).
(5) In this context parallel constant arrays represent token and node types.
Task example:
primes=: {{)n
/*
Simple prime number generator
*/
count = 1;
n = 1;
limit = 100;
while (n < limit) {
k=3;
p=1;
n=n+2;
while ((k*k<=n) && (p)) {
p=n/k*k!=n;
k=k+2;
}
if (p) {
print(n, " is prime\n");
count = count + 1;
}
}
print("Total primes found: ", count, "\n");
}}
syntax lex primes
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier count
Integer 1
Assign
Identifier n
Integer 1
Assign
Identifier limit
Integer 100
While
Less
Identifier n
Identifier limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier k
Integer 3
Assign
Identifier p
Integer 1
Assign
Identifier n
Add
Identifier n
Integer 2
While
And
LessEqual
Multiply
Identifier k
Identifier k
Identifier n
Identifier p
Sequence
Sequence
;
Assign
Identifier p
NotEqual
Divide
Identifier n
Multiply
Identifier k
Identifier k
Identifier n
Assign
Identifier k
Add
Identifier k
Integer 2
If
Identifier p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier n
;
Prts
String " is prime\n"
;
Assign
Identifier count
Add
Identifier count
Integer 1
;
Sequence
Sequence
Sequence
;
Prts
String "Total primes found: "
;
Prti
Identifier count
;
Prts
String "\n"
;
Java
Usage: java Parser infile [>outfile]
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.StringTokenizer;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.util.HashMap;
class Parser {
private List<Token> source;
private Token token;
private int position;
static class Node {
public NodeType nt;
public Node left, right;
public String value;
Node() {
this.nt = null;
this.left = null;
this.right = null;
this.value = null;
}
Node(NodeType node_type, Node left, Node right, String value) {
this.nt = node_type;
this.left = left;
this.right = right;
this.value = value;
}
public static Node make_node(NodeType nodetype, Node left, Node right) {
return new Node(nodetype, left, right, "");
}
public static Node make_node(NodeType