Compiler/syntax analyzer: Difference between revisions

Content added Content deleted
(J)
m (J: eliminate a redundancy and throw in a few comments)
Line 3,304: Line 3,304:
Implementation:
Implementation:


<lang J>
<lang J>require'format/printf'
require'format/printf'


tkref=: tokenize 'End_of_input*/%+--<<=>>===!=!&&||print=print(if{else}while;print,putc)a""0'
tkref=: tokenize 'End_of_input*/%+--<<=>>===!=!&&||print=print(if{else}while;,putc)a""0'
tkref,. (tknames)=: tknames=:;: {{)n
tkref,. (tknames)=: tknames=:;: {{)n
End_of_input Op_multiply Op_divide Op_mod Op_add Op_subtract Op_negate Op_less
End_of_input Op_multiply Op_divide Op_mod Op_add Op_subtract Op_negate Op_less
Op_lessequal Op_greater Op_greaterequal Op_equal Op_notequal Op_not Op_and
Op_lessequal Op_greater Op_greaterequal Op_equal Op_notequal Op_not Op_and
Op_or Keyword_print Op_assign Keyword_print LeftParen Keyword_if LeftBrace Keyword_else RightBrace
Op_or Keyword_print Op_assign Keyword_print LeftParen Keyword_if LeftBrace Keyword_else RightBrace
Keyword_while Semicolon Keyword_print Comma Keyword_putc RightParen
Keyword_while Semicolon Comma Keyword_putc RightParen
Identifier String Integer
Identifier String Integer
}}-.LF
}}-.LF
Line 3,319: Line 3,318:
tkEOI tkMul tkDiv tkMod tkAdd tkSub tkNeg tkLss tkLeq tkGtr tkGeq tkEql
tkEOI tkMul tkDiv tkMod tkAdd tkSub tkNeg tkLss tkLeq tkGtr tkGeq tkEql
tkNeq tkNot tkAnd tkOr tkPrint tkAssign tkPrint tkLpar tkIf tkLbra tkElse
tkNeq tkNot tkAnd tkOr tkPrint tkAssign tkPrint tkLpar tkIf tkLbra tkElse
tkRbra tkWhile tkSemi tkPrint tkComma tkPutc tkRpar tkId tkString tkInt
tkRbra tkWhile tkSemi tkComma tkPutc tkRpar tkId tkString tkInt
}}-.LF
}}-.LF


Line 3,332: Line 3,331:
Sequence Multiply Divide Mod Add Subtract Negate Less LessEqual Greater
Sequence Multiply Divide Mod Add Subtract Negate Less LessEqual Greater
GreaterEqual Equal NotEqual Not And Or Prts Assign Prti x If x x x While
GreaterEqual Equal NotEqual Not And Or Prts Assign Prti x If x x x While
x Prtc x Putc x Identifier String Integer
x x Putc x Identifier String Integer
}}-.LF
}}-.LF
NB. proofread |:ndRef,:ndDisp
NB. proofread |:ndRef,:ndDisp
Line 3,385: Line 3,384:
case.tkPutc do.
case.tkPutc do.
e=. paren_expr gettok''
e=. paren_expr gettok''
t=. Prtc make_node e a:
t=. Putc make_node e a:
Putc expect tkSemi
Putc expect tkSemi
case.tkPrint do.gettok''
case.tkPrint do.gettok''
Line 3,467: Line 3,466:
echo 'Error: line %d, column %d: %s\n'sprintf tok_ln;tok_col;y throw.
echo 'Error: line %d, column %d: %s\n'sprintf tok_ln;tok_col;y throw.
}}
}}



syntax=: {{
syntax=: {{
Line 3,481: Line 3,481:
end.
end.
}}
}}

</lang>
</lang>

Some quirks worth noting:

(1) '+' appears in the productions for 'primary' and 'addition_expr' but has only one node type (because we do not represent its appearance in 'primary' with a node.

(2) '-' and 'print' do have two node types (which we sort out on the fly).

(3) In this implementation, we require a 1:1 mapping between the data structure representing token types and the data structure representing node types. This means two token entries for both - and print (the second instance of both gets ignored by the lexer).

(4) Because the data structure produced by the lexer is independent of any type system implementation, we can use the same type system for the lexer or a different type system for the lexer and either way works (as long as the implementations are consistent with the spec).


Task example:
Task example: