Compiler/syntax analyzer: Difference between revisions
Content added Content deleted
(J) |
m (J: eliminate a redundancy and throw in a few comments) |
||
Line 3,304: | Line 3,304: | ||
Implementation: |
Implementation: |
||
<lang J> |
<lang J>require'format/printf' |
||
require'format/printf' |
|||
tkref=: tokenize 'End_of_input*/%+--<<=>>===!=!&&||print=print(if{else}while; |
tkref=: tokenize 'End_of_input*/%+--<<=>>===!=!&&||print=print(if{else}while;,putc)a""0' |
||
tkref,. (tknames)=: tknames=:;: {{)n |
tkref,. (tknames)=: tknames=:;: {{)n |
||
End_of_input Op_multiply Op_divide Op_mod Op_add Op_subtract Op_negate Op_less |
End_of_input Op_multiply Op_divide Op_mod Op_add Op_subtract Op_negate Op_less |
||
Op_lessequal Op_greater Op_greaterequal Op_equal Op_notequal Op_not Op_and |
Op_lessequal Op_greater Op_greaterequal Op_equal Op_notequal Op_not Op_and |
||
Op_or Keyword_print Op_assign Keyword_print LeftParen Keyword_if LeftBrace Keyword_else RightBrace |
Op_or Keyword_print Op_assign Keyword_print LeftParen Keyword_if LeftBrace Keyword_else RightBrace |
||
Keyword_while Semicolon |
Keyword_while Semicolon Comma Keyword_putc RightParen |
||
Identifier String Integer |
Identifier String Integer |
||
}}-.LF |
}}-.LF |
||
Line 3,319: | Line 3,318: | ||
tkEOI tkMul tkDiv tkMod tkAdd tkSub tkNeg tkLss tkLeq tkGtr tkGeq tkEql |
tkEOI tkMul tkDiv tkMod tkAdd tkSub tkNeg tkLss tkLeq tkGtr tkGeq tkEql |
||
tkNeq tkNot tkAnd tkOr tkPrint tkAssign tkPrint tkLpar tkIf tkLbra tkElse |
tkNeq tkNot tkAnd tkOr tkPrint tkAssign tkPrint tkLpar tkIf tkLbra tkElse |
||
tkRbra tkWhile tkSemi |
tkRbra tkWhile tkSemi tkComma tkPutc tkRpar tkId tkString tkInt |
||
}}-.LF |
}}-.LF |
||
Line 3,332: | Line 3,331: | ||
Sequence Multiply Divide Mod Add Subtract Negate Less LessEqual Greater |
Sequence Multiply Divide Mod Add Subtract Negate Less LessEqual Greater |
||
GreaterEqual Equal NotEqual Not And Or Prts Assign Prti x If x x x While |
GreaterEqual Equal NotEqual Not And Or Prts Assign Prti x If x x x While |
||
x |
x x Putc x Identifier String Integer |
||
}}-.LF |
}}-.LF |
||
NB. proofread |:ndRef,:ndDisp |
NB. proofread |:ndRef,:ndDisp |
||
Line 3,385: | Line 3,384: | ||
case.tkPutc do. |
case.tkPutc do. |
||
e=. paren_expr gettok'' |
e=. paren_expr gettok'' |
||
t=. |
t=. Putc make_node e a: |
||
Putc expect tkSemi |
Putc expect tkSemi |
||
case.tkPrint do.gettok'' |
case.tkPrint do.gettok'' |
||
Line 3,467: | Line 3,466: | ||
echo 'Error: line %d, column %d: %s\n'sprintf tok_ln;tok_col;y throw. |
echo 'Error: line %d, column %d: %s\n'sprintf tok_ln;tok_col;y throw. |
||
}} |
}} |
||
syntax=: {{ |
syntax=: {{ |
||
Line 3,481: | Line 3,481: | ||
end. |
end. |
||
}} |
}} |
||
</lang> |
</lang> |
||
Some quirks worth noting: |
|||
(1) '+' appears in the productions for 'primary' and 'addition_expr' but has only one node type (because we do not represent its appearance in 'primary' with a node. |
|||
(2) '-' and 'print' do have two node types (which we sort out on the fly). |
|||
(3) In this implementation, we require a 1:1 mapping between the data structure representing token types and the data structure representing node types. This means two token entries for both - and print (the second instance of both gets ignored by the lexer). |
|||
(4) Because the data structure produced by the lexer is independent of any type system implementation, we can use the same type system for the lexer or a different type system for the lexer and either way works (as long as the implementations are consistent with the spec). |
|||
Task example: |
Task example: |