Jump to content

Compiler/lexical analyzer: Difference between revisions

m
J: add some documentation
(J: bugfix '\n' is not an identifier)
m (J: add some documentation)
Line 4,774:
 
=={{header|J}}==
Here, we first build a tokenizer state machine sufficient to recognize our mini-language. This tokenizer must not discard any characters, because we will be using cumulative character offsets to identify line numbers and column numbers.
 
Then, we refine this result: we generate those line and column numbers, discard whitespace and comments, and classify tokens based on their structure.
 
(Also, in this version, rather than building out a full state machine to recognize character literals, we treat character literals as a sequence of tokens which we must then refine. It might have been wiser to build character literals as single tokens,)
 
Implementation:
 
Line 4,853 ⟶ 4,859:
RightBrace Keyword_while Semicolon Keyword_print Comma Keyword_putc
}}-.LF)=: tkref=: tokenize '*/%+-<<=>>===!=!&&||=()if{else}while;print,putc'
NB. the reference tokens here were arranged to avoid whitespace tokens
NB. also, we reserve multiple token instances where a literal string
NB. appears in different syntactic productions. Here, we only use the initial
NB. instances -- the others will be used in the syntax analyzer which
NB. uses the same tkref and tknames,
 
shift=: |.!.0
6,962

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.