Revision as of 21:39, 28 January 2022 (view source) Rdm (talk \| contribs) (J: bugfix '\n' is not an identifier) ← Older edit		Revision as of 16:56, 29 January 2022 (view source) Rdm (talk \| contribs) m (J: add some documentation) Newer edit →
Line 4,774: =={{header\|J}}== Here, we first build a tokenizer state machine sufficient to recognize our mini-language. This tokenizer must not discard any characters, because we will be using cumulative character offsets to identify line numbers and column numbers. Then, we refine this result: we generate those line and column numbers, discard whitespace and comments, and classify tokens based on their structure. (Also, in this version, rather than building out a full state machine to recognize character literals, we treat character literals as a sequence of tokens which we must then refine. It might have been wiser to build character literals as single tokens,) Implementation: Line 4,853 ⟶ 4,859: RightBrace Keyword_while Semicolon Keyword_print Comma Keyword_putc }}-.LF)=: tkref=: tokenize '*/%+-<<=>>===!=!&&\|\|=()if{else}while;print,putc' NB. the reference tokens here were arranged to avoid whitespace tokens NB. also, we reserve multiple token instances where a literal string NB. appears in different syntactic productions. Here, we only use the initial NB. instances -- the others will be used in the syntax analyzer which NB. uses the same tkref and tknames, shift=: \|.!.0

Compiler/lexical analyzer (view source)