Anonymous user
Compiler/lexical analyzer: Difference between revisions
improve formatting and wording of task description; make it a draft task until everything is clarified
Thundergnat (talk | contribs) m (Move to draft status) |
(improve formatting and wording of task description; make it a draft task until everything is clarified) |
||
Line 1:
{{draft task}}
{{clarify task}}
Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]:
: ''Lexical analysis is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified "meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, or scanner (though "scanner" is also used to refer to the first stage of a lexer).''
{{introheader|The Task}}
Create a lexical analyzer for the simple programming language specified below. The
Line 17 ⟶ 13:
if two versions of the solution are provided: One without the lexer module, and one with.
The simple programming language to be analyzed is more or less a subset of [[C]]. It supports the following tokens:
;Operators
Line 41 ⟶ 37:
| > || greater than || Gtr
|-
|
|-
| = || assign || Assign
Line 97 ⟶ 93:
|}
Notes: For char and string literals,
;White space
Line 132 ⟶ 125:
They should produce the same token stream, except for the line and column positions.
;Complete list of token names
<pre>
EOI Print Putc If While Lbrace Rbrace
Lparen Rparen Uminus Mul Div Add Sub
Lss Gtr Leq Neq And Semi Comma
Assign Integer String Ident
</pre>
{{introheader|Output Format}}
The program output should be a sequence of lines, each consisting of the following whitespace-separated fields:
# <code>line</code>
# <code>col</code>
{{introheader|Diagnostics}}
The following error conditions should be caught:
{| class="wikitable"
|-
! Error
! Example
|-
| Empty character constant
| <code>''</code>
|-
| Unknown escape sequence.
| <code>\r</code>
|-
| Multi-character constant.
| <code>xx</code>
|-
| End-of-file in comment. Closing comment characters not found.
|-
| End-of-file while scanning string literal. Closing string character not found.
|-
| End-of-line while scanning string literal. Closing string character not found before end-of-line.
|-
| Unrecognized character.
| <code>|</code>
|}
{{introheader|Test Cases}}
{| class="wikitable"
|-
! Input
! Output
|-
| style="vertical-align:top" |
<lang c>
/*
Line 159 ⟶ 190:
</lang>
| style="vertical-align:top" |
<b><pre>
line 4 col 1 Print
line 4 col 6 Lparen
Line 169 ⟶ 198:
line 4 col 25 Semi
line 5 col 1 EOI
</pre></b>
|-
| style="vertical-align:top" |
<lang c>
/*
Line 181 ⟶ 210:
</lang>
| style="vertical-align:top" |
<b><pre>
line 4 col 1 Ident phoenix_number
line 4 col 16 Assign
Line 197 ⟶ 224:
line 5 col 28 Semi
line 6 col 1 EOI
</pre></b>
|-
| style="vertical-align:top" |
<lang c>
/*
Line 222 ⟶ 249:
</lang>
| style="vertical-align:top" |
<b><pre>
line 5 col 15 Print
line 5 col 41 Sub
Line 252 ⟶ 277:
line 17 col 26 Integer 10
line 18 col 26 Integer 32
line 19 col 1 EOI
</pre></b>
|}
{{introheader|Reference}}
The Flex, C, Python and Euphoria versions can be considered reference implementations.
<hr>
__TOC__
|