Compiler/lexical analyzer: Difference between revisions
Content added Content deleted
Thundergnat (talk | contribs) m (Move to draft status) |
(improve formatting and wording of task description; make it a draft task until everything is clarified) |
||
Line 1: | Line 1: | ||
{{draft task}} |
{{draft task}} |
||
{{clarify task}} |
|||
Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]: |
|||
<br> |
|||
: ''Lexical analysis is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified "meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, or scanner (though "scanner" is also used to refer to the first stage of a lexer).'' |
|||
Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia] |
|||
{{introheader|The Task}} |
|||
Lexical analysis is the process of converting a sequence of characters (such as in a |
|||
computer program or web page) into a sequence of tokens (strings with an identified |
|||
"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, |
|||
or scanner (though "scanner" is also used to refer to the first stage of a lexer). |
|||
;The Task |
|||
Create a lexical analyzer for the simple programming language specified below. The |
Create a lexical analyzer for the simple programming language specified below. The |
||
Line 17: | Line 13: | ||
if two versions of the solution are provided: One without the lexer module, and one with. |
if two versions of the solution are provided: One without the lexer module, and one with. |
||
{{introheader|Input Specification}} |
|||
The simple programming language to be analyzed is more or less a subset of [[C]]. It supports the following tokens: |
|||
The various token types are denoted below. |
|||
;Operators |
;Operators |
||
Line 41: | Line 37: | ||
| > || greater than || Gtr |
| > || greater than || Gtr |
||
|- |
|- |
||
| |
| != || not equal || Neq |
||
|- |
|- |
||
| = || assign || Assign |
| = || assign || Assign |
||
Line 97: | Line 93: | ||
|} |
|} |
||
Notes: For char literals, |
Notes: For char and string literals, <code>\n</code> is supported as a new line character. To represent a backslash, use <code>\\</code>. No other special sequences are supported. |
||
character. To represent \, use: '\\'. \n may also be used in |
|||
Strings, to print a newline. No other special sequences are |
|||
supported. |
|||
;White space |
;White space |
||
Line 132: | Line 125: | ||
They should produce the same token stream, except for the line and column positions. |
They should produce the same token stream, except for the line and column positions. |
||
Comments enclosed in <code>/* ... */</code> are also treated as whitespace outside of strings. |
|||
;Complete list of token names |
;Complete list of token names |
||
<pre> |
|||
'''EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen, Uminus, Mul, Div, Add, Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma, Assign, Integerk, Stringk, Ident''' |
|||
EOI Print Putc If While Lbrace Rbrace |
|||
Lparen Rparen Uminus Mul Div Add Sub |
|||
Lss Gtr Leq Neq And Semi Comma |
|||
Assign Integer String Ident |
|||
</pre> |
|||
{{introheader|Output Format}} |
|||
;Program output |
|||
The program output should be a sequence of lines, each consisting of the following whitespace-separated fields: |
|||
Output of the program should be: |
|||
# <code>line</code> |
|||
* the word '''line''', followed by: |
|||
# the line number where the token starts |
|||
# <code>col</code> |
|||
* the abbreviation '''col''', followed by: |
|||
# the column number where the token starts |
|||
# the token name |
|||
# the token value, in case of ''Integer'', ''String'', or ''Ident'' |
|||
{{introheader|Diagnostics}} |
|||
;Test Cases |
|||
The following error conditions should be caught: |
|||
'''Test Case 1''' |
|||
{| class="wikitable" |
|||
|- |
|||
! Error |
|||
! Example |
|||
|- |
|||
| Empty character constant |
|||
| <code>''</code> |
|||
|- |
|||
| Unknown escape sequence. |
|||
| <code>\r</code> |
|||
|- |
|||
| Multi-character constant. |
|||
| <code>xx</code> |
|||
|- |
|||
| End-of-file in comment. Closing comment characters not found. |
|||
|- |
|||
| End-of-file while scanning string literal. Closing string character not found. |
|||
|- |
|||
| End-of-line while scanning string literal. Closing string character not found before end-of-line. |
|||
|- |
|||
| Unrecognized character. |
|||
| <code>|</code> |
|||
|} |
|||
{{introheader|Test Cases}} |
|||
{| class="wikitable" |
|||
|- |
|||
! Input |
|||
! Output |
|||
|- |
|||
| style="vertical-align:top" | |
|||
<lang c> |
<lang c> |
||
/* |
/* |
||
Line 159: | Line 190: | ||
</lang> |
</lang> |
||
| style="vertical-align:top" | |
|||
;Output |
|||
<b><pre> |
|||
<b> |
|||
<pre> |
|||
line 4 col 1 Print |
line 4 col 1 Print |
||
line 4 col 6 Lparen |
line 4 col 6 Lparen |
||
Line 169: | Line 198: | ||
line 4 col 25 Semi |
line 4 col 25 Semi |
||
line 5 col 1 EOI |
line 5 col 1 EOI |
||
</pre> |
</pre></b> |
||
</b> |
|||
|- |
|||
'''Test Case 2''' |
|||
| style="vertical-align:top" | |
|||
<lang c> |
<lang c> |
||
/* |
/* |
||
Line 181: | Line 210: | ||
</lang> |
</lang> |
||
| style="vertical-align:top" | |
|||
;Output |
|||
<b><pre> |
|||
<b> |
|||
<pre> |
|||
line 4 col 1 Ident phoenix_number |
line 4 col 1 Ident phoenix_number |
||
line 4 col 16 Assign |
line 4 col 16 Assign |
||
Line 197: | Line 224: | ||
line 5 col 28 Semi |
line 5 col 28 Semi |
||
line 6 col 1 EOI |
line 6 col 1 EOI |
||
</pre> |
</pre></b> |
||
</b> |
|||
|- |
|||
'''Test Case 3''' |
|||
| style="vertical-align:top" | |
|||
<lang c> |
<lang c> |
||
/* |
/* |
||
Line 222: | Line 249: | ||
</lang> |
</lang> |
||
| style="vertical-align:top" | |
|||
;Output |
|||
<b><pre> |
|||
<b> |
|||
<pre> |
|||
line 5 col 15 Print |
line 5 col 15 Print |
||
line 5 col 41 Sub |
line 5 col 41 Sub |
||
Line 252: | Line 277: | ||
line 17 col 26 Integer 10 |
line 17 col 26 Integer 10 |
||
line 18 col 26 Integer 32 |
line 18 col 26 Integer 32 |
||
line 19 col 1 EOI |
line 19 col 1 EOI |
||
</b> |
</pre></b> |
||
|} |
|||
{{introheader|Reference}} |
|||
;Diagnostics |
|||
The following error conditions should be caught: |
|||
* Empty character constant. Example: '' |
|||
* Unknown escape sequence. Example: '\r' |
|||
* Multi-character constant. Example: 'xx' |
|||
* End-of-file in comment. Closing comment characters not found. |
|||
* End-of-file while scanning string literal. Closing string character not found. |
|||
* End-of-line while scanning string literal. Closing string character not found before end-of-line. |
|||
* Unrecognized character. Example: | |
|||
;Reference |
|||
The Flex, C, Python and Euphoria versions can be considered reference implementations. |
The Flex, C, Python and Euphoria versions can be considered reference implementations. |
||
<hr> |
|||
;Implementations |
|||
__TOC__ |
__TOC__ |
||