Jump to content

Compiler/lexical analyzer: Difference between revisions

improve formatting and wording of task description; make it a draft task until everything is clarified
m (Move to draft status)
(improve formatting and wording of task description; make it a draft task until everything is clarified)
Line 1:
{{draft task}}Lexical Analyzer
{{clarify task}}
 
Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]:
<br>
 
: ''Lexical analysis is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified "meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, or scanner (though "scanner" is also used to refer to the first stage of a lexer).''
Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]
 
{{introheader|The Task}}
Lexical analysis is the process of converting a sequence of characters (such as in a
computer program or web page) into a sequence of tokens (strings with an identified
"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer,
or scanner (though "scanner" is also used to refer to the first stage of a lexer).
 
;The Task
 
Create a lexical analyzer for the simple programming language specified below. The
Line 17 ⟶ 13:
if two versions of the solution are provided: One without the lexer module, and one with.
 
;{{introheader|Input Specification}}
 
The simple programming language to be analyzed is more or less a subset of [[C]]. It supports the following tokens:
The various token types are denoted below.
 
;Operators
Line 41 ⟶ 37:
| > || greater than || Gtr
|-
| !&#33;= || not equal || Neq
|-
| = || assign || Assign
Line 97 ⟶ 93:
|}
 
Notes: For char and string literals, '<code>\n'</code> is supported as a new line character. To represent a backslash, use <code>\\</code>. No other special sequences are supported.
character. To represent \, use: '\\'. \n may also be used in
Strings, to print a newline. No other special sequences are
supported.
 
;White space
Line 132 ⟶ 125:
They should produce the same token stream, except for the line and column positions.
 
'''Comments''' enclosed in <code>/* ... */</code> are also (multi-line)treated as whitespace outside of strings.
 
;Complete list of token names
 
<pre>
'''EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen, Uminus, Mul, Div, Add, Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma, Assign, Integerk, Stringk, Ident'''
EOI Print Putc If While Lbrace Rbrace
Lparen Rparen Uminus Mul Div Add Sub
Lss Gtr Leq Neq And Semi Comma
Assign Integer String Ident
</pre>
 
{{introheader|Output Format}}
;Program output
 
The program output should be a sequence of lines, each consisting of the following whitespace-separated fields:
Output of the program should be:
 
# <code>line</code>
* the word '''line''', followed by:
*# the line number where the token starts, followed by:
# <code>col</code>
* the abbreviation '''col''', followed by:
*# the column number where the token starts, followed by:
*# the '''token name'''.
* If# the token namevalue, isin onecase of ''Integer'', Ident or ''String'', theor actual value of the same should follow.''Ident''
 
{{introheader|Diagnostics}}
;Test Cases
 
The following error conditions should be caught:
'''Test Case 1'''
 
{| class="wikitable"
|-
! Error
! Example
|-
| Empty character constant
| <code>&apos;&apos;</code>
|-
| Unknown escape sequence.
| <code>\r</code>
|-
| Multi-character constant.
| <code>xx</code>
|-
| End-of-file in comment. Closing comment characters not found.
|-
| End-of-file while scanning string literal. Closing string character not found.
|-
| End-of-line while scanning string literal. Closing string character not found before end-of-line.
|-
| Unrecognized character.
| <code>&#124;</code>
|}
 
{{introheader|Test Cases}}
 
{| class="wikitable"
|-
! Input
! Output
|-
| style="vertical-align:top" |
<lang c>
/*
Line 159 ⟶ 190:
</lang>
 
| style="vertical-align:top" |
;Output
<b><pre>
 
<b>
<pre>
line 4 col 1 Print
line 4 col 6 Lparen
Line 169 ⟶ 198:
line 4 col 25 Semi
line 5 col 1 EOI
</pre></b>
</b>
 
|-
'''Test Case 2'''
| style="vertical-align:top" |
<lang c>
/*
Line 181 ⟶ 210:
</lang>
 
| style="vertical-align:top" |
;Output
<b><pre>
 
<b>
<pre>
line 4 col 1 Ident phoenix_number
line 4 col 16 Assign
Line 197 ⟶ 224:
line 5 col 28 Semi
line 6 col 1 EOI
</pre></b>
</b>
 
|-
'''Test Case 3'''
| style="vertical-align:top" |
<lang c>
/*
Line 222 ⟶ 249:
</lang>
 
| style="vertical-align:top" |
;Output
<b><pre>
 
<b>
<pre>
line 5 col 15 Print
line 5 col 41 Sub
Line 252 ⟶ 277:
line 17 col 26 Integer 10
line 18 col 26 Integer 32
line 19 col 1 EOI</pre>
</pre></b>
|}
 
{{introheader|Reference}}
;Diagnostics
The following error conditions should be caught:
 
* Empty character constant. Example: &apos;&apos;
* Unknown escape sequence. Example: '\r'
* Multi-character constant. Example: 'xx'
* End-of-file in comment. Closing comment characters not found.
* End-of-file while scanning string literal. Closing string character not found.
* End-of-line while scanning string literal. Closing string character not found before end-of-line.
* Unrecognized character. Example: |
 
;Reference
 
The Flex, C, Python and Euphoria versions can be considered reference implementations.
 
<hr>
;Implementations
 
__TOC__
 
Anonymous user
Cookies help us deliver our services. By using our services, you agree to our use of cookies.