Compiler/lexical analyzer: Difference between revisions

Content added Content deleted
(use shorter example in the whitespace section)
(template "introheader" was renamed to "task heading")
Line 5: Line 5:
: ''Lexical analysis is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified "meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, or scanner (though "scanner" is also used to refer to the first stage of a lexer).''
: ''Lexical analysis is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified "meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, or scanner (though "scanner" is also used to refer to the first stage of a lexer).''


{{task heading}}
{{introheader|The Task}}


Create a lexical analyzer for the simple programming language specified below. The
Create a lexical analyzer for the simple programming language specified below. The
Line 12: Line 12:
if two versions of the solution are provided: One without the lexer module, and one with.
if two versions of the solution are provided: One without the lexer module, and one with.


{{introheader|Input Specification}}
{{task heading|Input Specification}}


The simple programming language to be analyzed is more or less a subset of [[C]]. It supports the following tokens:
The simple programming language to be analyzed is more or less a subset of [[C]]. It supports the following tokens:
Line 162: Line 162:
</pre>
</pre>


{{introheader|Output Format}}
{{task heading|Output Format}}


The program output should be a sequence of lines, each consisting of the following whitespace-separated fields:
The program output should be a sequence of lines, each consisting of the following whitespace-separated fields:
Line 171: Line 171:
# the token value (only for <tt>IDENTIFIER</tt>, <tt>INTEGER</tt>, and <tt>STRING</tt> tokens)
# the token value (only for <tt>IDENTIFIER</tt>, <tt>INTEGER</tt>, and <tt>STRING</tt> tokens)


{{introheader|Diagnostics}}
{{task heading|Diagnostics}}


The following error conditions should be caught:
The following error conditions should be caught:
Line 199: Line 199:
|}
|}


{{introheader|Test Cases}}
{{task heading|Test Cases}}


{| class="wikitable"
{| class="wikitable"
Line 307: Line 307:
|}
|}


{{introheader|Reference}}
{{task heading|Reference}}


The Flex, C, Python and Euphoria versions can be considered reference implementations.
The Flex, C, Python and Euphoria versions can be considered reference implementations.