Compiler/lexical analyzer: Difference between revisions

Content added Content deleted

Inline

@@ Line 1: / Line 1: @@
-{{draft task}}Lexical Analyzer
+{{draft task}}
+{{clarify task}}
+Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]:
-<br>
+: ''Lexical analysis is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified "meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, or scanner (though "scanner" is also used to refer to the first stage of a lexer).''
-Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]
+{{introheader|The Task}}
-Lexical analysis is the process of converting a sequence of characters (such as in a
-computer program or web page) into a sequence of tokens (strings with an identified
-"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer,
-or scanner (though "scanner" is also used to refer to the first stage of a lexer).
-;The Task
 Create a lexical analyzer for the simple programming language specified below.  The
@@ Line 17: / Line 13: @@
 if two versions of the solution are provided:  One without the lexer module, and one with.
-;Specification
+{{introheader|Input Specification}}
+The simple programming language to be analyzed is more or less a subset of [[C]]. It supports the following tokens:
-The various token types are denoted below.
 ;Operators
@@ Line 41: / Line 37: @@
 |  >   || greater than          || Gtr
 |-
-|  !=  || not equal             || Neq
+|  &#33;=  || not equal             || Neq
 |-
 |  =   || assign                || Assign
@@ Line 97: / Line 93: @@
 |}
-Notes: For char literals, '\n' is supported as a new line
+Notes: For char and string literals, <code>\n</code> is supported as a new line character. To represent a backslash, use <code>\\</code>. No other special sequences are supported.
-character. To represent \, use: '\\'.  \n may also be used in
-Strings, to print a newline. No other special sequences are
-supported.
 ;White space
@@ Line 132: / Line 125: @@
 They should produce the same token stream, except for the line and column positions.
-'''Comments'''    /* ... */   (multi-line)
+Comments enclosed in <code>/* ... */</code> are also treated as whitespace outside of strings.
 ;Complete list of token names
+<pre>
-'''EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen, Uminus, Mul, Div, Add, Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma, Assign, Integerk, Stringk, Ident'''
+EOI     Print    Putc    If     While  Lbrace  Rbrace
+Lparen  Rparen   Uminus  Mul    Div    Add     Sub
+Lss     Gtr      Leq     Neq    And    Semi    Comma
+Assign  Integer  String  Ident
+</pre>
+{{introheader|Output Format}}
-;Program output
+The program output should be a sequence of lines, each consisting of the following whitespace-separated fields:
-Output of the program should be:
+# <code>line</code>
-* the word '''line''', followed by:
-* the line number where the token starts, followed by:
+# the line number where the token starts
+# <code>col</code>
-* the abbreviation '''col''', followed by:
-* the column number where the token starts, followed by:
+# the column number where the token starts
-* the '''token name'''.
+# the token name
-* If the token name is one of Integer, Ident or String, the actual value of the same should follow.
+# the token value, in case of ''Integer'', ''String'', or ''Ident''
+{{introheader|Diagnostics}}
-;Test Cases
+The following error conditions should be caught:
-'''Test Case 1'''
+{| class="wikitable"
+|-
+! Error
+! Example
+|-
+| Empty character constant
+| <code>&apos;&apos;</code>
+|-
+| Unknown escape sequence.
+| <code>\r</code>
+|-
+| Multi-character constant.
+| <code>xx</code>
+|-
+| End-of-file in comment.     Closing comment characters not found.
+|-
+| End-of-file while scanning string literal. Closing string character not found.
+|-
+| End-of-line while scanning string literal. Closing string character not found before end-of-line.
+|-
+| Unrecognized character.
+| <code>&#124;</code>
+|}
+{{introheader|Test Cases}}
+{| class="wikitable"
+|-
+! Input
+! Output
+|-
+| style="vertical-align:top" |
 <lang c>
 /*
@@ Line 159: / Line 190: @@
 </lang>
+| style="vertical-align:top" |
-;Output
+<b><pre>
-<b>
-<pre>
 line     4  col     1 Print
 line     4  col     6 Lparen
@@ Line 169: / Line 198: @@
 line     4  col    25 Semi
 line     5  col     1 EOI
-</pre>
+</pre></b>
-</b>
+|-
-'''Test Case 2'''
+| style="vertical-align:top" |
 <lang c>
 /*
@@ Line 181: / Line 210: @@
 </lang>
+| style="vertical-align:top" |
-;Output
+<b><pre>
-<b>
-<pre>
 line     4  col     1 Ident    phoenix_number
 line     4  col    16 Assign
@@ Line 197: / Line 224: @@
 line     5  col    28 Semi
 line     6  col     1 EOI
-</pre>
+</pre></b>
-</b>
+|-
-'''Test Case 3'''
+| style="vertical-align:top" |
 <lang c>
 /*
@@ Line 222: / Line 249: @@
 </lang>
+| style="vertical-align:top" |
-;Output
+<b><pre>
-<b>
-<pre>
 line     5  col    15 Print
 line     5  col    41 Sub
@@ Line 252: / Line 277: @@
 line    17  col    26 Integer         10
 line    18  col    26 Integer         32
-line    19  col     1 EOI</pre>
+line    19  col     1 EOI
-</b>
+</pre></b>
+|}
+{{introheader|Reference}}
-;Diagnostics
-The following error conditions should be caught:
-* Empty character constant.   Example: &apos;&apos;
-* Unknown escape sequence.    Example: '\r'
-* Multi-character constant.   Example: 'xx'
-* End-of-file in comment.     Closing comment characters not found.
-* End-of-file while scanning string literal. Closing string character not found.
-* End-of-line while scanning string literal. Closing string character not found before end-of-line.
-* Unrecognized character.     Example: |
-;Reference
 The Flex, C, Python and Euphoria versions can be considered reference implementations.
+<hr>
-;Implementations
 __TOC__