Compiler/lexical analyzer: Difference between revisions

Compiler/lexical analyzer (view source)

Revision as of 12:54, 14 August 2016

444 bytes added , 7 years ago

improve formatting and wording of task description; make it a draft task until everything is clarified

Anonymous user

rosettacode>Smls

Revision as of 12:47, 14 August 2016 (view source) Thundergnat (talk \| contribs) m (Move to draft status) ← Older edit		Revision as of 12:54, 14 August 2016 (view source) rosettacode>Smls (improve formatting and wording of task description; make it a draft task until everything is clarified) Newer edit →
Line 1: {{draft task}}~~Lexical Analyzer~~ {{clarify task}} Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]: ~~<br>~~ : ''Lexical analysis is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified "meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, or scanner (though "scanner" is also used to refer to the first stage of a lexer).'' ~~Definition from [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]~~ {{introheader\|The Task}} ~~Lexical analysis is the process of converting a sequence of characters (such as in a~~ ~~computer program or web page) into a sequence of tokens (strings with an identified~~ ~~"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer,~~ ~~or scanner (though "scanner" is also used to refer to the first stage of a lexer).~~ ~~;The Task~~ Create a lexical analyzer for the simple programming language specified below. The Line 17 ⟶ 13: if two versions of the solution are provided: One without the lexer module, and one with. ;{{introheader\|Input Specification}} The simple programming language to be analyzed is more or less a subset of [[C]]. It supports the following tokens: ~~The various token types are denoted below.~~ ;Operators Line 41 ⟶ 37: \| > \|\| greater than \|\| Gtr \|- \| !!= \|\| not equal \|\| Neq \|- \| = \|\| assign \|\| Assign Line 97 ⟶ 93: \|} Notes: For char and string literals, '<code>\n'</code> is supported as a new line character. To represent a backslash, use <code>\\</code>. No other special sequences are supported. ~~character. To represent \, use: '\\'. \n may also be used in~~ ~~Strings, to print a newline. No other special sequences are~~ ~~supported.~~ ;White space Line 132 ⟶ 125: They should produce the same token stream, except for the line and column positions. ~~'''~~Comments~~'''~~ enclosed in <code>/* ... /</code> are also ~~(multi-line)~~treated as whitespace outside of strings. ;Complete list of token names <pre> ~~'''EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen, Uminus, Mul, Div, Add, Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma, Assign, Integerk, Stringk, Ident'''~~ EOI Print Putc If While Lbrace Rbrace Lparen Rparen Uminus Mul Div Add Sub Lss Gtr Leq Neq And Semi Comma Assign Integer String Ident </pre> {{introheader\|Output Format}} ~~;Program output~~ The program output should be a sequence of lines, each consisting of the following whitespace-separated fields: ~~Output of the program should be:~~ # <code>line</code> the word '''line''', followed by: # the line number where the token starts~~, followed by:~~ # <code>col</code> the abbreviation '''col''', followed by: # the column number where the token starts~~, followed by:~~ # the ~~'''~~token name~~'''.~~ * If# the token ~~name~~value, isin ~~one~~case of ''Integer'', ~~Ident or~~ ''String'', ~~the~~or ~~actual value of the same should follow.~~''Ident'' {{introheader\|Diagnostics}} ~~;Test Cases~~ The following error conditions should be caught: ~~'''Test Case 1'''~~ {\| class="wikitable" \|- ! Error ! Example \|- \| Empty character constant \| <code>''</code> \|- \| Unknown escape sequence. \| <code>\r</code> \|- \| Multi-character constant. \| <code>xx</code> \|- \| End-of-file in comment. Closing comment characters not found. \|- \| End-of-file while scanning string literal. Closing string character not found. \|- \| End-of-line while scanning string literal. Closing string character not found before end-of-line. \|- \| Unrecognized character. \| <code>\|</code> \|} {{introheader\|Test Cases}} {\| class="wikitable" \|- ! Input ! Output \|- \| style="vertical-align:top" \| <lang c> /* Line 159 ⟶ 190: </lang> \| style="vertical-align:top" \| ~~;Output~~ <b><pre> ~~<b>~~ ~~<pre>~~ line 4 col 1 Print line 4 col 6 Lparen Line 169 ⟶ 198: line 4 col 25 Semi line 5 col 1 EOI </pre></b> ~~</b>~~ \|- ~~'''Test Case 2'''~~ \| style="vertical-align:top" \| <lang c> /* Line 181 ⟶ 210: </lang> \| style="vertical-align:top" \| ~~;Output~~ <b><pre> ~~<b>~~ ~~<pre>~~ line 4 col 1 Ident phoenix_number line 4 col 16 Assign Line 197 ⟶ 224: line 5 col 28 Semi line 6 col 1 EOI </pre></b> ~~</b>~~ \|- ~~'''Test Case 3'''~~ \| style="vertical-align:top" \| <lang c> /* Line 222 ⟶ 249: </lang> \| style="vertical-align:top" \| ~~;Output~~ <b><pre> ~~<b>~~ ~~<pre>~~ line 5 col 15 Print line 5 col 41 Sub Line 252 ⟶ 277: line 17 col 26 Integer 10 line 18 col 26 Integer 32 line 19 col 1 EOI~~</pre>~~ </pre></b> \|} {{introheader\|Reference}} ~~;Diagnostics~~ ~~The following error conditions should be caught:~~ * Empty character constant. Example: '' * Unknown escape sequence. Example: '\r' * Multi-character constant. Example: 'xx' * End-of-file in comment. Closing comment characters not found. * End-of-file while scanning string literal. Closing string character not found. * End-of-line while scanning string literal. Closing string character not found before end-of-line. * Unrecognized character. Example: \| ~~;Reference~~ The Flex, C, Python and Euphoria versions can be considered reference implementations. <hr> ~~;Implementations~~ __TOC__