User:Ed Davis: Difference between revisions
Content added Content deleted
No edit summary |
m (Replaced content with "Hello, World!") |
||
(28 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Hello, World! |
|||
{{task}}Description of the task |
|||
Lexical Analyzer |
|||
---------------- |
|||
From Wikipedia: (https://en.wikipedia.org/wiki/Lexical_analysis) |
|||
Lexical analysis is the process of converting a sequence of characters (such as in a |
|||
computer program or web page) into a sequence of tokens (strings with an identified |
|||
"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, |
|||
or scanner (though "scanner" is also used to refer to the first stage of a lexer). |
|||
;The Task |
|||
Create a lexical analyzer for the Tiny programming language. |
|||
;Specification |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Regular expression !! Name |
|||
|- |
|||
| integers || [0-9]+ || Integer |
|||
|- |
|||
| char literal || 'x' || Integer |
|||
|- |
|||
| identifiers || [_a-zA-Z][_a-zA-Z0-9]+ || Ident |
|||
|- |
|||
| string literal || ".*" || String |
|||
|} |
|||
Notes: For char literals, '\n' is supported as a new line |
|||
character. To represent \, use: '\\'. \n may also be used in |
|||
Strings, to print a newline. No other special sequences are |
|||
supported. |
|||
operators: |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Common name !! Name |
|||
|- |
|||
| '*' || multiply || Mul |
|||
|- |
|||
| '/' || divide || Div |
|||
|- |
|||
| '+' || plus || Add |
|||
|- |
|||
| '-' || minus and unary minus || Sub and Uminus |
|||
|- |
|||
| '<' || less than || Lss |
|||
|- |
|||
| '<=' || less than or equal || Leq |
|||
|- |
|||
| '>' || greater than || Gtr |
|||
|- |
|||
| '!=' || not equal || Neq |
|||
|- |
|||
| '=' || assign || Assign |
|||
|- |
|||
| '&&' || and || And |
|||
|} |
|||
symbols: |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Common name !! Name |
|||
|- |
|||
| '(' || left parenthesis || Lparen |
|||
|- |
|||
| ')' || right parenthesis || Rparen |
|||
|- |
|||
| '{' || left brace || Lbrace |
|||
|- |
|||
| '}' || right brace || Rbrace |
|||
|- |
|||
| ';' || semi colon || Semi |
|||
|- |
|||
| ',' || comma || Comma |
|||
|} |
|||
keywords: |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Name |
|||
|- |
|||
| "if" || If |
|||
|- |
|||
| "while" || While |
|||
|- |
|||
| "print" || Print |
|||
|- |
|||
| "putc" || Putc |
|||
|} |
|||
comments: /* ... */ (multi-line) |
|||
Complete list of token names: |
|||
EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen, Uminus, Mul, Div, Add, |
|||
Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma, Assign, Integerk, Stringk, Ident |
|||
Output of the program should be the line and column where the |
|||
found token starts, followed by the Token name. For tokens |
|||
Integer, Ident and String, the Integer, identifier, or string |
|||
should follow. |
|||
Test Cases |
|||
---------- |
|||
/* |
|||
Hello world |
|||
*/ |
|||
print("Hello, World!\n"); |
|||
Output |
|||
------ |
|||
line 4 col 1 Print |
|||
line 4 col 6 Lparen |
|||
line 4 col 7 String "Hello, World!\n" |
|||
line 4 col 24 Rparen |
|||
line 4 col 25 Semi |
|||
line 5 col 1 EOI |
|||
/* |
|||
Show Ident and Integers |
|||
*/ |
|||
phoenix_number = 142857; |
|||
print(phoenix_number, "\n"); |
|||
Output |
|||
------ |
|||
line 1 col 1 Ident phoenix_number |
|||
line 1 col 16 Assign |
|||
line 1 col 18 Integer 142857 |
|||
line 1 col 24 Semi |
|||
line 2 col 1 Print |
|||
line 2 col 6 Lparen |
|||
line 2 col 7 Ident phoenix_number |
|||
line 2 col 21 Comma |
|||
line 2 col 23 String "\n" |
|||
line 2 col 27 Rparen |
|||
line 2 col 28 Semi |
|||
line 3 col 1 EOI |
|||
Diagnostics: |
|||
------------ |
|||
The following error conditions should be caught: |
|||
Empty character constant. Example: '' |
|||
Unknown escape sequence. Example: '\r' |
|||
Multi-character constant. Example: 'xx' |
|||
End-of-file in comment. Closing comment characters not found. |
|||
End-of-file while scanning string literal. Closing string character not found. |
|||
End-of-line while scanning string literal. Closing string character not found before end-of-line. |
|||
Unrecognized character. Example: | |
|||
Refer additional questions to the C and Python implementations. |
Latest revision as of 03:37, 14 August 2016
Hello, World!