User:Ed Davis: Difference between revisions
Content added Content deleted
(Created page with "Hello, World!") |
No edit summary |
||
Line 1: | Line 1: | ||
Lexical Analyzer |
|||
Hello, World! |
|||
---------------- |
|||
From Wikipedia: (https://en.wikipedia.org/wiki/Lexical_analysis) |
|||
Lexical analysis is the process of converting a sequence of characters (such as in a |
|||
computer program or web page) into a sequence of tokens (strings with an identified |
|||
"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer,[1] |
|||
or scanner (though "scanner" is also used to refer to the first stage of a lexer). |
|||
The Task |
|||
-------- |
|||
Create a lexical analyzer for the Tiny programming language. |
|||
Specification |
|||
------------- |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Regular expression !! Name |
|||
|- |
|||
| integers || [0-9]+ || Integer |
|||
|- |
|||
| char literal || 'x' || Integer |
|||
|- |
|||
| identifiers || [_a-zA-Z][_a-zA-Z0-9]+ || Ident |
|||
|- |
|||
| string literal || ".*" || String |
|||
|} |
|||
Notes: For char literals, '\n' is supported as a new line |
|||
character. To represent \, use: '\\'. \n may also be used in |
|||
Strings, to print a newline. No other special sequences are |
|||
supported. |
|||
operators: |
|||
'*' multiply Mul |
|||
'/' divide Div |
|||
'+' plus Add |
|||
'-' minus and unary minus Sub and Uminus |
|||
'<' less than Lss |
|||
'<=' less than or equal Leq |
|||
'>' greater than Gtr |
|||
'!=' not equal Neq |
|||
'=' assign Assign |
|||
'&&' and And |
|||
symbols: |
|||
'(' left parenthesis Lparen |
|||
')' right parenthesis Rparen |
|||
'{' left brace Lbrace |
|||
'}' right brace Rbrace |
|||
';' semi colon Semi |
|||
',' comma Comma |
|||
keywords: |
|||
"if" If |
|||
"while" While |
|||
"print" Print |
|||
"putc" Putc |
|||
comments: /* ... */ (multi-line) |
|||
Complete list of token types: |
|||
EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen, Uminus, Mul, Div, Add, |
|||
Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma, Assign, Integerk, Stringk, Ident |
|||
Output of the program should be the line and column where the |
|||
found token starts, followed by the Token name. For tokens |
|||
Integer, Ident and String, the Integer, identifier, or string |
|||
should follow. |
|||
Test Cases |
|||
---------- |
|||
/* |
|||
Hello world |
|||
*/ |
|||
print("Hello, World!\n"); |
|||
Output |
|||
------ |
|||
line 4 col 1 Print |
|||
line 4 col 6 Lparen |
|||
line 4 col 7 String "Hello, World!\n" |
|||
line 4 col 24 Rparen |
|||
line 4 col 25 Semi |
|||
line 5 col 1 EOI |
|||
/* |
|||
Show Ident and Integers |
|||
*/ |
|||
phoenix_number = 142857; |
|||
print(phoenix_number, "\n"); |
|||
Output |
|||
------ |
|||
line 1 col 1 Ident phoenix_number |
|||
line 1 col 16 Assign |
|||
line 1 col 18 Integer 142857 |
|||
line 1 col 24 Semi |
|||
line 2 col 1 Print |
|||
line 2 col 6 Lparen |
|||
line 2 col 7 Ident phoenix_number |
|||
line 2 col 21 Comma |
|||
line 2 col 23 String "\n" |
|||
line 2 col 27 Rparen |
|||
line 2 col 28 Semi |
|||
line 3 col 1 EOI |
|||
Diagnostics: |
|||
------------ |
|||
The following error conditions should be caught: |
|||
Empty character constant. Example: '' |
|||
Unknown escape sequence. Example: '\r' |
|||
Multi-character constant. Example: 'xx' |
|||
End-of-file in comment. Closing comment characters not found. |
|||
End-of-file while scanning string literal. Closing string character not found. |
|||
End-of-line while scanning string literal. Closing string character not found before end-of-line. |
|||
Unrecognized character. Example: | |
|||
Refer additional questions to the C and Python implementations. |