User:Ed Davis: Difference between revisions
Content added Content deleted
No edit summary |
m (Replaced content with "Hello, World!") |
||
(19 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Hello, World! |
|||
{{task}}Description of the task |
|||
Lexical Analyzer |
|||
---------------- |
|||
From [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia] |
|||
Lexical analysis is the process of converting a sequence of characters (such as in a |
|||
computer program or web page) into a sequence of tokens (strings with an identified |
|||
"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer, |
|||
or scanner (though "scanner" is also used to refer to the first stage of a lexer). |
|||
==The Task== |
|||
Create a lexical analyzer for the Tiny programming language. The |
|||
program should read input from a file and/or stdin, and write |
|||
output to a file and/or stdout. |
|||
==Specification== |
|||
===Operators=== |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Common name !! Name |
|||
|- |
|||
| '*' || multiply || Mul |
|||
|- |
|||
| '/' || divide || Div |
|||
|- |
|||
| '+' || plus || Add |
|||
|- |
|||
| '-' || minus and unary minus || Sub and Uminus |
|||
|- |
|||
| '<' || less than || Lss |
|||
|- |
|||
| '<=' || less than or equal || Leq |
|||
|- |
|||
| '>' || greater than || Gtr |
|||
|- |
|||
| '!=' || not equal || Neq |
|||
|- |
|||
| '=' || assign || Assign |
|||
|- |
|||
| '&&' || and || And |
|||
|} |
|||
===Symbols=== |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Common name !! Name |
|||
|- |
|||
| '(' || left parenthesis || Lparen |
|||
|- |
|||
| ')' || right parenthesis || Rparen |
|||
|- |
|||
| '{' || left brace || Lbrace |
|||
|- |
|||
| '}' || right brace || Rbrace |
|||
|- |
|||
| ';' || semi colon || Semi |
|||
|- |
|||
| ',' || comma || Comma |
|||
|} |
|||
===Keywords=== |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Name |
|||
|- |
|||
| "if" || If |
|||
|- |
|||
| "while" || While |
|||
|- |
|||
| "print" || Print |
|||
|- |
|||
| "putc" || Putc |
|||
|} |
|||
===Other entities=== |
|||
{| class="wikitable" |
|||
|- |
|||
! Characters !! Regular expression !! Name |
|||
|- |
|||
| integers || [0-9]+ || Integer |
|||
|- |
|||
| char literal || 'x' || Integer |
|||
|- |
|||
| identifiers || [_a-zA-Z][_a-zA-Z0-9]+ || Ident |
|||
|- |
|||
| string literal || ".*" || String |
|||
|} |
|||
Notes: For char literals, '\n' is supported as a new line |
|||
character. To represent \, use: '\\'. \n may also be used in |
|||
Strings, to print a newline. No other special sequences are |
|||
supported. |
|||
'''Comments''' /* ... */ (multi-line) |
|||
====Complete list of token names==== |
|||
'''EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen, |
|||
Uminus, Mul, Div, Add, Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma, |
|||
Assign, Integerk, Stringk, Ident''' |
|||
==Program output== |
|||
Output of the program should be the line and column where the |
|||
found token starts, followed by the Token name. For tokens |
|||
Integer, Ident and String, the Integer, identifier, or string |
|||
should follow. |
|||
===Test Cases=== |
|||
<lang c> |
|||
/* |
|||
Hello world |
|||
*/ |
|||
print("Hello, World!\n"); |
|||
</lang> |
|||
===Output=== |
|||
{| class="wikitable" |
|||
|- |
|||
| line || 4 || col || 1 || Print || |
|||
|- |
|||
| line || 4 || col || 6 || Lparen || |
|||
|- |
|||
| line || 4 || col || 7 || String || "Hello, World!\n" |
|||
|- |
|||
| line || 4 || col || 24 || Rparen || |
|||
|- |
|||
| line || 4 || col || 25 || Semi || |
|||
|- |
|||
| line || 5 || col || 1 || EOI || |
|||
|} |
|||
<lang c> |
|||
/* |
|||
Show Ident and Integers |
|||
*/ |
|||
phoenix_number = 142857; |
|||
print(phoenix_number, "\n"); |
|||
</lang> |
|||
===Output=== |
|||
{| class="wikitable" |
|||
|- |
|||
| line || 1 || col || 1 || Ident || phoenix_number |
|||
|- |
|||
| line || 1 || col || 16 || Assign || |
|||
|- |
|||
| line || 1 || col || 18 || Integer || 142857 |
|||
|- |
|||
| line || 1 || col || 24 || Semi || |
|||
|- |
|||
| line || 2 || col || 1 || Print || |
|||
|- |
|||
| line || 2 || col || 6 || Lparen || |
|||
|- |
|||
| line || 2 || col || 7 || Ident || phoenix_number |
|||
|- |
|||
| line || 2 || col || 21 || Comma || |
|||
|- |
|||
| line || 2 || col || 23 || String || "\n" |
|||
|- |
|||
| line || 2 || col || 27 || Rparen || |
|||
|- |
|||
| line || 2 || col || 28 || Semi || |
|||
|- |
|||
| line || 3 || col || 1 || EOI || |
|||
|} |
|||
==Diagnostics== |
|||
The following error conditions should be caught: |
|||
* Empty character constant. Example: '' |
|||
* Unknown escape sequence. Example: '\r' |
|||
* Multi-character constant. Example: 'xx' |
|||
* End-of-file in comment. Closing comment characters not found. |
|||
* End-of-file while scanning string literal. Closing string character not found. |
|||
* End-of-line while scanning string literal. Closing string character not found before end-of-line. |
|||
* Unrecognized character. Example: | |
|||
Refer additional questions to the C and Python implementations. |
Latest revision as of 03:37, 14 August 2016
Hello, World!