User:Ed Davis: Difference between revisions

Content added Content deleted

Inline

@@ Line 1: / Line 1: @@
+Lexical Analyzer
-Hello, World!
+----------------
+From Wikipedia: (https://en.wikipedia.org/wiki/Lexical_analysis)
+Lexical analysis is the process of converting a sequence of characters (such as in a
+computer program or web page) into a sequence of tokens (strings with an identified
+"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer,[1]
+or scanner (though "scanner" is also used to refer to the first stage of a lexer).
+The Task
+--------
+Create a lexical analyzer for the Tiny programming language.
+Specification
+-------------
+{| class="wikitable"
+|-
+! Characters  !! Regular expression !! Name
+|-
+| integers       || [0-9]+                 || Integer
+|-
+| char literal   || 'x'                    || Integer
+|-
+| identifiers    || [_a-zA-Z][_a-zA-Z0-9]+ || Ident
+|-
+| string literal || ".*"                   || String
+|}
+Notes: For char literals, '\n' is supported as a new line
+character. To represent \, use: '\\'.  \n may also be used in
+Strings, to print a newline. No other special sequences are
+supported.
+operators:
+'*'          multiply                Mul
+'/'          divide                  Div
+'+'          plus                    Add
+'-'          minus and unary minus   Sub and Uminus
+'<'          less than               Lss
+'<='         less than or equal      Leq
+'>'          greater than            Gtr
+'!='         not equal               Neq
+'='          assign                  Assign
+'&&'         and                     And
+symbols:
+'('          left parenthesis        Lparen
+')'          right parenthesis       Rparen
+'{'          left brace              Lbrace
+'}'          right brace             Rbrace
+';'          semi colon              Semi
+','          comma                   Comma
+keywords:
+"if"                                 If
+"while"                              While
+"print"                              Print
+"putc"                               Putc
+comments:    /* ... */   (multi-line)
+Complete list of token types:
+EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen, Uminus, Mul, Div, Add,
+Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma, Assign, Integerk, Stringk, Ident
+Output of the program should be the line and column where the
+found token starts, followed by the Token name.  For tokens
+Integer, Ident and String, the Integer, identifier, or string
+should follow.
+Test Cases
+----------
+/*
+  Hello world
+ */
+print("Hello, World!\n");
+Output
+------
+line     4  col     1 Print
+line     4  col     6 Lparen
+line     4  col     7 String   "Hello, World!\n"
+line     4  col    24 Rparen
+line     4  col    25 Semi
+line     5  col     1 EOI
+/*
+  Show Ident and Integers
+ */
+phoenix_number = 142857;
+print(phoenix_number, "\n");
+Output
+------
+line     1  col     1 Ident    phoenix_number
+line     1  col    16 Assign
+line     1  col    18 Integer     142857
+line     1  col    24 Semi
+line     2  col     1 Print
+line     2  col     6 Lparen
+line     2  col     7 Ident    phoenix_number
+line     2  col    21 Comma
+line     2  col    23 String   "\n"
+line     2  col    27 Rparen
+line     2  col    28 Semi
+line     3  col     1 EOI
+Diagnostics:
+------------
+The following error conditions should be caught:
+Empty character constant.   Example: ''
+Unknown escape sequence.    Example: '\r'
+Multi-character constant.   Example: 'xx'
+End-of-file in comment.    Closing comment characters not found.
+End-of-file while scanning string literal. Closing string character not found.
+End-of-line while scanning string literal. Closing string character not found before end-of-line.
+Unrecognized character.     Example: |
+Refer additional questions to the C and Python implementations.