User:Grondilu/Perl6-C-Grammar

From Rosetta Code

<lang perl6> grammar C { =begin quote

   The grammar has undefined terminal symbols integer-constant, character-constant, floating-
   constant, identifier, string, and enumeration-constant; the typewriter style words and
   symbols are terminals given literally. This grammar can be transformed mechanically into
   input acceptable for an automatic parser-generator. Besides adding whatever syntactic
   marking is used to indicate alternatives in productions, it is necessary to expand the ``one of
   constructions, and (depending on the rules of the parser-generator) to duplicate each
   production with an opt symbol, once with the symbol and once without. With one further
   change, namely deleting the production typedef-name: identifier and making typedef-name a
   terminal symbol, this grammar is acceptable to the YACC parser-generator. It has only one
   conflict, generated by the if-else ambiguity.

=end quote

   # keywords
   token keyword {

< auto break case char const continue default do double else enum extern float for goto if int struct long switch register typedef return union short unsigned signed void sizeof volatile static while >

   }
   # constants

=begin quote

   An integer constant consisting of a sequence of digits is taken to be octal if it begins with 0
   (digit zero), decimal otherwise. Octal constants do not contain the digits 8 or 9. A sequence of
   digits preceded by 0x or 0X (digit zero) is taken to be a hexadecimal integer. The hexadecimal
   digits include a or A through f or F with values 10 through 15.
   An integer constant may be suffixed by the letter u or U, to specify that it is unsigned. It may
   also be suffixed by the letter l or L to specify that it is long.

=end quote

   token integer-constant {

<[uU]>? [ 0 <[0..7]>+ | 0<[xX]> <xdigit>+ | <digit>+ ] <digit>+ <[lL]>?

   }

=begin quote

   A character constant is a sequence of one or more characters enclosed in single quotes as in
   'x'. The value of a character constant with only one character is the numeric value of the
   character in the machine's character set at execution time. The value of a multi-character
   constant is implementation-defined.
   Character constants do not contain the ' character or newlines; in order to represent them, and
   certain other characters, the following escape sequences may be used:
   newline           NL (LF) \n backslash      \    \\
   horizontal tab    HT      \t question mark  ?    \?
   vertical tab      VT      \v single quote   '    \'
   backspace         BS      \b double quote   "    \"
   carriage return   CR      \r octal number   ooo  \ooo
   formfeed          FF      \f hex number     hh   \xh
   audible alert     BEL     \a 

=end quote

   token character-constant {

\' [ <![ ? \ ' " ]> | < \n \t \v \b \r \f \a \\ \? \' \" > | \\ <[0..7]> ** 1..3 | '\x' <xdigit>* ]* \'

   }
   token floating-constant {...}
   token identifier { <ident> }
   token string {...}
   token enumeration-constant {...}

} </lang>