User:Grondilu/Perl6-C-Grammar
<lang perl6> grammar C { =begin quote
The grammar has undefined terminal symbols integer-constant, character-constant, floating- constant, identifier, string, and enumeration-constant; the typewriter style words and symbols are terminals given literally. This grammar can be transformed mechanically into input acceptable for an automatic parser-generator. Besides adding whatever syntactic marking is used to indicate alternatives in productions, it is necessary to expand the ``one of constructions, and (depending on the rules of the parser-generator) to duplicate each production with an opt symbol, once with the symbol and once without. With one further change, namely deleting the production typedef-name: identifier and making typedef-name a terminal symbol, this grammar is acceptable to the YACC parser-generator. It has only one conflict, generated by the if-else ambiguity.
=end quote
# keywords token keyword {
< auto break case char const continue default do double else enum extern float for goto if int struct long switch register typedef return union short unsigned signed void sizeof volatile static while >
}
# constants
=begin quote
An integer constant consisting of a sequence of digits is taken to be octal if it begins with 0 (digit zero), decimal otherwise. Octal constants do not contain the digits 8 or 9. A sequence of digits preceded by 0x or 0X (digit zero) is taken to be a hexadecimal integer. The hexadecimal digits include a or A through f or F with values 10 through 15. An integer constant may be suffixed by the letter u or U, to specify that it is unsigned. It may also be suffixed by the letter l or L to specify that it is long.
=end quote
token integer-constant {
<[uU]>? [ 0 <[0..7]>+ | 0<[xX]> <xdigit>+ | <digit>+ ] <digit>+ <[lL]>?
}
=begin quote
A character constant is a sequence of one or more characters enclosed in single quotes as in 'x'. The value of a character constant with only one character is the numeric value of the character in the machine's character set at execution time. The value of a multi-character constant is implementation-defined. Character constants do not contain the ' character or newlines; in order to represent them, and certain other characters, the following escape sequences may be used:
newline NL (LF) \n backslash \ \\ horizontal tab HT \t question mark ? \? vertical tab VT \v single quote ' \' backspace BS \b double quote " \" carriage return CR \r octal number ooo \ooo formfeed FF \f hex number hh \xh audible alert BEL \a
=end quote
token character-constant {
\' [ <![ ? \ ' " ]> | < \n \t \v \b \r \f \a \\ \? \' \" > | \\ <[0..7]> ** 1..3 | '\x' <xdigit>* ]* \'
} token floating-constant {...}
token identifier { <ident> } token string {...} token enumeration-constant {...}
} </lang>