Compiler/Verifying syntax: Difference between revisions

← Older edit

Compiler/Verifying syntax (view source)

Revision as of 16:18, 20 November 2023

12,390 bytes added , 6 months ago

m

→‎{{header|Wren}}: Minor tidy

PureFox

9,482

edits

Revision as of 14:00, 25 May 2021 (view source) Petelomax (talk \| contribs) m (→‎{{header\|Phix}}: added syntax colouring the hard way) ← Older edit		Latest revision as of 16:18, 20 November 2023 (view source) PureFox (talk \| contribs) m (→‎{{header\|Wren}}: Minor tidy)
(7 intermediate revisions by 3 users not shown)
Line 47: =={{header\|ALGOL W}}== Includes the test cases from the Go sample. Note, strings are limited to 256 characters in Algol W. <~~lang~~syntaxhighlight lang="algolw">begin % verify expressions match expected syntax % procedure stmt ( string(256) value text ) ; begin Line 224: stmt( "j & k" ); stmt( "l or _m" ) end.</~~lang~~syntaxhighlight> {{out}} <pre> Line 263: =={{header\|C}}== <~~lang~~syntaxhighlight Clang="c">// cverifyingsyntaxrosetta.c // http://www.rosettacode.org/wiki/Compiler/_Verifying_Syntax Line 367: { for( int i = 0; i < sizeof(tests)/sizeof(tests); i++ ) parse(tests[i]); }</~~lang~~syntaxhighlight> {{out}} <pre> Line 418: In particular, after substitutions, "= not", "+ not" etc. would be allowed by the Go parser so we need to exclude them. Curiously, the Go parser allows something like "2 < 3 < 4" even though it doesn't compile. We need therefore to exclude that also (see Talk page). <~~lang~~syntaxhighlight lang="go">package main import ( Line 519: fmt.Println() } }</~~lang~~syntaxhighlight> {{out}} Line 622: "false" -> identifier cannot begin with an underscore </pre> =={{header\|jq}}== {{works with\|jq}} '''Also works with gojq, the Go implementation of jq''' This entry uses the PEG (Parsing Expression Grammar) formalism to transform the given grammar to a jq verification program by a simple process that amounts to transcription. For example, in the rule for `primary`, the alternation Identifier \| Integer becomes the jq expression: Identifer // Integer The transcription process is not completely trivial as jq requires definitions be ordered and perhaps nested to satisfy a "define-before-use" rule. In the present case, since `primary` and `expr` are defined circularly, we define `primary` as an inner function of `expr`. This PEG-to-jq transcription process is described in detail at [https://github.com/stedolan/jq/wiki/Parsing-Expression-Grammars]. The following presentation uses jq's support for regular expressions for the sake of simplicity, brevity and efficiency. For example, the grammar rule: Digit = "0" \| "1" \| "2" \| "3" \| "4" \| "5" \| "6" \| "7" \| "8" \| "9" ; becomes the jq program: def Digit: parse("[0-9]"); where `parse` is a utility function defined in the library of PEG-oriented functions in the first subsection below. The jq program presented here works on character strings and textual files, and hence the use of `ws` (for whitespace) in the program. Since `ws` is defined here to make whitespace optional, it might be desirable to modify the program to require whitespace (e.g. using the utility function `_`) in certain places instead. </syntaxhighlight> ====Generic PEG Library==== The jq module at [[:Category:Jq/peg.jq]] can be included by copying it to a file, and adding an `include` statement to top of the main program, e.g. as follows: <syntaxhighlight lang=jq> include "peg" {search: "."}; </syntaxhighlight> ====The Grammar==== <syntaxhighlight lang=jq> def expr: def Digit : parse("[0-9]"); def Letter : parse("[a-zA-Z]"); def Identifier : Letter \| star(Letter // Digit // literal("_")); def Integer : plus(Digit); def primary : ws \| (Identifier // Integer // (literal("(") \| expr \| literal(")")) // literal("true") // literal("false")) \| ws; def expr_level_6 : primary \| star((literal("") // literal("/")) \| primary) ; def expr_level_5 : expr_level_6 \| star((literal("+") // literal("-")) \| expr_level_6) ; def expr_level_4 : ws \| optional(literal("not")) \| expr_level_5 \| optional(parse("[=<]") \| expr_level_5) ; def expr_level_3 : expr_level_4 \| star(literal("and") \| expr_level_4) ; def expr_level_2 : expr_level_3 \| star(literal("or") \| expr_level_3) ; ws \| expr_level_2 \| ws; def stmt: {remainder: .} \| expr \| eos; </syntaxhighlight> ====The Go examples==== <syntaxhighlight lang=jq> def go: [ "$", "one", "either or both", "a + 1", "a + b < c", "a = b", "a or b = c", "3 + not 5", "3 + (not 5)", "(42 + 3", "(42 + 3)", " not 3 < 4 or (true or 3 / 4 + 8 * 5 - 5 * 2 < 56) and 4 * 3 < 12 or not true", " and 3 < 2", "not 7 < 2", "2 < 3 < 4", "2 < (3 < 4)", "2 < foobar - 3 < 4", "2 < foobar and 3 < 4", "4 * (32 - 16) + 9 = 73", "235 76 + 1", "true or false = not true", "true or false = (not true)", "not true or false = false", "not true = false", "a + b = not c and false", "a + b = (not c) and false", "a + b = (not c and false)", "ab_c / bd2 or < e_f7", "g not = h", "été = false", "i++", "j & k", "l or _m" ]; # For ease of comparison with the Go output, simply emit `true` or `false` go[] \| (stmt \| true) // false </syntaxhighlight> '''Invocation''': jq -nr -f compiler-verifying-syntax.jq {{output}} The same sequence of `true` and `false` values as at [[#Go\|Go]]. =={{header\|Julia}}== <~~lang~~syntaxhighlight lang="julia">function substituteinnerparentheses(s, subs) ((i = findlast('(', s)) == nothing) && return (s, false) ((j = findfirst(')', s[i:end])) == nothing) && return (s, false) Line 676 ⟶ 804: println("The compiler parses the statement { $s } and outputs: ", okparse(s)) end </~~lang~~syntaxhighlight>{{out}} <pre> The compiler parses the statement { not 3 < 4 or (true or 3 / 4 + 8 * 5 - 5 * 2 < 56) and 4 * 3 < 12 or not true } and outputs: true Line 689 ⟶ 817: =={{header\|Nim}}== <~~lang~~syntaxhighlight ~~Nim~~lang="nim">import strutils, tables type Line 903 ⟶ 1,031: let ok = checkStmt(lex) echo test, " → ", ok if not ok: echo "*** Error at position $1. $2 ".format(lex.pos, lex.error)</~~lang~~syntaxhighlight> {{out}} Line 962 ⟶ 1,090: Added 'not' and non-assoc fixes. Cooler output. <~~lang~~syntaxhighlight lang="perl">#!/usr/bin/perl use strict; # http://www.rosettacode.org/wiki/Compiler/_Verifying_Syntax Line 1,019 ⟶ 1,147: j & k l or _m UPPER_cAsE_aNd_letter_and_12345_test</~~lang~~syntaxhighlight> {{out}} <pre> Line 1,059 ⟶ 1,187: =={{header\|Phix}}== <!--<~~lang~~syntaxhighlight ~~Phix~~lang="phix">(phixonline)--> <span style="color: #000080;font-style:italic;">-- demo\rosetta\Compiler\Verify_Syntax.exw</span> <span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span> Line 1,252 ⟶ 1,380: <span style="color: #0000FF;">?</span><span style="color: #008000;">"done"</span> <span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">wait_key</span><span style="color: #0000FF;">()</span> <!--</~~lang~~syntaxhighlight>--> {{out}} Note that "= not c" fails, whereas "= (not c)" passes, see talk page. (Arguably the task definition should be fixed.) Line 1,293 ⟶ 1,421: =={{header\|Raku}}== Format of task grammar is changed from EBNF to ABNF to cater for the Grammar::ABNF module and testing data is taken from the Perl entry. <syntaxhighlight lang="raku" ~~perl6~~line># 20200511 Raku programming solution use Grammar::ABNF; Line 1,344 ⟶ 1,472: DATA say $g.parse($_).Bool, "\t", $_ for DATA.lines</~~lang~~syntaxhighlight> {{out}} <pre>False 3 + not 5 Line 1,367 ⟶ 1,495: False l or _m True UPPER_cAsE_aNd_letter_and_12345_test </pre> =={{header\|Wren}}== {{trans\|Nim}} {{libheader\|Wren-dynamic}} {{libheader\|Wren-str}} <syntaxhighlight lang="wren">import "./dynamic" for Enum import "./str" for Char var Token = Enum.create( "Token", ["error", "ident", "int", "lpar", "rpar", "False", "True", "lt", "eq", "add", "sub", "mul", "div", "or", "and", "not", "eof"] ) var Token2Text = { Token.error: "invalid token", Token.ident: "identifier", Token.int: "integer", Token.lpar: "'('", Token.rpar: "')'", Token.False: "'false'", Token.True: "'true'", Token.lt: "'<'", Token.eq: "'='", Token.add: "'+'", Token.sub: "'-'", Token.mul: "''", Token.div: "'/'", Token.or: "'or'", Token.and: "'and'", Token.not: "'not'", Token.eof: "EOF" } var IdentTokens = { "false": Token.False, "true": Token.True, "or": Token.or, "and": Token.and, "not": Token.not } var CharTokens = { "(": Token.lpar, ")": Token.rpar, "<": Token.lt, "=": Token.eq, "+": Token.add, "-": Token.sub, "": Token.mul, "/": Token.div } var IsIdentChar = Fn.new { \|c\| Char.isAsciiAlphaNum(c) \|\| c == "_" } class Lexer { static init(s) { var lex = Lexer.new(s, s.count, 0, "", "") lex.nextToken return lex } construct new(str, len, pos, token, error) { _str = str // string to parse _len = len // string length _pos = pos // current lexer position _token = token // current token _error = error // error message } // property getters required pos { _pos } error { _error } // get the token for an identifier getIdToken { var s = "" while (_pos < _len && IsIdentChar.call(_str[_pos])) { s = s + _str[_pos] _pos = _pos + 1 } _token = IdentTokens.containsKey(s) ? IdentTokens[s] : Token.ident } // get an integer token getInt { while (_pos < _len && Char.isDigit(_str[_pos])) _pos = _pos + 1 _token = Token.int } // find the next token nextToken { // skip spaces while (_pos < _len && _str[_pos] == " ") _pos = _pos + 1 if (_pos == _len) { _token = Token.eof } else { var ch = _str[_pos] if (Char.isAsciiLower(ch)) { getIdToken } else if (Char.isDigit(ch)) { getInt } else { _pos = _pos + 1 _token = CharTokens.containsKey(ch) ? CharTokens[ch] : Token.error } } } // check validity of a primary checkPrimary { if ([Token.ident, Token.int, Token.False, Token.True].contains(_token)) { nextToken return true } else if (_token == Token.lpar) { nextToken if (!checkExpr) return false if (_token != Token.rpar) { _error = "Encountered %(Token2Text[_token]); expected ')'" return false } else { nextToken return true } } else { _error = "Encountered %(Token2Text[_token]); expected identifier, literal or '('" return false } } // check validity of an expr6 checkExpr6 { if (!checkPrimary) return false while ([Token.mul, Token.div].contains(_token)) { nextToken if (!checkPrimary) return false } return true } // check validity of an expr5 checkExpr5 { if (!checkExpr6) return false while ([Token.add, Token.sub].contains(_token)) { nextToken if (!checkExpr6) return false } return true } // check validity of an expr4 checkExpr4 { if (_token == Token.not) nextToken if (!checkExpr5) return false if ([Token.lt, Token.eq].contains(_token)) { nextToken if (!checkExpr5) return false } return true } // check validity of an expr3 checkExpr3 { if (!checkExpr4) return false while (_token == Token.and) { nextToken if (!checkExpr4) return false } return true } // check validity of an expr2 checkExpr2 { if (!checkExpr3) return false while (_token == Token.or) { nextToken if (!checkExpr3) return false } return true } // check validity of an expr checkExpr { checkExpr2 } // check validity of a statement checkStmt { var result = checkExpr if (result && _pos < _len) { _error = "Extra characters at end of statement." result = false } return result } } // using test set from Algol68 version var tests = [ "wombat", "wombat or monotreme", "( wombat and not )", "wombat or not", "a + 1", "a + b < c", "a + b - c * d / e < f and not ( g = h )", "a + b - c * d / e < f and not ( g = h", "a = b", "a or b = c", "$", "true or false = not true", "not true = false", "3 + not 5", "3 + (not 5)", "(42 + 3", " not 3 < 4 or (true or 3 / 4 + 8 * 5 - 5 * 2 < 56) and 4 * 3 < 12 or not true", " and 3 < 2", "not 7 < 2", "2 < 3 < 4", "2 < foobar - 3 < 4", "2 < foobar and 3 < 4", "4 * (32 - 16) + 9 = 73", "235 76 + 1", "a + b = not c and false", "a + b = (not c) and false", "a + b = (not c and false)", "ab_c / bd2 or < e_f7", "g not = h", "été = false", "i++", "j & k", "l or _m" ] for (test in tests) { var lex = Lexer.init(test) var ok = lex.checkStmt System.print("%(test) -> %(ok)") if (!ok) { System.print("* Error at position %(lex.pos). %(lex.error)\n") } }</syntaxhighlight> {{out}} <pre> wombat -> true wombat or monotreme -> true ( wombat and not ) -> false * Error at position 18. Encountered ')'; expected identifier, literal or '(' wombat or not -> false *** Error at position 13. Encountered EOF; expected identifier, literal or '(' a + 1 -> true a + b < c -> true a + b - c * d / e < f and not ( g = h ) -> true a + b - c * d / e < f and not ( g = h -> false * Error at position 37. Encountered EOF; expected ')' a = b -> true a or b = c -> true $ -> false * Error at position 1. Encountered invalid token; expected identifier, literal or '(' true or false = not true -> false * Error at position 19. Encountered 'not'; expected identifier, literal or '(' not true = false -> true 3 + not 5 -> false * Error at position 7. Encountered 'not'; expected identifier, literal or '(' 3 + (not 5) -> true (42 + 3 -> false *** Error at position 7. Encountered EOF; expected ')' not 3 < 4 or (true or 3 / 4 + 8 * 5 - 5 * 2 < 56) and 4 * 3 < 12 or not true -> true and 3 < 2 -> false * Error at position 4. Encountered 'and'; expected identifier, literal or '(' not 7 < 2 -> true 2 < 3 < 4 -> false * Error at position 7. Extra characters at end of statement. 2 < foobar - 3 < 4 -> false *** Error at position 16. Extra characters at end of statement. 2 < foobar and 3 < 4 -> true 4 * (32 - 16) + 9 = 73 -> true 235 76 + 1 -> false * Error at position 6. Extra characters at end of statement. a + b = not c and false -> false * Error at position 11. Encountered 'not'; expected identifier, literal or '(' a + b = (not c) and false -> true a + b = (not c and false) -> true ab_c / bd2 or < e_f7 -> false * Error at position 15. Encountered '<'; expected identifier, literal or '(' g not = h -> false * Error at position 5. Extra characters at end of statement. été = false -> false * Error at position 1. Encountered invalid token; expected identifier, literal or '(' i++ -> false * Error at position 3. Encountered '+'; expected identifier, literal or '(' j & k -> false * Error at position 3. Extra characters at end of statement. l or _m -> false * Error at position 6. Encountered invalid token; expected identifier, literal or '(' </pre>