Compiler/Verifying syntax: Difference between revisions

m
m (→‎{{header|Phix}}: added syntax colouring the hard way)
m (→‎{{header|Wren}}: Minor tidy)
 
(7 intermediate revisions by 3 users not shown)
Line 47:
=={{header|ALGOL W}}==
Includes the test cases from the Go sample. Note, strings are limited to 256 characters in Algol W.
<langsyntaxhighlight lang="algolw">begin
% verify expressions match expected syntax %
procedure stmt ( string(256) value text ) ; begin
Line 224:
stmt( "j & k" );
stmt( "l or _m" )
end.</langsyntaxhighlight>
{{out}}
<pre>
Line 263:
 
=={{header|C}}==
<langsyntaxhighlight Clang="c">// cverifyingsyntaxrosetta.c
// http://www.rosettacode.org/wiki/Compiler/_Verifying_Syntax
 
Line 367:
{
for( int i = 0; i < sizeof(tests)/sizeof(*tests); i++ ) parse(tests[i]);
}</langsyntaxhighlight>
{{out}}
<pre>
Line 418:
 
In particular, after substitutions, "= not", "+ not" etc. would be allowed by the Go parser so we need to exclude them. Curiously, the Go parser allows something like "2 < 3 < 4" even though it doesn't compile. We need therefore to exclude that also (see Talk page).
<langsyntaxhighlight lang="go">package main
 
import (
Line 519:
fmt.Println()
}
}</langsyntaxhighlight>
 
{{out}}
Line 622:
"false" -> identifier cannot begin with an underscore
</pre>
 
=={{header|jq}}==
{{works with|jq}}
'''Also works with gojq, the Go implementation of jq'''
 
This entry uses the PEG (Parsing Expression Grammar) formalism
to transform the given grammar to a jq verification program
by a simple process that amounts to transcription.
 
For example, in the rule for `primary`, the alternation
 
Identifier | Integer
 
becomes the jq expression:
 
Identifer // Integer
 
The transcription process is not completely trivial as
jq requires definitions be ordered and perhaps nested
to satisfy a "define-before-use" rule. In the present case,
since `primary` and `expr` are defined circularly,
we define `primary` as an inner function of `expr`.
 
This PEG-to-jq transcription process is
described in detail at
[https://github.com/stedolan/jq/wiki/Parsing-Expression-Grammars].
 
The following presentation uses jq's support for regular expressions
for the sake of simplicity, brevity and efficiency.
For example, the grammar rule:
 
Digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
 
becomes the jq program:
 
def Digit: parse("[0-9]");
 
where `parse` is a utility function defined in the library
of PEG-oriented functions in the first subsection below.
 
The jq program presented here works on character strings and textual files, and hence
the use of `ws` (for whitespace) in the program. Since `ws` is defined here
to make whitespace optional, it might be desirable to modify the program
to require whitespace (e.g. using the utility function `_`) in certain places instead.
</syntaxhighlight>
====Generic PEG Library====
The jq module at [[:Category:Jq/peg.jq]] can be included by copying it to a file,
and adding an `include` statement to top of the main program, e.g. as follows:
<syntaxhighlight lang=jq>
include "peg" {search: "."};
</syntaxhighlight>
 
====The Grammar====
<syntaxhighlight lang=jq>
def expr:
def Digit : parse("[0-9]");
def Letter : parse("[a-zA-Z]");
def Identifier : Letter | star(Letter // Digit // literal("_"));
def Integer : plus(Digit);
def primary : ws
| (Identifier
// Integer
// (literal("(") | expr | literal(")"))
// literal("true")
// literal("false"))
| ws;
def expr_level_6 : primary | star((literal("*") // literal("/")) | primary) ;
def expr_level_5 : expr_level_6 | star((literal("+") // literal("-")) | expr_level_6) ;
def expr_level_4 : ws | optional(literal("not")) | expr_level_5 | optional(parse("[=<]") | expr_level_5) ;
def expr_level_3 : expr_level_4 | star(literal("and") | expr_level_4) ;
def expr_level_2 : expr_level_3 | star(literal("or") | expr_level_3) ;
 
ws | expr_level_2 | ws;
 
def stmt:
{remainder: .} | expr | eos;
</syntaxhighlight>
====The Go examples====
<syntaxhighlight lang=jq>
def go: [
"$",
"one",
"either or both",
"a + 1",
"a + b < c",
"a = b",
"a or b = c",
"3 + not 5",
"3 + (not 5)",
"(42 + 3",
"(42 + 3)",
" not 3 < 4 or (true or 3 / 4 + 8 * 5 - 5 * 2 < 56) and 4 * 3 < 12 or not true",
" and 3 < 2",
"not 7 < 2",
"2 < 3 < 4",
"2 < (3 < 4)",
"2 < foobar - 3 < 4",
"2 < foobar and 3 < 4",
"4 * (32 - 16) + 9 = 73",
"235 76 + 1",
"true or false = not true",
"true or false = (not true)",
"not true or false = false",
"not true = false",
"a + b = not c and false",
"a + b = (not c) and false",
"a + b = (not c and false)",
"ab_c / bd2 or < e_f7",
"g not = h",
"été = false",
"i++",
"j & k",
"l or _m"
];
 
# For ease of comparison with the Go output, simply emit `true` or `false`
go[]
| (stmt | true) // false
</syntaxhighlight>
 
'''Invocation''': jq -nr -f compiler-verifying-syntax.jq
{{output}}
The same sequence of `true` and `false` values as at [[#Go|Go]].
 
=={{header|Julia}}==
<langsyntaxhighlight lang="julia">function substituteinnerparentheses(s, subs)
((i = findlast('(', s)) == nothing) && return (s, false)
((j = findfirst(')', s[i:end])) == nothing) && return (s, false)
Line 676 ⟶ 804:
println("The compiler parses the statement { $s } and outputs: ", okparse(s))
end
</langsyntaxhighlight>{{out}}
<pre>
The compiler parses the statement { not 3 < 4 or (true or 3 / 4 + 8 * 5 - 5 * 2 < 56) and 4 * 3 < 12 or not true } and outputs: true
Line 689 ⟶ 817:
 
=={{header|Nim}}==
<langsyntaxhighlight Nimlang="nim">import strutils, tables
 
type
Line 903 ⟶ 1,031:
let ok = checkStmt(lex)
echo test, " → ", ok
if not ok: echo "*** Error at position $1. $2 ".format(lex.pos, lex.error)</langsyntaxhighlight>
 
{{out}}
Line 962 ⟶ 1,090:
Added 'not' and non-assoc fixes.
Cooler output.
<langsyntaxhighlight lang="perl">#!/usr/bin/perl
 
use strict; # http://www.rosettacode.org/wiki/Compiler/_Verifying_Syntax
Line 1,019 ⟶ 1,147:
j & k
l or _m
UPPER_cAsE_aNd_letter_and_12345_test</langsyntaxhighlight>
{{out}}
<pre>
Line 1,059 ⟶ 1,187:
 
=={{header|Phix}}==
<!--<langsyntaxhighlight Phixlang="phix">(phixonline)-->
<span style="color: #000080;font-style:italic;">-- demo\rosetta\Compiler\Verify_Syntax.exw</span>
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
Line 1,252 ⟶ 1,380:
<span style="color: #0000FF;">?</span><span style="color: #008000;">"done"</span>
<span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">wait_key</span><span style="color: #0000FF;">()</span>
<!--</langsyntaxhighlight>-->
{{out}}
Note that "= not c" fails, whereas "= (not c)" passes, see talk page. (Arguably the task definition should be fixed.)
Line 1,293 ⟶ 1,421:
=={{header|Raku}}==
Format of task grammar is changed from EBNF to ABNF to cater for the Grammar::ABNF module and testing data is taken from the Perl entry.
<syntaxhighlight lang="raku" perl6line># 20200511 Raku programming solution
 
use Grammar::ABNF;
Line 1,344 ⟶ 1,472:
DATA
 
say $g.parse($_).Bool, "\t", $_ for DATA.lines</langsyntaxhighlight>
{{out}}
<pre>False 3 + not 5
Line 1,367 ⟶ 1,495:
False l or _m
True UPPER_cAsE_aNd_letter_and_12345_test
</pre>
 
=={{header|Wren}}==
{{trans|Nim}}
{{libheader|Wren-dynamic}}
{{libheader|Wren-str}}
<syntaxhighlight lang="wren">import "./dynamic" for Enum
import "./str" for Char
 
var Token = Enum.create(
"Token", ["error", "ident", "int", "lpar", "rpar", "False", "True",
"lt", "eq", "add", "sub", "mul", "div", "or", "and", "not", "eof"]
)
 
var Token2Text = {
Token.error: "invalid token", Token.ident: "identifier", Token.int: "integer",
Token.lpar: "'('", Token.rpar: "')'", Token.False: "'false'", Token.True: "'true'",
Token.lt: "'<'", Token.eq: "'='", Token.add: "'+'", Token.sub: "'-'",
Token.mul: "'*'", Token.div: "'/'", Token.or: "'or'", Token.and: "'and'",
Token.not: "'not'", Token.eof: "EOF"
}
 
var IdentTokens = {
"false": Token.False, "true": Token.True, "or": Token.or,
"and": Token.and, "not": Token.not
}
 
var CharTokens = {
"(": Token.lpar, ")": Token.rpar, "<": Token.lt, "=": Token.eq,
"+": Token.add, "-": Token.sub, "*": Token.mul, "/": Token.div
}
 
var IsIdentChar = Fn.new { |c| Char.isAsciiAlphaNum(c) || c == "_" }
 
class Lexer {
static init(s) {
var lex = Lexer.new(s, s.count, 0, "", "")
lex.nextToken
return lex
}
 
construct new(str, len, pos, token, error) {
_str = str // string to parse
_len = len // string length
_pos = pos // current lexer position
_token = token // current token
_error = error // error message
}
 
// property getters required
pos { _pos }
error { _error }
 
// get the token for an identifier
getIdToken {
var s = ""
while (_pos < _len && IsIdentChar.call(_str[_pos])) {
s = s + _str[_pos]
_pos = _pos + 1
}
_token = IdentTokens.containsKey(s) ? IdentTokens[s] : Token.ident
}
 
// get an integer token
getInt {
while (_pos < _len && Char.isDigit(_str[_pos])) _pos = _pos + 1
_token = Token.int
}
 
// find the next token
nextToken {
// skip spaces
while (_pos < _len && _str[_pos] == " ") _pos = _pos + 1
if (_pos == _len) {
_token = Token.eof
} else {
var ch = _str[_pos]
if (Char.isAsciiLower(ch)) {
getIdToken
} else if (Char.isDigit(ch)) {
getInt
} else {
_pos = _pos + 1
_token = CharTokens.containsKey(ch) ? CharTokens[ch] : Token.error
}
}
}
 
// check validity of a primary
checkPrimary {
if ([Token.ident, Token.int, Token.False, Token.True].contains(_token)) {
nextToken
return true
} else if (_token == Token.lpar) {
nextToken
if (!checkExpr) return false
if (_token != Token.rpar) {
_error = "Encountered %(Token2Text[_token]); expected ')'"
return false
} else {
nextToken
return true
}
} else {
_error = "Encountered %(Token2Text[_token]); expected identifier, literal or '('"
return false
}
}
 
// check validity of an expr6
checkExpr6 {
if (!checkPrimary) return false
while ([Token.mul, Token.div].contains(_token)) {
nextToken
if (!checkPrimary) return false
}
return true
}
 
// check validity of an expr5
checkExpr5 {
if (!checkExpr6) return false
while ([Token.add, Token.sub].contains(_token)) {
nextToken
if (!checkExpr6) return false
}
return true
}
 
// check validity of an expr4
checkExpr4 {
if (_token == Token.not) nextToken
if (!checkExpr5) return false
if ([Token.lt, Token.eq].contains(_token)) {
nextToken
if (!checkExpr5) return false
}
return true
}
 
// check validity of an expr3
checkExpr3 {
if (!checkExpr4) return false
while (_token == Token.and) {
nextToken
if (!checkExpr4) return false
}
return true
}
 
// check validity of an expr2
checkExpr2 {
if (!checkExpr3) return false
while (_token == Token.or) {
nextToken
if (!checkExpr3) return false
}
return true
}
 
// check validity of an expr
checkExpr { checkExpr2 }
 
// check validity of a statement
checkStmt {
var result = checkExpr
if (result && _pos < _len) {
_error = "Extra characters at end of statement."
result = false
}
return result
}
}
 
// using test set from Algol68 version
 
var tests = [
"wombat",
"wombat or monotreme",
"( wombat and not )",
"wombat or not",
"a + 1",
"a + b < c",
"a + b - c * d / e < f and not ( g = h )",
"a + b - c * d / e < f and not ( g = h",
"a = b",
"a or b = c",
"$",
"true or false = not true",
"not true = false",
"3 + not 5",
"3 + (not 5)",
"(42 + 3",
" not 3 < 4 or (true or 3 / 4 + 8 * 5 - 5 * 2 < 56) and 4 * 3 < 12 or not true",
" and 3 < 2",
"not 7 < 2",
"2 < 3 < 4",
"2 < foobar - 3 < 4",
"2 < foobar and 3 < 4",
"4 * (32 - 16) + 9 = 73",
"235 76 + 1",
"a + b = not c and false",
"a + b = (not c) and false",
"a + b = (not c and false)",
"ab_c / bd2 or < e_f7",
"g not = h",
"été = false",
"i++",
"j & k",
"l or _m"
]
 
for (test in tests) {
var lex = Lexer.init(test)
var ok = lex.checkStmt
System.print("%(test) -> %(ok)")
if (!ok) {
System.print("*** Error at position %(lex.pos). %(lex.error)\n")
}
}</syntaxhighlight>
 
{{out}}
<pre>
wombat -> true
wombat or monotreme -> true
( wombat and not ) -> false
*** Error at position 18. Encountered ')'; expected identifier, literal or '('
 
wombat or not -> false
*** Error at position 13. Encountered EOF; expected identifier, literal or '('
 
a + 1 -> true
a + b < c -> true
a + b - c * d / e < f and not ( g = h ) -> true
a + b - c * d / e < f and not ( g = h -> false
*** Error at position 37. Encountered EOF; expected ')'
 
a = b -> true
a or b = c -> true
$ -> false
*** Error at position 1. Encountered invalid token; expected identifier, literal or '('
 
true or false = not true -> false
*** Error at position 19. Encountered 'not'; expected identifier, literal or '('
 
not true = false -> true
3 + not 5 -> false
*** Error at position 7. Encountered 'not'; expected identifier, literal or '('
 
3 + (not 5) -> true
(42 + 3 -> false
*** Error at position 7. Encountered EOF; expected ')'
 
not 3 < 4 or (true or 3 / 4 + 8 * 5 - 5 * 2 < 56) and 4 * 3 < 12 or not true -> true
and 3 < 2 -> false
*** Error at position 4. Encountered 'and'; expected identifier, literal or '('
 
not 7 < 2 -> true
2 < 3 < 4 -> false
*** Error at position 7. Extra characters at end of statement.
 
2 < foobar - 3 < 4 -> false
*** Error at position 16. Extra characters at end of statement.
 
2 < foobar and 3 < 4 -> true
4 * (32 - 16) + 9 = 73 -> true
235 76 + 1 -> false
*** Error at position 6. Extra characters at end of statement.
 
a + b = not c and false -> false
*** Error at position 11. Encountered 'not'; expected identifier, literal or '('
 
a + b = (not c) and false -> true
a + b = (not c and false) -> true
ab_c / bd2 or < e_f7 -> false
*** Error at position 15. Encountered '<'; expected identifier, literal or '('
 
g not = h -> false
*** Error at position 5. Extra characters at end of statement.
 
été = false -> false
*** Error at position 1. Encountered invalid token; expected identifier, literal or '('
 
i++ -> false
*** Error at position 3. Encountered '+'; expected identifier, literal or '('
 
j & k -> false
*** Error at position 3. Extra characters at end of statement.
 
l or _m -> false
*** Error at position 6. Encountered invalid token; expected identifier, literal or '('
</pre>
9,482

edits