Compiler/syntax analyzer: Difference between revisions

Line 21:

[https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form Extended Backus-Naur Form (EBNF)]:

stmt_list = {stmt} ;

Line 47:

| '(' expr ')'

| ('+' | '-' | '!') primary

;</~~lang~~>

;</syntaxhighlight>

The resulting AST should be formulated as a Binary Tree.

Line 68:

|-

| style="vertical-align:top" |

<~~lang~~ c>count = 1;

<syntaxhighlight lang="c">count = 1;

while (count < 10) {

print("count is: ", count, "\n");

count = count + 1;

}</~~lang~~>

}</syntaxhighlight>

| style="vertical-align:top" |

Line 280:

[https://en.wikipedia.org/wiki/Recursive_descent_parser Recursive Descent] for statement parsing. The AST is also built:

<~~lang~~ python>def expr(p)

<syntaxhighlight lang="python">def expr(p)

if tok is "("

x = paren_expr()

Line 364:

t = make_node(Sequence, t, stmt())

until tok is end-of-file

return t</~~lang~~>

return t</syntaxhighlight>

;Once the AST is built, it should be output in a [[Flatten_a_list|flattened format.]] This can be as simple as the following:

<~~lang~~ python>def prt_ast(t)

<syntaxhighlight lang="python">def prt_ast(t)

if t == NULL

print(";\n")

Line 378:

print("\n")

prt_ast(t.left)

prt_ast(t.right)</~~lang~~>

prt_ast(t.right)</syntaxhighlight>

;If the AST is correctly built, loading it into a subsequent program should be as simple as:

<~~lang~~ python>def load_ast()

<syntaxhighlight lang="python">def load_ast()

line = readline()

# Each line has at least one token

Line 402:

left = load_ast()

right = load_ast()

return make_node(node_type, left, right)</~~lang~~>

return make_node(node_type, left, right)</syntaxhighlight>

Finally, the AST can also be tested by running it against one of the AST Interpreter [[Compiler/AST_interpreter|solutions]].

Line 415:

|-

| style="vertical-align:top" |

<~~lang~~ c>/*

<syntaxhighlight lang="c">/*

Simple prime number generator

*/

Line 434:

}

print("Total primes found: ", count, "\n");</~~lang~~>

print("Total primes found: ", count, "\n");</syntaxhighlight>

| style="vertical-align:top" |

Line 654:

=={{header|ALGOL W}}==

<~~lang~~ algolw>begin % syntax analyser %

<syntaxhighlight lang="algolw">begin % syntax analyser %

% parse tree nodes %

record node( integer type

Line 1,041:

readToken;

writeNode( parseStatementList( tEnd_of_input ) )

end.</~~lang~~>

end.</syntaxhighlight>

Output from parsing the Prime Numbers example program.

Line 1,144:

=={{header|ATS}}==

<~~lang~~ ~~ATS~~>(********************************************************************)

<syntaxhighlight lang="ats">(********************************************************************)

(* Usage: parse [INPUTFILE [OUTPUTFILE]]

If INPUTFILE or OUTPUTFILE is "-" or missing, then standard input

Line 2,074:

end

(********************************************************************)</~~lang~~>

(********************************************************************)</syntaxhighlight>

Line 2,177:

=={{header|AWK}}==

Tested with gawk 4.1.1 and mawk 1.3.4.

function Token_assign(tk, attr, attr_array, n, i) {

n=split(attr, attr_array)

Line 2,473:

prt_ast(t)

}

</syntaxhighlight>

</lang>

Line 2,514:

=={{header|C}}==

Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra

<~~lang~~ C>#include <stdio.h>

<syntaxhighlight lang="c">#include <stdio.h>

#include <stdlib.h>

#include <string.h>

Line 2,854:

init_io(&dest_fp, stdout, "wb", argc > 2 ? argv[2] : "");

prt_ast(parse());

}</~~lang~~>

}</syntaxhighlight>

Line 2,960:

Code by Steve Williams. Tested with GnuCOBOL 2.2.

<~~lang~~ cobol> >>SOURCE FORMAT IS FREE

<syntaxhighlight lang="cobol"> >>SOURCE FORMAT IS FREE

identification division.

*> this code is dedicated to the public domain

Line 3,565:

.

end program printast.

end program parser.</~~lang~~>

end program parser.</syntaxhighlight>

Line 3,673:

<~~lang~~ lisp>#!/bin/sh

<syntaxhighlight lang="lisp">#!/bin/sh

#|-*- mode:lisp -*-|#

#|

Line 4,084:

(uiop:quit 0)))

;;; vim: set ft=lisp lisp:</~~lang~~>

;;; vim: set ft=lisp lisp:</syntaxhighlight>

Line 4,274:

=={{header|Forth}}==

Tested with Gforth 0.7.3.

<~~lang~~ ~~Forth~~>CREATE BUF 0 , \ single-character look-ahead buffer

<syntaxhighlight lang="forth">CREATE BUF 0 , \ single-character look-ahead buffer

: PEEK BUF @ 0= IF KEY BUF ! THEN BUF @ ;

: GETC PEEK 0 BUF ! ;

Line 4,445:

: -EOI? TOKEN-TYPE End_of_input <> ;

: PARSE $NULL GETTOK BEGIN -EOI? WHILE STMT $SEQUENCE REPEAT ;

PARSE .NODE</~~lang~~>

PARSE .NODE</syntaxhighlight>

Line 4,487:

The following is Fortran 2008/2018 code with C preprocessing directives. If you call the program source ‘parse.F90’, with a capital ‘F’, then gfortran will know to run the C preprocessor.

<~~lang~~ fortran>!!!

<syntaxhighlight lang="fortran">!!!

!!! An implementation of the Rosetta Code parser task:

!!! https://rosettacode.org/wiki/Compiler/syntax_analyzer

Line 6,033:

end subroutine print_usage

end program parse</~~lang~~>

end program parse</syntaxhighlight>

Line 6,135:

=={{header|Go}}==

<~~lang~~ go>package main

<syntaxhighlight lang="go">package main

import (

Line 6,488:

scanner = bufio.NewScanner(source)

prtAst(parse())

}</~~lang~~>

}</syntaxhighlight>

Line 6,597:

<syntaxhighlight lang="icon">#

<lang Icon>#

# The Rosetta Code Tiny-Language Parser, in Icon.

#

Line 6,954:

}

return s

end</~~lang~~>

end</syntaxhighlight>

Line 7,057:

Implementation:

<~~lang~~ J>require'format/printf'

<syntaxhighlight lang="j">require'format/printf'

tkref=: tokenize 'End_of_input*/%+--<<=>>===!=!&&||print=print(if{else}while;,putc)a""0'

Line 7,214:

end.

}}

</syntaxhighlight>

</lang>

Some quirks worth noting:

Line 7,230:

Task example:

primes=: {{)n

/*

Line 7,350:

String "\n"

;

</syntaxhighlight>

</lang>

=={{header|Java}}==

Usage: java Parser infile [>outfile]

<~~lang~~ java>

import java.io.File;

import java.io.FileNotFoundException;

Line 7,707:

}

</syntaxhighlight>

</lang>

=={{header|Julia}}==

Julia tends to discourage large numbers of global variables, so this direct port from the Python reference implementation moves the globals into a function wrapper.

<~~lang~~ julia>struct ASTnode

<syntaxhighlight lang="julia">struct ASTnode

nodetype::Int

left::Union{Nothing, ASTnode}

Line 7,964:

# syntaxanalyzer(length(ARGS) > 1 ? ARGS[1] : stdin) # for use as in the Python code

</syntaxhighlight>

</lang>

=={{header|M2000 Interpreter}}==

Line 7,972:

Module syntax_analyzer(b$){

enum tokens {

Line 8,345:

43 1 End_of_Input

}

</syntaxhighlight>

</lang>

Line 8,525:

Using the third version of Nim lexer.

<~~lang~~ ~~Nim~~>import ast_lexer

<syntaxhighlight lang="nim">import ast_lexer

type NodeKind* = enum

Line 8,773:

let code = if paramCount() < 1: stdin.readAll() else: paramStr(1).readFile()

let tree = parse(code)

tree.printAst()</~~lang~~>

tree.printAst()</syntaxhighlight>

Line 8,882:

<~~lang~~ ~~ObjectIcon~~># -*- ObjectIcon -*-

<syntaxhighlight lang="objecticon"># -*- ObjectIcon -*-

#

# The Rosetta Code Tiny-Language Parser, in Object Icon.

Line 9,238:

}

return s

end</~~lang~~>

end</syntaxhighlight>

Line 9,342:

=={{header|Perl}}==

Tested on perl v5.26.1

<~~lang~~ ~~Perl~~>#!/usr/bin/perl

<syntaxhighlight lang="perl">#!/usr/bin/perl

use strict; # parse.pl - inputs lex, outputs flattened ast

Line 9,415:

$_[0] <= 0 && /$h Op_or \n/gcx ? "Or\n$ast" . expr(1) :

return $ast while 1;

}</~~lang~~>

}</syntaxhighlight>

Line 9,449:

=={{header|Phix}}==

Reusing lex.e (and core.e) from the [[Compiler/lexical_analyzer#Phix|Lexical Analyzer task]], and again written as a reusable module.

--

-- demo\rosetta\Compiler\parse.e

Line 9,602:

return t

end function

And a simple test driver for the specific task:

--

-- demo\rosetta\Compiler\parse.exw

Line 9,663:

--main({0,0,"primes.c"}) -- as Algol, C, Python (apart from spacing)

--main({0,0,"count.c"}) -- as AWK ( "" )

<pre>

Line 9,672:

=={{header|Python}}==

Tested with Python 2.7 and 3.x

<~~lang~~ ~~Python~~>from __future__ import print_function

<syntaxhighlight lang="python">from __future__ import print_function

import sys, shlex, operator

Line 9,938:

error(0, 0, "Can't open %s" % sys.argv[1])

t = parse()

prt_ast(t)</~~lang~~>

prt_ast(t)</syntaxhighlight>

Line 10,058:

<~~lang~~ ratfor>######################################################################

<syntaxhighlight lang="ratfor">######################################################################

#

# The Rosetta Code parser in Ratfor 77.

Line 11,608:

end

######################################################################</~~lang~~>

######################################################################</syntaxhighlight>

Line 11,723:

The following code implements a configurable (from a symbol map provided as a parameter) Precedence Climbing parser for the output of the [http://rosettacode.org/wiki/Compiler/lexical_analyzer#Scala lexer]. The recursive descent language parser is closely based on the pseudo code given in the task description.

<~~lang~~ scala>

package xyz.hyperreal.rosettacodeCompiler

Line 11,933:

}

</syntaxhighlight>

</lang>

=={{header|Scheme}}==

Line 11,939:

Code implements a recursive descent parser based on the given grammar. Tested against all programs in [[Compiler/Sample programs]].

<~~lang~~ scheme>

(import (scheme base)

(scheme process-context)

Line 12,146:

(display-ast (parse-file (cadr (command-line))))

(display "Error: provide program filename\n"))

</syntaxhighlight>

</lang>

=={{header|Wren}}==

Line 12,153:

<~~lang~~ ecmascript>import "/dynamic" for Enum, Struct, Tuple

<syntaxhighlight lang="ecmascript">import "/dynamic" for Enum, Struct, Tuple

import "/fmt" for Fmt

import "/ioutil" for FileUtil

Line 12,469:

lines = FileUtil.readLines("source.txt")

lineCount = lines.count

prtAst.call(parse.call())</~~lang~~>

prtAst.call(parse.call())</syntaxhighlight>

Line 12,571:

=={{header|Zig}}==

<~~lang~~ zig>

const std = @import("std");

Line 13,124:

return result;

}

</syntaxhighlight>

</lang>