XML/Input: Difference between revisions

Content added Content deleted
m (→‎PEG-based Parsing: allow </foo >; def q(_):)
Line 1,991: Line 1,991:
* (2) https://cs.lmu.edu/~ray/notes/xmlgrammar/
* (2) https://cs.lmu.edu/~ray/notes/xmlgrammar/
====PEG Infrastructure====
====PEG Infrastructure====
The jq module at [[:Category:Jq/peg.jq]] can be included by copying it to a file,
and adding an `include` statement to top of the main program, e.g. as follows:
<syntaxhighlight lang=jq>
<syntaxhighlight lang=jq>
include "peg" {search: "."};
# PEG to jq transcription is based on these equivalences:
</syntaxhighlight>
# Sequence: e1 e2 e1 | e2
# Ordered choice: e1 / e2 e1 // e2
# Zero-or-more: e* star(E)
# One-or-more: e+ plus(E)
# Optional: e? optional(E)
# And-predicate: &e amp(E) # no input is consumed
# Not-predicate: !e neg(E) # no input is consumed


# The idea is to pass a JSON object {remainder:_, result:_ } through a
# pipeline, consuming the text in .remainder and building up .result.

def star(E): ((E | star(E)) // .) ;
def plus(E): E | (plus(E) // . );
def optional(E): (E // .);
def amp(E): . as $in | E | $in;
def neg(E): select( [E] == [] );

### Helper functions:

# Consume a regular expression rooted at the start of .remainder, or emit empty;
# on success, update .remainder and set .match but do NOT update .result
def consume($re):
# on failure, match yields empty
(.remainder | match("^" + $re)) as $match
| .remainder |= .[$match.length :]
| .match = $match.string;

def parse($re):
consume($re)
| .result = .result + [.match] ;

# consume the literal string $s
def q($s):
select(.remainder | startswith($s))
| .remainder |= .[$s | length :] ;

def literal($s):
q($s)
| .result += [$s];

# Tagging
def box(E):
((.result = null) | E) as $e
| .remainder = $e.remainder
| .result += [$e.result] # the magic sauce
;

def box(name; E):
((.result = null) | E) as $e
| .remainder = $e.remainder
| .result += [{(name): (try ($e.result|join("")) catch $e.result) }] # the magic sauce
;

def objectify(E):
box(E)
| .result[-1] |= {(.[0]): .[1:]} ;

def keyvalue(E):
box(E)
| .result[-1] |= {(.[0]): .[1]} ;

# optional whitespace
def ws: consume("[ \n\r\t]*");

def string_except($regex):
box(star(neg( parse($regex) ) | parse("."))) | .result[-1] |= add;

</syntaxhighlight>
====XML Grammar====
====XML Grammar====
<syntaxhighlight lang=jq>
<syntaxhighlight lang=jq>