XML/Input: Difference between revisions
Content added Content deleted
m (→PEG-based Parsing: allow </foo >; def q(_):) |
|||
Line 1,991: | Line 1,991: | ||
* (2) https://cs.lmu.edu/~ray/notes/xmlgrammar/ |
* (2) https://cs.lmu.edu/~ray/notes/xmlgrammar/ |
||
====PEG Infrastructure==== |
====PEG Infrastructure==== |
||
The jq module at [[:Category:Jq/peg.jq]] can be included by copying it to a file, |
|||
and adding an `include` statement to top of the main program, e.g. as follows: |
|||
<syntaxhighlight lang=jq> |
<syntaxhighlight lang=jq> |
||
include "peg" {search: "."}; |
|||
# PEG to jq transcription is based on these equivalences: |
|||
⚫ | |||
# Sequence: e1 e2 e1 | e2 |
|||
# Ordered choice: e1 / e2 e1 // e2 |
|||
# Zero-or-more: e* star(E) |
|||
# One-or-more: e+ plus(E) |
|||
# Optional: e? optional(E) |
|||
# And-predicate: &e amp(E) # no input is consumed |
|||
# Not-predicate: !e neg(E) # no input is consumed |
|||
# The idea is to pass a JSON object {remainder:_, result:_ } through a |
|||
# pipeline, consuming the text in .remainder and building up .result. |
|||
def star(E): ((E | star(E)) // .) ; |
|||
def plus(E): E | (plus(E) // . ); |
|||
def optional(E): (E // .); |
|||
def amp(E): . as $in | E | $in; |
|||
def neg(E): select( [E] == [] ); |
|||
### Helper functions: |
|||
# Consume a regular expression rooted at the start of .remainder, or emit empty; |
|||
# on success, update .remainder and set .match but do NOT update .result |
|||
def consume($re): |
|||
# on failure, match yields empty |
|||
(.remainder | match("^" + $re)) as $match |
|||
| .remainder |= .[$match.length :] |
|||
| .match = $match.string; |
|||
def parse($re): |
|||
consume($re) |
|||
| .result = .result + [.match] ; |
|||
# consume the literal string $s |
|||
def q($s): |
|||
select(.remainder | startswith($s)) |
|||
| .remainder |= .[$s | length :] ; |
|||
def literal($s): |
|||
q($s) |
|||
| .result += [$s]; |
|||
# Tagging |
|||
def box(E): |
|||
((.result = null) | E) as $e |
|||
| .remainder = $e.remainder |
|||
| .result += [$e.result] # the magic sauce |
|||
; |
|||
def box(name; E): |
|||
((.result = null) | E) as $e |
|||
| .remainder = $e.remainder |
|||
| .result += [{(name): (try ($e.result|join("")) catch $e.result) }] # the magic sauce |
|||
; |
|||
def objectify(E): |
|||
box(E) |
|||
| .result[-1] |= {(.[0]): .[1:]} ; |
|||
def keyvalue(E): |
|||
box(E) |
|||
| .result[-1] |= {(.[0]): .[1]} ; |
|||
# optional whitespace |
|||
def ws: consume("[ \n\r\t]*"); |
|||
def string_except($regex): |
|||
box(star(neg( parse($regex) ) | parse("."))) | .result[-1] |= add; |
|||
⚫ | |||
====XML Grammar==== |
====XML Grammar==== |
||
<syntaxhighlight lang=jq> |
<syntaxhighlight lang=jq> |