JSON: Difference between revisions

3,651 bytes added ,  12 years ago
(jsonlint also accepts invalid syntax.)
Line 765:
<pre>{"blue": [1, 2], "ocean": "water"}</pre>
Note that this is capable of correctly handling the round-trip of values parsed from the <code>json</code> package described above.
 
=={{header|TXR}}==
 
{{works with|TXR|"git head"}}
 
===Parsing===
 
The following implements the parsing half of the task. It is a parser closely based on the JSON grammar [[http://www.json.org/fatfree.html]].
 
It is implemented with recursive horizontal pattern matching functions, and so basically the definition resembles a grammar. Horizontal functions are a new feature in TXR, and basically allow the language to easily specify LL grammars with indefinite lookahead, not restricted to regular languages (thanks to TXR's backtracking). The numerous occurences of @\ in the code are line continuations. Horizontal functions must be written on one logical line. @\ eats the whitespace at the start of the next physical line, to allow indentation.
 
The parser translates to a nested list structure in which the types are labeled with the strings "O", "A", "N", "S" and "K". (Object, array, number, string, and keyword).
 
The largest grammar rule handles JSON string literals. The strategy is to generate a HTML string and then filter from HTML using the <code>:from_html</code> filter in TXR. For instance \xABCD is translated to &#xABCD; and then the filter will produce the proper Unicode character. Similarly \" is translated to &quot; and \n is translated to &#10; etc.
 
A little liberty is taken: the useless commas in JSON are treated as optional.
 
Superfluous terminating commas (not generated by the JSON grammar but accepted by some other parsers) are not allowed by this parser.
 
<lang txr>@(define value (v))@\
@(cases)@\
@(string v)@(or)@(num v)@(or)@(object v)@(or)@\
@(keyword v)@(or)@(array v)@\
@(end)@\
@(end)
@(define ws)@/[\n\t ]*/@(end)
@(define string (g))@\
@(local s hex)@\
@(ws)@\
"@(coll :gap 0 :vars (s))@\
@(cases)@\
\"@(bind s "&quot;")@(or)@\
\\@(bind s "\\\\")@(or)@\
\/@(bind s "\\/")@(or)@\
\b@(bind s "&#8;")@(or)@\
\f@(bind s "&#12;")@(or)@\
\n@(bind s "&#10;")@(or)@\
\r@(bind s "&#13;")@(or)@\
\t@(bind s "&#9;")@(or)@\
\u@{hex /[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]/}@\
@(bind s `&#x@hex;`)@(or)@\
@{s /[^"\\]*/}@(filter :to_html s)@\
@(end)@\
@(until)"@\
@(end)"@\
@(ws)@\
@(cat s "")@\
@(filter :from_html s)@\
@(bind g ("S" s))@\
@(end)
@(define num (v))@\
@(local n)@\
@(ws)@{n /-?[0-9]+((\.[0-9]+)?([Ee][+\-]?[0-9]+)?)?/}@(ws)@\
@(bind v ("N" n))@\
@(end)
@(define keyword (v))@\
@(local k)@\
@(ws)@(some)@{k /true/}@(or)@{k /false/}@(or)@{k /null/}@(end)@(ws)@\
@(bind v ("K" k))@\
@(end)
@(define object (v))@\
@(local p e pair)@\
@(ws){@(ws)@(coll :gap 0 :vars (pair))@\
@(string p):@(value e)@/,?/@\
@(bind pair (p e))@\
@(until)}@\
@(end)}@(ws)@\
@(bind v ("O" pair))@\
@(end)
@(define array (v))@\
@(local e)@\
@(ws)[@(ws)@(coll :gap 0 :var (e))@(value e)@/,?/@(until)]@(end)]@(ws)@\
@(bind v ("A" e))@\
@(end)
@(freeform)
@(maybe)@(value v)@(end)@badsyntax</lang>
 
A few tests. Note, the <code>badsyntax</code> variable is bound to any trailing portion of the input that does not match the syntax. The call to the parser <code>@(value v)</code> extracts the longest prefix of the input which is consistent with the syntax, leaving the remainder to be matched into <code>badsyntax</code>.
 
<lang bash>$ echo -n '{ "a" : { "b" : 3, "c" : [1,2,3] } }[' | ./txr -l json.txr -
(v "O" ((("S" "a") ("O" ((("S" "b") ("N" "3")) (("S" "c") ("A" (("N" "1") ("N" "2") ("N" "3")))))))))
(badsyntax . "[\n")
 
$ echo -n '"\u1234"' | ./txr -l json.txr -
(v "S" "\11064")
(badsyntax . "")</lang>
 
 
[[Category:Data Structures]]
Anonymous user