S-expressions: Difference between revisions

← Older edit

S-expressions (view source)

Revision as of 21:30, 17 April 2024

57,938 bytes added , 2 months ago

m

→‎JS Functional

Hout

9,659

edits

Revision as of 04:01, 2 February 2021 (view source) rosettacode>Gerard Schildberger m (→‎{{header\|REXX}}: corrected a typo, added/changed whitespace and comments.) ← Older edit		Latest revision as of 21:30, 17 April 2024 (view source) Hout (talk \| contribs) m (→‎JS Functional)
(54 intermediate revisions by 11 users not shown)
Line 21: The reader should be able to read the following input <~~lang~~syntaxhighlight lang="lisp">((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))</~~lang~~syntaxhighlight> and turn it into a native datastructure. (see the [[#Pike\|Pike]], [[#Python\|Python]] and [[#Ruby\|Ruby]] implementations for examples of native data structures.) Line 32: Let the writer produce pretty printed output with indenting and line-breaks. <br><br> =={{header\|11l}}== {{trans\|Nim}} <syntaxhighlight lang="11l">T Token T.enum Kind INT FLOAT STRING IDENT LPAR RPAR END Kind kind String val F (kind, val = ‘’) .kind = kind .val = val F lex(input_str) [Token] result V pos = 0 F current() R I @pos < @input_str.len {@input_str[@pos]} E Char("\0") L pos < input_str.len V ch = input_str[pos] I ch == ‘(’ pos++ result.append(Token(Token.Kind.LPAR)) E I ch == ‘)’ pos++ result.append(Token(Token.Kind.RPAR)) E I ch C ‘0’..‘9’ V num = ‘’ V kind = Token.Kind.INT L current() C ‘0’..‘9’ num ‘’= current() pos++ I current() == ‘.’ num ‘’= current() kind = FLOAT pos++ L current() C ‘0’..‘9’ num ‘’= current() pos++ result.append(Token(kind, num)) E I ch C (‘ ’, "\t", "\n", "\r") pos++ E I ch == ‘"’ V str = ‘’ pos++ L current() != ‘"’ str ‘’= current() pos++ pos++ result.append(Token(Token.Kind.STRING, str)) E V BannedChars = Set([‘ ’, "\t", ‘"’, ‘(’, ‘)’, ‘;’]) V ident = ‘’ L current() !C BannedChars ident ‘’= current() pos++ result.append(Token(Token.Kind.IDENT, ident)) result.append(Token(Token.Kind.END)) R result F indent(s, count) R (count * ‘ ’)‘’s.replace("\n", "\n"(count * ‘ ’)) T SExpr T.enum Kind INT FLOAT STRING IDENT LIST Kind kind String val [SExpr] children F (kind, val = ‘’) .kind = kind .val = val F to_str() I .kind C (SExpr.Kind.INT, SExpr.Kind.FLOAT, SExpr.Kind.IDENT) R .val E I .kind == STRING R ‘"’(.val)‘"’ E I .kind == LIST V result = ‘(’ L(i, ex) enumerate(.children) I ex.kind == LIST & ex.children.len > 1 result ‘’= "\n" result ‘’= indent(ex.to_str(), 2) E I i > 0 result ‘’= ‘ ’ result ‘’= ex.to_str() R result‘)’ assert(0B) V input_str = ‘ ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) ’ V tokens = lex(input_str) V pos = 0 F current() R I :pos < :tokens.len {:tokens[:pos]} E Token(Token.Kind.END) F parse() -> SExpr V token = current() :pos++ I token.kind == INT R SExpr(SExpr.Kind.INT, token.val) E I token.kind == FLOAT R SExpr(SExpr.Kind.FLOAT, token.val) E I token.kind == STRING R SExpr(SExpr.Kind.STRING, token.val) E I token.kind == IDENT R SExpr(SExpr.Kind.IDENT, token.val) E I token.kind == LPAR V result = SExpr(SExpr.Kind.LIST) L current().kind !C (Token.Kind.RPAR, Token.Kind.END) result.children.append(parse()) assert(current().kind != END, ‘Missing right paren ')'’) :pos++ R result assert(0B) print(parse().to_str())</syntaxhighlight> {{out}} <pre> ( (data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) </pre> =={{header\|Ada}}== Line 39 ⟶ 186: Specification of package S_Expr: <~~lang~~syntaxhighlight ~~Ada~~lang="ada">with Ada.Strings.Unbounded; private with Ada.Containers.Indefinite_Vectors; Line 93 ⟶ 240: end record; end S_Expr;</~~lang~~syntaxhighlight> The implementation of S_Expr: <~~lang~~syntaxhighlight ~~Ada~~lang="ada">with Ada.Integer_Text_IO, Ada.Float_Text_IO; package body S_Expr is Line 151 ⟶ 298: end Print; end S_Expr;</~~lang~~syntaxhighlight> Specification and Implementation of S_Expr.Parser (a child package of S_Expr): <~~lang~~syntaxhighlight ~~Ada~~lang="ada">generic -- child of a generic package must be a generic unit package S_Expr.Parser is Line 161 ⟶ 308: -- the result of a parse process is always a list of expressions end S_Expr.Parser;</~~lang~~syntaxhighlight> <~~lang~~syntaxhighlight ~~Ada~~lang="ada">with Ada.Integer_Text_IO, Ada.Float_Text_IO; package body S_Expr.Parser is Line 272 ⟶ 419: end Parse; end S_Expr.Parser;</~~lang~~syntaxhighlight> The main program Test_S_Expr: <~~lang~~syntaxhighlight ~~Ada~~lang="ada">with S_Expr.Parser, Ada.Text_IO; procedure Test_S_Expr is Line 298 ⟶ 445: Expression_List.First.Print(Indention => 0); -- Parse will output a list of S-Expressions. We need the first Expression. end Test_S_Expr;</~~lang~~syntaxhighlight> {{out}} Line 326 ⟶ 473: =={{header\|ALGOL 68}}== <~~lang~~syntaxhighlight lang="algol68"># S-Expressions # CHAR nl = REPR 10; # mode representing an S-expression # Line 452 ⟶ 599: + nl ) )</~~lang~~syntaxhighlight> {{out}} <pre> Line 559 ⟶ 706: <~~lang~~syntaxhighlight ~~APL~~lang="apl">sexp←{ wspace←' ',⎕TC ⍝ whitespace is space, tab, cr, lf Line 622 ⟶ 769: } </syntaxhighlight> ~~</lang>~~ =={{header\|Arturo}}== <syntaxhighlight lang="rebol">code: { ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) } s: first to :block code inspect.muted s print as.code s</syntaxhighlight> {{out}} <pre>[ :inline [ :inline data :word quoted data :string 123 :integer 4.5 :floating ] [ :inline data :word [ :inline ! :symbol @ :symbol # :symbol [ :inline 4.5 :floating ] (more :string data) :string ] ] ] ((data "quoted data" 123 4.5) (data (! @ # (4.5) "(more" "data)")))</pre> =={{header\|AutoHotkey}}== <~~lang~~syntaxhighlight ~~AutoHotkey~~lang="autohotkey">S_Expressions(Str){ Str := RegExReplace(Str, "s)(?<![\\])"".?[^\\]""(SKIP)(F)\|((?<![\\])[)(]\|\s)", "`n$0`n") Str := RegExReplace(Str, "`am)^\s\v+") , Cnt := 0 Line 640 ⟶ 822: Res .= "`t" return Res }</~~lang~~syntaxhighlight> Examples:<~~lang~~syntaxhighlight ~~AutoHotkey~~lang="autohotkey">Str = ( ((data da\(\)ta "quot\\ed data" 123 4.5) ("data" (!@# (4.5) "(mo\"re" "data)"))) ) MsgBox, 262144, , % S_Expressions(Str)</~~lang~~syntaxhighlight> {{out}} <pre>( Line 672 ⟶ 854: =={{header\|C}}== <~~lang~~syntaxhighlight lang="c">#include <stdio.h> #include <stdlib.h> #include <ctype.h> Line 897 ⟶ 1,079: print_expr(x, 0); return 0; }</~~lang~~syntaxhighlight> {{out}}<syntaxhighlight lang="text">input is: ((data da\(\)ta "quot\\ed data" 123 4.5) ("data" (!@# (4.5) "(mo\"re" "data)"))) Line 921 ⟶ 1,103: ) ) )</~~lang~~syntaxhighlight> =={{header\|C sharp\|C#}}== Line 928 ⟶ 1,110: Git repository with code and tests can be found here: https://github.com/ichensky/SExpression/tree/rosettacode <~~lang~~syntaxhighlight lang="csharp"> using System; using System.Collections.Generic; Line 975 ⟶ 1,157: } </syntaxhighlight> ~~</lang>~~ <~~lang~~syntaxhighlight lang="csharp"> using System; using System.Collections.Generic; Line 1,161 ⟶ 1,343: } } </syntaxhighlight> ~~</lang>~~ <~~lang~~syntaxhighlight lang="csharp"> using System; using System.Collections.Generic; Line 1,183 ⟶ 1,365: } } </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 1,196 ⟶ 1,378: apart from writing it out, which fulfils this task's requirements. With some more work this code might actually be useful. <~~lang~~syntaxhighlight lang="cpp">#include <cctype> #include <iomanip> #include <iostream> Line 1,466 ⟶ 1,648: } return 0; }</~~lang~~syntaxhighlight> {{out}} Line 1,492 ⟶ 1,674: =={{header\|Ceylon}}== <~~lang~~syntaxhighlight lang="ceylon">class Symbol(symbol) { shared String symbol; string => symbol; Line 1,660 ⟶ 1,842: prettyPrint(tree); } </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 1,686 ⟶ 1,868: =={{header\|CoffeeScript}}== {{improve\|CoffeeScript\|This solution does not reproduce unquoted strings as per task description}} <~~lang~~syntaxhighlight lang="coffeescript"> # This code works with Lisp-like s-expressions. # Line 1,817 ⟶ 1,999: console.log "output:\n#{pp output}\n" console.log "round trip:\n#{sexp output}\n" </syntaxhighlight> ~~</lang>~~ {{out}} <syntaxhighlight lang="text"> > coffee sexp.coffee input: Line 1,849 ⟶ 2,031: round trip: (("data" "quoted data with escaped \"" 123 4.5 "14") ("data" ("!@#" (4.5) "(more" "data)"))) </syntaxhighlight> ~~</lang>~~ =={{header\|Common Lisp}}== Line 1,867 ⟶ 2,049: Unfortunately, our pointy-haired boss has asked you to write a parser for an unusual s-expression syntax that uses square brackets instead of parenthesis. In most programming languages, this would necessitate writing an entire parser. Fortunately, the Common Lisp reader can be modified through the use of macro-characters to accomplish this task. When the reader parses a macro-character token, a function associated with the macro-character is called. As evidenced below, modifying the behavior of the Lisp reader by setting macro-character functions to handle additional sytax requires far less work than writing a complete parser from scratch. <~~lang~~syntaxhighlight lang="lisp">(defun lsquare-reader (stream char) (declare (ignore char)) (read-delimited-list #\] stream t)) (set-macro-character #\[ #'lsquare-reader) ;;Call the lsquare-reader function when a '[' token is parsed (set-macro-character #\] (get-macro-character #\) nil)) ;;Do the same thing as ')' when a ']' token is parsed</~~lang~~syntaxhighlight> Unit test code: <~~lang~~syntaxhighlight lang="lisp">;;A list of unit tests. Each test is a cons in which the car (left side) contains the ;;test string and the cdr (right side) the expected result of reading the S-Exp. (setf unit-tests Line 1,888 ⟶ 2,070: (dolist (test unit-tests) (format t "String: ~23s Expected: ~23s Actual: ~s~%" (car test) (cdr test) (read-from-string (car test)))))</~~lang~~syntaxhighlight> {{out\| Unit test output}} <pre>CL-USER> (run-tests) Line 1,922 ⟶ 2,104: ===Writing S-Expressions=== The next step in this task is to write a standard Lisp s-expression in the square bracket notation. <~~lang~~syntaxhighlight lang="lisp">(defun write-sexp (sexp) "Writes a Lisp s-expression in square bracket notation." (labels ((parse (sexp) Line 1,945 ⟶ 2,127: (subseq str 0 last-char) str))))))) (concatenate 'string "[" (fix-spacing (parse sexp)) "]")))</~~lang~~syntaxhighlight> Unit test code: <~~lang~~syntaxhighlight lang="lisp">(setf unit-tests '(((1 2) (3 4)) (1 2 3 4) ("ab(cd" "mn)op") (1 (2 (3 (4)))) ((1) (2) (3)) ())) Line 1,953 ⟶ 2,135: (dolist (test unit-tests) (format t "Before: ~18s After: ~s~%" test (write-sexp test))))</~~lang~~syntaxhighlight> {{out\|Unit test output}} <pre>CL-USER> (run-tests) Line 1,979 ⟶ 2,161: a native floating point type, floating point numbers are not. <~~lang~~syntaxhighlight lang="cowgol">include "cowgol.coh"; include "strings.coh"; include "malloc.coh"; Line 2,244 ⟶ 2,426: print("Parsed:\n"); prettyprint(ParseSExp(str)); print_nl();</~~lang~~syntaxhighlight> {{out}} Line 2,273 ⟶ 2,455: =={{header\|D}}== <~~lang~~syntaxhighlight lang="d">import std.stdio, std.conv, std.algorithm, std.variant, std.uni, std.functional, std.string; Line 2,359 ⟶ 2,541: "Printed: ".write; pTest.writeSexp; }</~~lang~~syntaxhighlight> {{out}} <pre>Parsed: [[data, quoted data, 123, 4.5], [data, [!@#, [4.5], (more, data)]]] Line 2,366 ⟶ 2,548: =={{header\|EchoLisp}}== The '''(read-from-string input-string)''' function parses a string into an s-expression, which is the native representation of program/data in EchoLisp and the majority of Lisps . <~~lang~~syntaxhighlight lang="lisp"> (define input-string #'((data "quoted data" 123 4.5)\n(data (!@# (4.5) "(more" "data)")))'#) Line 2,383 ⟶ 2,565: (first(rest s-expr)) → (data (!@# (4.5) "(more" "data)")) </syntaxhighlight> ~~</lang>~~ =={{header\|F_Sharp\|F#}}== Implementation of S-expression parser in F# 4.7 language. Line 2,396 ⟶ 2,578: The file <code>SExpr.fs</code> containing the implementation: <~~lang~~syntaxhighlight lang="fsharp"> module SExpr (* This module is a very simple port of the OCaml version to F# (F-Sharp) ) Line 2,628 ⟶ 2,810: ( print_endline (string_of_sexpr_indent s) ) printfn "%s" (string_of_sexpr_indent s) </syntaxhighlight> ~~</lang>~~ Line 2,635 ⟶ 2,817: Read the experession from a file of preset it in the code. <~~lang~~syntaxhighlight lang="fsharp"> module Program ( Learn more about F# at https://fsharp.org ) Line 2,673 ⟶ 2,855: ( return an integer exit code ) 0 </syntaxhighlight> ~~</lang>~~ {{out}} Line 2,699 ⟶ 2,881: Factor has a comprehensive prettyprinter which can print any Factor object in a readable way. Not only can we leverage it to easily print our native data structure, but we can also call <code>unparse</code> to convert it to a string. This leaves us with a string reminiscent of the original input, and we are able to take it the rest of the way with two simple regular expressions. <~~lang~~syntaxhighlight lang="factor">USING: formatting kernel math.parser multiline peg peg.ebnf regexp sequences prettyprint words ; IN: rosetta-code.s-expressions Line 2,725 ⟶ 2,907: sexp>seq dup seq>sexp "Native:\n%u\n\nRound trip:\n%s\n" printf ] bi</~~lang~~syntaxhighlight> {{out}} <pre> Line 2,744 ⟶ 2,926: =={{header\|Go}}== <~~lang~~syntaxhighlight lang="go">package main import ( Line 2,917 ⟶ 3,099: fmt.Println(s.i) } }</~~lang~~syntaxhighlight> {{out}} <pre> Line 2,945 ⟶ 3,127: =={{header\|Haskell}}== <~~lang~~syntaxhighlight lang="haskell">import qualified Data.Functor.Identity as F import qualified Text.Parsec.Prim as Prim import Text.Parsec Line 2,984 ⟶ 3,166: "((data \"quoted data\" 123 4.5)\n (data (!@# (4.5) \"(more\" \"data)\")))" putStrLn ("The input:\n" ++ expr ++ "\n\nParsed as:") p expr</~~lang~~syntaxhighlight> {{Out}} <pre>The input: Line 2,992 ⟶ 3,174: Parsed as: List [List [Symbol "data",String "quoted data",Int 123,Float 4.5],List [Symbol "data",List [Symbol "!@#",List [Float 4.5],String "(more",String "data)"]]]</pre> Or, parsing by hand (rather than with a parser combinator library) and printing a parse tree diagram: <syntaxhighlight lang="haskell">{-# LANGUAGE TupleSections #-} import Data.Bifunctor (bimap) import Data.List (mapAccumL) import Data.List.Split (splitOn) import Data.Maybe (catMaybes, fromMaybe, listToMaybe) import Data.Tree (Forest, Tree (..), drawForest) ------------------------ DATA TYPE ----------------------- data Val = Int Integer \| Float Double \| String String \| Symbol String \| List [Val] deriving (Eq, Show, Read) instance Semigroup Val where List a <> List b = List (a <> b) instance Monoid Val where mempty = List [] --------------------------- MAIN ------------------------- main :: IO () main = do let expr = unlines [ "((data \"quoted data\" 123 4.5)", " (data (!@# (4.5) \"(more\" \"data)\")))" ] parse = fst (parseExpr (tokenized expr)) putStrLn $ treeDiagram $ forestFromVal parse putStrLn "Serialized from the parse tree:\n" putStrLn $ litVal parse ------------------- S-EXPRESSION PARSER ------------------ parseExpr :: [String] -> (Val, [String]) parseExpr = until finished parseToken . (mempty,) finished :: (Val, [String]) -> Bool finished (_, []) = True finished (_, token : _) = ")" == token parseToken :: (Val, [String]) -> (Val, [String]) parseToken (v, "(" : rest) = bimap ((v <>) . List . return) tail (parseExpr rest) parseToken (v, ")" : rest) = (v, rest) parseToken (v, t : rest) = (v <> List [atom t], rest) ----------------------- TOKEN PARSER --------------------- atom :: String -> Val atom [] = mempty atom s@('"' : _) = fromMaybe mempty (maybeRead ("String " <> s)) atom s = headDef (Symbol s) $ catMaybes $ maybeRead . (<> (' ' : s)) <$> ["Int", "Float"] maybeRead :: String -> Maybe Val maybeRead = fmap fst . listToMaybe . reads ----------------------- TOKENIZATION --------------------- tokenized :: String -> [String] tokenized s = quoteTokens '"' s >>= go where go [] = [] go token@('"' : _) = [token] go s = words $ spacedBrackets s quoteTokens :: Char -> String -> [String] quoteTokens q s = snd $ mapAccumL go False (splitOn [q] s) where go b s \| b = (False, '"' : s <> "\"") \| otherwise = (True, s) spacedBrackets :: String -> String spacedBrackets [] = [] spacedBrackets (c : cs) \| c `elem` "()" = ' ' : c : " " <> spacedBrackets cs \| otherwise = c : spacedBrackets cs ------------------------- DIAGRAMS ----------------------- treeDiagram :: Forest Val -> String treeDiagram = drawForest . fmap (fmap show) forestFromVal :: Val -> Forest Val forestFromVal (List xs) = treeFromVal <$> xs treeFromVal :: Val -> Tree Val treeFromVal (List xs) = Node (Symbol "List") (treeFromVal <$> xs) treeFromVal v = Node v [] ---------------------- SERIALISATION --------------------- litVal (Symbol x) = x litVal (Int x) = show x litVal (Float x) = show x litVal (String x) = '"' : x <> "\"" litVal (List [List xs]) = litVal (List xs) litVal (List xs) = '(' : (unwords (litVal <$> xs) <> ")") ------------------------- GENERIC ------------------------ headDef :: a -> [a] -> a headDef d [] = d headDef _ (x : _) = x</syntaxhighlight> {{Out}} <pre>Symbol "List" \| +- Symbol "List" \| \| \| +- Symbol "data" \| \| \| +- String "quoted data" \| \| \| +- Int 123 \| \| \| `- Float 4.5 \| `- Symbol "List" \| +- Symbol "data" \| `- Symbol "List" \| +- Symbol "!@#" \| +- Symbol "List" \| \| \| `- Float 4.5 \| +- String "(more" \| `- String "data)" Serialized from the parse tree: ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))</pre> =={{header\|Icon}} and {{header\|Unicon}}== Line 2,999 ⟶ 3,335: The example takes single and double quotes. <br> Single quotes were used instead of doubles in the input. <~~lang~~syntaxhighlight ~~Icon~~lang="icon">link ximage procedure main() Line 3,058 ⟶ 3,394: } return T end</~~lang~~syntaxhighlight> {{libheader\|Icon Programming Library}} Line 3,089 ⟶ 3,425: This implementation does not support escape characters. If escape characters were added, we would need additional support in the tokenizer (an extra character class, and in the state table an extra column and two extra rows, or almost double the number of state transitions: 35 instead of 20), and additional support in the data language (unfmt would need to strip out escape characters and fmt would need to insert escape characters -- so each of these routines would also perhaps double in size.) And that's a lot of bulk for serialize/deserialize mechanism which, by design, cannot represent frequently used data elements (such as matrices and gerunds). <~~lang~~syntaxhighlight lang="j">NB. character classes: 0: paren, 1: quote, 2: whitespace, 3: wordforming (default) chrMap=: '()';'"';' ',LF,TAB,CR Line 3,140 ⟶ 3,476: readSexpr=: fmt L:0 @rdSexpr :.writeSexpr writeSexpr=: wrSexpr @(unfmt L:0) :.readSexpr</~~lang~~syntaxhighlight> Example use: <~~lang~~syntaxhighlight lang="j"> readSexpr '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))' ┌───────────────────────────┬────────────────────────────────┐ │┌─────┬───────────┬───┬───┐│┌─────┬────────────────────────┐│ Line 3,156 ⟶ 3,492: └───────────────────────────┴────────────────────────────────┘ writeSexpr readSexpr '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))' ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))</~~lang~~syntaxhighlight> =={{header\|Java}}== Line 3,166 ⟶ 3,502: ====LispTokenizer.java==== <~~lang~~syntaxhighlight lang="java">package jfkbits; import java.io.BufferedReader; Line 3,276 ⟶ 3,612: { } }</~~lang~~syntaxhighlight> ====Token.java==== <~~lang~~syntaxhighlight lang="java">package jfkbits; import java.io.StreamTokenizer; Line 3,307 ⟶ 3,643: } } }</~~lang~~syntaxhighlight> ====Atom.java==== <~~lang~~syntaxhighlight lang="java">package jfkbits; import jfkbits.LispParser.Expr; Line 3,326 ⟶ 3,662: } }</~~lang~~syntaxhighlight> ====StringAtom.java==== <~~lang~~syntaxhighlight lang="java">package jfkbits; public class StringAtom extends Atom Line 3,349 ⟶ 3,685: } } </syntaxhighlight> ~~</lang>~~ ====ExprList.java==== <~~lang~~syntaxhighlight lang="java">package jfkbits; import java.util.AbstractCollection; Line 3,422 ⟶ 3,758: } }</~~lang~~syntaxhighlight> ====LispParser.java==== <~~lang~~syntaxhighlight lang="java">package jfkbits; Line 3,472 ⟶ 3,808: } </syntaxhighlight> ~~</lang>~~ ====LispParserDemo.java==== <~~lang~~syntaxhighlight lang="java">import jfkbits.ExprList; import jfkbits.LispParser; import jfkbits.LispParser.ParseException; Line 3,500 ⟶ 3,836: } } }</~~lang~~syntaxhighlight> =={{header\|JavaScript}}== (for a '''bug-fix''' concerning \" and \n in strings see the [[Talk:S-expressions#JavaScript_version_bugfix_for_%5C%22_and_%5Cn_in_strings\|Discussion]]) ~~<lang JavaScript>String.prototype.parseSexpr = function() {~~ ===Procedural=== <syntaxhighlight lang="javascript">String.prototype.parseSexpr = function() { var t = this.match(/\s("[^"]"\|\(\|\)\|"\|[^\s()"]+)/g) for (var o, c=0, i=t.length-1; i>=0; i--) { Line 3,540 ⟶ 3,878: document.write('Invalid s-expr!', '<br>') else document.write('s-expr:<br>', sexpr, '<br><br>', sexpr.constructor != Array ? '' : 'pretty print:<br>' + sexpr.toPretty())</~~lang~~syntaxhighlight> {{~~output~~out}} <pre>text: ~~text:~~ ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) Line 3,569 ⟶ 3,906: ) ) )</pre> ) ===Functional=== Showing the parse tree in an indented JSON format, and writing out a reserialization: <syntaxhighlight lang="javascript">(() => { "use strict"; // ------------------ S-EXPRESSIONS ------------------ const main = () => { const expr = [ "((data \"quoted data\" 123 4.5)", " (data (!@# (4.5) \"(more\" \"data)\")))" ] .join("\n"); const [parse, residue] = parseExpr( tokenized(expr) ); return 0 < residue.length ? `Unparsed tokens: ${JSON.stringify(residue)}` : 0 < parse.length ? [ JSON.stringify(parse, null, 2), "Reserialized from parse:", parse.map(serialized).join(" ") ] .join("\n\n") : "Could not be parsed"; }; // ---------------- EXPRESSION PARSER ---------------- // parseExpr [String] -> ([Expr], [String]) const parseExpr = tokens => // A tuple of (parsed trees, residual tokens) // derived from a list of tokens. until(finished)(readToken)([ [], tokens ]); // finished :: ([Expr], [String]) -> Bool const finished = ([, tokens]) => // True if no tokens remain, or the next // closes a sub-expression. 0 === tokens.length \|\| ")" === tokens[0]; // readToken :: ([Expr], [String]) -> ([Expr], [String]) const readToken = ([xs, tokens]) => { // A tuple of enriched expressions and // depleted tokens. const [token, ...ts] = tokens; // An open bracket introduces recursion over // a sub-expression to define a sub-list. return "(" === token ? (() => { const [expr, rest] = parseExpr(ts); return [xs.concat([expr]), rest.slice(1)]; })() : ")" === token ? [xs, token] : [xs.concat(atom(token)), ts]; }; // ------------------- ATOM PARSER ------------------- // atom :: String -> Expr const atom = s => 0 < s.length ? isNaN(s) ? "\"'".includes(s[0]) ? s.slice(1, -1) : {name: s} : parseFloat(s, 10) : ""; // ------------------ TOKENIZATION ------------------- // tokenized :: String -> [String] const tokenized = s => // Brackets and quoted or unquoted atomic strings. quoteTokens("\"")(s).flatMap( segment => "\"" !== segment[0] ? segment.replace(/([()])/gu, " $1 ") .split(/\s+/u) .filter(Boolean) : [segment] ); // quoteTokens :: Char -> String -> [String] const quoteTokens = q => // Alternating unquoted and quoted segments. s => s.split(q).flatMap( (k, i) => even(i) ? 0 < k.length ? [k] : [] : [`${q}${k}${q}`] ); // ------------------ SERIALIZATION ------------------ // serialized :: Expr -> String const serialized = e => { const t = typeof e; return "number" === t ? `${e}` : "string" === t ? `"${e}"` : "object" === t ? Array.isArray(e) ? `(${e.map(serialized).join(" ")})` : e.name : "?"; }; // --------------------- GENERIC --------------------- // even :: Int -> Bool const even = n => // True if 2 is a factor of n. 0 === n % 2; // until :: (a -> Bool) -> (a -> a) -> a -> a const until = p => // The value resulting from repeated applications // of f to the seed value x, terminating when // that result returns true for the predicate p. f => { const go = x => p(x) ? x : go(f(x)); return go; }; return main(); })();</syntaxhighlight> {{Out}} <pre>[ [ [ { "name": "data" }, "quoted data", 123, 4.5 ], [ { "name": "data" }, [ { "name": "!@#" }, [ 4.5 ], "(more", "data)" ] ] ] ] Reserialized from parse: ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))</pre> =={{header\|jq}}== {{works with\|jq}} '''Also works with gojq, the Go implementation of jq''' [[Category:PEG]] This entry is based on a Parsing Expression Grammar (PEG) for S-expressions. The idea is to pass a JSON object `{remainder:_, result:_ }` through a jq pipeline corresponding to a PEG for S-expressions, consuming the text in `.remainder` and building up `.result`. For further details about this approach, see e.g. [https://github.com/stedolan/jq/wiki/Parsing-Expression-Grammars jq as a PEG Engine]. <syntaxhighlight lang=jq> # PEG infrastructure def star(E): ((E \| star(E)) // .) ; ### Helper functions: # Consume a regular expression rooted at the start of .remainder, or emit empty; # on success, update .remainder and set .match but do NOT update .result def consume($re): # on failure, match yields empty (.remainder \| match("^" + $re)) as $match \| .remainder \|= .[$match.length :] \| .match = $match.string; def parse($re): consume($re) \| .result = .result + [.match] ; def parseNumber($re): consume($re) \| .result = .result + [.match\|tonumber] ; def eos: select(.remainder == ""); # whitespace def ws: consume("[ \t\r\n]"); def box(E): ((.result = null) \| E) as $e \| .remainder = $e.remainder \| .result += [$e.result] # the magic sauce ; # S-expressions # Input: a string # Output: an array representation of the input if it is an S-expression def SExpression: def string: consume("\"") \| parse("[^\"]") \| consume("\""); def identifier: parse("[^ \t\n\r()]+"); def decimal: parseNumber("[0-9]+([.][0-9])?"); def hex: parse("0x[0-9A-Fa-f]+") ; def number: hex // decimal; def atom: ws \| (string // number // identifier); def SExpr: ws \| consume("[(]") \| ws \| box(star(atom // SExpr)) \| consume("[)]"); {remainder: .} \| SExpr \| ws \| eos \| .result; SExpression </syntaxhighlight> '''Invocation:''' <pre> cat << EOF \| ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) EOF jq -Rsc -f s-expression.jq </pre> {{output}} <pre> [[["data","\"quoted","data\"",123,4.5],["data",["!@#",[4.5],"\"",["more\"","\"data"],"\""]]]] </pre> =={{header\|Julia}}== <~~lang~~syntaxhighlight lang="julia"> function rewritequotedparen(s) segments = split(s, "\"") Line 3,648 ⟶ 4,237: println("The processed native structure is:\n", nat) println("The reconstructed string is:\n"), printAny(nat) </syntaxhighlight> ~~</lang>~~ {{output}}<pre> The input string is: Line 3,664 ⟶ 4,253: =={{header\|Kotlin}}== {{trans\|JavaScript}} <~~lang~~syntaxhighlight lang="groovy">// version 1.2.31 const val INDENT = 2 Line 3,761 ⟶ 4,350: tokens2.prettyPrint() } }</~~lang~~syntaxhighlight> {{out}} Line 3,819 ⟶ 4,408: Tested with Lua 5.3.5 and LPeg 1.0.2-1. <~~lang~~syntaxhighlight lang="lua">lpeg = require 'lpeg' -- see http://www.inf.puc-rio.br/~roberto/lpeg/ imports = 'P R S C V match' Line 3,845 ⟶ 4,434: 'S'; S = ws lpar * C((atom + V'S')^0) * rpar / tolist }</~~lang~~syntaxhighlight> Now to use the <i>sexpr</i> pattern: <~~lang~~syntaxhighlight lang="lua">eg_input = [[ ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) Line 3,873 ⟶ 4,462: check(eg_produced, eg_expected) print("checks out!") -- won't get here if any <i>check()</i> assertion fails </syntaxhighlight> ~~</lang>~~ And here's the pretty printer, whose output looks like all the others: <~~lang~~syntaxhighlight lang="lua">function pprint(expr, indent) local function prindent(fmt, expr) io.write(indent) -- no line break Line 3,902 ⟶ 4,491: end pprint(eg_expected, '')</~~lang~~syntaxhighlight> =={{header\|Nim}}== <~~lang~~syntaxhighlight lang="nim">import strutils const Input = """ Line 4,050 ⟶ 4,639: else: nil echo parse()</~~lang~~syntaxhighlight> {{out}} Line 4,062 ⟶ 4,651: =={{header\|OCaml}}== You may be interested in this [https://dev.realworldocaml.org/data-serialization.html chapter of the book Real World OCaml]. ~~You may be interested by [https://realworldocaml.org/v1/en/html/data-serialization-with-s-expressions.html this chapter of the book Real World Ocaml]~~ The file <code>SExpr.mli</code> containing the interface: <~~lang~~syntaxhighlight lang="ocaml">(** This module is a very simple parsing library for S-expressions. ) ( Copyright (C) 2009 Florent Monnier, released under MIT license. ) Line 4,095 ⟶ 4,683: val string_of_sexpr_indent : sexpr list -> string (* same than [string_of_sexpr] but with indentation )</~~lang~~syntaxhighlight> The file <code>SExpr.ml</code> containing the implementation: <~~lang~~syntaxhighlight lang="ocaml">(* This module is a very simple parsing library for S-expressions. ) ( Copyright (C) 2009 Florent Monnier, released under MIT license. ) ( modified to match the task description ) Line 4,276 ⟶ 4,864: let print_sexpr_indent s = print_endline (string_of_sexpr_indent s)</~~lang~~syntaxhighlight> Then we compile this small module and test it in the interactive loop: Line 4,314 ⟶ 4,902: =={{header\|Perl}}== <~~lang~~syntaxhighlight lang="perl">#!/usr/bin/perl -w use strict; use warnings; Line 4,368 ⟶ 4,956: ref($_) eq 'ARRAY' ? sexpr2txt($_) : $$_ } @{$_[0]} ]})} }</~~lang~~syntaxhighlight> Check: <~~lang~~syntaxhighlight lang="perl">my $s = sexpr(q{ ((data "quoted data" 123 4.5) Line 4,382 ⟶ 4,970: # Convert back print sexpr2txt($s)."\n";</~~lang~~syntaxhighlight> Output: <pre>$VAR1 = [ Line 4,411 ⟶ 4,999: that may not be clear on the display: 4e-5 and 4-e5 may appear similar but the latter is probably a parse failure. It may be more sensible for get_term() to raise an error if the scanf fails, than assume it is a symbol like it does now. <!--<syntaxhighlight lang="phix">(phixonline)--> ~~<lang Phix>constant s_expr_str = """~~ <span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span> ~~((data "quoted data" 123 4.5)~~ <span style="color: #008080;">constant</span> <span style="color: #000000;">s_expr_str</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">""" ~~(data (!@# (4.5) "(more" "data)")))"""~~ ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))"""</span> ~~function skip_spaces(string s, integer sidx)~~ ~~while sidx<=length(s) and find(s[sidx]," \t\r\n") do sidx += 1 end while~~ <span style="color: #008080;">function</span> <span style="color: #000000;">skip_spaces</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">sidx</span><span style="color: #0000FF;">)</span> ~~return sidx~~ <span style="color: #008080;">while</span> <span style="color: #000000;">sidx</span><span style="color: #0000FF;"><=</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">and</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">],{</span><span style="color: #008000;">' '</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\t'</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\r'</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\n'</span><span style="color: #0000FF;">})</span> <span style="color: #008080;">do</span> <span style="color: #000000;">sidx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> <span style="color: #008080;">end</span> <span style="color: #008080;">while</span> ~~end function~~ <span style="color: #008080;">return</span> <span style="color: #000000;">sidx</span> <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> ~~function get_term(string s, integer sidx)~~ ~~-- get a single quoted string, symbol, or number.~~ <span style="color: #008080;">function</span> <span style="color: #000000;">get_term</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">sidx</span><span style="color: #0000FF;">)</span> ~~integer ch = s[sidx]~~ <span style="color: #000080;font-style:italic;">-- get a single quoted string, symbol, or number.</span> ~~string res = ""~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">ch</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">]</span> ~~if ch='\"' then~~ <span style="color: #004080;">string</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">""</span> ~~res &= ch~~ <span style="color: #008080;">if</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">=</span><span style="color: #008000;">'\"'</span> <span style="color: #008080;">then</span> ~~while 1 do~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">ch</span> ~~sidx += 1~~ <span style="color: #008080;">while</span> <span style="color: #000000;">1</span> <span style="color: #008080;">do</span> ~~ch = s[sidx]~~ <span style="color: #000000;">sidx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~res &= ch~~ <span style="color: #000000;">ch</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">]</span> ~~if ch='\\' then~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">ch</span> ~~sidx += 1~~ <span style="color: #008080;">if</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">=</span><span style="color: #008000;">'\\'</span> <span style="color: #008080;">then</span> ~~ch = s[sidx]~~ <span style="color: #000000;">sidx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~res &= ch~~ <span style="color: #000000;">ch</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">]</span> ~~elsif ch='\"' then~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">ch</span> ~~sidx += 1~~ <span style="color: #008080;">elsif</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">=</span><span style="color: #008000;">'\"'</span> <span style="color: #008080;">then</span> ~~exit~~ <span style="color: #000000;">sidx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~end if~~ <span style="color: #008080;">exit</span> ~~end while~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~else~~ <span style="color: #008080;">end</span> <span style="color: #008080;">while</span> ~~integer asnumber = (ch>='0' and ch<='9')~~ <span style="color: #008080;">else</span> ~~while not find(ch,") \t\r\n") do~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">asnumber</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">>=</span><span style="color: #008000;">'0'</span> <span style="color: #008080;">and</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;"><=</span><span style="color: #008000;">'9'</span><span style="color: #0000FF;">)</span> ~~res &= ch~~ <span style="color: #008080;">while</span> <span style="color: #008080;">not</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,{</span><span style="color: #008000;">')'</span><span style="color: #0000FF;">,</span><span style="color: #008000;">' '</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\t'</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\r'</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\n'</span><span style="color: #0000FF;">})</span> <span style="color: #008080;">do</span> ~~sidx += 1~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">ch</span> ~~if sidx>length(s) then exit end if~~ <span style="color: #000000;">sidx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~ch = s[sidx]~~ <span style="color: #008080;">if</span> <span style="color: #000000;">sidx</span><span style="color: #0000FF;">></span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span> <span style="color: #008080;">exit</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end while~~ <span style="color: #000000;">ch</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">]</span> ~~if asnumber then~~ <span style="color: #008080;">end</span> <span style="color: #008080;">while</span> ~~sequence scanres = scanf(res,"%f")~~ <span style="color: #008080;">if</span> <span style="color: #000000;">asnumber</span> <span style="color: #008080;">then</span> ~~if length(scanres)=1 then return {scanres[1][1],sidx} end if~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">scanres</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">scanf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%f"</span><span style="color: #0000FF;">)</span> ~~-- error? (failed to parse number)~~ <span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">scanres</span><span style="color: #0000FF;">)=</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span> <span style="color: #008080;">return</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">scanres</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">][</span><span style="color: #000000;">1</span><span style="color: #0000FF;">],</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">}</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end if~~ <span style="color: #000080;font-style:italic;">-- error? (failed to parse number)</span> ~~end if~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~return {res,sidx}~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end function~~ <span style="color: #008080;">return</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">}</span> <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> ~~function parse_s_expr(string s, integer sidx)~~ ~~integer ch = s[sidx]~~ <span style="color: #008080;">function</span> <span style="color: #000000;">parse_s_expr</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">sidx</span><span style="color: #0000FF;">)</span> ~~sequence res = {}~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">ch</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">]</span> ~~object element~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span> ~~if ch!='(' then ?9/0 end if~~ <span style="color: #004080;">object</span> <span style="color: #000000;">element</span> ~~sidx += 1~~ <span style="color: #008080;">if</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">!=</span><span style="color: #008000;">'('</span> <span style="color: #008080;">then</span> <span style="color: #0000FF;">?</span><span style="color: #000000;">9</span><span style="color: #0000FF;">/</span><span style="color: #000000;">0</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~while 1 do~~ <span style="color: #000000;">sidx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~sidx = skip_spaces(s,sidx)~~ <span style="color: #008080;">while</span> <span style="color: #000000;">1</span> <span style="color: #008080;">do</span> ~~-- error? (if past end of string/missing ')')~~ <span style="color: #000000;">sidx</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">skip_spaces</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">)</span> ~~ch = s[sidx]~~ <span style="color: #000080;font-style:italic;">-- error? (if past end of string/missing ')')</span> ~~if ch=')' then exit end if~~ <span style="color: #000000;">ch</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">]</span> ~~if ch='(' then~~ <span style="color: #008080;">if</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">=</span><span style="color: #008000;">')'</span> <span style="color: #008080;">then</span> <span style="color: #008080;">exit</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~{element,sidx} = parse_s_expr(s,sidx)~~ <span style="color: #008080;">if</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">=</span><span style="color: #008000;">'('</span> <span style="color: #008080;">then</span> ~~else~~ <span style="color: #0000FF;">{</span><span style="color: #000000;">element</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">parse_s_expr</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">)</span> ~~{element,sidx} = get_term(s,sidx)~~ <span style="color: #008080;">else</span> ~~end if~~ <span style="color: #0000FF;">{</span><span style="color: #000000;">element</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">get_term</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">)</span> ~~res = append(res,element)~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end while~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #000000;">element</span><span style="color: #0000FF;">)</span> ~~sidx = skip_spaces(s,sidx+1)~~ <span style="color: #008080;">end</span> <span style="color: #008080;">while</span> ~~return {res,sidx}~~ <span style="color: #000000;">sidx</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">skip_spaces</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">+</span><span style="color: #000000;">1</span><span style="color: #0000FF;">)</span> ~~end function~~ <span style="color: #008080;">return</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">}</span> <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> ~~sequence s_expr~~ ~~integer sidx~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">s_expr</span> ~~{s_expr,sidx} = parse_s_expr(s_expr_str,1)~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">sidx</span> ~~if sidx<=length(s_expr_str) then~~ <span style="color: #0000FF;">{</span><span style="color: #000000;">s_expr</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">parse_s_expr</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s_expr_str</span><span style="color: #0000FF;">,</span><span style="color: #000000;">1</span><span style="color: #0000FF;">)</span> ~~printf(1,"incomplete parse(\"%s\")\n",{s_expr_str[sidx..$]})~~ <span style="color: #008080;">if</span> <span style="color: #000000;">sidx</span><span style="color: #0000FF;"><=</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s_expr_str</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span> ~~end if~~ <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"incomplete parse(\"%s\")\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">s_expr_str</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sidx</span><span style="color: #0000FF;">..$]})</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~puts(1,"\nThe string:\n")~~ ~~?s_expr_str~~ <span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\nThe string:\n"</span><span style="color: #0000FF;">)</span> <span style="color: #0000FF;">?</span><span style="color: #000000;">s_expr_str</span> ~~puts(1,"\nDefault pretty printing:\n")~~ ~~--?s_expr~~ <span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\nDefault pretty printing:\n"</span><span style="color: #0000FF;">)</span> ~~pp(s_expr)~~ <span style="color: #000080;font-style:italic;">--?s_expr</span> <span style="color: #7060A8;">pp</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s_expr</span><span style="color: #0000FF;">)</span> ~~puts(1,"\nBespoke pretty printing:\n")~~ ~~--ppEx(s_expr,{pp_Nest,1,pp_StrFmt,-1,pp_IntCh,false,pp_Brkt,"()"})~~ <span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\nBespoke pretty printing:\n"</span><span style="color: #0000FF;">)</span> ~~ppEx(s_expr,{pp_Nest,4,pp_StrFmt,-1,pp_IntCh,false,pp_Brkt,"()"})</lang>~~ <span style="color: #000080;font-style:italic;">--ppEx(s_expr,{pp_Nest,1,pp_StrFmt,-1,pp_IntCh,false,pp_Brkt,"()"})</span> <span style="color: #7060A8;">ppEx</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s_expr</span><span style="color: #0000FF;">,{</span><span style="color: #004600;">pp_Nest</span><span style="color: #0000FF;">,</span><span style="color: #000000;">4</span><span style="color: #0000FF;">,</span><span style="color: #004600;">pp_StrFmt</span><span style="color: #0000FF;">,-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #004600;">pp_IntCh</span><span style="color: #0000FF;">,</span><span style="color: #004600;">false</span><span style="color: #0000FF;">,</span><span style="color: #004600;">pp_Brkt</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"()"</span><span style="color: #0000FF;">})</span> <!--</syntaxhighlight>--> {{out}} <pre> Line 4,518 ⟶ 5,109: =={{header\|PicoLisp}}== The '[http://software-lab.de/doc/refA.html#any any]' function parses an s-expression from a string (indentical to the way '[http://software-lab.de/doc/refR.html#read read]' does this from an input stream). <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">: (any "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))") -> ((data "quoted data" 123 5) (data (!@# (5) "(more" "data)"))) Line 4,538 ⟶ 5,129: +-- "(more" \| +-- "data)"</~~lang~~syntaxhighlight> Implementing a subset of 'any' explicitly: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">(de readSexpr () (case (skip) ("(" (char) (readList)) Line 4,565 ⟶ 5,156: (until (or (sp? (peek)) (member (peek) '("(" ")"))) (link (char)) ) ) (or (format X) (intern (pack X))) ) )</~~lang~~syntaxhighlight> It can be used in a pipe to read from a string: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">: (pipe (prin "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))") (readSexpr)) -> ((data "quoted data" 123 5) (data (!@# (5) "(more" "data)")))</~~lang~~syntaxhighlight> '[http://software-lab.de/doc/refS.html#sym sym]' does the reverse (i.e. builds a symbol (string) from an expression). <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">: (sym @@) -> "((data \"quoted data\" 123 5) (data (!@# (5) \"(more\" \"data)\")))"</~~lang~~syntaxhighlight> Implementing a subset of the built-in printer: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">(de printSexpr (Expr Fun) (cond ((pair Expr) Line 4,586 ⟶ 5,177: (mapc Fun (chop Expr)) (Fun "\"") ) (T (mapc Fun (chop Expr))) ) )</~~lang~~syntaxhighlight> This can be used for plain printing <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">: (printSexpr '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) prin ) ((data "quoted data" 123 5) (data (!@# (5) "(more" "data)")))</~~lang~~syntaxhighlight> or to collect the characters into a string: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">: (pack (make (printSexpr '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) link ) ) ) -> "((data \"quoted data\" 123 5) (data (!@# (5) \"(more\" \"data)\")))"</~~lang~~syntaxhighlight> =={{header\|Pike}}== <~~lang~~syntaxhighlight lang="pike">class Symbol(string name) { string _sprintf(int type) Line 4,706 ⟶ 5,297: string input = "((data \"quoted data\" 123 4.5)\n (data (!@# (4.5) \"(more\" \"data)\")))"; array data = group(tokenizer(input))[0]; string output = sexp(data);</~~lang~~syntaxhighlight> Output: Line 4,718 ⟶ 5,309: =={{header\|Potion}}== How values are stored: Tuples for list, integers for integers, floats for floats, strings for symbols, quoted strings for strings. This implementation is not the most elegant/succinct or practical (it's trusty and has no real error handling). <~~lang~~syntaxhighlight lang="potion">isdigit = (c): 47 < c ord and c ord < 58. iswhitespace = (c): c ord == 10 or c ord == 13 or c == " ". Line 4,789 ⟶ 5,380: parsesexpr("((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))") string print "\n" print</~~lang~~syntaxhighlight> =={{header\|Python}}== ===Procedural=== ~~<lang python>import re~~ <syntaxhighlight lang="python">import re dbg = False Line 4,851 ⟶ 5,443: print("\nParsed to Python:", parsed) print("\nThen back to: '%s'" % print_sexp(parsed))</~~lang~~syntaxhighlight> ;Output: Line 4,862 ⟶ 5,454: ;Simpler parser: Note that in the example above the parser also recognises and changes the type of some tokens as well as generating a nested list. If that functionality is not needed, or better done elsewhere, then the parse function can be achieved more simply by just applying the regexp: <~~lang~~syntaxhighlight lang="python">>>> from pprint import pprint as pp >>> x = [[(t,v) for t,v in termtypes.groupdict().items() if v][0] for termtypes in re.finditer(term_regex, sexp)] >>> pp(x) Line 4,884 ⟶ 5,476: ('brackr', ')'), ('brackr', ')')] >>> </~~lang~~syntaxhighlight> ===Functional=== Composing functionally, and writing out a tree diagram, and a serialization, of the parse. <syntaxhighlight lang="python">'''S-expressions''' from itertools import chain, repeat import re def main(): '''Sample s-expression parsed, diagrammed, and reserialized from the parse tree. ''' expr = "((data \"quoted data\" 123 4.5)\n" + ( " (data (!@# (4.5) \"(more\" \"data)\")))" ) parse = parseExpr(tokenized(expr))[0] print( drawForest([ fmapTree(str)(tree) for tree in forestFromExprs(parse) ]) ) print( f'\nReserialized from parse:\n\n{serialized(parse)}' ) # ----------------- S-EXPRESSION PARSER ------------------ # parseExpr :: [String] -> ([Expr], [String] def parseExpr(tokens): '''A tuple of a nested list with any unparsed tokens that remain. ''' return until(finished)(parseToken)( ([], tokens) ) # finished :: ([Expr], [String]) -> Bool def finished(xr): '''True if no tokens remain, or the next token is a closing bracket. ''' r = xr[1] return (not r) or (r[0] == ")") # parseToken :: ([Expr], [String]) -> ([Expr], [String]) def parseToken(xsr): '''A tuple of an expanded expression list and a reduced token list. ''' xs, r = xsr h, t = r if "(" == h: expr, rest = parseExpr(t) return xs + [expr], rest[1:] else: return (xs, t) if ")" == h else ( xs + [atom(h)], t ) # --------------------- ATOM PARSER ---------------------- # atom :: String -> Expr def atom(s): '''A Symbol, String, Float, or Int derived from s. Symbol is represented as a dict with a 'name' key. ''' def n(k): return float(k) if '.' in k else int(k) return s if '"' == s[0] else ( n(s) if s.replace('.', '', 1).isdigit() else { "name": s } ) # --------------------- TOKENIZATION --------------------- # tokenized :: String -> [String] def tokenized(s): '''A list of the tokens in s. ''' return list(chain.from_iterable(map( lambda token: [token] if '"' == token[0] else ( x for x in re.split( r'\s+', re.sub(r"([()])", r" \1 ", token) ) if x ) if token else [], ( x if (0 == i % 2) else f'"{x}"' for (i, x) in enumerate(s.split('"')) ) ))) # -------------------- SERIALIZATION --------------------- # serialized :: Expr -> String def serialized(e): '''An s-expression written out from the parse tree. ''' k = typename(e) return str(e) if k in ['int', 'float', 'str'] else ( ( f'({" ".join([serialized(x) for x in e])})' if ( (1 < len(e)) or ('list' != typename(e[0])) ) else serialized(e[0]) ) if 'list' == k else ( e.get("name") if 'dict' == k else "?" ) ) # typename :: a -> String def typename(x): '''Name property of the type of a value.''' return type(x).__name__ # ------------------- TREE DIAGRAMMING ------------------- # Node :: a -> [Tree a] -> Tree a def Node(v): '''Constructor for a Tree node which connects a value of some kind to a list of zero or more child trees. ''' return lambda xs: {'type': 'Tree', 'root': v, 'nest': xs} # append :: [a] -> [a] -> [a] def append(a, b): '''Concatenation.''' return a + b # draw :: Tree a -> [String] def draw(node): '''List of the lines of an ASCII diagram of a tree. ''' def shift_(h, other, xs): return list(map( append, chain( [h], ( repeat(other, len(xs) - 1) ) ), xs )) def drawSubTrees(xs): return ( ( ['\|'] + shift_( '├─ ', '│ ', draw(xs[0]) ) + drawSubTrees(xs[1:]) ) if 1 < len(xs) else ['\|'] + shift_( '└─ ', ' ', draw(xs[0]) ) ) if xs else [] return (root(node)).splitlines() + ( drawSubTrees(nest(node)) ) # drawForest :: [Tree String] -> String def drawForest(trees): '''A simple unicode character representation of a list of trees. ''' return '\n'.join(map(drawTree, trees)) # drawTree :: Tree a -> String def drawTree(tree): '''ASCII diagram of a tree.''' return '\n'.join(draw(tree)) # fmapTree :: (a -> b) -> Tree a -> Tree b def fmapTree(f): '''A new tree holding the results of an application of f to each root in the existing tree. ''' def go(x): return Node( f(root(x)) )([go(v) for v in nest(x)]) return go # forestFromExprs :: [Expr] -> [Tree Expr] def forestFromExprs(es): '''A list of expressions rewritten as a forest. ''' return [treeFromExpr(x) for x in es] # nest :: Tree a -> [Tree a] def nest(t): '''Accessor function for children of tree node.''' return t.get('nest') # root :: Tree a -> a def root(t): '''Accessor function for data of tree node.''' return t.get('root') # treeFromExprs :: Expr -> Tree Expr def treeFromExpr(e): '''An expression rewritten as a tree. ''' return ( Node({"name": "List"})(forestFromExprs(e)) ) if type(e) is list else ( Node(e)([]) ) # ----------------------- GENERIC ------------------------ # until :: (a -> Bool) -> (a -> a) -> a -> a def until(p): '''The result of repeatedly applying f until p holds. The initial seed value is x. ''' def go(f): def loop(x): v = x while not p(v): v = f(v) return v return loop return go # MAIN --- if __name__ == '__main__': main()</syntaxhighlight> {{Out}} <pre>{'name': 'List'} \| ├─ {'name': 'List'} │ \| │ ├─ {'name': 'data'} │ \| │ ├─ "quoted data" │ \| │ ├─ 123 │ \| │ └─ 4.5 \| └─ {'name': 'List'} \| ├─ {'name': 'data'} \| └─ {'name': 'List'} \| ├─ {'name': '!@#'} \| ├─ {'name': 'List'} │ \| │ └─ 4.5 \| ├─ "(more" \| └─ "data)" Reserialized from parse: ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))</pre> =={{header\|Racket}}== Racket has builtin support for S-expressions in the form of the read function. <~~lang~~syntaxhighlight lang="racket"> #lang racket (define input Line 4,899 ⟶ 5,774: (read (open-input-string input)) </syntaxhighlight> ~~</lang>~~ Output: <pre> Line 4,911 ⟶ 5,786: This parses the task, but it isn't really a good lisp parser, because it always wants whitespace between lists, so <code>(()())</code> will fail ( <code>(() ())</code> wont) <syntaxhighlight lang="raku" ~~perl6~~line>grammar S-Exp { rule TOP {^ <s-list> $}; Line 4,953 ⟶ 5,828: say "the expression:\n$s-exp\n"; say "the Raku expression:\n{$raku_array.raku}\n"; say "and back:\n{s-exp_writer($raku_array)}";</~~lang~~syntaxhighlight> {{out}} Line 4,973 ⟶ 5,848: It would normally be considered improper, but the literal string delimiters were left intact; making it much easier to understand what is/was being parsed. <~~lang~~syntaxhighlight lang="rexx">/REXX program parses an S-expression and displays the results to the terminal. / input= '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))' say center('input', length(input), "═") /display the header title to terminal./ Line 5,028 ⟶ 5,903: exit 0 /stick a fork in it, we're all done. / /──────────────────────────────────────────────────────────────────────────────────────/ add!: if !='' then return; #=#+1; @.#=left("", max(0, tabs(level-1)))!; !=; return</~~lang~~syntaxhighlight> {{out\|output\|text=  when using the default input:}} <pre> Line 5,057 ⟶ 5,932: =={{header\|Ruby}}== {{works with\|Ruby\|1.9}} <~~lang~~syntaxhighlight lang="ruby">class SExpr def initialize(str) @original = str Line 5,178 ⟶ 6,053: puts "original sexpr:\n#{sexpr.original}" puts "\nruby data structure:\n#{sexpr.data}" puts "\nand back to S-Expr:\n#{sexpr.to_sexpr}"</~~lang~~syntaxhighlight> {{out}} Line 5,192 ⟶ 6,067: ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) </pre> =={{header\|Rust}}== lib.rs: <syntaxhighlight lang="rust"> //! This implementation isn't based on anything in particular, although it's probably informed by a //! lot of Rust's JSON encoding code. It should be very fast (both encoding and decoding the toy //! example here takes under a microsecond on my machine) and tries to avoid unnecessary allocation. //! //! In a real implementation, most of this would be private, with only a few visible functions, and //! there would be somewhat nicer signatures (in particular, the fact that `ParseContext` has to be //! mutable would get annoying in real code pretty quickly, so it would probably be split out). //! //! It supports the ability to read individual atoms, not just lists, although whether this is //! useful is questionable. //! //! Caveats: Does not support symbols vs. non-symbols (it wouldn't be hard, but it would greatly //! complicate setting up our test structure since we'd have to force it to go through functions //! that checked to make sure `Symbol`s couldn't have spaces, or slow down our parser by checking //! for this information each time, which is obnoxious). Does not support string escaping, because //! the decoding technique doesn't allocate extra space for strings. Does support numbers, but //! only float types (supporting more types is possible but would complicate the code //! significantly). extern crate typed_arena; use typed_arena::Arena; use self::Error::; use self::SExp::; use self::Token::; use std::io; use std::num::FpCategory; use std::str::FromStr; /// The actual `SExp` structure. Supports `f64`s, lists, and string literals. Note that it takes /// everything by reference, rather than owning it--this is mostly done just so we can allocate /// `SExp`s statically (since we don't have to call `Vec`). It does complicate the code a bit, /// requiring us to have a `ParseContext` that holds an arena where lists are actually allocated. #[derive(PartialEq, Debug)] pub enum SExp<'a> { /// Float literal: 0.5 F64(f64), /// List of SExps: ( a b c) List(&'a [SExp<'a>]), /// Plain old string literal: "abc" Str(&'a str), } /// Errors that can be thrown by the parser. #[derive(PartialEq, Debug)] pub enum Error { /// If the float is `NaN`, `Infinity`, etc. NoReprForFloat, /// Missing an end double quote during string parsing UnterminatedStringLiteral, /// Some other kind of I/O error Io, /// ) appeared where it shouldn't (usually as the first token) IncorrectCloseDelimiter, /// Usually means a missing ), but could also mean there were no tokens at all. UnexpectedEOF, /// More tokens after the list is finished, or after a literal if there is no list. ExpectedEOF, } impl From<io::Error> for Error { fn from(_err: io::Error) -> Error { Error::Io } } /// Tokens returned from the token stream. #[derive(PartialEq, Debug)] enum Token<'a> { /// Left parenthesis ListStart, /// Right parenthesis ListEnd, /// String or float literal, quotes removed. Literal(SExp<'a>), /// Stream is out of tokens. Eof, } /// An iterator over a string that yields a stream of Tokens. /// /// Implementation note: it probably seems weird to store first, rest, AND string, since they should /// all be derivable from string. But see below. #[derive(Copy, Clone, Debug)] struct Tokens<'a> { /// The part of the string that still needs to be parsed string: &'a str, /// The first character to parse first: Option<char>, /// The rest of the string after the first character rest: &'a str, } impl<'a> Tokens<'a> { /// Initialize a token stream for a given string. fn new(string: &str) -> Tokens { let mut chars = string.chars(); match chars.next() { Some(ch) => Tokens { string, first: Some(ch), rest: chars.as_str(), }, None => Tokens { string, first: None, rest: string, }, } } /// Utility function to update information in the iterator. It might not be performant to keep /// rest cached, but there are times where we don't know exactly what string is (at least, not /// in a way that we can safely reconstruct it without allocating), so we keep both here. /// With some unsafe code we could probably get rid of one of them (and maybe first, too). fn update(&mut self, string: &'a str) { self.string = string; let mut chars = self.string.chars(); if let Some(ch) = chars.next() { self.first = Some(ch); self.rest = chars.as_str(); } else { self.first = None; }; } /// This is where the lexing happens. Note that it does not handle string escaping. fn next_token(&mut self) -> Result<Token<'a>, Error> { loop { match self.first { // List start Some('(') => { self.update(self.rest); return Ok(ListStart); } // List end Some(')') => { self.update(self.rest); return Ok(ListEnd); } // Quoted literal start Some('"') => { // Split the string at most once. This lets us get a // reference to the next piece of the string without having // to loop through the string again. let mut iter = self.rest.splitn(2, '"'); // The first time splitn is run it will never return None, so this is safe. let str = iter.next().unwrap(); match iter.next() { // Extract the interior of the string without allocating. If we want to // handle string escaping, we would have to allocate at some point though. Some(s) => { self.update(s); return Ok(Literal(Str(str))); } None => return Err(UnterminatedStringLiteral), } } // Plain old literal start Some(c) => { // Skip whitespace. This could probably be made more efficient. if c.is_whitespace() { self.update(self.rest); continue; } // Since we've exhausted all other possibilities, this must be a real literal. // Unlike the quoted case, it's not an error to encounter EOF before whitespace. let mut end_ch = None; let str = { let mut iter = self.string.splitn(2, \|ch: char\| { let term = ch == ')' \|\| ch == '('; if term { end_ch = Some(ch) } term \|\| ch.is_whitespace() }); // The first time splitn is run it will never return None, so this is safe. let str = iter.next().unwrap(); self.rest = iter.next().unwrap_or(""); str }; match end_ch { // self.string will be incorrect in the Some(_) case. The only reason it's // okay is because the next time next() is called in this case, we know it // will be '(' or ')', so it will never reach any code that actually looks // at self.string. In a real implementation this would be enforced by // visibility rules. Some(_) => self.first = end_ch, None => self.update(self.rest), } return Ok(Literal(parse_literal(str))); } None => return Ok(Eof), } } } } /// This is not the most efficient way to do this, because we end up going over numeric literals /// twice, but it avoids having to write our own number parsing logic. fn parse_literal(literal: &str) -> SExp { match literal.bytes().next() { Some(b'0'..=b'9') \| Some(b'-') => match f64::from_str(literal) { Ok(f) => F64(f), Err(_) => Str(literal), }, _ => Str(literal), } } /// Parse context, holds information required by the parser (and owns any allocations it makes) pub struct ParseContext<'a> { /// The string being parsed. Not required, but convenient. string: &'a str, /// Arena holding any allocations made by the parser. arena: Option<Arena<Vec<SExp<'a>>>>, /// Stored in the parse context so it can be reused once allocated. stack: Vec<Vec<SExp<'a>>>, } impl<'a> ParseContext<'a> { /// Create a new parse context from a given string pub fn new(string: &'a str) -> ParseContext<'a> { ParseContext { string, arena: None, stack: Vec::new(), } } } impl<'a> SExp<'a> { /// Serialize a SExp. fn encode<T: io::Write>(&self, writer: &mut T) -> Result<(), Error> { match self { F64(f) => { match f.classify() { // We don't want to identify NaN, Infinity, etc. as floats. FpCategory::Normal \| FpCategory::Zero => { write!(writer, "{}", f)?; Ok(()) } _ => Err(Error::NoReprForFloat), } } List(l) => { // Writing a list is very straightforward--write a left parenthesis, then // recursively call encode on each member, and then write a right parenthesis. The // only reason the logic is as long as it is is to make sure we don't write // unnecessary spaces between parentheses in the zero or one element cases. write!(writer, "(")?; let mut iter = l.iter(); if let Some(sexp) = iter.next() { sexp.encode(writer)?; for sexp in iter { write!(writer, " ")?; sexp.encode(writer)?; } } write!(writer, ")")?; Ok(()) } Str(s) => { write!(writer, "\"{}\"", s)?; Ok(()) } } } /// Deserialize a SExp. pub fn parse(ctx: &'a mut ParseContext<'a>) -> Result<SExp<'a>, Error> { ctx.arena = Some(Arena::new()); // Hopefully this unreachable! gets optimized out, because it should literally be // unreachable. let arena = match ctx.arena { Some(ref mut arena) => arena, None => unreachable!(), }; let ParseContext { string, ref mut stack, .. } = ctx; // Make sure the stack is cleared--we keep it in the context to avoid unnecessary // reallocation between parses (if you need to remember old parse information for a new // list, you can pass in a new context). stack.clear(); let mut tokens = Tokens::new(string); // First, we check the very first token to see if we're parsing a full list. It // simplifies parsing a lot in the subsequent code if we can assume that. let next = tokens.next_token(); let mut list = match next? { ListStart => Vec::new(), Literal(s) => { return if tokens.next_token()? == Eof { Ok(s) } else { Err(ExpectedEOF) }; } ListEnd => return Err(IncorrectCloseDelimiter), Eof => return Err(UnexpectedEOF), }; // We know we're in a list if we got this far. loop { let tok = tokens.next_token(); match tok? { ListStart => { // We push the previous context onto our stack when we start reading a new list. stack.push(list); list = Vec::new() } Literal(s) => list.push(s), // Plain old literal, push it onto the current list ListEnd => { match stack.pop() { // Pop the old context off the stack on list end. Some(mut l) => { // We allocate a slot for the current list in our parse context (needed // for safety) before pushing it onto its parent list. l.push(List(&arena.alloc(list))); // Now reset the current list to the parent list list = l; } // There was nothing on the stack, so we're at the end of the topmost list. // The check to make sure there are no more tokens is required for // correctness. None => { return match tokens.next_token()? { Eof => Ok(List(&arena.alloc(list))), _ => Err(ExpectedEOF), }; } } } // We encountered an EOF before the list ended--that's an error. Eof => return Err(UnexpectedEOF), } } } /// Convenience method for the common case where you just want to encode a SExp as a String. pub fn buffer_encode(&self) -> Result<String, Error> { let mut m = Vec::new(); self.encode(&mut m)?; // Because encode() only ever writes valid UTF-8, we can safely skip the secondary check we // normally have to do when converting from Vec<u8> to String. If we didn't know that the // buffer was already UTF-8, we'd want to call container_as_str() here. unsafe { Ok(String::from_utf8_unchecked(m)) } } } pub const SEXP_STRUCT: SExp<'static> = List(&[ List(&[Str("data"), Str("quoted data"), F64(123.), F64(4.5)]), List(&[ Str("data"), List(&[Str("!@#"), List(&[F64(4.5)]), Str("(more"), Str("data)")]), ]), ]); pub const SEXP_STRING_IN: &str = r#"((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))"#; and main.rs: use s_expressions::{ParseContext, SExp, SEXP_STRING_IN, SEXP_STRUCT}; fn main() { println!("{:?}", SEXP_STRUCT.buffer_encode()); let ctx = &mut ParseContext::new(SEXP_STRING_IN); println!("{:?}", SExp::parse(ctx)); } </syntaxhighlight>{{out}} <pre> Ok("((\"data\" \"quoted data\" 123 4.5) (\"data\" (\"!@#\" (4.5) \"(more\" \"data)\")))") Ok(List([List([Str("data"), Str("quoted data"), F64(123.0), F64(4.5)]), List([Str("data"), List([Str("!@#"), List([F64(4.5)]), Str("(more"), Str("data)")])])])) </pre> =={{header\|Scheme}}== Line 5,203 ⟶ 6,478: Using guile scheme 2.0.11 <~~lang~~syntaxhighlight lang="scheme">(define (sexpr-read port) (define (help port) (let ((char (read-char port))) Line 5,246 ⟶ 6,521: (format-sexpr (sexpr-read (open-input-string "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))")))</~~lang~~syntaxhighlight> Output: Line 5,271 ⟶ 6,546: =={{header\|Sidef}}== {{trans\|Perl}} <~~lang~~syntaxhighlight lang="ruby">func sexpr(txt) { txt.trim! Line 5,320 ⟶ 6,595: say s # dump structure say sexpr2txt(s) # convert back</~~lang~~syntaxhighlight> {{out}} <pre> Line 5,329 ⟶ 6,604: =={{header\|Tcl}}== Note that because Tcl doesn't expose a type system (well, not in a conventional sense) the parts of the parsed out data structure are tagged lists; the first element is one of “<tt>string</tt>”, “<tt>int</tt>”, “<tt>real</tt>” and “<tt>atom</tt>” to indicate a leaf token, or “<tt>list</tt>” to indicate a sublist. A “native” data structure could also be generated, but then that would turn things into lists that are not in the original. <~~lang~~syntaxhighlight lang="tcl">package require Tcl 8.5 proc fromSexp {str} { Line 5,377 ⟶ 6,652: return [lindex $content 0] } }</~~lang~~syntaxhighlight> Demonstrating with the sample data: <~~lang~~syntaxhighlight lang="tcl">set sample {((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))} set parsed [fromSexp $sample] puts "sample: $sample" puts "parsed: $parsed" puts "regen: [toSexp $parsed]"</~~lang~~syntaxhighlight> Output: <pre> Line 5,430 ⟶ 6,705: Code: <~~lang~~syntaxhighlight lang="txr">@(define float (f))@\ @(local (tok))@\ @(cases)@\ Line 5,473 ⟶ 6,748: expr: @(format nil "~s" e) junk: @junk @(end)</~~lang~~syntaxhighlight> Run: Line 5,508 ⟶ 6,783: Explanation of most confusing line: <~~lang~~syntaxhighlight lang="txr"> @/\s\(\s/@(coll :vars (e))@(expr e)@/\s/@(last))@(end)</~~lang~~syntaxhighlight> First, we match an open parenthesis that can be embedded in whitespace. Then we have a <code>@(coll)</code> construct which terminates with <code>@(end)</code>. This is a repetition construct for collecting zero or more items. The <code>:vars (e)</code> argument makes the collect strict: each repetition must bind the variable <code>e</code>. More importantly, in this case, if nothing is collected, then <code>e</code> gets bound to <code>nil</code> (the empty list). The collect construct does not look at context beyond itself. To terminate the collect at the closing parenthesis we use <code>@(last))</code>. The second closing parenthesis here is literal text to be matched, not TXR syntax. This special clause establishes the terminating context without which the collect will munge all input. When the last clause matches, whatever it matches is consumed and the collect ends. (There is a related <code>@(until)</code> clause which terminates the collect, but leaves its own match unconsumed.) =={{header\|Wren}}== {{trans\|Kotlin}} {{libheader\|Wren-pattern}} {{libheader\|Wren-fmt}} <syntaxhighlight lang="wren">import "./pattern" for Pattern import "./fmt" for Fmt var INDENT = 2 var parseSExpr = Fn.new { \|str\| var ipat = " \t\n\f\v\r()\"" var p = Pattern.new("""+0/s["+0^""\|(\|)\|"\|+1/I]""", Pattern.within, ipat) var t = p.findAll(str).map { \|m\| m.text }.toList if (t.count == 0) return null var o = false var c = 0 for (i in t.count-1..0) { var ti = t[i].trim() var nd = Num.fromString(ti) if (ti == "\"") return null if (ti == "(") { t[i] = "[" c = c + 1 } else if (ti == ")") { t[i] = "]" c = c - 1 } else if (nd) { var ni = Num.fromString(ti) t[i] = ni ? ni.toString : nd.toString } else if (ti.startsWith("\"")) { // escape embedded double quotes var temp = ti[1...-1] t[i] = "\"" + temp.replace("\"", "\\\"") + "\"" } if (i > 0 && t[i] != "]" && t[i - 1].trim() != "(") t.insert(i, ", ") if (c == 0) { if (!o) o = true else return null } } return (c != 0) ? null : t } var toSExpr = Fn.new { \|tokens\| for (i in 0...tokens.count) { if (tokens[i] == "[") { tokens[i] = "(" } else if (tokens[i] == "]") { tokens[i] = ")" } else if (tokens[i] == ", ") { tokens[i] = " " } else if (tokens[i].startsWith("\"")) { // unescape embedded quotes var temp = tokens[i][1...-1] tokens[i] = "\"" + temp.replace("\\\"", "\"") + "\"" } } return tokens.join() } var prettyPrint = Fn.new { \|tokens\| var level = 0 for (t in tokens) { var n if (t == ", " \|\| t == " ") { continue } else if (t == "[" \|\| t == "(") { n = level INDENT + 1 level = level + 1 } else if (t == "]" \|\| t == ")") { level = level - 1 n = level * INDENT + 1 } else { n = level * INDENT + t.count } Fmt.print("$*s", n, t) } } var str = """((data "quoted data" 123 4.5)""" + "\n" + """ (data (!@# (4.5) "(more" "data)")))""" var tokens = parseSExpr.call(str) if (!tokens) { System.print("Invalid s-expr!") } else { System.print("Native data structure:") System.print(tokens.join()) System.print("\nNative data structure (pretty print):") prettyPrint.call(tokens) System.print("\nRecovered S-Expression:") System.print(toSExpr.call(tokens)) System.print("\nRecovered S-Expression (pretty print):") prettyPrint.call(tokens) }</syntaxhighlight> {{out}} <pre> Native data structure: [[data, "quoted data", 123, 4.5], [data, [!@#, [4.5], "(more", "data)"]]] Native data structure (pretty print): [ [ data "quoted data" 123 4.5 ] [ data [ !@# [ 4.5 ] "(more" "data)" ] ] ] Recovered S-Expression: ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)"))) Recovered S-Expression (pretty print): ( ( data "quoted data" 123 4.5 ) ( data ( !@# ( 4.5 ) "(more" "data)" ) ) ) </pre> {{omit from\|Brlcad}}