S-expressions

S-Expressions are one convenient way to parse and store data.

Task

Write a simple reader and writer for S-Expressions that handles quoted and unquoted strings, integers and floats.

The reader should read a single but nested S-Expression from a string and store it in a suitable datastructure (list, array, etc).

Newlines and other whitespace may be ignored unless contained within a quoted string.

“()” inside quoted strings are not interpreted, but treated as part of the string.

Handling escaped quotes inside a string is optional; thus “(foo"bar)” maybe treated as a string “foo"bar”, or as an error.

For this, the reader need not recognize “\” for escaping, but should, in addition, recognize numbers if the language has appropriate datatypes.

Languages that support it may treat unquoted strings as symbols.

Note that with the exception of “()"” (“\” if escaping is supported) and whitespace there are no special characters. Anything else is allowed without quotes.

The reader should be able to read the following input

((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

and turn it into a native datastructure. (see the Pike, Python and Ruby implementations for examples of native data structures.)

The writer should be able to take the produced list and turn it into a new S-Expression. Strings that don't contain whitespace or parentheses () don't need to be quoted in the resulting S-Expression, but as a simplification, any string may be quoted.

Extra Credit

Let the writer produce pretty printed output with indenting and line-breaks.

11l

Translation of: Nim

T Token
   T.enum Kind
      INT
      FLOAT
      STRING
      IDENT
      LPAR
      RPAR
      END

   Kind kind
   String val

   F (kind, val = ‘’)
      .kind = kind
      .val = val

F lex(input_str)
   [Token] result
   V pos = 0

   F current()
      R I @pos < @input_str.len {@input_str[@pos]} E Char("\0")

   L pos < input_str.len
      V ch = input_str[pos]
      I ch == ‘(’
         pos++
         result.append(Token(Token.Kind.LPAR))
      E I ch == ‘)’
         pos++
         result.append(Token(Token.Kind.RPAR))
      E I ch C ‘0’..‘9’
         V num = ‘’
         V kind = Token.Kind.INT
         L current() C ‘0’..‘9’
            num ‘’= current()
            pos++
         I current() == ‘.’
            num ‘’= current()
            kind = FLOAT
            pos++
            L current() C ‘0’..‘9’
               num ‘’= current()
               pos++
         result.append(Token(kind, num))
      E I ch C (‘ ’, "\t", "\n", "\r")
         pos++
      E I ch == ‘"’
         V str = ‘’
         pos++
         L current() != ‘"’
            str ‘’= current()
            pos++
         pos++
         result.append(Token(Token.Kind.STRING, str))
      E
         V BannedChars = Set([‘ ’, "\t", ‘"’, ‘(’, ‘)’, ‘;’])
         V ident = ‘’
         L current() !C BannedChars
            ident ‘’= current()
            pos++
         result.append(Token(Token.Kind.IDENT, ident))

   result.append(Token(Token.Kind.END))
   R result

F indent(s, count)
   R (count * ‘ ’)‘’s.replace("\n", "\n"(count * ‘ ’))

T SExpr
   T.enum Kind
      INT
      FLOAT
      STRING
      IDENT
      LIST

   Kind kind
   String val
   [SExpr] children

   F (kind, val = ‘’)
      .kind = kind
      .val = val

   F to_str()
      I .kind C (SExpr.Kind.INT, SExpr.Kind.FLOAT, SExpr.Kind.IDENT)
         R .val
      E I .kind == STRING
         R ‘"’(.val)‘"’
      E I .kind == LIST
         V result = ‘(’
         L(i, ex) enumerate(.children)
            I ex.kind == LIST & ex.children.len > 1
               result ‘’= "\n"
               result ‘’= indent(ex.to_str(), 2)
            E
               I i > 0
                  result ‘’= ‘ ’
               result ‘’= ex.to_str()
         R result‘)’
      assert(0B)

V input_str = ‘
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
’
V tokens = lex(input_str)
V pos = 0

F current()
   R I :pos < :tokens.len {:tokens[:pos]} E Token(Token.Kind.END)

F parse() -> SExpr
   V token = current()
   :pos++
   I token.kind == INT
      R SExpr(SExpr.Kind.INT, token.val)
   E I token.kind == FLOAT
      R SExpr(SExpr.Kind.FLOAT, token.val)
   E I token.kind == STRING
      R SExpr(SExpr.Kind.STRING, token.val)
   E I token.kind == IDENT
      R SExpr(SExpr.Kind.IDENT, token.val)
   E I token.kind == LPAR
      V result = SExpr(SExpr.Kind.LIST)
      L current().kind !C (Token.Kind.RPAR, Token.Kind.END)
         result.children.append(parse())
      assert(current().kind != END, ‘Missing right paren ')'’)
      :pos++
      R result
   assert(0B)

print(parse().to_str())

Output:

(
  (data "quoted data" 123 4.5)
  (data
    (!@# (4.5) "(more" "data)")))

Ada

Uses Ada 2005 (Ada.Containers).

Specification of package S_Expr:

with Ada.Strings.Unbounded;
private with Ada.Containers.Indefinite_Vectors;

generic
   with procedure Print_Line(Indention: Natural; Line: String);
package S_Expr is

   function "-"(S: String) return Ada.Strings.Unbounded.Unbounded_String
     renames Ada.Strings.Unbounded.To_Unbounded_String;

   function "+"(U: Ada.Strings.Unbounded.Unbounded_String) return String
     renames Ada.Strings.Unbounded.To_String;

   type Empty_Data is tagged null record;
   subtype Data is Empty_Data'Class;
   procedure Print(This: Empty_Data; Indention: Natural);
   -- any object form class Data knows how to print itself
   -- objects of class data are either List of Data or Atomic
   -- atomic objects hold either an integer or a float or a string

   type List_Of_Data is new Empty_Data with private;
   overriding procedure Print(This: List_Of_Data; Indention: Natural);
   function First(This: List_Of_Data) return Data;
   function Rest(This: List_Of_Data) return List_Of_Data;
   function Empty(This: List_Of_Data) return Boolean;

   type Atomic is new Empty_Data with null record;

   type Str_Data is new Atomic with record
      Value: Ada.Strings.Unbounded.Unbounded_String;
      Quoted: Boolean := False;
   end record;
   overriding procedure Print(This: Str_Data; Indention: Natural);

   type Int_Data is new Atomic with record
      Value: Integer;
   end record;
   overriding procedure Print(This: Int_Data; Indention: Natural);

   type Flt_Data is new Atomic with record
      Value: Float;
   end record;
   overriding procedure Print(This: Flt_Data; Indention: Natural);

private

   package Vectors is new Ada.Containers.Indefinite_Vectors
     (Index_Type   => Positive,
      Element_Type => Data);

   type List_Of_Data is new Empty_Data with record
      Values: Vectors.Vector;
   end record;

end S_Expr;

The implementation of S_Expr:

with Ada.Integer_Text_IO, Ada.Float_Text_IO;

package body S_Expr is

   function First(This: List_Of_Data) return Data is
   begin
      return This.Values.First_Element;
   end First;

   function Rest(This: List_Of_Data) return List_Of_Data is
      List: List_Of_Data := This;
   begin
      List.Values.Delete_First;
      return List;
   end Rest;

   function Empty(This: List_Of_Data) return Boolean is
   begin
      return This.Values.Is_Empty;
   end Empty;

   procedure Print(This: Empty_Data; Indention: Natural) is
   begin
      Print_Line(Indention, "");
   end Print;

   procedure Print(This: Int_Data; Indention: Natural) is
   begin
      Print_Line(Indention, Integer'Image(This.Value));
   end Print;

   procedure Print(This: Flt_Data; Indention: Natural) is
   begin
      Print_Line(Indention, Float'Image(This.Value));
   end Print;

   procedure Print(This: Str_Data; Indention: Natural) is
   begin
      if This.Quoted then
         Print_Line(Indention, """" & (+This.Value) & """");
      else
         Print_Line(Indention, +This.Value);
      end if;
   end Print;

   procedure Print(This: List_Of_Data; Indention: Natural) is
   begin
      Print_Line(Indention, " ( ");
      for I in This.Values.First_Index .. This.Values.Last_Index loop
         This.Values.Element(I).Print(Indention + 1);
      end loop;
      Print_Line(Indention, " ) ");
   end Print;

end S_Expr;

Specification and Implementation of S_Expr.Parser (a child package of S_Expr):

generic -- child of a generic package must be a generic unit
package S_Expr.Parser is

   function Parse(Input: String) return List_Of_Data;
   -- the result of a parse process is always a list of expressions

end S_Expr.Parser;

with Ada.Integer_Text_IO, Ada.Float_Text_IO;

package body S_Expr.Parser is

   function Parse(Input: String) return List_Of_Data is

      procedure First_Token(S: String;
                            Start_Of_Token, End_Of_Token: out Positive) is
      begin
         Start_Of_Token := S'First;
         while Start_Of_Token <= S'Last and then S(Start_Of_Token) = ' ' loop
            Start_Of_Token := Start_Of_Token + 1; -- skip spaces
         end loop;
         if Start_Of_Token > S'Last then
            End_Of_Token := Start_Of_Token - 1;
            -- S(Start_Of_Token .. End_Of_Token) is the empty string
         elsif (S(Start_Of_Token) = '(') or (S(Start_Of_Token) = ')') then
            End_OF_Token := Start_Of_Token; -- the bracket is the token
         elsif S(Start_Of_Token) = '"' then -- " -- begin quoted string
            End_Of_Token := Start_Of_Token + 1;
            while S(End_Of_Token) /= '"' loop -- " -- search for closing bracket
               End_Of_Token := End_Of_Token + 1;
            end loop; -- raises Constraint_Error if closing bracket not found
         else -- Token is some kind of string
            End_Of_Token := Start_Of_Token;
            while End_Of_Token < S'Last and then
           ((S(End_Of_Token+1) /= ' ') and (S(End_Of_Token+1) /= '(') and
              (S(End_Of_Token+1) /= ')') and (S(End_Of_Token+1) /= '"')) loop  -- "
               End_Of_Token := End_Of_Token + 1;
            end loop;
         end if;
      end First_Token;

      procedure To_Int(Token: String; I: out Integer; Found: out Boolean) is
         Last: Positive;
      begin
         Ada.Integer_Text_IO.Get(Token, I, Last);
         Found := Last = Token'Last;
      exception
         when others => Found := False;
      end To_Int;

      procedure To_Flt(Token: String; F: out Float; Found: out Boolean) is
         Last: Positive;
      begin
         Ada.Float_Text_IO.Get(Token, F, Last);
         Found := Last = Token'Last;
      exception
         when others => Found := False;
      end To_Flt;

      function Quoted_String(Token: String) return Boolean is
      begin
         return
           Token'Length >= 2 and then Token(Token'First)='"' -- "
                             and then Token(Token'Last) ='"'; -- "
      end Quoted_String;

      Start, Stop: Positive;

      procedure Recursive_Parse(This: in out List_Of_Data) is

      Found: Boolean;

      Flt: Flt_Data;
      Int: Int_Data;
      Str: Str_Data;
      Lst: List_Of_Data;

      begin
         while Input(Start .. Stop) /= "" loop
            if Input(Start .. Stop) = ")" then
               return;
            elsif Input(Start .. Stop) = "(" then
               First_Token(Input(Stop+1 .. Input'Last), Start, Stop);
               Recursive_Parse(Lst);
               This.Values.Append(Lst);
            else
               To_Int(Input(Start .. Stop), Int.Value, Found);
               if Found then
                  This.Values.Append(Int);
               else
                  To_Flt(Input(Start .. Stop), Flt.Value, Found);
                  if Found then
                     This.Values.Append(Flt);
                  else
                     if Quoted_String(Input(Start .. Stop)) then
                        Str.Value  := -Input(Start+1 .. Stop-1);
                        Str.Quoted := True;
                     else
                        Str.Value  := -Input(Start .. Stop);
                        Str.Quoted := False;
                     end if;
                     This.Values.Append(Str);
                  end if;
               end if;
            end if;
            First_Token(Input(Stop+1 .. Input'Last), Start, Stop);
         end loop;
      end Recursive_Parse;

      L: List_Of_Data;

   begin
      First_Token(Input, Start, Stop);
      Recursive_Parse(L);
      return L;
   end Parse;

end S_Expr.Parser;

The main program Test_S_Expr:

with S_Expr.Parser, Ada.Text_IO;

procedure Test_S_Expr is

   procedure Put_Line(Indention: Natural; Line: String) is
   begin
      for I in 1 .. 3*Indention loop
         Ada.Text_IO.Put(" ");
      end loop;
      Ada.Text_IO.Put_Line(Line);
   end Put_Line;

   package S_Exp is new S_Expr(Put_Line);
   package S_Par is new S_Exp.Parser;

   Input: String := "((data ""quoted data"" 123 4.5)" &
                    "(data (!@# (4.5) ""(more"" ""data)"")))";
   Expression_List: S_Exp.List_Of_Data := S_Par.Parse(Input);

begin
   Expression_List.First.Print(Indention => 0);
   -- Parse will output a list of S-Expressions. We need the first Expression.
end Test_S_Expr;

Output:

 ( 
    ( 
      data
      "quoted data"
       123
       4.50000E+00
    ) 
    ( 
      data
      "quoted data"
       123
       4.50000E+00
      data
       ( 
         !@#
          ( 
             4.50000E+00
          ) 
         "(more"
         "data)"
       ) 
    ) 
 )

ALGOL 68

# S-Expressions #
CHAR nl = REPR 10;
# mode representing an S-expression #
MODE SEXPR = STRUCT( UNION( VOID, STRING, REF SEXPR ) element, REF SEXPR next );
# creates an initialises an SEXPR #
PROC new s expr = REF SEXPR: HEAP SEXPR := ( EMPTY, NIL );
# reports an error #
PROC error = ( STRING msg )VOID: print( ( "**** ", msg, newline ) );
# S-expression reader - reads and returns an S-expression from the string s #
PROC s reader = ( STRING s )REF SEXPR:
     BEGIN
        PROC at end      = BOOL: s pos > UPB s;
        PROC curr        = CHAR: IF at end THEN REPR 0 ELSE s[ s pos ] FI;
        PROC skip spaces = VOID: WHILE NOT at end AND ( curr = " " OR curr = nl ) DO s pos +:= 1 OD;
        PROC end of list = BOOL: at end OR curr = ")";
        INT s pos := LWB s;
        INT t pos;
        [ ( UPB s - LWB s ) + 1 ]CHAR token; # token text - large enough to hold the whole string if necessary #
        # adds the current character to the token #
        PROC add curr    = VOID: token[ t pos +:= 1 ] := curr;
        # get an s expression element from s #
        PROC get element = REF SEXPR:
             BEGIN
                REF SEXPR result = new s expr;
                skip spaces;
                # get token text #
                IF   at end THEN
                    # no element #
                    element OF result := EMPTY
                ELIF curr = "("  THEN
                    s pos +:= 1;
                    skip spaces;
                    IF NOT end of list
                    THEN
                        REF SEXPR nested expression = get element;
                        REF SEXPR element pos      := nested expression;
                        element OF result          := nested expression;
                        skip spaces;
                        WHILE NOT end of list
                        DO
                            element pos := next OF element pos := get element;
                            skip spaces
                        OD
                    FI;
                    IF curr = ")" THEN
                        s pos +:= 1
                    ELSE
                        error( "Missing "")""" )
                    FI
                ELIF curr = ")" THEN
                    s pos +:= 1;
                    error( "Unexpected "")""" );
                    element OF result := EMPTY
                ELSE
                    # quoted or unquoted string #
                    t pos := LWB token - 1;
                    IF curr /= """" THEN
                        # unquoted string #
                        WHILE add curr;
                              s pos +:= 1;
                              NOT at end AND curr /= " " AND curr /= "("
                                         AND curr /= ")" AND curr /= """"
                                         AND curr /= nl
                        DO SKIP OD
                    ELSE
                        # quoted string #
                        WHILE add curr;
                              s pos +:= 1;
                              NOT at end AND curr /= """"
                        DO SKIP OD;
                        IF curr /= """" THEN
                            # missing string quote #
                            error( "Unterminated string: <<" + token[ : t pos ] + ">>" )
                        ELSE
                            # have the closing quote #
                            add curr;
                            s pos +:= 1
                        FI
                    FI;
                    element OF result := token[ : t pos ]
                FI;
                result
             END # get element # ;

        REF SEXPR s expr = get element;
        skip spaces;
        IF NOT at end THEN
            # extraneuos text after the expression #
            error( "Unexpected text at end of expression: " + s[ s pos : ] )
        FI;

        s expr
     END # s reader # ;
# prints an S expression #
PROC s writer = ( REF SEXPR s expr )VOID:
     BEGIN
        # prints an S expression with a suitable indent #
        PROC print indented s expression = ( REF SEXPR s expr, INT indent )VOID:
             BEGIN
                REF SEXPR s pos := s expr;
                WHILE REF SEXPR( s pos ) ISNT REF SEXPR( NIL ) DO
                    FOR i TO indent DO print( ( " " ) ) OD;
                    CASE element OF s pos
                    IN  (VOID       ): print( ( "()", newline ) )
                     ,  (STRING    s): print( ( s,    newline ) )
                     ,  (REF SEXPR e): BEGIN
                                          print( ( "(", newline ) );
                                          print indented s expression( e, indent + 4 );
                                          FOR i TO indent DO print( ( " " ) ) OD;
                                          print( ( ")", newline ) )
                                      END
                    OUT
                        error( "Unexpected S expression element" )
                    ESAC;
                    s pos := next OF s pos
                OD
             END # print indented s expression # ;

        print indented s expression( s expr, 0 )
     END # s writer # ;
# test the eader and writer with the example from the task #
s writer( s reader( "((data ""quoted data"" 123 4.5)"
                  + nl
                  + " (data (!@# (4.5) ""(more"" ""data)"")))"
                  + nl
                  )
        )

Output:

(
    (
        data
        "quoted data"
        123
        4.5
    )
    (
        data
        (
            !@#
            (
                4.5
            )
            "(more"
            "data)"
        )
    )
)

APL

Works with: Dyalog APL

These are two functions, sexp parses an S-expression, and pretty prettyprints a parsed expression.

The S-expression is represented as a nested APL vector, where every item is a tuple consisting of a number representing the type, and the value of the item.

The simple types are string (1), number (2), and atom (3); in these cases the value is a character vector (for string and atom) or a number. Type 0 is a list, and the value is a vector of items.

As an example, this is how a list may be defined in APL itself:

      pretty⊂(0((3 'Hi')(3 'Bye')(1 'A string')(0((3 'Depth')(2 42)))))
(                                           
 Hi                                         
 Bye                                        
 "A string"                                 
 (                                          
  Depth                                     
  42                                        
 )                                          
)

The following is the result of parsing and then prettyprinting the given input:

      ⍴r
65
      r
((data "quoted data" 123 4.5)                                    
(data (!@# (4.5) "(more" "data)")))
      sexp r
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│┌─┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────┐│
││0│┌──────────────────────────────────────────────┬──────────────────────────────────────────────────────────────┐││
││ ││┌─┬──────────────────────────────────────────┐│┌─┬──────────────────────────────────────────────────────────┐│││
││ │││0│┌────────┬───────────────┬───────┬───────┐│││0│┌────────┬───────────────────────────────────────────────┐││││
││ │││ ││┌─┬────┐│┌─┬───────────┐│┌─┬───┐│┌─┬───┐││││ ││┌─┬────┐│┌─┬───────────────────────────────────────────┐│││││
││ │││ │││3│data│││1│quoted data│││2│123│││2│4.5│││││ │││3│data│││0│┌───────┬─────────────┬─────────┬─────────┐││││││
││ │││ ││└─┴────┘│└─┴───────────┘│└─┴───┘│└─┴───┘││││ ││└─┴────┘││ ││┌─┬───┐│┌─┬─────────┐│┌─┬─────┐│┌─┬─────┐│││││││
││ │││ │└────────┴───────────────┴───────┴───────┘│││ ││        ││ │││3│!@#│││0│┌───────┐│││1│(more│││1│data)││││││││
││ ││└─┴──────────────────────────────────────────┘││ ││        ││ ││└─┴───┘││ ││┌─┬───┐│││└─┴─────┘│└─┴─────┘│││││││
││ ││                                              ││ ││        ││ ││       ││ │││2│4.5││││         │         │││││││
││ ││                                              ││ ││        ││ ││       ││ ││└─┴───┘│││         │         │││││││
││ ││                                              ││ ││        ││ ││       ││ │└───────┘││         │         │││││││
││ ││                                              ││ ││        ││ ││       │└─┴─────────┘│         │         │││││││
││ ││                                              ││ ││        ││ │└───────┴─────────────┴─────────┴─────────┘││││││
││ ││                                              ││ ││        │└─┴───────────────────────────────────────────┘│││││
││ ││                                              ││ │└────────┴───────────────────────────────────────────────┘││││
││ ││                                              │└─┴──────────────────────────────────────────────────────────┘│││
││ │└──────────────────────────────────────────────┴──────────────────────────────────────────────────────────────┘││
│└─┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────┘│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘                              
      pretty sexp r
(                                                                                                                
 (                                                                                                               
  data                                                                                                           
  "quoted data"                                                                                                  
  123                                                                                                            
  4.5                                                                                                            
 )                                                                                                               
 (                                                                                                               
  data                                                                                                           
  (                                                                                                              
   !@#                                                                                                           
   (                                                                                                             
    4.5                                                                                                          
   )                                                                                                             
   "(more"                                                                                                       
   "data)"                                                                                                       
  )                                                                                                              
 )                                                                                                               
)

sexp←{
    wspace←' ',⎕TC                  ⍝ whitespace is space, tab, cr, lf

    ⍝ turn string into number if possible
    num←{
        0=≢⍵:⍬                         ⍝ empty = nope
        (⊃⍵)∊'-¯':-∇1↓⍵                ⍝ negative?
        (1≥⍵+.='.')∧⍵∧.∊⊂⎕D,'.':⍎⍵     ⍝ number: all digits and 0 or 1 points
        ⍬                              ⍝ otherwise, nope.
    }

    ⍝ tokenize (0=brackets, 1=strings, 2=atoms)
    tok←{
        d←(~∧\⍵∊wspace)/⍵              ⍝ ignore leading whitespace
        d≡'':d                         ⍝ empty input = empty output
        s←1↑d ⋄ r←1↓d                  ⍝ start and rest
        s∊'()':(⊂0,⊂s),∇r              ⍝ brackets: just the bracket
        sb←∧\~('"'=r)∧'\'≠¯1⌽r         ⍝ strings: up to first " not preceded by \
        sd←(1⌽sd≠'"')/sd←sb/r          ⍝ without escape characters
        s='"':(⊂1,⊂sd),∇1↓(~sb)/r
        atm←∧\~d∊wspace,'()"'          ⍝ atom: up to next whitespace, () or "
        (⊂2,⊂atm/d),∇(~atm)/d
    }

    ⍝ build structure from tokens
    build←{
        ⍺←⍬
        0=≢⍵:⍺ ⍬                       ⍝ empty input = done
        typ tok←⊃⍵                     ⍝ current token and type
        rst←1↓⍵                        ⍝ rest of tokens
        tok≡,'(':(⍺,⊂0 l)∇r⊣l r←∇rst   ⍝ open bracket: go down a level
        tok≡,')':⍺ rst                 ⍝ close bracket: go up a level
        typ=1:(⍺,⊂1 tok)∇rst           ⍝ string: type 1
        0≠≢n←num tok:(⍺,⊂2(,n))∇rst    ⍝ number: type 2
        (⍺,⊂3 tok)∇rst                 ⍝ symbol: type 3
    }

    ⍝ check that a string was passed in
    (''≢0↑⍵)∨1≠⍴⍴⍵:⎕SIGNAL⊂('EN'11)('Message' 'Input must be a char vector')

    ⍝ check that all strings are closed
    quot←('"'=⍵)∧'\'≠¯1⌽⍵
    0≠2|+/quot:⎕SIGNAL⊂('EN'11)('Message' 'Open string')

    ⍝ check that all brackets match (except those in strings)
    nest←+\+⌿1 ¯1×[1]'()'∘.=(~2|+\quot)/⍵
    (0≠¯1↑nest)∨0<.∨nest:⎕SIGNAL⊂('EN'11)('Message' 'Mismatched parentheses')

    ⊃build tok ⍵
}

pretty←{
    ⍝ Prettyprinter for parsed S-expressions
    NL←⎕tc[2]
    ∊∇{
        typ itm←⍵
        typ=3:itm,NL                          ⍝ Atom
        typ=2:(⍕itm),NL                       ⍝ Number
        typ=1:('"',('"'⎕R'\\"'⊢itm),'"'),NL   ⍝ String
        typ=0:'(',NL,('^'⎕R' '⊢⍺⍺ itm),')',NL ⍝ List
    }¨⍵
}

Arturo

code: {
    ((data "quoted data" 123 4.5)
     (data (!@# (4.5) "(more" "data)")))
}

s: first to :block code
inspect.muted s
print as.code s

Output:

[ :inline
	[ :inline
		data :word
		quoted data :string
		123 :integer
		4.5 :floating
	]
	[ :inline
		data :word
		[ :inline
			! :symbol
			@ :symbol
			# :symbol
			[ :inline
				4.5 :floating
			]
			(more :string
			data) :string
		]
	]
]
((data "quoted data" 123 4.5) (data (! @ # (4.5) "(more" "data)")))

AutoHotkey

S_Expressions(Str){
	Str := RegExReplace(Str, "s)(?<![\\])"".*?[^\\]""(*SKIP)(*F)|((?<![\\])[)(]|\s)", "`n$0`n")
	Str := RegExReplace(Str, "`am)^\s*\v+")	,	Cnt := 0
	loop, parse, Str, `n, `r
	{
		Cnt := A_LoopField=")" ? Cnt-1 : Cnt
		Res .= tabs(Cnt) A_LoopField "`r`n"
		Cnt := A_LoopField="(" ? Cnt+1 : Cnt
	}
	return Res
}
tabs(n){
	loop, % n
		Res .= "`t"
	return Res
}

Examples:

Str =
(
((data da\(\)ta "quot\\ed data" 123 4.5)
 ("data" (!@# (4.5) "(mo\"re" "data)")))
)
MsgBox, 262144, , % S_Expressions(Str)

Output:

(
	(
		data
		da\(\)ta
		"quot\\ed data"
		123
		4.5
	)
	(
		"data"
		(
			!@#
			(
				4.5
			)
			"(mo\"re"
			"data)"
		)
	)
)

C

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

enum { S_NONE, S_LIST, S_STRING, S_SYMBOL };

typedef struct {
	int type;
	size_t len;
	void *buf;
} s_expr, *expr;

void whine(const char *s)
{
	fprintf(stderr, "parse error before ==>%.10s\n", s);
}

expr parse_string(const char *s, char **e)
{
	expr ex = calloc(sizeof(s_expr), 1);
	char buf[256] = {0};
	int i = 0;

	while (*s) {
		if (i >= 256) {
			fprintf(stderr, "string too long:\n");
			whine(s);
			goto fail;
		}
		switch (*s) {
		case '\\':
			switch (*++s) {
			case '\\':
			case '"':	buf[i++] = *s++;
					continue;

			default:	whine(s);
					goto fail;
			}
		case '"':	goto success;
		default:	buf[i++] = *s++;
		}
	}
fail:
	free(ex);
	return 0;

success:
	*(const char **)e = s + 1;
	ex->type = S_STRING;
	ex->buf = strdup(buf);
	ex->len = strlen(buf);
	return ex;
}

expr parse_symbol(const char *s, char **e)
{
	expr ex = calloc(sizeof(s_expr), 1);
	char buf[256] = {0};
	int i = 0;

	while (*s) {
		if (i >= 256) {
			fprintf(stderr, "symbol too long:\n");
			whine(s);
			goto fail;
		}
		if (isspace(*s)) goto success;
		if (*s == ')' || *s == '(') {
			s--;
			goto success;
		}

		switch (*s) {
		case '\\':
			switch (*++s) {
			case '\\': case '"': case '(': case ')':
					buf[i++] = *s++;
					continue;
			default:	whine(s);
					goto fail;
			}
		case '"':	whine(s);
				goto success;
		default:	buf[i++] = *s++;
		}
	}
fail:
	free(ex);
	return 0;

success:
	*(const char **)e = s + 1;
	ex->type = S_SYMBOL;
	ex->buf = strdup(buf);
	ex->len = strlen(buf);
	return ex;
}

void append(expr list, expr ele)
{
	list->buf = realloc(list->buf, sizeof(expr) * ++list->len);
	((expr*)(list->buf))[list->len - 1] = ele;
}

expr parse_list(const char *s, char **e)
{
	expr ex = calloc(sizeof(s_expr), 1), chld;
	char *next;

	ex->len = 0;

	while (*s) {
		if (isspace(*s)) {
			s++;
			continue;
		}

		switch (*s) {
		case '"':
			chld = parse_string(s+1, &next);
			if (!chld) goto fail;
			append(ex, chld);
			s = next;
			continue;
		case '(':
			chld = parse_list(s+1, &next);
			if (!chld) goto fail;
			append(ex, chld);
			s = next;
			continue;
		case ')':
			goto success;

		default:
			chld = parse_symbol(s, &next);
			if (!chld) goto fail;
			append(ex, chld);
			s = next;
			continue;
		}
	}

fail:
	whine(s);
	free(ex);
	return 0;

success:
	*(const char **)e = s+1;
	ex->type = S_LIST;
	return ex;
}

expr parse_term(const char *s, char **e)
{
	while (*s) {
		if (isspace(*s)) {
			s++;
			continue;
		}
		switch(*s) {
		case '(':
			return parse_list(s+1, e);
		case '"':
			return parse_string(s+1, e);
		default:
			return parse_symbol(s+1, e);
		}
	}
	return 0;
}

void print_expr(expr e, int depth)
{
#define sep() for(i = 0; i < depth; i++) printf("    ")
	int i;
	if (!e) return;


	switch(e->type) {
	case S_LIST:
		sep();
		puts("(");
		for (i = 0; i < e->len; i++)
			print_expr(((expr*)e->buf)[i], depth + 1);
		sep();
		puts(")");
		return;
	case S_SYMBOL:
	case S_STRING:
		sep();
		if (e->type == S_STRING) putchar('"');
		for (i = 0; i < e->len; i++) {
			switch(((char*)e->buf)[i]) {
			case '"':
			case '\\':
				putchar('\\');
				break;
			case ')': case '(':
				if (e->type == S_SYMBOL)
					putchar('\\');
			}

			putchar(((char*)e->buf)[i]);
		}
		if (e->type == S_STRING) putchar('"');
		putchar('\n');
		return;
	}
}

int main()
{
	char *next;
	const char *in = "((data da\\(\\)ta \"quot\\\\ed data\" 123 4.5)\n"
			" (\"data\" (!@# (4.5) \"(mo\\\"re\" \"data)\")))";

	expr x = parse_term(in, &next);

	printf("input is:\n%s\n", in);
	printf("parsed as:\n");
	print_expr(x, 0);
	return 0;
}

Output:

input is:
((data da\(\)ta "quot\\ed data" 123 4.5)
 ("data" (!@# (4.5) "(mo\"re" "data)")))
parsed as:
(
    (
        data
        da\(\)ta
        "quot\\ed data"
        123
        4.5
    )
    (
        "data"
        (
            !@#
            (
                4.5
            )
            "(mo\"re"
            "data)"
        )
    )
)

C#

Implementation of S-expression serializer & deserializer in C# 6.0 language.

Git repository with code and tests can be found here: https://github.com/ichensky/SExpression/tree/rosettacode

using System;
using System.Collections.Generic;
using System.Text;

  public class SNode
    {
        private List<SNode> _items;
        public string Name { get; set; }
        public IReadOnlyCollection<SNode> Items { get { return _items.AsReadOnly(); } }
        public SNode()
        {
            this._items = new List<SNode>();
        }
        public SNode(string name):this()
        {
            this.Name=name;
        }
        public void AddNode(SNode node)
        {
            this._items.Add(node);
        }      
    }

    public class SNodeFull : SNode
    {
        private bool _isLeaf;
        public bool IsLeaf { get => _isLeaf; }
        public SNodeFull(bool isLeaf) : base()
        {
            this._isLeaf = isLeaf;
        }

        public SNodeFull(string name, bool isLeaf) : base(name)
        {
            this._isLeaf = isLeaf;
        }

        public SNodeFull RootNode { get; set; }

        public void AddNode(SNodeFull node)
        {
            base.AddNode(node);
            node.RootNode = this;
        }
    }

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace SExpression
{
    public partial class SExpression
    {
        public const string ErrorStrNotValidFormat = "Not valid format.";
    }
    public partial class SExpression : ISExpression
    {
        public String Serialize(SNode root)
        {
            if (root == null)
            {
                throw new ArgumentNullException();
            }
            var sb = new StringBuilder();
            Serialize(root, sb);
            return sb.ToString();
        }
        private void Serialize(SNode node, StringBuilder sb)
        {
            sb.Append('(');

            if (node.Items.Count > 0)
            {
                int x = 0;
                foreach (var item in node.Items)
                {
                    if (x>0)
                    {
                    sb.Append(' ');
                    }
                    else
                    {
                        x++;
                    }
                    if (item.Items.Count > 0)
                    {
                        Serialize(item, sb);
                    }
                    else
                    {
                        SerializeItem(item, sb);
                    }
                }
            }

            sb.Append(')');
        }
        private void SerializeItem(SNode node, StringBuilder sb)
        {
            if (node.Name == null)
            {
                sb.Append("()");
                return;
            }
            node.Name = node.Name.Replace("\"", "\\\"");
            if (node.Name.IndexOfAny(new char[] { ' ', '"', '(', ')' }) != -1 || node.Name == string.Empty)
            {
                sb.Append('"').Append(node.Name).Append('"');
                return;
            }
            sb.Append(node.Name);
        }
    }
    public partial class SExpression
    {
        public SNode Deserialize(string st)
        {
            if (st==null)
            {
                return null;
            }
            st = st.Trim();
            if (string.IsNullOrEmpty(st))
            {
                return null;
            }

            var begin = st.IndexOf('(');
            if (begin != 0)
            {
                throw new Exception();
            }
            var end = st.LastIndexOf(')');
            if (end != st.Length - 1)
            {
                throw new Exception(ErrorStrNotValidFormat);
            }
            st = st.Remove(st.Length-1).Remove(0, 1).ToString();
            var node = new SNodeFull(false);
            Deserialize(ref st, node);
            return node;
        }

        private void Deserialize(ref string st, SNodeFull root)
        {
            st = st.Trim();
            if (string.IsNullOrEmpty(st))
            {
                return;
            }

            SNodeFull node = null;
            SNodeFull r = root;
            do
            {
                while (st[0] == ')')
                {
                    st = st.Remove(0, 1).Trim();
                    if (st.Length==0)
                    {
                        return;
                    }
                    r = root.RootNode;
                    if (r==null)
                    {
                        throw new Exception(ErrorStrNotValidFormat);
                    }
                }
                node = DeserializeItem(ref st);
                st = st.Trim();

                r.AddNode(node);
               
                if (!node.IsLeaf)
                {
                    Deserialize(ref st,node);
                }
            }
            while (st.Length > 0);
            
        }

        private SNodeFull DeserializeItem(ref string st)
        {
            if (st[0] == '(')
            {
                st = st.Remove(0, 1);
                return new SNodeFull(false);
            }

            var x = 0;
            var esc = 0;
            for (int i = 0; i < st.Length; i++)
            {
                if (st[i] == '"')
                {
                    if (esc == 0)
                    {
                        esc = 1;
                    }
                    else if(esc == 1 && (i> 0 && st[i - 1] == '\\'))
                    {
                        throw new Exception(ErrorStrNotValidFormat);
                    }
                    else
                    {
                        esc = 2;
                        break;
                    }
                }
                else if (esc==0 && " ()".Contains(st[i]))
                {
                    break;
                }

                x++;
            }
            if (esc == 1)
            {
                throw new Exception(ErrorStrNotValidFormat);
            }

            var head = esc==0? st.Substring(0, x): st.Substring(1,x-1);
            st = st.Remove(0, esc ==0 ? x: x + 2);
            return new SNodeFull(head, true);
        }
    }
}

using System;
using System.Collections.Generic;

namespace SExpression.Test
{
    class Program
    {
        static void Main(string[] args)
        {
            var str = 
@"((data ""quoted data"" 123 4.5)
(data(!@# (4.5) ""(more"" ""data)"")))";

            var se = new SExpression();
            var node = se.Deserialize(str);
            var result = se.Serialize(node);
            Console.WriteLine(result);
        }
    }
}

Output:

((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

C++

The function s_expr::parse parses an object from an input stream. An object may be a quoted string, symbol (i.e. unquoted string), a number (integer or float) or a list of objects. The "\" character is used for escaping special characters. C++ 17 is required. As it stands, you can't really do anything with the parsed object apart from writing it out, which fulfils this task's requirements. With some more work this code might actually be useful.

#include <cctype>
#include <iomanip>
#include <iostream>
#include <list>
#include <memory>
#include <sstream>
#include <string>
#include <variant>

namespace s_expr {

enum class token_type { none, left_paren, right_paren, symbol, string, number };
enum class char_type { left_paren, right_paren, quote, escape, space, other };
enum class parse_state { init, quote, symbol };

struct token {
    token_type type = token_type::none;
    std::variant<std::string, double> data;
};

char_type get_char_type(char ch) {
    switch (ch) {
    case '(':
        return char_type::left_paren;
    case ')':
        return char_type::right_paren;
    case '"':
        return char_type::quote;
    case '\\':
        return char_type::escape;
    }
    if (isspace(static_cast<unsigned char>(ch)))
        return char_type::space;
    return char_type::other;
}

bool parse_number(const std::string& str, token& tok) {
    try {
        size_t pos = 0;
        double num = std::stod(str, &pos);
        if (pos == str.size()) {
            tok.type = token_type::number;
            tok.data = num;
            return true;
        }
    } catch (const std::exception&) {
    }
    return false;
}

bool get_token(std::istream& in, token& tok) {
    char ch;
    parse_state state = parse_state::init;
    bool escape = false;
    std::string str;
    token_type type = token_type::none;
    while (in.get(ch)) {
        char_type ctype = get_char_type(ch);
        if (escape) {
            ctype = char_type::other;
            escape = false;
        } else if (ctype == char_type::escape) {
            escape = true;
            continue;
        }
        if (state == parse_state::quote) {
            if (ctype == char_type::quote) {
                type = token_type::string;
                break;
            }
            else
                str += ch;
        } else if (state == parse_state::symbol) {
            if (ctype == char_type::space)
                break;
            if (ctype != char_type::other) {
                in.putback(ch);
                break;
            }
            str += ch;
        } else if (ctype == char_type::quote) {
            state = parse_state::quote;
        } else if (ctype == char_type::other) {
            state = parse_state::symbol;
            type = token_type::symbol;
            str = ch;
        } else if (ctype == char_type::left_paren) {
            type = token_type::left_paren;
            break;
        } else if (ctype == char_type::right_paren) {
            type = token_type::right_paren;
            break;
        }
    }
    if (type == token_type::none) {
        if (state == parse_state::quote)
            throw std::runtime_error("syntax error: missing quote");
        return false;
    }
    tok.type = type;
    if (type == token_type::string)
        tok.data = str;
    else if (type == token_type::symbol) {
        if (!parse_number(str, tok))
            tok.data = str;
    }
    return true;
}

void indent(std::ostream& out, int level) {
    for (int i = 0; i < level; ++i)
        out << "   ";
}

class object {
public:
    virtual ~object() {}
    virtual void write(std::ostream&) const = 0;
    virtual void write_indented(std::ostream& out, int level) const {
        indent(out, level);
        write(out);
    }
};

class string : public object {
public:
    explicit string(const std::string& str) : string_(str) {}
    void write(std::ostream& out) const { out << std::quoted(string_); }
private:
    std::string string_;
};

class symbol : public object {
public:
    explicit symbol(const std::string& str) : string_(str) {}
    void write(std::ostream& out) const {
        for (char ch : string_) {
            if (get_char_type(ch) != char_type::other)
                out << '\\';
            out << ch;
        }
    }
private:
    std::string string_;
};

class number : public object {
public:
    explicit number(double num) : number_(num) {}
    void write(std::ostream& out) const { out << number_; }
private:
    double number_;
};

class list : public object {
public:
    void write(std::ostream& out) const;
    void write_indented(std::ostream&, int) const;
    void append(const std::shared_ptr<object>& ptr) {
        list_.push_back(ptr);
    }
private:
    std::list<std::shared_ptr<object>> list_;
};

void list::write(std::ostream& out) const {
    out << "(";
    if (!list_.empty()) {
        auto i = list_.begin();
        (*i)->write(out);
        while (++i != list_.end()) {
            out << ' ';
            (*i)->write(out);
        }
    }
    out << ")";
}

void list::write_indented(std::ostream& out, int level) const {
    indent(out, level);
    out << "(\n";
    if (!list_.empty()) {
        for (auto i = list_.begin(); i != list_.end(); ++i) {
            (*i)->write_indented(out, level + 1);
            out << '\n';
        }
    }
    indent(out, level);
    out << ")";
}

class tokenizer {
public:
    tokenizer(std::istream& in) : in_(in) {}
    bool next() {
        if (putback_) {
            putback_ = false;
            return true;
        }
        return get_token(in_, current_);
    }
    const token& current() const {
        return current_;
    }
    void putback() {
        putback_ = true;
    }
private:
    std::istream& in_;
    bool putback_ = false;
    token current_;
};

std::shared_ptr<object> parse(tokenizer&);

std::shared_ptr<list> parse_list(tokenizer& tok) {
    std::shared_ptr<list> lst = std::make_shared<list>();
    while (tok.next()) {
        if (tok.current().type == token_type::right_paren)
            return lst;
        else
            tok.putback();
        lst->append(parse(tok));
    }
    throw std::runtime_error("syntax error: unclosed list");
}

std::shared_ptr<object> parse(tokenizer& tokenizer) {
    if (!tokenizer.next())
        return nullptr;
    const token& tok = tokenizer.current();
    switch (tok.type) {
    case token_type::string:
        return std::make_shared<string>(std::get<std::string>(tok.data));
    case token_type::symbol:
        return std::make_shared<symbol>(std::get<std::string>(tok.data));
    case token_type::number:
        return std::make_shared<number>(std::get<double>(tok.data));
    case token_type::left_paren:
        return parse_list(tokenizer);
    default:
        break;
    }
    throw std::runtime_error("syntax error: unexpected token");
}

} // namespace s_expr

void parse_string(const std::string& str) {
    std::istringstream in(str);
    s_expr::tokenizer tokenizer(in);
    auto exp = s_expr::parse(tokenizer);
    if (exp != nullptr) {
        exp->write_indented(std::cout, 0);
        std::cout << '\n';
    }
}

int main(int argc, char** argv) {
    std::string test_string =
        "((data \"quoted data\" 123 4.5)\n"
        " (data (!@# (4.5) \"(more\" \"data)\")))";
    if (argc == 2)
        test_string = argv[1];
    try {
        parse_string(test_string);
    } catch (const std::exception& ex) {
        std::cerr << ex.what() << '\n';
    }
    return 0;
}

Output:

(
   (
      data
      "quoted data"
      123
      4.5
   )
   (
      data
      (
         !@#
         (
            4.5
         )
         "(more"
         "data)"
      )
   )
)

Ceylon

class Symbol(symbol) {
    shared String symbol;
    string => symbol;
}

abstract class Token()
        of  DataToken | leftParen | rightParen {}

abstract class DataToken(data)
        of StringToken | IntegerToken | FloatToken | SymbolToken
        extends Token() {

    shared String|Integer|Float|Symbol data;
    string => data.string;
}

class StringToken(String data) extends DataToken(data) {}
class IntegerToken(Integer data) extends DataToken(data) {}
class FloatToken(Float data) extends DataToken(data) {}
class SymbolToken(Symbol data) extends DataToken(data) {}

object leftParen extends Token() {
    string => "(";
}
object rightParen extends Token() {
    string => ")";
}

class Tokens(String input) satisfies {Token*} {
    shared actual Iterator<Token> iterator() => object satisfies Iterator<Token> {

        variable value index = 0;

        shared actual Token|Finished next() {

            while(exists nextChar = input[index], nextChar.whitespace) {
                index++;
            }

            if(index >= input.size) {
                return finished;
            }

            assert(exists char = input[index]);

            if(char == '(') {
                index++;
                return leftParen;
            }
            if(char == ')') {
                index++;
                return rightParen;
            }

            if(char == '"') {
                value builder = StringBuilder();
                while(exists nextChar = input[++index]) {
                    if(nextChar == '"') {
                        index++;
                        break;
                    }
                    if(nextChar == '\\') {
                        if(exists nextNextChar = input[++index]) {
                            switch(nextNextChar)
                            case('\\') {
                                builder.append("\\");
                            }
                            case('t') {
                                builder.append("\t");
                            }
                            case('n') {
                                builder.append("\n");
                            }
                            case('"') {
                                builder.append("\"");
                            }
                            else {
                                throw Exception("unknown escaped character");
                            }
                        } else {
                            throw Exception("unclosed string");
                        }
                    } else {
                        builder.appendCharacter(nextChar);
                    }
                }
                return StringToken(builder.string);
            }

            value builder = StringBuilder();
            while(exists nextChar = input[index], !nextChar.whitespace && nextChar != '(' && nextChar != ')') {
                builder.appendCharacter(nextChar);
                index++;
            }
            value string = builder.string;
            if(is Integer int = Integer.parse(string)) {
                return IntegerToken(int);
            } else if(is Float float = Float.parse(string)) {
                return FloatToken(float);
            } else {
                return SymbolToken(Symbol(string));
            }
        }
    };
}

abstract class Node() of Atom | Group {}

class Atom(data) extends Node() {
    shared String|Integer|Float|Symbol data;
    string => data.string;
}
class Group() extends Node() satisfies {Node*} {
    shared variable Node[] nodes = [];
    string => nodes.string;
    shared actual Iterator<Node> iterator() => nodes.iterator();

}

Node buildTree(Tokens tokens) {

    [Group, Integer] recursivelyBuild(Token[] tokens, variable Integer index = 0) {
        value result = Group();
        while(exists token = tokens[index]) {
            switch (token)
            case (leftParen) {
                value [newNode, newIndex] = recursivelyBuild(tokens, index + 1);
                index = newIndex;
                result.nodes = result.nodes.append([newNode]);
            }
            case (rightParen) {
                return [result, index];
            }
            else {
                result.nodes = result.nodes.append([Atom(token.data)]);
            }
            index++;
        }
        return [result, index];
    }

    value root = recursivelyBuild(tokens.sequence())[0]; 
    return root.first else Group();
}

void prettyPrint(Node node, Integer indentation = 0) {

    void paddedPrint(String s) => print(" ".repeat(indentation) + s);

    if(is Atom node) {
        paddedPrint(node.string);
    } else {
        paddedPrint("(");
        for(n in node.nodes) {
            prettyPrint(n, indentation + 2);
        }
        paddedPrint(")");
    }
}

shared void run() {
    value tokens = Tokens("""((data "quoted data" 123 4.5)
                               (data (!@# (4.5) "(more" "data)")))""");
    print(tokens);

    value tree = buildTree(tokens);
    prettyPrint(tree);
}

Output:

(
  (
    data
    quoted data
    123
    4.5
  )
  (
    data
    (
      !@#
      (
        4.5
      )
      (more
      data)
    )
  )
)

CoffeeScript

This example is in need of improvement:

This solution does not reproduce unquoted strings as per task description

# This code works with Lisp-like s-expressions.
#
# We lex tokens then do recursive descent on the tokens
# to build our data structures.

sexp = (data) ->
  # Convert a JS data structure to a string s-expression.  A sexier version
  # would remove quotes around strings that don't need them.
  s = ''
  if Array.isArray data
    children = (sexp elem for elem in data).join ' '
    '(' + children + ')'
  else
    return JSON.stringify data

parse_sexp = (sexp) ->
  tokens = lex_sexp sexp
  i = 0

  _parse_list = ->
    i += 1
    arr = []
    while i < tokens.length and tokens[i].type != ')'
      arr.push _parse()
    if i < tokens.length
      i += 1
    else
     throw Error "missing end paren"
    arr
    
  _guess_type = (word) ->
    # This is crude, doesn't handle all forms of floats.
    if word.match /^\d+\.\d+$/
      parseFloat(word)
    else if word.match /^\d+/
      parseInt(word)
    else
      word    

  _parse_word = ->
    token = tokens[i]
    i += 1
    if token.type == 'string'
      token.word
    else
      _guess_type token.word
      
  _parse = ->
    return undefined unless i < tokens.length
    token = tokens[i]
    if token.type == '('
      _parse_list()
    else
      _parse_word()

  exp = _parse()
  throw Error "premature termination" if i < tokens.length
  exp
    
lex_sexp = (sexp) ->
  is_whitespace = (c) -> c in [' ', '\t', '\n']
  i = 0
  tokens = []
  
  test = (f) ->
    return false unless i < sexp.length
    f(sexp[i])
  
  eat_char = (c) ->
    tokens.push
      type: c
    i += 1

  eat_whitespace = ->
    i += 1
    while test is_whitespace
      i += 1

  eat_word = ->
    token = c
    i += 1
    word_char = (c) ->
      c != ')' and !is_whitespace c
    while test word_char
      token += sexp[i]
      i += 1
    tokens.push
      type: "word"
      word: token
  
  eat_quoted_word = ->
    start = i
    i += 1
    token = ''
    while test ((c) -> c != '"')
      if sexp[i] == '\\'
        i += 1
        throw Error("escaping error") unless i < sexp.length
      token += sexp[i]
      i += 1
    if test ((c) -> c == '"')
      tokens.push
        type: "string"
        word: token
      i += 1
    else
      throw Error("end quote missing #{sexp.substring(start, i)}")
  
  while i < sexp.length
    c = sexp[i]
    if c == '(' or c == ')'
      eat_char c
    else if is_whitespace c
      eat_whitespace()
    else if c == '"'
      eat_quoted_word()
    else
      eat_word()
  tokens

do ->
  input = """
    ((data "quoted data with escaped \\"" 123 4.5 "14")
     (data (!@# (4.5) "(more" "data)")))
  """
  console.log "input:\n#{input}\n"
  output = parse_sexp(input)
  pp = (data) -> JSON.stringify data, null, '  '
  console.log "output:\n#{pp output}\n"
  console.log "round trip:\n#{sexp output}\n"

Output:

> coffee sexp.coffee 
input:
((data "quoted data with escaped \"" 123 4.5 "14")
 (data (!@# (4.5) "(more" "data)")))

output:
[
  [
    "data",
    "quoted data with escaped \"",
    123,
    4.5,
    "14"
  ],
  [
    "data",
    [
      "!@#",
      [
        4.5
      ],
      "(more",
      "data)"
    ]
  ]
]

round trip:
(("data" "quoted data with escaped \"" 123 4.5 "14") ("data" ("!@#" (4.5) "(more" "data)")))

Common Lisp

Parsing S-Expressions

Like most Lisp dialects, Common Lisp uses s-expressions to represent both source code and data. Therefore, a properly formatted s-expression is easily "read" then "evaluated" within the top-level REPL (read-eval-print-loop).

CL-USER> (read-from-string "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))")
((DATA "quoted data" 123 4.5) (DATA (!@# (4.5) "[more" "data]")))
65
CL-USER> (caar *)  ;;The first element from the first sublist is DATA
DATA
CL-USER> (eval (read-from-string "(concatenate 'string \"foo\" \"bar\")"))  ;;An example with evaluation
"foobar"
CL-USER>

Unfortunately, our pointy-haired boss has asked you to write a parser for an unusual s-expression syntax that uses square brackets instead of parenthesis. In most programming languages, this would necessitate writing an entire parser. Fortunately, the Common Lisp reader can be modified through the use of macro-characters to accomplish this task. When the reader parses a macro-character token, a function associated with the macro-character is called. As evidenced below, modifying the behavior of the Lisp reader by setting macro-character functions to handle additional sytax requires far less work than writing a complete parser from scratch.

(defun lsquare-reader (stream char)
  (declare (ignore char))
  (read-delimited-list #\] stream t))
(set-macro-character #\[ #'lsquare-reader)               ;;Call the lsquare-reader function when a '[' token is parsed
(set-macro-character #\] (get-macro-character #\) nil))  ;;Do the same thing as ')' when a ']' token is parsed

Unit test code:

;;A list of unit tests.  Each test is a cons in which the car (left side) contains the 
;;test string and the cdr (right side) the expected result of reading the S-Exp.
(setf unit-tests 
      (list 
       (cons "[]" NIL)
       (cons "[a b c]" '(a b c))
       (cons "[\"abc\" \"def\"]" '("abc" "def"))
       (cons "[1 2 [3 4 [5]]]" '(1 2 (3 4 (5))))
       (cons "[\"(\" 1 2 \")\"]" '("(" 1 2 ")"))
       (cons "[4/8 3/6 2/4]" '(1/2 1/2 1/2))
       (cons "[reduce #'+ '[1 2 3]]" '(reduce #'+ '(1 2 3)))))

(defun run-tests () 
  (dolist (test unit-tests)
    (format t "String: ~23s  Expected: ~23s  Actual: ~s~%"
	    (car test) (cdr test) (read-from-string (car test)))))

Unit test output:

CL-USER> (run-tests)
String: "[]"                     Expected: NIL                      Actual: NIL
String: "[a b c]"                Expected: (A B C)                  Actual: (A B C)
String: "[\"abc\" \"def\"]"      Expected: ("abc" "def")            Actual: ("abc" "def")
String: "[1 2 [3 4 [5]]]"        Expected: (1 2 (3 4 (5)))          Actual: (1 2 (3 4 (5)))
String: "[\"(\" 1 2 \")\"]"      Expected: ("(" 1 2 ")")            Actual: ("(" 1 2 ")")
String: "[4/8 3/6 2/4]"          Expected: (1/2 1/2 1/2)            Actual: (1/2 1/2 1/2)
String: "[reduce #'+ '[1 2 3]]"  Expected: (REDUCE #'+ '(1 2 3))    Actual: (REDUCE #'+ '(1 2 3))

Error testing:

(read-from-string "[\"ab\"\"]")  ;;Error:  Object ends with a string.
(read-from-string "[)")          ;;Error:  An object cannot start with ')'
(read-from-string "(]")          ;;Error:  An object cannot start with ']'

Task output:

CL-USER>  (setf task "[[data \"quoted data\" 123 4.5] [data [!@# [4.5] \"[more\" \"data]\"]]]")
"[[data \"quoted data\" 123 4.5] [data [!@# [4.5] \"[more\" \"data]\"]]]"
CL-USER> (read-from-string task)
((DATA "quoted data" 123 4.5) (DATA (!@# (4.5) "[more" "data]")))
65
CL-USER> (setf testing *)  ;; * is the previous result in SLIME
((DATA "quoted data" 123 4.5) (DATA (!@# (4.5) "[more" "data]")))
CL-USER> (car testing)
(DATA "quoted data" 123 4.5)
CL-USER> (caar testing)
DATA
CL-USER> (cdar testing)
("quoted data" 123 4.5)
CL-USER> (cadar testing)
"quoted data"
CL-USER>

Writing S-Expressions

The next step in this task is to write a standard Lisp s-expression in the square bracket notation.

(defun write-sexp (sexp)
  "Writes a Lisp s-expression in square bracket notation."
  (labels ((parse (sexp)
	     (cond ((null sexp) "")
		   ((atom sexp) (format nil "~s " sexp))
		   ((listp sexp) 
		    (concatenate 
		     'string 
		     (if (listp (car sexp))
			 (concatenate 'string "[" 
				      (fix-spacing (parse (car sexp))) 
				      "] ")
			 (parse (car sexp)))
		     (parse (cdr sexp))))))
	   (fix-spacing (str)
	     (let ((empty-string ""))
	       (unless (null str)
		 (if (equal str empty-string)
		     empty-string
		     (let ((last-char (1- (length str))))
		       (if (eq #\Space (char str last-char))
			   (subseq str 0 last-char)
			   str)))))))
    (concatenate 'string "[" (fix-spacing (parse sexp)) "]")))

Unit test code:

(setf unit-tests '(((1 2) (3 4)) (1 2 3 4) ("ab(cd" "mn)op")
		   (1 (2 (3 (4)))) ((1) (2) (3)) ()))

(defun run-tests () 
  (dolist (test unit-tests)
    (format t "Before: ~18s  After: ~s~%"
	    test (write-sexp test))))

Unit test output:

CL-USER> (run-tests)
Before: ((1 2) (3 4))       After: "[[1 2] [3 4]]"
Before: (1 2 3 4)           After: "[1 2 3 4]"
Before: ("ab(cd" "mn)op")   After: "[\"ab(cd\" \"mn)op\"]"
Before: (1 (2 (3 (4))))     After: "[1 [2 [3 [4]]]]"
Before: ((1) (2) (3))       After: "[[1] [2] [3]]"
Before: NIL                 After: "[]"

Finally, round trip output for the original task example:

CL-USER> task
"[[data \"quoted data\" 123 4.5] [data [!@# [4.5] \"[more\" \"data]\"]]]"
CL-USER> (read-from-string task)
((DATA "quoted data" 123 4.5) (DATA (!@# (4.5) "[more" "data]")))
65
CL-USER> (write-sexp *)  ; * is the previous return value
"[[DATA \"quoted data\" 123 4.5] [DATA [!@# [4.5] \"[more\" \"data]\"]]]"
CL-USER>

Cowgol

This parser allows both the use of quotes in symbols (e.g. abc"def), as well as escaped quotes in quoted strings (e.g. "Hello \" world"). Integer numbers are recognized and stored as such, but since Cowgol does not have a native floating point type, floating point numbers are not.

include "cowgol.coh";
include "strings.coh";
include "malloc.coh";

const MAXDEPTH := 256; # Maximum depth (used for stack sizes)
const MAXSTR := 256;   # Maximum string length

# Type markers
const T_ATOM := 1;
const T_STRING := 2;
const T_NUMBER := 3;
const T_LIST := 4;

# Value union
record SVal is
    number @at(0): int32;
    string @at(0): [uint8]; # also used for atoms
    list @at(0): [SExp];
end record;

# Holds a linked list of items
record SExp is  
    type: uint8;
    next: [SExp];
    val: SVal;
end record;

# Free an S-Expression
sub FreeSExp(exp: [SExp]) is
    var stack: [SExp][MAXDEPTH];
    stack[0] := exp;
    var sp: @indexof stack := 1;
    
    while sp > 0 loop
        sp := sp - 1;
        exp := stack[sp];
        while exp != 0 as [SExp] loop
            var next := exp.next;
            case exp.type is
                when T_ATOM: 
                    Free(exp.val.string);
                when T_STRING: 
                    Free(exp.val.string);
                when T_LIST:
                    stack[sp] := exp.val.list;
                    sp := sp + 1;
            end case;
            Free(exp as [uint8]);
            exp := next;
        end loop;
    end loop;
end sub;

# Build an S-Expression
sub ParseSExp(in: [uint8]): (out: [SExp]) is
    out := 0 as [SExp];
    
    sub SkipSpace() is
        while ([in] != 0) and ([in] <= 32) loop
            in := @next in;
        end loop;
    end sub;
    
    sub AtomEnd(): (space: [uint8]) is
        space := in;
        while ([space] > 32) 
        and   ([space] != '(')
        and   ([space] != ')') loop
            space := @next space;
        end loop;
    end sub;
    
    record Stk is
        start: [SExp];
        cur: [SExp];
    end record;

    var strbuf: uint8[MAXSTR];
    var stridx: @indexof strbuf := 0;
    var item: [SExp];
    var stack: Stk[MAXDEPTH];
    var sp: @indexof stack := 0;
    stack[0].start := 0 as [SExp];
    stack[0].cur := 0 as [SExp];
    
    sub Store(item: [SExp]) is
        if stack[sp].start == 0 as [SExp] then
            stack[sp].start := item;
        end if;
        if stack[sp].cur != 0 as [SExp] then
            stack[sp].cur.next := item;
        end if;
        stack[sp].cur := item;
    end sub;
    
    # called on error to clean up memory
    sub FreeAll() is
        loop    
            FreeSExp(stack[sp].start);
            stack[sp].start := 0;
            if sp == 0 then break; end if;
            sp := sp - 1;
        end loop;
    end sub;
    
    loop
        SkipSpace();
        case [in] is
            when 0: break;
            when '"': 
                var escape: uint8 := 0;   
                stridx := 0;
                loop
                    in := @next in;
                    if [in] == 0 then break;
                    elseif escape == 1 then
                        strbuf[stridx] := [in];
                        stridx := stridx + 1;
                        escape := 0;
                    elseif [in] == '\\' then escape := 1;
                    elseif [in] == '"' then break;
                    else
                        strbuf[stridx] := [in];
                        stridx := stridx + 1;
                    end if;
                end loop;
                
                if [in] == 0 then
                    # missing _"_
                    FreeAll();
                    return;
                end if;
                in := @next in;
                strbuf[stridx] := 0;
                stridx := stridx + 1;
                
                item := Alloc(@bytesof SExp) as [SExp];
                item.type := T_STRING;
                item.val.string := Alloc(stridx as intptr);
                CopyString(&strbuf[0], item.val.string);
                Store(item);
            when '(':
                in := @next in;
                sp := sp + 1;
                stack[sp].start := 0 as [SExp];
                stack[sp].cur := 0 as [SExp];
            when ')':
                in := @next in;
                if sp == 0 then
                    # stack underflow, error
                    FreeAll();
                    return;
                else
                    item := Alloc(@bytesof SExp) as [SExp];
                    item.type := T_LIST;
                    item.val.list := stack[sp].start;
                    sp := sp - 1;
                    Store(item);
                end if;
            when else:
                var aend := AtomEnd();
                item := Alloc(@bytesof SExp) as [SExp];
                
                # if this is a valid integer number then store as number
                var ptr: [uint8];
                (item.val.number, ptr) := AToI(in);
                if ptr == aend then
                    # a number was parsed and the whole atom consumed
                    item.type := T_NUMBER;
                else
                    # not a valid integer number, store as atom
                    item.type := T_ATOM;
                    var length := aend - in;
                    item.val.string := Alloc(length + 1);
                    MemCopy(in, length, item.val.string);
                    [item.val.string + length] := 0;
                end if;
                in := aend;
                Store(item);
        end case;
    end loop;
    
    if sp != 0 then
        # unterminated list!
        FreeAll();
        return;
    else
        # return start of list
        out := stack[0].start;
    end if;
end sub;

# Prettyprint an S-Expression with types 
sub prettyprint(sexp: [SExp]) is
    sub PrintNum(n: int32) is
        var buf: uint8[16];
        [IToA(n, 10, &buf[0])] := 0;
        print(&buf[0]);
    end sub;
    
    sub PrintQuoteStr(s: [uint8]) is
        print_char('"');
        while [s] != 0 loop
            if [s] == '"' or [s] == '\\' then 
                print_char('\\');
            end if;
            print_char([s]);
            s := @next s;
        end loop;
        print_char('"');
    end sub;
    
    var stack: [SExp][MAXDEPTH];
    var sp: @indexof stack := 1;
    stack[0] := sexp;
    
    sub Indent(n: @indexof stack) is
        while n > 0 loop
            print("    ");
            n := n - 1;
        end loop;
    end sub;  
  
    loop
        sp := sp - 1;
        while stack[sp] != 0 as [SExp] loop
            Indent(sp);
            case stack[sp].type is
                when T_ATOM: 
                    print(stack[sp].val.string);
                    print(" :: Atom");
                    stack[sp] := stack[sp].next;
                when T_STRING: 
                    PrintQuoteStr(stack[sp].val.string);
                    print(" :: String");
                    stack[sp] := stack[sp].next;
                when T_NUMBER: 
                    PrintNum(stack[sp].val.number);
                    print(" :: Number");
                    stack[sp] := stack[sp].next;
                when T_LIST:
                    print_char('(');
                    sp := sp + 1;
                    stack[sp] := stack[sp-1].val.list;
                    stack[sp-1] := stack[sp-1].next;
            end case;
            print_nl();
        end loop;
        if sp == 0 then
            break;
        end if;
        Indent(sp-1);
        print_char(')');
        print_nl();
    end loop;
end sub;

var str := "((data \"quoted data\" 123 4.5)\n (data (!@# (4.5) \"(more\" \"data)\")))"; 

print("Input:\n");
print(str);
print_nl();

print("Parsed:\n");
prettyprint(ParseSExp(str));
print_nl();

Output:

Input:
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
Parsed:
(
    (
        data :: Atom
        "quoted data" :: String
        123 :: Number
        4.5 :: Atom
    )
    (
        data :: Atom
        (
            !@# :: Atom
            (
                4.5 :: Atom
            )
            "(more" :: String
            "data)" :: String
        )
    )
)

D

import std.stdio, std.conv, std.algorithm, std.variant, std.uni,
       std.functional, std.string;

alias Sexp = Variant;

struct Symbol {
    private string name;
    string toString() @safe const pure nothrow { return name; }
}

Sexp parseSexp(string txt) @safe pure /*nothrow*/ {
    static bool isIdentChar(in char c) @safe pure nothrow {
        return c.isAlpha || "0123456789!@#-".representation.canFind(c);
    }

    size_t pos = 0;

    Sexp _parse() /*nothrow*/ {
        auto i = pos + 1;
        scope (exit)
            pos = i;
        if (txt[pos] == '"') {
            while (txt[i] != '"' && i < txt.length)
                i++;
            i++;
            return Sexp(txt[pos + 1 .. i - 1]);
        } else if (txt[pos].isNumber) {
            while (txt[i].isNumber && i < txt.length)
                i++;
            if (txt[i] == '.') {
                i++;
                while (txt[i].isNumber && i < txt.length)
                    i++;
                auto aux = txt[pos .. i]; //
                return aux.parse!double.Sexp;
            }
            auto aux = txt[pos .. i]; //
            return aux.parse!ulong.Sexp;
        } else if (isIdentChar(txt[pos])) {
            while (isIdentChar(txt[i]) && i < txt.length)
                i++;
            return Sexp(Symbol(txt[pos .. i]));
        } else if (txt[pos] == '(') {
            Sexp[] lst;
            while (txt[i] != ')') {
                while (txt[i].isWhite)
                    i++;
                pos = i;
                lst ~= _parse;
                i = pos;
                while (txt[i].isWhite)
                    i++;
            }
            i = pos + 1;
            return Sexp(lst);
        }
        return Sexp(null);
    }

    txt = txt.find!(not!isWhite);
    return _parse;
}

void writeSexp(Sexp expr) {
    if (expr.type == typeid(string)) {
        write('"', expr, '"');
    } else if (expr.type == typeid(Sexp[])) {
        '('.write;
        auto arr = expr.get!(Sexp[]);
        foreach (immutable i, e; arr) {
            e.writeSexp;
            if (i + 1 < arr.length)
                ' '.write;
        }
        ')'.write;
    } else {
        expr.write;
    }
}

void main() {
    auto pTest = `((data "quoted data" 123 4.5)
                   (data (!@# (4.5) "(more" "data)")))`.parseSexp;
    writeln("Parsed: ", pTest);
    "Printed: ".write;
    pTest.writeSexp;
}

Output:

Parsed: [[data, quoted data, 123, 4.5], [data, [!@#, [4.5], (more, data)]]]
Printed: ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

EchoLisp

The (read-from-string input-string) function parses a string into an s-expression, which is the native representation of program/data in EchoLisp and the majority of Lisps .

(define input-string #'((data "quoted data" 123 4.5)\n(data (!@# (4.5) "(more" "data)")))'#)

input-string
    → "((data "quoted data" 123 4.5)
    (data (!@# (4.5) "(more" "data)")))"

(define s-expr (read-from-string input-string))
s-expr
    → ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

(first s-expr)
    → (data "quoted data" 123 4.5)
(first(first s-expr))
    → data
(first(rest s-expr))
    → (data (!@# (4.5) "(more" "data)"))

F#

Implementation of S-expression parser in F# 4.7 language.

Visual Studio COmmunity 2019 Edition - Version 16.4.5.

Learn more about F# at https://fsharp.org

The file SExpr.fs containing the implementation:

module SExpr
(* This module is a very simple port of the OCaml version to F# (F-Sharp) *)
(* The original OCaml setatment is comment out and the F# statement(s) follow *)
(* Port performed by Bob Elward 23 Feb 2020 *)

(* The .Net standard would use "+" and not "^" for string concatenation *)
(* I kept the "^" to be compatable with the "ml" standard *)
(* The line below eliminates the warning/suggestion to use "+" *)
#nowarn "62"

(** This module is a very simple parsing library for S-expressions. *)
(* Copyright (C) 2009  Florent Monnier, released under MIT license. *)
(* modified to match the task description *)

(* Code obtained from: https://rosettacode.org/wiki/S-Expressions#OCaml *)
 
(* Note: The type below defines the grammar for this S-Expression (S-Expr). 
         An S-Expr is either an Atom or an S-Expr 
*)

open System.Text
open System.IO
open System

type sexpr = Atom of string | Expr of sexpr list

type state =
 | Parse_root of sexpr list
 | Parse_content of sexpr list
 | Parse_word of StringBuilder * sexpr list
 | Parse_string of bool * StringBuilder * sexpr list

let parse pop_char =
 let rec aux st =
   match pop_char() with
   | None ->
       begin match st with
       | Parse_root sl -> (List.rev sl)
       | Parse_content _
       | Parse_word _
       | Parse_string _ ->
           failwith "Parsing error: content not closed by parenthesis"
       end
   | Some c ->
       match c with
       | '(' ->
           begin match st with
           | Parse_root sl ->
               let this = aux(Parse_content []) in
               aux(Parse_root((Expr this)::sl))
           | Parse_content sl ->
               let this = aux(Parse_content []) in
               aux(Parse_content((Expr this)::sl))
           | Parse_word(w, sl) ->
               let this = aux(Parse_content []) in
               (* aux(Parse_content((Expr this)::Atom(Buffer.contents w)::sl)) *)
               aux(Parse_content((Expr this)::Atom(w.ToString())::sl))
           | Parse_string(_, s, sl) ->
               //Buffer.add_char s c;
               s.Append(c) |> ignore;
               aux(Parse_string(false, s, sl))
           end
       | ')' ->
           begin match st with
           | Parse_root sl ->
               failwith "Parsing error: closing parenthesis without openning"
           | Parse_content sl -> (List.rev sl)
           (* | Parse_word(w, sl) -> List.rev(Atom(Buffer.contents w)::sl) *)
           | Parse_word(w, sl) -> List.rev(Atom(w.ToString())::sl)
           | Parse_string(_, s, sl) ->
               (* Buffer.add_char s c; *)
               s.Append(c) |> ignore;
               aux(Parse_string(false, s, sl))
           end
       | ' ' | '\n' | '\r' | '\t' ->
           begin match st with
           | Parse_root sl -> aux(Parse_root sl)
           | Parse_content sl -> aux(Parse_content sl)
           (* | Parse_word(w, sl) -> aux(Parse_content(Atom(Buffer.contents w)::sl)) *)
           | Parse_word(w, sl) -> aux(Parse_content(Atom(w.ToString())::sl))
           | Parse_string(_, s, sl) ->
               //Buffer.add_char s c;
               s.Append(c) |> ignore;
               aux(Parse_string(false, s, sl))
           end
       | '"' ->
           (* '"' *)
           begin match st with
           | Parse_root _ -> failwith "Parse error: double quote at root level"
           | Parse_content sl ->
               (* let s = Buffer.create 74 in *)
               let s = StringBuilder(74) in
               aux(Parse_string(false, s, sl))
           | Parse_word(w, sl) ->
               (* let s = Buffer.create 74 in *)
               let s = StringBuilder(74) in
               (* aux(Parse_string(false, s, Atom(Buffer.contents w)::sl)) *)
               aux(Parse_string(false, s, Atom(w.ToString())::sl))
           | Parse_string(true, s, sl) ->
               (* Buffer.add_char s c; *)
               s.Append(c) |> ignore;
               aux(Parse_string(false, s, sl))
           | Parse_string(false, s, sl) ->
               (* aux(Parse_content(Atom(Buffer.contents s)::sl)) *)
               aux(Parse_content(Atom(s.ToString())::sl))
           end
       | '\\' ->
           begin match st with
           | Parse_string(true, s, sl) ->
               (* Buffer.add_char s c; *)
               s.Append(c) |> ignore;
               aux(Parse_string(false, s, sl))
           | Parse_string(false, s, sl) ->
               aux(Parse_string(true, s, sl))
           | _ ->
               failwith "Parsing error: escape character in wrong place"
           end
       | _ ->
           begin match st with
           | Parse_root _ ->
               failwith(Printf.sprintf "Parsing error: char '%c' at root level" c)
           | Parse_content sl ->
               (* let w = Buffer.create 16 in *)
               let w = StringBuilder(16) in 
               (* Buffer.add_char w c; *)
               w.Append(c) |> ignore;
               aux(Parse_word(w, sl))
           | Parse_word(w, sl) ->
               (* Buffer.add_char w c; *)
               w.Append(c) |> ignore;
               aux(Parse_word(w, sl))
           | Parse_string(_, s, sl) ->
               (* Buffer.add_char s c; *)
               s.Append(c) |> ignore;
               aux(Parse_string(false, s, sl))
           end
 in
 aux (Parse_root [])


let string_pop_char str =
 let len = String.length str in
 let i = ref(-1) in
 (function () -> incr i; if !i >= len then None else Some(str.[!i]))


let parse_string str =
 parse (string_pop_char str)

(*
let ic_pop_char ic =
 (function () ->
    try Some(input_char ic)
    with End_of_file -> (None))
*)

let ic_pop_char (ic:TextReader) =
  (function () -> try Some(Convert.ToChar(ic.Read()))
                  with _End_of_file -> (None)
  )


let parse_ic ic =
 parse (ic_pop_char ic)


let parse_file filename =
 (* let ic = open_in filename in *)
 let ic = File.OpenText filename in
 let res = parse_ic ic in
 (* close_in ic; *)
 ic.Close();
 (res)


let quote s =
 "\"" ^ s ^ "\""

let needs_quote s =
 (* List.exists (String.contains s) [' '; '\n'; '\r'; '\t'; '('; ')'] *)
 List.exists (fun elem -> (String.exists (fun c -> c = elem) s)) [' '; '\n'; '\r'; '\t'; '('; ')']

let protect s =
 (* There is no need to "escape" .Net strings the framework takes care of this *)
 (* let s = String.escaped s in *)
 if needs_quote s then quote s else s


let string_of_sexpr s =
 let rec aux acc = function
 | (Atom tag)::tl -> aux ((protect tag)::acc) tl
 | (Expr e)::tl ->
     let s =
       "(" ^
       (String.concat " " (aux [] e))
       ^ ")"
     in
     aux (s::acc) tl
 | [] -> (List.rev acc)
 in
 String.concat " " (aux [] s)


let print_sexpr s =
 (* print_endline (string_of_sexpr s) *)
 printfn "%s" (string_of_sexpr s)


let string_of_sexpr_indent s =
 let rec aux i acc = function
 | (Atom tag)::tl -> aux i ((protect tag)::acc) tl
 | (Expr e)::tl ->
     let s =
       (*
       "\n" ^ (String.make i ' ') ^ "(" ^
       (String.concat " " (aux (succ i) [] e))
       ^ ")"
       *)
       "\n" ^ (String.replicate i " ") ^ "(" ^
       (String.concat " " (aux (i + 1) [] e))
       ^ ")"
     in
     aux i (s::acc) tl
 | [] -> (List.rev acc)
 in
 String.concat "\n" (aux 0 [] s)


let print_sexpr_indent s =
 (* print_endline (string_of_sexpr_indent s) *)
 printfn "%s" (string_of_sexpr_indent s)

The file Program.fs containing the main module to call the S-Expression parser and printer. Two options are shown: Read the experession from a file of preset it in the code.

module Program
(* Learn more about F# at https://fsharp.org *)

open System

[<EntryPoint>]
let main argv =
    let sexpr =
    
        (* Data from file supplied at runtime or a preset string? *)
        if argv.Length > 0 then
            (* Data from file supplied at runtime *)
            begin
                (* Get the file to parse *)
                let name = argv.[0] in
    
                (* parse the program file *)
                SExpr.parse_file name
            end
        else
            (* Data from a preset string *)
            begin
                (* preset the string *)
                let data= "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))" in

                (* parse the program file *)
                SExpr.parse_string data
            end
    
    (* Print the parsed program token list *)
    (printf "\nSExpr: \n");
    SExpr.print_sexpr sexpr;
    (printf "\nSExpr - Indented: \n");
    SExpr.print_sexpr_indent sexpr;
    
    (* return an integer exit code *)
    0

Output:


SExpr:
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

SExpr - Indented:

(
 (data "quoted data" 123 4.5)
 (data
  (!@#
   (4.5) "(more" "data)")))

Factor

Factor's nested sequences are close enough to s-expressions that in most cases we can simply eval s-expression strings after some minor whitespace/bracket/parenthesis transformations. However, if we wish to support symbols, this approach becomes complicated because symbols need to be declared before use. This means we need to go into the string and identify them, so we may as well parse the s-expression properly while we're there.

We have a nice tool at our disposal for doing this. In its standard library, Factor contains a domain-specific language for defining extended Backus-Naur form (EBNF) grammars. EBNF is a convenient, declarative way to describe different parts of a grammar where later rules build on earlier ones until the final rule defines the entire grammar. Upon calling the word defined by EBNF:, an input string will be tokenized according to the declared rules and stored in an abstract syntax tree.

To get an idea of how this works, look at the final rule. It declares that an s-expression is any number of objects (comprised of numbers, floats, strings, and symbols) and s-expressions (the rule is recursive, allowing for nested s-expressions) surrounded by parenthesis which are in turn surrounded by any amount of whitespace. This implementation of EBNF allows us to define actions: the quotation after the => is called on the rule token just before being added to the abstract syntax tree. This is convenient for our use case where we need to parse different types of objects into our sequence structure.

Factor has a comprehensive prettyprinter which can print any Factor object in a readable way. Not only can we leverage it to easily print our native data structure, but we can also call unparse to convert it to a string. This leaves us with a string reminiscent of the original input, and we are able to take it the rest of the way with two simple regular expressions.

USING: formatting kernel math.parser multiline peg peg.ebnf
regexp sequences prettyprint words ;
IN: rosetta-code.s-expressions

STRING: input
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
;

EBNF: sexp>seq [=[
  ws     = [\n\t\r ]* => [[ drop ignore ]]
  digits = [0-9]+
  number = digits => [[ string>number ]]
  float  = digits:a "." digits:b => [[ a b "." glue string>number ]]
  string = '"'~ [^"]* '"'~ => [[ "" like ]]
  symbol = [!#-'*-~]+ => [[ "" like <uninterned-word> ]]
  object = ws ( float | number | string | symbol ) ws
  sexp   = ws "("~ ( object | sexp )* ")"~ ws => [[ { } like ]]
]=]

: seq>sexp ( seq -- str )
    unparse R/ {\s+/ "(" R/ \s+}/ ")" [ re-replace ] 2bi@ ;
    
input [ "Input:\n%s\n\n" printf ] [
    sexp>seq dup seq>sexp
    "Native:\n%u\n\nRound trip:\n%s\n" printf
] bi

Output:

Input:
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

Native:
{
    { data "quoted data" 123 4.5 }
    { data { !@# { 4.5 } "(more" "data)" } }
}

Round trip:
((data "quoted data" 123 4.5)
    (data (!@# (4.5) "(more" "data)")))

Go

package main

import (
    "errors"
    "fmt"
    "reflect"
    "strconv"
    "strings"
    "unicode"
)

var input = `((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))`

func main() {
    fmt.Println("input:")
    fmt.Println(input)

    s, err := parseSexp(input)
    if err != nil {
        fmt.Println("error:", err)
        return
    }

    fmt.Println("\nparsed:")
    fmt.Println(s)

    fmt.Println("\nrepresentation:")
    s.dump(0)
}

// dynamic types for i are string, qString, int, float64, list, and error.
type sexp struct {
    i interface{}
}
type qString string
type list []sexp

func (s sexp) String() string {
    return fmt.Sprintf("%v", s.i)
}

func (q qString) String() string {
    return strconv.Quote(string(q))
}

func (l list) String() string {
    if len(l) == 0 {
        return "()"
    }
    b := fmt.Sprintf("(%v", l[0])
    for _, s := range l[1:] {
        b = fmt.Sprintf("%s %v", b, s)
    }
    return b + ")"
}

// parseSexp parses a string into a Go representation of an s-expression.
//
// Quoted strings go from one " to the next.  There is no escape character,
// all characters except " are valid.
//
// Otherwise atoms are any string of characters between any of '(', ')',
// '"', or white space characters.  If the atom parses as a Go int type
// using strconv.Atoi, it is taken as int; if it parses as a Go float64
// type using strconv.ParseFloat, it is taken as float64; otherwise it is
// taken as an unquoted string.
//
// Unmatched (, ), or " are errors.
// An empty or all whitespace input string is an error.
// Left over text after the sexp is an error.
//
// An empty list is a valid sexp, but there is no nil, no cons, no dot.
func parseSexp(s string) (sexp, error) {
    s1, rem := ps2(s, -1)
    if err, isErr := s1.i.(error); isErr {
        return sexp{}, err
    }
    if rem > "" {
        return s1, errors.New("Left over text: " + rem)
    }
    return s1, nil
}

// recursive.  n = -1 means not parsing a list.  n >= 0 means the number
// of list elements parsed so far.  string result is unparsed remainder
// of the input string s0.
func ps2(s0 string, n int) (x sexp, rem string) {
    tok, s1 := gettok(s0)
    switch t := tok.(type) {
    case error:
        return sexp{tok}, s1
    case nil: // this is also an error
        if n < 0 {
            return sexp{errors.New("blank input string")}, s0
        } else {
            return sexp{errors.New("unmatched (")}, ""
        }
    case byte:
        switch {
        case t == '(':
            x, s1 = ps2(s1, 0) // x is a list
            if _, isErr := x.i.(error); isErr {
                return x, s0
            }
        case n < 0:
            return sexp{errors.New("unmatched )")}, ""
        default:
            // found end of list.  allocate space for it.
            return sexp{make(list, n)}, s1
        }
    default:
        x = sexp{tok} // x is an atom
    }
    if n < 0 {
        // not in a list, just return the s-expression x
        return x, s1
    }
    // in a list.  hold on to x while we parse the rest of the list.
    l, s1 := ps2(s1, n+1)
    // result l is either an error or the allocated list, not completely
    // filled in yet.
    if _, isErr := l.i.(error); !isErr {
        // as long as no errors, drop x into its place in the list
        l.i.(list)[n] = x
    }
    return l, s1
}
    
// gettok gets one token from string s.
// return values are the token and the remainder of the string.
// dynamic type of tok indicates result:
// nil:  no token.  string was empty or all white space.
// byte:  one of '(' or ')'
// otherwise string, qString, int, float64, or error.
func gettok(s string) (tok interface{}, rem string) {
    s = strings.TrimSpace(s)
    if s == "" {
        return nil, ""
    }
    switch s[0] {
    case '(', ')':
        return s[0], s[1:]
    case '"': 
        if i := strings.Index(s[1:], `"`); i >= 0 {
            return qString(s[1 : i+1]), s[i+2:]
        } 
        return errors.New(`unmatched "`), s
    }
    i := 1
    for i < len(s) && s[i] != '(' && s[i] != ')' && s[i] != '"' &&
        !unicode.IsSpace(rune(s[i])) {
        i++
    }
    if j, err := strconv.Atoi(s[:i]); err == nil {
        return j, s[i:]
    }
    if f, err := strconv.ParseFloat(s[:i], 64); err == nil {
        return f, s[i:]
    }
    return s[:i], s[i:]
}

func (s sexp) dump(i int) {
    fmt.Printf("%*s%v: ", i*3, "", reflect.TypeOf(s.i))
    if l, isList := s.i.(list); isList {
        fmt.Println(len(l), "elements")
        for _, e := range l {
            e.dump(i + 1)
        }
    } else {
        fmt.Println(s.i)
    }
}

Output:

input:
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

parsed:
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

representation:
main.list: 2 elements
   main.list: 4 elements
      string: data
      main.qString: "quoted data"
      int: 123
      float64: 4.5
   main.list: 2 elements
      string: data
      main.list: 4 elements
         string: !@#
         main.list: 1 elements
            float64: 4.5
         main.qString: "(more"
         main.qString: "data)"

Haskell

import qualified Data.Functor.Identity as F
import qualified Text.Parsec.Prim as Prim
import Text.Parsec
       ((<|>), (<?>), many, many1, char, try, parse, sepBy, choice,
        between)
import Text.Parsec.Token
       (integer, float, whiteSpace, stringLiteral, makeTokenParser)
import Text.Parsec.Char (noneOf)
import Text.Parsec.Language (haskell)

data Val
  = Int Integer
  | Float Double
  | String String
  | Symbol String
  | List [Val]
  deriving (Eq, Show)

tProg :: Prim.ParsecT String a F.Identity [Val]
tProg = many tExpr <?> "program"
  where
    tExpr = between ws ws (tList <|> tAtom) <?> "expression"
    ws = whiteSpace haskell
    tAtom =
      (try (Float <$> float haskell) <?> "floating point number") <|>
      (try (Int <$> integer haskell) <?> "integer") <|>
      (String <$> stringLiteral haskell <?> "string") <|>
      (Symbol <$> many1 (noneOf "()\"\t\n\r ") <?> "symbol") <?>
      "atomic expression"
    tList = List <$> between (char '(') (char ')') (many tExpr) <?> "list"

p :: String -> IO ()
p = either print (putStrLn . unwords . map show) . parse tProg ""

main :: IO ()
main = do
  let expr =
        "((data \"quoted data\" 123 4.5)\n  (data (!@# (4.5) \"(more\" \"data)\")))"
  putStrLn ("The input:\n" ++ expr ++ "\n\nParsed as:")
  p expr

Output:

The input:
((data "quoted data" 123 4.5)
  (data (!@# (4.5) "(more" "data)")))

Parsed as:
List [List [Symbol "data",String "quoted data",Int 123,Float 4.5],List [Symbol "data",List [Symbol "!@#",List [Float 4.5],String "(more",String "data)"]]]

Or, parsing by hand (rather than with a parser combinator library) and printing a parse tree diagram:

{-# LANGUAGE TupleSections #-}
 
import Data.Bifunctor (bimap)
import Data.List (mapAccumL)
import Data.List.Split (splitOn)
import Data.Maybe (catMaybes, fromMaybe, listToMaybe)
import Data.Tree (Forest, Tree (..), drawForest)
 
------------------------ DATA TYPE -----------------------
data Val
  = Int Integer
  | Float Double
  | String String
  | Symbol String
  | List [Val]
  deriving (Eq, Show, Read)
 
instance Semigroup Val where
  List a <> List b = List (a <> b)
 
instance Monoid Val where
  mempty = List []
 
--------------------------- MAIN -------------------------
main :: IO ()
main = do
  let expr =
        unlines
          [ "((data \"quoted data\" 123 4.5)",
            "  (data (!@# (4.5) \"(more\" \"data)\")))"
          ]
      parse = fst (parseExpr (tokenized expr))
 
  putStrLn $ treeDiagram $ forestFromVal parse
  putStrLn "Serialized from the parse tree:\n"
  putStrLn $ litVal parse
 
------------------- S-EXPRESSION PARSER ------------------
 
parseExpr :: [String] -> (Val, [String])
parseExpr = until finished parseToken . (mempty,)
 
finished :: (Val, [String]) -> Bool
finished (_, []) = True
finished (_, token : _) = ")" == token
 
parseToken :: (Val, [String]) -> (Val, [String])
parseToken (v, "(" : rest) =
  bimap
    ((v <>) . List . return)
    tail
    (parseExpr rest)
parseToken (v, ")" : rest) = (v, rest)
parseToken (v, t : rest) = (v <> List [atom t], rest)
 
----------------------- TOKEN PARSER ---------------------
 
atom :: String -> Val
atom [] = mempty
atom s@('"' : _) =
  fromMaybe mempty (maybeRead ("String " <> s))
atom s =
  headDef (Symbol s) $
    catMaybes $
      maybeRead . (<> (' ' : s)) <$> ["Int", "Float"]
 
maybeRead :: String -> Maybe Val
maybeRead = fmap fst . listToMaybe . reads
 
----------------------- TOKENIZATION ---------------------
 
tokenized :: String -> [String]
tokenized s = quoteTokens '"' s >>= go
  where
    go [] = []
    go token@('"' : _) = [token]
    go s = words $ spacedBrackets s
 
quoteTokens :: Char -> String -> [String]
quoteTokens q s = snd $ mapAccumL go False (splitOn [q] s)
  where
    go b s
      | b = (False, '"' : s <> "\"")
      | otherwise = (True, s)
 
spacedBrackets :: String -> String
spacedBrackets [] = []
spacedBrackets (c : cs)
  | c `elem` "()" = ' ' : c : " " <> spacedBrackets cs
  | otherwise = c : spacedBrackets cs
 
------------------------- DIAGRAMS -----------------------
 
treeDiagram :: Forest Val -> String
treeDiagram = drawForest . fmap (fmap show)
 
forestFromVal :: Val -> Forest Val
forestFromVal (List xs) = treeFromVal <$> xs
 
treeFromVal :: Val -> Tree Val
treeFromVal (List xs) =
  Node (Symbol "List") (treeFromVal <$> xs)
treeFromVal v = Node v []
 
---------------------- SERIALISATION ---------------------
 
litVal (Symbol x) = x
litVal (Int x) = show x
litVal (Float x) = show x
litVal (String x) = '"' : x <> "\""
litVal (List [List xs]) = litVal (List xs)
litVal (List xs) = '(' : (unwords (litVal <$> xs) <> ")")
 
------------------------- GENERIC ------------------------
 
headDef :: a -> [a] -> a
headDef d [] = d
headDef _ (x : _) = x

Output:

Symbol "List"
|
+- Symbol "List"
|  |
|  +- Symbol "data"
|  |
|  +- String "quoted data"
|  |
|  +- Int 123
|  |
|  `- Float 4.5
|
`- Symbol "List"
   |
   +- Symbol "data"
   |
   `- Symbol "List"
      |
      +- Symbol "!@#"
      |
      +- Symbol "List"
      |  |
      |  `- Float 4.5
      |
      +- String "(more"
      |
      `- String "data)"


Serialized from the parse tree:

((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Icon and Unicon

The following should suffice as a demonstration.
String escaping and quoting could be handled more robustly.

The example takes single and double quotes.
Single quotes were used instead of doubles in the input.

link ximage

procedure main()
  in := "((data 'quoted data' 123 4.5) (data (!@# (4.5) '(more' 'data)')))"
  # in := map(in,"'","\"") # uncomment to put back double quotes if desired
  write("Input:  ",image(in))
  write("Structure: \n",ximage(S := string2sexp(in)))
  write("Output:  ",image(sexp2string(S)))
end

procedure sexp2string(S)   #: return a string representing the s-expr
   s := ""
   every t := !S do {
      if type(t) == "list" then 
         s ||:= "(" || trim(sexp2string(t)) || ")"
      else 
         if upto('() \t\r\n',t) then 
            s ||:= "'" || t || "'" 
         else
            s ||:= t 
      s ||:= " "
      }
   return trim(s)
end

procedure string2sexp(s)   #: return a s-expression nested list
   if s ? ( sexptokenize(T := []), pos(0) ) then 
      return sexpnest(T)
   else
      write("Malformed: ",s)
end

procedure sexpnest(T,L)   #: transform s-expr token list to nested list
   /L := []
   while t := get(T) do           
      case t of {
         "("      :  {
                     put(L,[])
                     sexpnest(T,L[*L])
                     }
         ")"      :  return L
         default  :  put(L, numeric(t) | t)
      }
      return L
end

procedure sexptokenize(T) #: return list of tokens parsed from an s-expr string
static sym
initial sym := &letters++&digits++'~`!@#$%^&*_-+|;:.,<>[]{}'
   until pos(0) do 
      case &subject[&pos] of {
         " "   :  tab(many(' \t\r\n'))                     # consume whitespace
         "'"|"\""  : 
            (q := move(1)) & put(T,tab(find(q))) & move(1) # quotes
         "("   :  put(T,move(1)) & sexptokenize(T)         # open      
         ")"   :  put(T,move(1)) &return T                 # close
         default  : put(T, tab(many(sym)))                 # other symbols
         } 
   return T
end

Library: Icon Programming Library

ximage.icn formats arbitrary structures into printable strings

Output:

Input:  "((data 'quoted data' 123 4.5) (data (!@# (4.5) '(more' 'data)')))"
Structure:
L2 := list(1)
   L2[1] := L3 := list(2)
      L3[1] := L4 := list(4)
         L4[1] := "data"
         L4[2] := "quoted data"
         L4[3] := 123
         L4[4] := 4.5
      L3[2] := L5 := list(2)
         L5[1] := "data"
         L5[2] := L6 := list(4)
            L6[1] := "!@#"
            L6[2] := L7 := list(1)
               L7[1] := 4.5
            L6[3] := "(more"
            L6[4] := "data)"
Output:  "((data 'quoted data' 123 4.5) (data (!@# (4.5) '(more' 'data)')))"

J

Since J already has a way of expressing nested lists, this implementation is for illustration purposes only. No attempt is made to handle arrays which are not representable using sexpr syntax.

This implementation does not support escape characters. If escape characters were added, we would need additional support in the tokenizer (an extra character class, and in the state table an extra column and two extra rows, or almost double the number of state transitions: 35 instead of 20), and additional support in the data language (unfmt would need to strip out escape characters and fmt would need to insert escape characters -- so each of these routines would also perhaps double in size.) And that's a lot of bulk for serialize/deserialize mechanism which, by design, cannot represent frequently used data elements (such as matrices and gerunds).

NB. character classes: 0: paren, 1: quote, 2: whitespace, 3: wordforming (default)
chrMap=: '()';'"';' ',LF,TAB,CR

NB. state columns correspond to the above character classes
NB. first digit chooses next state.
NB. second digit is action 0: do nothing, 1: start token, 2: end token
states=: 10 10#: ".;._2]0 :0
  11  21  00  31  NB. state 0: initial state
  12  22  02  32  NB. state 1: after () or after closing "
  40  10  40  40  NB. state 2: after opening "
  12  22  02  30  NB. state 3: after word forming character
  40  10  40  40  NB. state 4: between opening " and closing "
)

tokenize=: (0;states;<chrMap)&;:

rdSexpr=:3 :0 :.wrSexpr
  s=. r=. '' [ 'L R'=. ;:'()'
  for_token. tokenize y do.
    select. token
      case. L do. r=. ''  [ s=. s,<r
      case. R do. s=. }:s [ r=. (_1{::s),<r
      case.   do. r=. r,token
    end.
  end.
  >{.r
)

wrSexpr=: ('(' , ;:^:_1 , ')'"_)^:L.L:1^:L. :.rdSexpr

fmt=: 3 :0 :.unfmt
  if. '"' e. {.y     do. }.,}: y  NB. quoted string
  elseif. 0=#$n=.".y do. n        NB. number or character
  elseif.            do. s:<y     NB. symbol
  end.
)

unfmt=: 3 :0 :.fmt
  select. 3!:0 y
    case. 1;4;8;16;128 do. ":!.20 y
    case. 2;131072     do.
      select. #$y
        case. 0 do. '''',y,''''
        case. 1 do. '"',y,'"'
      end.
    case. 64           do. (":y),'x'
    case. 65536        do. >s:inv y
  end.
)

readSexpr=: fmt L:0 @rdSexpr :.writeSexpr
writeSexpr=: wrSexpr @(unfmt L:0) :.readSexpr

Example use:

   readSexpr '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))'
┌───────────────────────────┬────────────────────────────────┐
│┌─────┬───────────┬───┬───┐│┌─────┬────────────────────────┐│
││`data│quoted data│123│4.5│││`data│┌────┬─────┬─────┬─────┐││
│└─────┴───────────┴───┴───┘││     ││`!@#│┌───┐│(more│data)│││
│                           ││     ││    ││4.5││     │     │││
│                           ││     ││    │└───┘│     │     │││
│                           ││     │└────┴─────┴─────┴─────┘││
│                           │└─────┴────────────────────────┘│
└───────────────────────────┴────────────────────────────────┘
   writeSexpr readSexpr '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))'
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Java

This code is based on [1] and [2] It is available under the GPL. The author is Joel F. Klein. He has graciously given permission to share the code under the FDL for the purpose of publishing it on RosettaCode.

The code as presented here is taken unmodified from an application being developed right now where it is used.

LispTokenizer.java

package jfkbits;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.Reader;
import java.io.StreamTokenizer;
import java.io.StringReader;
import java.util.Iterator;

public class LispTokenizer implements Iterator<Token>
{
    // Instance variables have default access to allow unit tests access.
    StreamTokenizer m_tokenizer;
    IOException m_ioexn;

    /** Constructs a tokenizer that scans input from the given string.
     * @param src A string containing S-expressions.
     */
    public LispTokenizer(String src)
    {
        this(new StringReader(src));
    }

    /** Constructs a tokenizer that scans input from the given Reader.
     * @param r Reader for the character input source
     */
    public LispTokenizer(Reader r)
    {
        if(r == null)
            r = new StringReader("");
        BufferedReader buffrdr = new BufferedReader(r);
        m_tokenizer = new StreamTokenizer(buffrdr);
        m_tokenizer.resetSyntax(); // We don't like the default settings

        m_tokenizer.whitespaceChars(0, ' ');
        m_tokenizer.wordChars(' '+1,255);
        m_tokenizer.ordinaryChar('(');
        m_tokenizer.ordinaryChar(')');
        m_tokenizer.ordinaryChar('\'');
        m_tokenizer.commentChar(';');
        m_tokenizer.quoteChar('"');
    }

    public Token peekToken()
    {	
        if(m_ioexn != null)
            return null;
        try
        {
            m_tokenizer.nextToken();
        }
        catch(IOException e)
        {
            m_ioexn = e;
            return null;
        }
        if(m_tokenizer.ttype == StreamTokenizer.TT_EOF)
            return null;
        Token token = new Token(m_tokenizer);
        m_tokenizer.pushBack();
        return token;
    }

    public boolean hasNext()
    {
        if(m_ioexn != null)
            return false;
        try
        {
            m_tokenizer.nextToken();
        }
        catch(IOException e)
        {
            m_ioexn = e;
            return false;
        }
        if(m_tokenizer.ttype == StreamTokenizer.TT_EOF)
            return false;
        m_tokenizer.pushBack();
        return true;
    }

    /** Return the most recently caught IOException, if any,
     * 
     * @return
     */
    public IOException getIOException()
    {
        return m_ioexn;
    }

    public Token next()
    {
        try
        {
            m_tokenizer.nextToken();
        }
        catch(IOException e)
        {
            m_ioexn = e;
            return null;
        }

        Token token = new Token(m_tokenizer);
        return token;
    }

    public void remove()
    {
    }
}

Token.java

package jfkbits;
import java.io.StreamTokenizer;

public class Token
{
    public static final int SYMBOL = StreamTokenizer.TT_WORD;
    public int type;
    public String text;
    public int line;

    public Token(StreamTokenizer tzr)
    {
        this.type = tzr.ttype;
        this.text = tzr.sval;
        this.line = tzr.lineno();
    }

    public String toString()
    {
        switch(this.type)
        {
            case SYMBOL:
            case '"':
                return this.text;
            default:
                return String.valueOf((char)this.type);
        }
    }
}

Atom.java

package jfkbits;

import jfkbits.LispParser.Expr;

public class Atom implements Expr
{
    String name;
    public String toString()
    {
        return name;
    }
    public Atom(String text)
    {
        name = text;
    }

}

StringAtom.java

package jfkbits;

public class StringAtom extends Atom
{
    public String toString()
    {
        // StreamTokenizer hardcodes escaping with \, and doesn't allow \n inside words
        String escaped = name.replace("\\", "\\\\").replace("\n", "\\n").replace("\r", "\\r").replace("\"", "\\\"");
        return "\""+escaped+"\"";
    }

    public StringAtom(String text)
    {
        super(text);
    }
    public String getValue()
    {
        return name;
    }
}

ExprList.java

package jfkbits;

import java.util.AbstractCollection;
import java.util.Arrays;
import java.util.Iterator;
import java.util.ArrayList;

import jfkbits.LispParser.Expr;

public class ExprList extends ArrayList<Expr> implements Expr
{
    ExprList parent = null;
    int indent =1;

    public int getIndent()
    {
        if (parent != null)
        {
            return parent.getIndent()+indent;
        }
        else return 0;
    }

    public void setIndent(int indent)
    {
        this.indent = indent;
    }



    public void setParent(ExprList parent)
    {
        this.parent = parent;
    }

    public String toString()
    {
        String indent = "";
        if (parent != null && parent.get(0) != this)
        {
            indent = "\n";
            char[] chars = new char[getIndent()];
            Arrays.fill(chars, ' ');
            indent += new String(chars);		
        }

        String output = indent+"(";
        for(Iterator<Expr> it=this.iterator(); it.hasNext(); ) 
        {
            Expr expr = it.next();
            output += expr.toString();
            if (it.hasNext())
                output += " ";
        }
        output += ")";
        return output;
    }

    @Override
    public synchronized boolean add(Expr e)
    {
        if (e instanceof ExprList)
        {
            ((ExprList) e).setParent(this);
            if (size() != 0 && get(0) instanceof Atom)
                ((ExprList) e).setIndent(2);
        }
        return super.add(e);
    }

}

LispParser.java

package jfkbits;


public class LispParser
{
    LispTokenizer tokenizer;

    public LispParser(LispTokenizer input)
    {
        tokenizer=input;
    }

    public class ParseException extends Exception
    {

    }

    public interface Expr
    {
        // abstract parent for Atom and ExprList
    }

    public Expr parseExpr() throws ParseException
    {
        Token token = tokenizer.next();
        switch(token.type)
        {
            case '(': return parseExprList(token);
            case '"': return new StringAtom(token.text);
            default: return new Atom(token.text);
        }
    }


    protected ExprList parseExprList(Token openParen) throws ParseException
    {
        ExprList acc = new ExprList();
        while(tokenizer.peekToken().type != ')')
        {
            Expr element = parseExpr();
            acc.add(element);
        }
        Token closeParen = tokenizer.next();
        return acc;
    }

}

LispParserDemo.java

import jfkbits.ExprList;
import jfkbits.LispParser;
import jfkbits.LispParser.ParseException;
import jfkbits.LispTokenizer;

public class LispParserDemo
{
    public static void main(String args[])
    {

        LispTokenizer tzr = new LispTokenizer(
            "((data \"quoted data\" 123 4.5)\n (data (!@# (4.5) \"(more\" \"data)\")))");
        LispParser parser = new LispParser(tzr);

        try
        {
            Expr result = parser.parseExpr();
            System.out.println(result);
        }
        catch (ParseException e1)
        {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
    }	
}

JavaScript

(for a bug-fix concerning \" and \n in strings see the Discussion)

Procedural

String.prototype.parseSexpr = function() {
	var t = this.match(/\s*("[^"]*"|\(|\)|"|[^\s()"]+)/g)
	for (var o, c=0, i=t.length-1; i>=0; i--) {
		var n, ti = t[i].trim()
		if (ti == '"') return
		else if (ti == '(') t[i]='[', c+=1
		else if (ti == ')') t[i]=']', c-=1
		else if ((n=+ti) == ti) t[i]=n
		else t[i] = '\'' + ti.replace('\'', '\\\'') + '\''
		if (i>0 && ti!=']' && t[i-1].trim()!='(' ) t.splice(i,0, ',')
		if (!c) if (!o) o=true; else return
	}
	return c ? undefined : eval(t.join(''))
}

Array.prototype.toString = function() {
	var s=''; for (var i=0, e=this.length; i<e; i++) s+=(s?' ':'')+this[i]
	return '('+s+')'
}

Array.prototype.toPretty = function(s) {
	if (!s) s = ''
	var r = s + '(<br>'
	var s2 = s + Array(6).join('&nbsp;')
	for (var i=0, e=this.length; i<e; i+=1) { 
		var ai = this[i]
		r += ai.constructor != Array ? s2+ai+'<br>' : ai.toPretty(s2)
	}
	return r + s + ')<br>'
}

var str = '((data "quoted data" 123 4.5)\n (data (!@# (4.5) "(more" "data)")))'
document.write('text:<br>', str.replace(/\n/g,'<br>').replace(/ /g,'&nbsp;'), '<br><br>')
var sexpr = str.parseSexpr()
if (sexpr === undefined) 
	document.write('Invalid s-expr!', '<br>')
else 
	document.write('s-expr:<br>', sexpr, '<br><br>', sexpr.constructor != Array ? '' : 'pretty print:<br>' + sexpr.toPretty())

Output:

text:
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

s-expr:
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

pretty print:
(
      (
            data
            "quoted data"
            123
            4.5
      )
      (
            data
            (
                  !@#
                  (
                        4.5
                  )
                  "(more"
                  "data)"
            )
      )
)

Functional

Showing the parse tree in an indented JSON format, and writing out a reserialization:

(() => {
    "use strict";

    // ------------------ S-EXPRESSIONS ------------------
    const main = () => {
        const expr = [
            "((data \"quoted data\" 123 4.5)",
            "  (data (!@# (4.5) \"(more\" \"data)\")))"
        ]
        .join("\n");

        const [parse, residue] = parseExpr(
            tokenized(expr)
        );

        return 0 < residue.length
            ? `Unparsed tokens: ${JSON.stringify(residue)}`
            : 0 < parse.length
                ? [
                    JSON.stringify(parse, null, 2),
                    "Reserialized from parse:",
                    parse.map(serialized).join(" ")
                ]
                .join("\n\n")
                : "Could not be parsed";
    };

    // ---------------- EXPRESSION PARSER ----------------

    // parseExpr [String] -> ([Expr], [String])
    const parseExpr = tokens =>
        // A tuple of (parsed trees, residual tokens)
        // derived from a list of tokens.
        until(finished)(readToken)([
            [], tokens
        ]);


    // finished :: ([Expr], [String]) -> Bool
    const finished = ([, tokens]) =>
        // True if no tokens remain, or the next
        // closes a sub-expression.
        0 === tokens.length || ")" === tokens[0];


    // readToken :: ([Expr], [String]) -> ([Expr], [String])
    const readToken = ([xs, tokens]) => {
        // A tuple of enriched expressions and
        // depleted tokens.
        const [token, ...ts] = tokens;

        // An open bracket introduces recursion over
        // a sub-expression to define a sub-list.
        return "(" === token
            ? (() => {
                const [expr, rest] = parseExpr(ts);

                return [xs.concat([expr]), rest.slice(1)];
            })()
            : ")" === token
                ? [xs, token]
                : [xs.concat(atom(token)), ts];
    };

    // ------------------- ATOM PARSER -------------------

    // atom :: String -> Expr
    const atom = s =>
        0 < s.length
            ? isNaN(s)
                ? "\"'".includes(s[0])
                    ? s.slice(1, -1)
                    : {name: s}
                : parseFloat(s, 10)
            : "";


    // ------------------ TOKENIZATION -------------------

    // tokenized :: String -> [String]
    const tokenized = s =>
        // Brackets and quoted or unquoted atomic strings.
        quoteTokens("\"")(s).flatMap(
            segment => "\"" !== segment[0]
                ? segment.replace(/([()])/gu, " $1 ")
                .split(/\s+/u)
                .filter(Boolean)
                : [segment]
        );


    // quoteTokens :: Char -> String -> [String]
    const quoteTokens = q =>
        // Alternating unquoted and quoted segments.
        s => s.split(q).flatMap(
            (k, i) => even(i)
                ? 0 < k.length
                    ? [k]
                    : []
                : [`${q}${k}${q}`]
        );

    // ------------------ SERIALIZATION ------------------

    // serialized :: Expr -> String
    const serialized = e => {
        const t = typeof e;

        return "number" === t
            ? `${e}`
            : "string" === t
                ? `"${e}"`
                : "object" === t
                    ? Array.isArray(e)
                        ? `(${e.map(serialized).join(" ")})`
                        : e.name
                    : "?";
    };


    // --------------------- GENERIC ---------------------

    // even :: Int -> Bool
    const even = n =>
        // True if 2 is a factor of n.
        0 === n % 2;


    // until :: (a -> Bool) -> (a -> a) -> a -> a
    const until = p =>
        // The value resulting from repeated applications
        // of f to the seed value x, terminating when
        // that result returns true for the predicate p.
        f => {
            const go = x =>
                p(x)
                    ? x
                    : go(f(x));

            return go;
        };

    return main();
})();

Output:

[
  [
    [
      {
        "name": "data"
      },
      "quoted data",
      123,
      4.5
    ],
    [
      {
        "name": "data"
      },
      [
        {
          "name": "!@#"
        },
        [
          4.5
        ],
        "(more",
        "data)"
      ]
    ]
  ]
]

Reserialized from parse:

((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

jq

Works with: jq

Also works with gojq, the Go implementation of jq This entry is based on a Parsing Expression Grammar (PEG) for S-expressions. The idea is to pass a JSON object `{remainder:_, result:_ }` through a jq pipeline corresponding to a PEG for S-expressions, consuming the text in `.remainder` and building up `.result`. For further details about this approach, see e.g. jq as a PEG Engine.

# PEG infrastructure
def star(E): ((E | star(E)) // .) ;

### Helper functions:
# Consume a regular expression rooted at the start of .remainder, or emit empty;
# on success, update .remainder and set .match but do NOT update .result
def consume($re):
  # on failure, match yields empty
  (.remainder | match("^" + $re)) as $match
  | .remainder |= .[$match.length :]
  | .match = $match.string;

def parse($re):
  consume($re)
  | .result = .result + [.match] ;

def parseNumber($re):
  consume($re)
  | .result = .result + [.match|tonumber] ;

def eos: select(.remainder == "");

# whitespace
def ws: consume("[ \t\r\n]*");

def box(E):
  ((.result = null) | E) as $e
  | .remainder = $e.remainder
  | .result += [$e.result]  # the magic sauce
  ;

# S-expressions

# Input: a string
# Output: an array representation of the input if it is an S-expression
def SExpression:
  def string:     consume("\"") | parse("[^\"]") | consume("\"");
  def identifier: parse("[^ \t\n\r()]+");
  def decimal:    parseNumber("[0-9]+([.][0-9]*)?");
  def hex:        parse("0x[0-9A-Fa-f]+") ;
  def number:     hex // decimal;
  def atom:       ws | (string // number // identifier);

  def SExpr:      ws | consume("[(]") | ws | box(star(atom // SExpr)) | consume("[)]");

  {remainder: .} | SExpr | ws | eos | .result;

SExpression

Invocation:

cat << EOF |
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
EOF
jq -Rsc -f s-expression.jq

Output:

[[["data","\"quoted","data\"",123,4.5],["data",["!@#",[4.5],"\"",["more\"","\"data"],"\""]]]]

Julia

function rewritequotedparen(s)
    segments = split(s, "\"")
    for i in 1:length(segments)
        if i & 1 == 0  # even i
            ret = replace(segments[i], r"\(", s"_O_PAREN")
            segments[i] = replace(ret, r"\)", s"_C_PAREN")
        end
    end
    join(segments, "\"")
end

function reconsdata(n, s)
    if n > 1
        print(" ")
    end
    if s isa String && ismatch(r"[\$\%\!\$\#]", s) == false
        print("\"$s\"")
    else
        print(s)
    end
end

function printAny(anyarr)
    print("(")
    for (i, el) in enumerate(anyarr)
        if el isa Array
            print("(")
            for (j, el2) in enumerate(el)
                if el2 isa Array
                    print("(")
                    for(k, el3) in enumerate(el2)
                        if el3 isa Array
                            print(" (")
                            for(n, el4) in enumerate(el3)
                                reconsdata(n, el4)
                            end
                            print(")")
                        else
                            reconsdata(k, el3)
                        end
                    end
                    print(")")             
                else
                    reconsdata(j, el2)
                end
            end
            if i == 1
                print(")\n ")
            else
                print(")")
            end
        end
    end
    println(")")
end

removewhitespace(s) = replace(replace(s, r"\n", " "), r"^\s*(\S.*\S)\s*$", s"\1")
quote3op(s) = replace(s, r"([\$\!\@\#\%]{3})", s"\"\1\"")
paren2bracket(s) = replace(replace(s, r"\(", s"["), r"\)", s"]")
data2symbol(s) = replace(s, "[data", "[:data")
unrewriteparens(s) = replace(replace(s, "_C_PAREN", ")"), "_O_PAREN", "(")
addcommas(s) = replace(replace(s, r"\]\s*\[", "],["), r" (?![a-z])", ",")

inputstring = """
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
 """

println("The input string is:\n", inputstring)
processed = (inputstring |> removewhitespace |> rewritequotedparen |> quote3op
                        |> paren2bracket |> data2symbol |> unrewriteparens |> addcommas)
nat = eval(parse("""$processed"""))
println("The processed native structure is:\n", nat)
println("The reconstructed string is:\n"), printAny(nat)

Output:


The input string is:

   ((data "quoted data" 123 4.5)
    (data (!@# (4.5) "(more" "data)")))

The processed native structure is:
Array{Any,1}[Any[:data, "quoted data", 123, 4.5], Any[:data, Any["!@#", [4.5], "(more", "data)"]]]
The reconstructed string is:

   ((data "quoted data" 123 4.5)
    (data(!@# (4.5) "(more" "data)")))

Kotlin

Translation of: JavaScript

// version 1.2.31

const val INDENT = 2

fun String.parseSExpr(): List<String>? {
    val r = Regex("""\s*("[^"]*"|\(|\)|"|[^\s()"]+)""")
    val t = r.findAll(this).map { it.value }.toMutableList()
    if (t.size == 0) return null
    var o = false
    var c = 0
    for (i in t.size - 1 downTo 0) {
        val ti = t[i].trim()
        val nd = ti.toDoubleOrNull()
        if (ti == "\"") return null
        if (ti == "(") {
            t[i] = "["
            c++
        }
        else if (ti == ")") {
            t[i] = "]"
            c--
        }
        else if (nd != null) {
             val ni = ti.toIntOrNull()
             if (ni != null) t[i] = ni.toString()
             else t[i] = nd.toString()
        }
        else if (ti.startsWith("\"")) { // escape embedded double quotes
             var temp = ti.drop(1).dropLast(1)
             t[i] = "\"" + temp.replace("\"", "\\\"") + "\""
        }
        if (i > 0 && t[i] != "]" && t[i - 1].trim() != "(") t.add(i, ", ")
        if (c == 0) {
            if (!o) o = true else return null
        }
    }
    return if (c != 0) null else t
}

fun MutableList<String>.toSExpr(): String {
    for (i in 0 until this.size) {
        this[i] = when (this[i]) {
            "["  -> "("
            "]"  -> ")"
            ", " -> " "
            else ->  {
                if (this[i].startsWith("\"")) { // unescape embedded quotes
                    var temp = this[i].drop(1).dropLast(1)
                    "\"" + temp.replace("\\\"", "\"") + "\""
                }
                else this[i]
            }
        }
    }
    return this.joinToString("")
}

fun List<String>.prettyPrint() {
    var level = 0
    loop@for (t in this) {
        var n: Int
        when(t) {
            ", ", " " -> continue@loop
            "[", "(" -> {
                n = level * INDENT + 1
                level++
             }
             "]", ")" -> {
                level--
                n = level * INDENT + 1
             }
             else -> {
                n = level * INDENT + t.length
             }
        }
        println("%${n}s".format(t))
    }
}

fun main(args: Array<String>) {
    val str = """((data "quoted data" 123 4.5)""" + "\n" +
              """ (data (!@# (4.5) "(more" "data)")))"""  
    val tokens = str.parseSExpr()
    if (tokens == null)
        println("Invalid s-expr!")
    else {
        println("Native data structure:")
        println(tokens.joinToString(""))
        println("\nNative data structure (pretty print):")    
        tokens.prettyPrint()

        val tokens2 = tokens.toMutableList()
        println("\nRecovered S-Expression:")
        println(tokens2.toSExpr())
        println("\nRecovered S-Expression (pretty print):")
        tokens2.prettyPrint()
    }
}

Output:

Native data structure:
[[data, "quoted data", 123, 4.5], [data, [!@#, [4.5], "(more", "data)"]]]

Native data structure (pretty print):
[
  [
    data
    "quoted data"
    123
    4.5
  ]
  [
    data
    [
      !@#
      [
        4.5
      ]
      "(more"
      "data)"
    ]
  ]
]

Recovered S-Expression:
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Recovered S-Expression (pretty print):
(
  (
    data
    "quoted data"
    123
    4.5
  )
  (
    data
    (
      !@#
      (
        4.5
      )
      "(more"
      "data)"
    )
  )
)

Lua

This uses LPeg, a parsing expression grammar library written by one the authors of Lua. Tested with Lua 5.3.5 and LPeg 1.0.2-1.

lpeg = require 'lpeg' -- see http://www.inf.puc-rio.br/~roberto/lpeg/

imports = 'P R S C V match'
for w in imports:gmatch('%a+') do _G[w] = lpeg[w] end -- make e.g. 'lpeg.P' function available as 'P'

function tosymbol(s) return s end 
function tolist(x, ...) return {...} end -- ignore the first capture, the whole sexpr
    
ws = S' \t\n'^0                 -- whitespace, 0 or more

digits = R'09'^1                -- digits, 1 or more
Tnumber = C(digits * (P'.' * digits)^-1) * ws / tonumber -- ^-1 => at most 1

Tstring = C(P'"' * (P(1) - P'"')^0 * P'"') * ws

sep = S'()" \t\n'
symstart = (P(1) - (R'09' + sep))
symchar = (P(1) - sep)
Tsymbol = C(symstart * symchar^0) * ws / tosymbol

atom = Tnumber + Tstring + Tsymbol
lpar = P'(' * ws
rpar = P')' * ws
sexpr = P{ -- defines a recursive pattern
    'S';
    S = ws * lpar * C((atom + V'S')^0) * rpar / tolist
}

Now to use the sexpr pattern:

eg_input = [[
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
]]

eg_produced = match(sexpr, eg_input)

eg_expected = { -- expected Lua data structure of the reader (lpeg.match)
    {'data', '"quoted data"', 123, 4.5},
    {'data', {'!@#', {4.5}, '"(more"', '"data)"'}}
}

function check(produced, expected)
    assert(type(produced) == type(expected))
    if type(expected) == 'table' then -- i.e. a list
        assert(#produced == #expected)
        for i = 1, #expected do check(produced[i], expected[i]) end
    else
        assert(produced == expected)
    end
end

check(eg_produced, eg_expected)
print("checks out!") -- won't get here if any <i>check()</i> assertion fails

And here's the pretty printer, whose output looks like all the others:

function pprint(expr, indent)
    local function prindent(fmt, expr)
        io.write(indent) -- no line break
        print(string.format(fmt, expr))
    end
    if type(expr) == 'table' then
        if #expr == 0 then
            prindent('()')
        else
            prindent('(')
            local indentmore = '  ' .. indent
            for i= 1,#expr do pprint(expr[i], indentmore) end
            prindent(')')
        end
    elseif type(expr) == 'string' then
        if expr:sub(1,1) == '"' then
            prindent("%q", expr:sub(2,-2)) -- print as a Lua string
        else
            prindent("%s", expr) -- print as a symbol
        end
    else
        prindent("%s", expr)
    end
end

pprint(eg_expected, '')

Nim

import strutils

const Input = """
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
"""

type
  TokenKind = enum
    tokInt, tokFloat, tokString, tokIdent
    tokLPar, tokRPar
    tokEnd
  Token = object
    case kind: TokenKind
    of tokString: stringVal: string
    of tokInt: intVal: int
    of tokFloat: floatVal: float
    of tokIdent: ident: string
    else: discard

proc lex(input: string): seq[Token] =
  var pos = 0

  template current: char =
    if pos < input.len: input[pos]
    else: '\x00'

  while pos < input.len:
    case current
    of ';':
      inc(pos)
      while current notin {'\r', '\n'}:
        inc(pos)
      if current == '\r': inc(pos)
      if current == '\n': inc(pos)
    of '(': inc(pos); result.add(Token(kind: tokLPar))
    of ')': inc(pos); result.add(Token(kind: tokRPar))
    of '0'..'9':
      var
        num = ""
        isFloat = false
      while current in Digits:
        num.add(current)
        inc(pos)
      if current == '.':
        num.add(current)
        isFloat = true
        inc(pos)
        while current in Digits:
          num.add(current)
          inc(pos)
      result.add(if isFloat: Token(kind: tokFloat, floatVal: parseFloat(num))
                 else: Token(kind: tokInt, intVal: parseInt(num)))
    of ' ', '\t', '\n', '\r': inc(pos)
    of '"':
      var str = ""
      inc(pos)
      while current != '"':
        str.add(current)
        inc(pos)
      inc(pos)
      result.add(Token(kind: tokString, stringVal: str))
    else:
      const BannedChars = {' ', '\t', '"', '(', ')', ';'}
      var ident = ""
      while current notin BannedChars:
        ident.add(current)
        inc(pos)
      result.add(Token(kind: tokIdent, ident: ident))
  result.add(Token(kind: tokEnd))

type
  SExprKind = enum
    seInt, seFloat, seString, seIdent, seList
  SExpr = ref object
    case kind: SExprKind
    of seInt: intVal: int
    of seFloat: floatVal: float
    of seString: stringVal: string
    of seIdent: ident: string
    of seList: children: seq[SExpr]
  ParseError = object of CatchableError

proc `$`*(se: SExpr): string =
  case se.kind
  of seInt: result = $se.intVal
  of seFloat: result = $se.floatVal
  of seString: result = '"' & se.stringVal & '"'
  of seIdent: result = se.ident
  of seList:
    result = "("
    for i, ex in se.children:
      if ex.kind == seList and ex.children.len > 1:
        result.add("\n")
        result.add(indent($ex, 2))
      else:
        if i > 0:
          result.add(" ")
        result.add($ex)
    result.add(")")

var
  tokens = lex(Input)
  pos = 0

template current: Token =
  if pos < tokens.len: tokens[pos]
  else: Token(kind: tokEnd)

proc parseInt(token: Token): SExpr =
  result = SExpr(kind: seInt, intVal: token.intVal)

proc parseFloat(token: Token): SExpr =
  result = SExpr(kind: seFloat, floatVal: token.floatVal)

proc parseString(token: Token): SExpr =
  result = SExpr(kind: seString, stringVal: token.stringVal)

proc parseIdent(token: Token): SExpr =
  result = SExpr(kind: seIdent, ident: token.ident)

proc parse(): SExpr

proc parseList(): SExpr =
  result = SExpr(kind: seList)
  while current.kind notin {tokRPar, tokEnd}:
    result.children.add(parse())
  if current.kind == tokEnd:
    raise newException(ParseError, "Missing right paren ')'")
  else:
    inc(pos)

proc parse(): SExpr =
  var token = current
  inc(pos)
  result =
    case token.kind
    of tokInt: parseInt(token)
    of tokFloat: parseFloat(token)
    of tokString: parseString(token)
    of tokIdent: parseIdent(token)
    of tokLPar: parseList()
    else: nil

echo parse()

Output:

(
  (data "quoted data" 123 4.5)
  (data
    (!@# (4.5) "(more" "data)")))

OCaml

You may be interested in this chapter of the book Real World OCaml.

The file SExpr.mli containing the interface:

(** This module is a very simple parsing library for S-expressions. *)
(* Copyright (C) 2009  Florent Monnier, released under MIT license. *)

type sexpr = Atom of string | Expr of sexpr list
(** the type of S-expressions *)

val parse_string : string -> sexpr list
(** parse from a string *)

val parse_ic : in_channel -> sexpr list
(** parse from an input channel *)

val parse_file : string -> sexpr list
(** parse from a file *)

val parse : (unit -> char option) -> sexpr list
(** parse from a custom function, [None] indicates the end of the flux *)

val print_sexpr : sexpr list -> unit
(** a dump function for the type [sexpr] *)

val print_sexpr_indent : sexpr list -> unit
(** same than [print_sexpr] but with indentation *)

val string_of_sexpr : sexpr list -> string
(** convert an expression of type [sexpr] into a string *)

val string_of_sexpr_indent : sexpr list -> string
(** same than [string_of_sexpr] but with indentation *)

The file SExpr.ml containing the implementation:

(** This module is a very simple parsing library for S-expressions. *)
(* Copyright (C) 2009  Florent Monnier, released under MIT license. *)
(* modified to match the task description *)

type sexpr = Atom of string | Expr of sexpr list

type state =
  | Parse_root of sexpr list
  | Parse_content of sexpr list
  | Parse_word of Buffer.t * sexpr list
  | Parse_string of bool * Buffer.t * sexpr list

let parse pop_char =
  let rec aux st =
    match pop_char() with
    | None ->
        begin match st with
        | Parse_root sl -> (List.rev sl)
        | Parse_content _
        | Parse_word _
        | Parse_string _ ->
            failwith "Parsing error: content not closed by parenthesis"
        end
    | Some c ->
        match c with
        | '(' ->
            begin match st with
            | Parse_root sl ->
                let this = aux(Parse_content []) in
                aux(Parse_root((Expr this)::sl))
            | Parse_content sl ->
                let this = aux(Parse_content []) in
                aux(Parse_content((Expr this)::sl))
            | Parse_word(w, sl) ->
                let this = aux(Parse_content []) in
                aux(Parse_content((Expr this)::Atom(Buffer.contents w)::sl))
            | Parse_string(_, s, sl) ->
                Buffer.add_char s c;
                aux(Parse_string(false, s, sl))
            end
        | ')' ->
            begin match st with
            | Parse_root sl ->
                failwith "Parsing error: closing parenthesis without openning"
            | Parse_content sl -> (List.rev sl)
            | Parse_word(w, sl) -> List.rev(Atom(Buffer.contents w)::sl)
            | Parse_string(_, s, sl) ->
                Buffer.add_char s c;
                aux(Parse_string(false, s, sl))
            end
        | ' ' | '\n' | '\r' | '\t' ->
            begin match st with
            | Parse_root sl -> aux(Parse_root sl)
            | Parse_content sl -> aux(Parse_content sl)
            | Parse_word(w, sl) -> aux(Parse_content(Atom(Buffer.contents w)::sl))
            | Parse_string(_, s, sl) ->
                Buffer.add_char s c;
                aux(Parse_string(false, s, sl))
            end
        | '"' ->
            (* '"' *)
            begin match st with
            | Parse_root _ -> failwith "Parse error: double quote at root level"
            | Parse_content sl ->
                let s = Buffer.create 74 in
                aux(Parse_string(false, s, sl))
            | Parse_word(w, sl) ->
                let s = Buffer.create 74 in
                aux(Parse_string(false, s, Atom(Buffer.contents w)::sl))
            | Parse_string(true, s, sl) ->
                Buffer.add_char s c;
                aux(Parse_string(false, s, sl))
            | Parse_string(false, s, sl) ->
                aux(Parse_content(Atom(Buffer.contents s)::sl))
            end
        | '\\' ->
            begin match st with
            | Parse_string(true, s, sl) ->
                Buffer.add_char s c;
                aux(Parse_string(false, s, sl))
            | Parse_string(false, s, sl) ->
                aux(Parse_string(true, s, sl))
            | _ ->
                failwith "Parsing error: escape character in wrong place"
            end
        | _ ->
            begin match st with
            | Parse_root _ ->
                failwith(Printf.sprintf "Parsing error: char '%c' at root level" c)
            | Parse_content sl ->
                let w = Buffer.create 16 in
                Buffer.add_char w c;
                aux(Parse_word(w, sl))
            | Parse_word(w, sl) ->
                Buffer.add_char w c;
                aux(Parse_word(w, sl))
            | Parse_string(_, s, sl) ->
                Buffer.add_char s c;
                aux(Parse_string(false, s, sl))
            end
  in
  aux (Parse_root [])


let string_pop_char str =
  let len = String.length str in
  let i = ref(-1) in
  (function () -> incr i; if !i >= len then None else Some(str.[!i]))


let parse_string str =
  parse (string_pop_char str)


let ic_pop_char ic =
  (function () ->
     try Some(input_char ic)
     with End_of_file -> (None))


let parse_ic ic =
  parse (ic_pop_char ic)


let parse_file filename =
  let ic = open_in filename in
  let res = parse_ic ic in
  close_in ic;
  (res)


let quote s =
  "\"" ^ s ^ "\""

let needs_quote s =
  List.exists (String.contains s) [' '; '\n'; '\r'; '\t'; '('; ')']

let protect s =
  let s = String.escaped s in
  if needs_quote s then quote s else s


let string_of_sexpr s =
  let rec aux acc = function
  | (Atom tag)::tl -> aux ((protect tag)::acc) tl
  | (Expr e)::tl ->
      let s =
        "(" ^
        (String.concat " " (aux [] e))
        ^ ")"
      in
      aux (s::acc) tl
  | [] -> (List.rev acc)
  in
  String.concat " " (aux [] s)


let print_sexpr s =
  print_endline (string_of_sexpr s)


let string_of_sexpr_indent s =
  let rec aux i acc = function
  | (Atom tag)::tl -> aux i ((protect tag)::acc) tl
  | (Expr e)::tl ->
      let s =
        "\n" ^ (String.make i ' ') ^ "(" ^
        (String.concat " " (aux (succ i) [] e))
        ^ ")"
      in
      aux i (s::acc) tl
  | [] -> (List.rev acc)
  in
  String.concat "\n" (aux 0 [] s)


let print_sexpr_indent s =
  print_endline (string_of_sexpr_indent s)

Then we compile this small module and test it in the interactive loop:

$ ocamlc -c SExpr.mli
$ ocamlc -c SExpr.ml
$ ocaml SExpr.cmo 
        Objective Caml version 3.11.2

# open SExpr ;;

# let s = read_line () ;;
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))
val s : string =
  "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))"

# let se = SExpr.parse_string s ;;
val se : SExpr.sexpr list =
  [Expr
    [Expr [Atom "data"; Atom "quoted data"; Atom "123"; Atom "4.5"];
     Expr
      [Atom "data";
       Expr [Atom "!@#"; Expr [Atom "4.5"]; Atom "(more"; Atom "data)"]]]]

# SExpr.print_sexpr se ;;
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))
- : unit = ()

# SExpr.print_sexpr_indent se ;;

(
 (data "quoted data" 123 4.5) 
 (data 
  (!@# 
   (4.5) "(more" "data)")))
- : unit = ()

Perl

#!/usr/bin/perl -w
use strict;
use warnings;

sub sexpr
{
	my @stack = ([]);
	local $_ = $_[0];

	while (m{
		\G    # start match right at the end of the previous one
		\s*+  # skip whitespaces
		# now try to match any of possible tokens in THIS order:
		(?<lparen>\() |
		(?<rparen>\)) |
		(?<FLOAT>[0-9]*+\.[0-9]*+) |
		(?<INT>[0-9]++) |
		(?:"(?<STRING>([^\"\\]|\\.)*+)") |
		(?<IDENTIFIER>[^\s()]++)
		# Flags:
		#  g = match the same string repeatedly
		#  m = ^ and $ match at \n
		#  s = dot and \s matches \n
		#  x = allow comments within regex
		}gmsx)
	{
		die "match error" if 0+(keys %+) != 1;

		my $token = (keys %+)[0];
		my $val = $+{$token};

		if ($token eq 'lparen') {
			my $a = [];
			push @{$stack[$#stack]}, $a;
			push @stack, $a;
		} elsif ($token eq 'rparen') {
			pop @stack;
		} else {
			push @{$stack[$#stack]}, bless \$val, $token;
		}
	}
	return $stack[0]->[0];
}

sub quote
{ (local $_ = $_[0]) =~ /[\s\"\(\)]/s ? do{s/\"/\\\"/gs; qq{"$_"}} : $_; }
 
sub sexpr2txt
{
	qq{(@{[ map {
		ref($_) eq '' ? quote($_) :
		ref($_) eq 'STRING' ? quote($$_) :
		ref($_) eq 'ARRAY' ? sexpr2txt($_) : $$_
	} @{$_[0]} ]})}
}

Check:

my $s = sexpr(q{

((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

});

# Dump structure
use Data::Dumper;
print Dumper $s;

# Convert back
print sexpr2txt($s)."\n";

Output:

$VAR1 = [
          [
            bless( do{\(my $o = 'data')}, 'IDENTIFIER' ),
            bless( do{\(my $o = 'quoted data')}, 'STRING' ),
            bless( do{\(my $o = '123')}, 'INT' ),
            bless( do{\(my $o = '4.5')}, 'FLOAT' )
          ],
          [
            bless( do{\(my $o = 'data')}, 'IDENTIFIER' ),
            [
              bless( do{\(my $o = '!@#')}, 'IDENTIFIER' ),
              [
                bless( do{\(my $o = '4.5')}, 'FLOAT' )
              ],
              bless( do{\(my $o = '(more')}, 'STRING' ),
              bless( do{\(my $o = 'data)')}, 'STRING' )
            ]
          ]
        ];
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Phix

The distinction between a symbol data and a quoted string "data" is simple: both are represented as strings, with the symbol being held as "data" and the quoted string being held as "\"data\"", and you can test for the latter by seeing if the first character is a double quote. Internally, it is easy to differentiate between a symbol (held as a string) and a number, but that may not be clear on the display: 4e-5 and 4-e5 may appear similar but the latter is probably a parse failure. It may be more sensible for get_term() to raise an error if the scanf fails, than assume it is a symbol like it does now.

with javascript_semantics
constant s_expr_str = """
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))"""
 
function skip_spaces(string s, integer sidx)
    while sidx<=length(s) and find(s[sidx],{' ','\t','\r','\n'}) do sidx += 1 end while
    return sidx
end function
 
function get_term(string s, integer sidx)
-- get a single quoted string, symbol, or number.
    integer ch = s[sidx]
    string res = ""
    if ch='\"' then
        res &= ch
        while 1 do
            sidx += 1
            ch = s[sidx]
            res &= ch
            if ch='\\' then
                sidx += 1
                ch = s[sidx]
                res &= ch
            elsif ch='\"' then
                sidx += 1
                exit
            end if
        end while
    else
        integer asnumber = (ch>='0' and ch<='9')
        while not find(ch,{')',' ','\t','\r','\n'}) do
            res &= ch
            sidx += 1
            if sidx>length(s) then exit end if
            ch = s[sidx]
        end while
        if asnumber then
            sequence scanres = scanf(res,"%f")
            if length(scanres)=1 then return {scanres[1][1],sidx} end if
            -- error? (failed to parse number)
        end if
    end if
    return {res,sidx}
end function
 
function parse_s_expr(string s, integer sidx)
    integer ch = s[sidx]
    sequence res = {}
    object element
    if ch!='(' then ?9/0 end if
    sidx += 1
    while 1 do
        sidx = skip_spaces(s,sidx)
        -- error? (if past end of string/missing ')')
        ch = s[sidx]
        if ch=')' then exit end if
        if ch='(' then
            {element,sidx} = parse_s_expr(s,sidx)
        else
            {element,sidx} = get_term(s,sidx)
        end if
        res = append(res,element)
    end while
    sidx = skip_spaces(s,sidx+1)
    return {res,sidx}
end function
 
sequence s_expr
integer sidx
{s_expr,sidx} = parse_s_expr(s_expr_str,1)
if sidx<=length(s_expr_str) then
    printf(1,"incomplete parse(\"%s\")\n",{s_expr_str[sidx..$]})
end if
 
puts(1,"\nThe string:\n")
?s_expr_str
 
puts(1,"\nDefault pretty printing:\n")
--?s_expr
pp(s_expr)
 
puts(1,"\nBespoke pretty printing:\n")
--ppEx(s_expr,{pp_Nest,1,pp_StrFmt,-1,pp_IntCh,false,pp_Brkt,"()"})
ppEx(s_expr,{pp_Nest,4,pp_StrFmt,-1,pp_IntCh,false,pp_Brkt,"()"})

Output:

The string:
"((data \"quoted data\" 123 4.5)\n (data (!@# (4.5) \"(more\" \"data)\")))"

Default pretty printing:
{{"data", "\"quoted data\"", 123'{',4.5},
 {"data", {"!@#", {4.5}, "\"(more\"", "\"data)\""}}}

Bespoke pretty printing:
((data,
  "quoted data",
  123,
  4.5),
 (data,
  (!@#,
   (4.5),
   "(more",
   "data)")))

PicoLisp

The 'any' function parses an s-expression from a string (indentical to the way 'read' does this from an input stream).

: (any "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))")
-> ((data "quoted data" 123 5) (data (!@# (5) "(more" "data)")))

: (view @)
+---+-- data
|   |
|   +-- "quoted data"
|   |
|   +-- 123
|   |
|   +-- 5
|
+---+-- data
    |
    +---+-- !@#
        |
        +---+-- 5
        |
        +-- "(more"
        |
        +-- "data)"

Implementing a subset of 'any' explicitly:

(de readSexpr ()
   (case (skip)
      ("(" (char) (readList))
      ("\"" (char) (readString))
      (T (readAtom)) ) ) )

(de readList ()
   (make
      (loop
         (NIL (skip))
         (T (= @ ")") (char))
         (link (readSexpr)) ) ) )

(de readString ()
   (pack
      (make
         (until (= "\"" (or (peek) (quit "Unterminated string")))
            (link (char)) )
         (char) ) ) )

(de readAtom ()
   (let X
      (make
         (until (or (sp? (peek)) (member (peek) '("(" ")")))
            (link (char)) ) )
      (or (format X) (intern (pack X))) ) )

It can be used in a pipe to read from a string:

: (pipe (prin "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))") (readSexpr))
-> ((data "quoted data" 123 5) (data (!@# (5) "(more" "data)")))

'sym' does the reverse (i.e. builds a symbol (string) from an expression).

: (sym @@)
-> "((data \"quoted data\" 123 5) (data (!@# (5) \"(more\" \"data)\")))"

Implementing a subset of the built-in printer:

(de printSexpr (Expr Fun)
   (cond
      ((pair Expr)
         (Fun "(")
         (printSexpr (car Expr) Fun)
         (for X (cdr Expr)
            (Fun " ")
            (printSexpr X Fun) )
         (Fun ")") )
      ((str? Expr)
         (Fun "\"")
         (mapc Fun (chop Expr))
         (Fun "\"") )
      (T (mapc Fun (chop Expr))) ) )

This can be used for plain printing

: (printSexpr
   '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))
   prin )
((data "quoted data" 123 5) (data (!@# (5) "(more" "data)")))

or to collect the characters into a string:

: (pack
   (make
      (printSexpr
         '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))
         link ) ) )
-> "((data \"quoted data\" 123 5) (data (!@# (5) \"(more\" \"data)\")))"

Pike

class Symbol(string name)
{ 
    string _sprintf(int type)
    { 
        switch(type)
        { 
            case 's': return name; 
            case 'O': return sprintf("(Symbol: %s)", name||"");
            case 'q': return name; 
            case 't': return "Symbol";
            default:  return sprintf("%"+int2char(type), name);
        } 
    }

    mixed cast(string type)
    { 
        switch(type)
        { 
            case "string": return name;
            default: throw(sprintf("can not cast 'Symbol' to '%s'", type)); 
        }  
    }
}

mixed value(string token)
{
    if ((string)(int)token==token)
        return (int)token;
    array result = array_sscanf(token, "%f%s");
    if (sizeof(result) && floatp(result[0]) && ! sizeof(result[1]))
        return result[0];
    else
        return Symbol(token);
}

array tokenizer(string input)
{
    array output = ({}); 
    for(int i=0; i<sizeof(input); i++)
    { 
        switch(input[i])
        { 
            case '(': output+= ({"("}); break; 
            case ')': output += ({")"}); break; 
            case '"': //"
                      output+=array_sscanf(input[++i..], "%s\"%[ \t\n]")[0..0]; 
                      i+=sizeof(output[-1]); 
                      break; 
            case ' ': 
            case '\t': 
            case '\n': break; 
            default: string token = array_sscanf(input[i..], "%s%[) \t\n]")[0]; 
                     output+=({ value(token) }); 
                     i+=sizeof(token)-1; 
                     break; 
        }
    }
    return output;
}

// this function is based on the logic in Parser.C.group() in the pike library;
array group(array tokens)
{
    ADT.Stack stack=ADT.Stack();
    array ret =({});

    foreach(tokens;; string token)
    {
        switch(token)
        {
            case "(": stack->push(ret); ret=({}); break;
            case ")":
                    if (!sizeof(ret) || !stack->ptr) 
                    {
                      // Mismatch
                        werror ("unmatched close parenthesis\n");
                        return ret;
                    }
                    ret=stack->pop()+({ ret }); 
                    break;
            default: ret+=({token}); break;
        }
    }
    return ret;
}

string sexp(array input)
{
    array output = ({});
    foreach(input;; mixed item)
    {
        if (arrayp(item))
            output += ({ sexp(item) });
        else if (intp(item))
            output += ({ sprintf("%d", item) });
        else if (floatp(item))
            output += ({ sprintf("%f", item) });
        else
            output += ({ sprintf("%q", item) });
    }
    return "("+output*" "+")";
}

string input = "((data \"quoted data\" 123 4.5)\n (data (!@# (4.5) \"(more\" \"data)\")))";
array data = group(tokenizer(input))[0];
string output = sexp(data);

Output:

((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

({ ({ (Symbol: data), "quoted data", 123, 4.5 }), ({ (Symbol: data), ({ (Symbol: !@#), ({ 4.5 }), "(more", "data)" }) }) })

((data "quoted data" 123 4.5) (data (!@# (45) "(more" "data)")))

Potion

How values are stored: Tuples for list, integers for integers, floats for floats, strings for symbols, quoted strings for strings. This implementation is not the most elegant/succinct or practical (it's trusty and has no real error handling).

isdigit = (c): 47 < c ord and c ord < 58.
iswhitespace = (c): c ord == 10 or c ord == 13 or c == " ".

# str: a string of the form "...<nondigit>[{<symb>}]..."
# i: index to start at (must be the index of <nondigit>)
# => returns (<the symbol as a string>, <index after the last char>)
parsesymbol = (str, i) :
   datum = ()
   while (str(i) != "(" and str(i) != ")" and not iswhitespace(str(i)) and str(i) != "\"") :
      datum append(str(i++))
   .
   (datum join, i)
.

# str: a string of the form "...[<minus>]{<digit>}[<dot>{<digit>}]..."
# i: index to start at (must be the index of the first token)
# => returns (<float or int>, <index after the last digit>)
parsenumber = (str, i) :
   datum = ()
   dot = false
   while (str(i) != "(" and str(i) != ")" and not iswhitespace(str(i)) and str(i) != "\"") :
      if (str(i) == "."): dot = true.
      datum append(str(i++))
   .
   if (dot): (datum join number, i).
   else: (datum join number integer, i).
.

# str: a string of the form "...\"....\"..."
# i: index to start at (must be the index of the first quote)
# => returns (<the string>, <index after the last quote>)
parsestring = (str, i) :
   datum = ("\"")
   while (str(++i) != "\"") :
      datum append(str(i))
   .
   datum append("\"")
   (datum join, ++i)
.

# str: a string of the form "...(...)..."
# i: index to start at
# => returns (<tuple/list>, <index after the last paren>)
parselist = (str, i) :
   lst = ()
   data = ()
   while (str(i) != "("): i++.
   i++
   while (str(i) != ")") :
      if (not iswhitespace(str(i))) :
         if (isdigit(str(i)) or (str(i) == "-" and isdigit(str(i + 1)))): data = parsenumber(str, i).
         elsif (str(i) == "\""): data = parsestring(str, i).
         elsif (str(i) == "("): data = parselist(str, i).
         else: data = parsesymbol(str, i).
         lst append(data(0))
         i = data(1)
      . else :
         ++i
      .
   .
   (lst, ++i)
.

parsesexpr = (str) :
   parselist(str, 0)(0)
.

parsesexpr("(define (factorial x) \"compute factorial\" (version 2.0) (apply * (range 1 x)))") string print
"\n" print
parsesexpr("((data \"quoted data\" 123 4.5)
 (data (!@# (4.5) \"(more\" \"data)\")))") string print
"\n" print

Python

Procedural

import re

dbg = False

term_regex = r'''(?mx)
    \s*(?:
        (?P<brackl>\()|
        (?P<brackr>\))|
        (?P<num>\-?\d+\.\d+|\-?\d+)|
        (?P<sq>"[^"]*")|
        (?P<s>[^(^)\s]+)
       )'''

def parse_sexp(sexp):
    stack = []
    out = []
    if dbg: print("%-6s %-14s %-44s %-s" % tuple("term value out stack".split()))
    for termtypes in re.finditer(term_regex, sexp):
        term, value = [(t,v) for t,v in termtypes.groupdict().items() if v][0]
        if dbg: print("%-7s %-14s %-44r %-r" % (term, value, out, stack))
        if   term == 'brackl':
            stack.append(out)
            out = []
        elif term == 'brackr':
            assert stack, "Trouble with nesting of brackets"
            tmpout, out = out, stack.pop(-1)
            out.append(tmpout)
        elif term == 'num':
            v = float(value)
            if v.is_integer(): v = int(v)
            out.append(v)
        elif term == 'sq':
            out.append(value[1:-1])
        elif term == 's':
            out.append(value)
        else:
            raise NotImplementedError("Error: %r" % (term, value))
    assert not stack, "Trouble with nesting of brackets"
    return out[0]

def print_sexp(exp):
    out = ''
    if type(exp) == type([]):
        out += '(' + ' '.join(print_sexp(x) for x in exp) + ')'
    elif type(exp) == type('') and re.search(r'[\s()]', exp):
        out += '"%s"' % repr(exp)[1:-1].replace('"', '\"')
    else:
        out += '%s' % exp
    return out
        
    
if __name__ == '__main__':
    sexp = ''' ( ( data "quoted data" 123 4.5)
         (data (123 (4.5) "(more" "data)")))'''

    print('Input S-expression: %r' % (sexp, ))
    parsed = parse_sexp(sexp)
    print("\nParsed to Python:", parsed)

    print("\nThen back to: '%s'" % print_sexp(parsed))

Output

Input S-expression: '((data "quoted data" 123 4.5)\n         (data (123 (4.5) "(more" "data)")))'

Parsed to Python: [['data', 'quoted data', 123, 4.5], ['data', [123, [4.5], '(more', 'data)']]]

Then back to: '((data "quoted data" 123 4.5) (data (123 (4.5) "(more" "data)")))'

Simpler parser

Note that in the example above the parser also recognises and changes the type of some tokens as well as generating a nested list. If that functionality is not needed, or better done elsewhere, then the parse function can be achieved more simply by just applying the regexp:

>>> from pprint import pprint as pp
>>> x = [[(t,v) for t,v in  termtypes.groupdict().items() if v][0] for termtypes in re.finditer(term_regex, sexp)]
>>> pp(x)
[('brackl', '('),
 ('brackl', '('),
 ('s', 'data'),
 ('sq', '"quoted data"'),
 ('num', '123'),
 ('num', '4.5'),
 ('brackr', ')'),
 ('brackl', '('),
 ('s', 'data'),
 ('brackl', '('),
 ('num', '123'),
 ('brackl', '('),
 ('num', '4.5'),
 ('brackr', ')'),
 ('sq', '"(more"'),
 ('sq', '"data)"'),
 ('brackr', ')'),
 ('brackr', ')'),
 ('brackr', ')')]
>>>

Functional

Composing functionally, and writing out a tree diagram, and a serialization, of the parse.

'''S-expressions'''

from itertools import chain, repeat
import re


def main():
    '''Sample s-expression parsed, diagrammed,
       and reserialized from the parse tree.
    '''
    expr = "((data \"quoted data\" 123 4.5)\n" + (
        "  (data (!@# (4.5) \"(more\" \"data)\")))"
    )
    parse = parseExpr(tokenized(expr))[0]
    print(
        drawForest([
            fmapTree(str)(tree) for tree
            in forestFromExprs(parse)
        ])
    )
    print(
        f'\nReserialized from parse:\n\n{serialized(parse)}'
    )


# ----------------- S-EXPRESSION PARSER ------------------

# parseExpr :: [String] -> ([Expr], [String]
def parseExpr(tokens):
    '''A tuple of a nested list with any
       unparsed tokens that remain.
    '''
    return until(finished)(parseToken)(
        ([], tokens)
    )


# finished :: ([Expr], [String]) -> Bool
def finished(xr):
    '''True if no tokens remain,
       or the next token is a closing bracket.
    '''
    r = xr[1]
    return (not r) or (r[0] == ")")


# parseToken :: ([Expr], [String]) -> ([Expr], [String])
def parseToken(xsr):
    '''A tuple of an expanded expression list
       and a reduced token list.
    '''
    xs, r = xsr
    h, *t = r
    if "(" == h:
        expr, rest = parseExpr(t)
        return xs + [expr], rest[1:]
    else:
        return (xs, t) if ")" == h else (
            xs + [atom(h)], t
        )

# --------------------- ATOM PARSER ----------------------

# atom :: String -> Expr
def atom(s):
    '''A Symbol, String, Float, or Int derived from s.
       Symbol is represented as a dict with a 'name' key.
    '''
    def n(k):
        return float(k) if '.' in k else int(k)

    return s if '"' == s[0] else (
        n(s) if s.replace('.', '', 1).isdigit() else {
            "name": s
        }
    )


# --------------------- TOKENIZATION ---------------------

# tokenized :: String -> [String]
def tokenized(s):
    '''A list of the tokens in s.
    '''
    return list(chain.from_iterable(map(
        lambda token: [token] if '"' == token[0] else (
            x for x in re.split(
                r'\s+',
                re.sub(r"([()])", r" \1 ", token)
            ) if x
        ) if token else [], (
            x if (0 == i % 2) else f'"{x}"'
            for (i, x) in enumerate(s.split('"'))
        )
    )))


# -------------------- SERIALIZATION ---------------------

# serialized :: Expr -> String
def serialized(e):
    '''An s-expression written out from the parse tree.
    '''
    k = typename(e)

    return str(e) if k in ['int', 'float', 'str'] else (
        (
            f'({" ".join([serialized(x) for x in e])})' if (
                (1 < len(e)) or ('list' != typename(e[0]))
            ) else serialized(e[0])
        ) if 'list' == k else (
            e.get("name") if 'dict' == k else "?"
        )
    )


# typename :: a -> String
def typename(x):
    '''Name property of the type of a value.'''
    return type(x).__name__


# ------------------- TREE DIAGRAMMING -------------------

# Node :: a -> [Tree a] -> Tree a
def Node(v):
    '''Constructor for a Tree node which connects a
       value of some kind to a list of zero or
       more child trees.
    '''
    return lambda xs: {'type': 'Tree', 'root': v, 'nest': xs}


# append :: [a] -> [a] -> [a]
def append(a, b):
    '''Concatenation.'''
    return a + b


# draw :: Tree a -> [String]
def draw(node):
    '''List of the lines of an ASCII
       diagram of a tree.
    '''
    def shift_(h, other, xs):
        return list(map(
            append,
            chain(
                [h], (
                    repeat(other, len(xs) - 1)
                )
            ),
            xs
        ))

    def drawSubTrees(xs):
        return (
            (
                ['|'] + shift_(
                    '├─ ', '│  ', draw(xs[0])
                ) + drawSubTrees(xs[1:])
            ) if 1 < len(xs) else ['|'] + shift_(
                '└─ ', '   ', draw(xs[0])
            )
        ) if xs else []

    return (root(node)).splitlines() + (
        drawSubTrees(nest(node))
    )


# drawForest :: [Tree String] -> String
def drawForest(trees):
    '''A simple unicode character representation of
       a list of trees.
    '''
    return '\n'.join(map(drawTree, trees))


# drawTree :: Tree a -> String
def drawTree(tree):
    '''ASCII diagram of a tree.'''
    return '\n'.join(draw(tree))


# fmapTree :: (a -> b) -> Tree a -> Tree b
def fmapTree(f):
    '''A new tree holding the results of
       an application of f to each root in
       the existing tree.
    '''
    def go(x):
        return Node(
            f(root(x))
        )([go(v) for v in nest(x)])
    return go


# forestFromExprs :: [Expr] -> [Tree Expr]
def forestFromExprs(es):
    '''A list of expressions rewritten as a forest.
    '''
    return [treeFromExpr(x) for x in es]


# nest :: Tree a -> [Tree a]
def nest(t):
    '''Accessor function for children of tree node.'''
    return t.get('nest')


# root :: Tree a -> a
def root(t):
    '''Accessor function for data of tree node.'''
    return t.get('root')


# treeFromExprs :: Expr -> Tree Expr
def treeFromExpr(e):
    '''An expression rewritten as a tree.
    '''
    return (
        Node({"name": "List"})(forestFromExprs(e))
    ) if type(e) is list else (
        Node(e)([])
    )


# ----------------------- GENERIC ------------------------

# until :: (a -> Bool) -> (a -> a) -> a -> a
def until(p):
    '''The result of repeatedly applying f until p holds.
       The initial seed value is x.
    '''
    def go(f):
        def loop(x):
            v = x
            while not p(v):
                v = f(v)
            return v
        return loop
    return go


# MAIN ---
if __name__ == '__main__':
    main()

Output:

{'name': 'List'}
|
├─ {'name': 'List'}
│  |
│  ├─ {'name': 'data'}
│  |
│  ├─ "quoted data"
│  |
│  ├─ 123
│  |
│  └─ 4.5
|
└─ {'name': 'List'}
   |
   ├─ {'name': 'data'}
   |
   └─ {'name': 'List'}
      |
      ├─ {'name': '!@#'}
      |
      ├─ {'name': 'List'}
      │  |
      │  └─ 4.5
      |
      ├─ "(more"
      |
      └─ "data)"

Reserialized from parse:

((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Racket

Racket has builtin support for S-expressions in the form of the read function.

#lang racket
(define input
#<<---
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
---
  )

(read (open-input-string input))

Output:

'((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Raku

(formerly Perl 6)

Works with: Rakudo version 2020.02

This parses the task, but it isn't really a good lisp parser, because it always wants whitespace between lists, so (()()) will fail ( (() ()) wont)

grammar S-Exp {
  rule TOP    {^ <s-list> $};

  token s-list { '(' ~ ')' [ <in_list>+ % [\s+] | '' ] }
  token in_list { <s-token> | <s-list> } 
 
  proto token s-token {*}
  token s-token:sym<Num>    {\d*\.?\d+}
  token s-token:sym<String> {'"' ['\"' |<-[\\"]>]*? '"'} #'
  token s-token:sym<Atom>   {<-[()\s]>+}
 
}
 
# The Actions class, for each syntactic rule there is a method
# that stores some data in the abstract syntax tree with make
class S-Exp::ACTIONS {
  method TOP ($/) {make $<s-list>.ast}
  method s-list ($/) {make [$<in_list>».ast]}
  method in_list ($/) {make $/.values[0].ast}
 
  method s-token:sym<Num> ($/){make +$/}
  method s-token:sym<String> ($/){make ~$/.substr(1,*-1)}
  method s-token:sym<Atom> ($/){make ~$/}
}
 
multi s-exp_writer (Positional $ary) {'(' ~ $ary.map(&s-exp_writer).join(' ') ~ ')'}
multi s-exp_writer (Numeric    $num) {~$num}
multi s-exp_writer (Str        $str) {
  return $str unless $str ~~ /<[(")]>|\s/;
  return '()' if $str eq '()';
  '"' ~ $str.subst('"', '\"' ) ~ '"';
}
 
 
my $s-exp = '((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))';
 
my $actions = S-Exp::ACTIONS.new();
my $raku_array = (S-Exp.parse($s-exp, :$actions)).ast;
 
say "the expression:\n$s-exp\n";
say "the Raku expression:\n{$raku_array.raku}\n";
say "and back:\n{s-exp_writer($raku_array)}";

Output:

the expression:
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

the Raku expression:
[["data", "quoted data", "123", 9/2], ["data", ["!\@#", [9/2], "(more", "data)"]]]

and back:
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

REXX

The checking of errors has been minimized (issuing of error message is very rudimentary, as is the error recovery).
More grouping symbols have been added (brackets [ ], braces { }, and guillemets « »), as well as another types of literals.
Also added were two more separators (a comma and semicolon).
Separators that could be added are more whitespace characters (vertical/horizontal tabs, line feed, form feed, tab char, etc).

It would normally be considered improper, but the literal string delimiters were left intact; making it much easier to understand what is/was being parsed.

/*REXX program  parses  an   S-expression   and  displays the results to the terminal.  */
input= '((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))'
say center('input', length(input), "═")          /*display the header title to terminal.*/
say         input                                /*   "     "  input data    "    "     */
say copies('═',     length(input) )              /*   "     "  header sep    "    "     */
grpO.=;      grpO.1 = '{'   ;    grpC.1 = "}"    /*pair of grouping symbol: braces      */
             grpO.2 = '['   ;    grpC.2 = "]"    /*  "   "    "       "     brackets    */
             grpO.3 = '('   ;    grpC.3 = ")"    /*  "   "    "       "     parentheses */
             grpO.4 = '«'   ;    grpC.4 = "»"    /*  "   "    "       "     guillemets  */
q.=;            q.1 = "'"   ;       q.2 = '"'    /*1st and 2nd literal string delimiter.*/
#        = 0                                     /*the number of tokens found (so far). */
tabs     = 10                                    /*used for the indenting of the levels.*/
seps     = ',;'                                  /*characters used for separation.      */
atoms    = ' 'seps                               /*     "       "  to  separate atoms.  */
level    = 0                                     /*the current level being processed.   */
quoted   = 0                                     /*quotation level  (for nested quotes).*/
grpU     =                                       /*used to go   up  an expression level.*/
grpD     =                                       /*  "   "  "  down  "     "       "    */
@.=;        do n=1  while grpO.n\==''
            atoms = atoms || grpO.n || grpC.n    /*add Open and Closed groups to  ATOMS.*/
            grpU  = grpU  || grpO.n              /*add Open            groups to  GRPU, */
            grpD  = grpD  || grpC.n              /*add          Closed groups to  GRPD, */
            end   /*n*/                          /* [↑]  handle a bunch of grouping syms*/
literals=
            do k=1  while q.k\=='';  literals= literals || q.k  /*add literal delimiters*/
            end   /*k*/
!=;                                      literalStart=
      do j=1  to length(input);          $= substr(input, j, 1)                              /* ◄■■■■■text parsing*/
                                                                                             /* ◄■■■■■text parsing*/
      if quoted                then do;  !=! || $;    if $==literalStart  then quoted= 0     /* ◄■■■■■text parsing*/
                                         iterate                                             /* ◄■■■■■text parsing*/
                                    end          /* [↑]  handle running  quoted string. */   /* ◄■■■■■text parsing*/
                                                                                             /* ◄■■■■■text parsing*/
      if pos($, literals)\==0  then do;  literalStart= $;      != ! || $;        quoted= 1   /* ◄■■■■■text parsing*/
                                         iterate                                             /* ◄■■■■■text parsing*/
                                    end          /* [↑]  handle start of quoted strring.*/   /* ◄■■■■■text parsing*/
                                                                                             /* ◄■■■■■text parsing*/
      if pos($, atoms)==0      then do;  != ! || $;   iterate;   end    /*is    an atom?*/   /* ◄■■■■■text parsing*/
                               else do;  call add!;   != $;      end    /*isn't  "   " ?*/   /* ◄■■■■■text parsing*/
                                                                                             /* ◄■■■■■text parsing*/
      if pos($, literals)==0   then do;  if pos($, grpU)\==0  then level= level + 1          /* ◄■■■■■text parsing*/
                                         call add!                                           /* ◄■■■■■text parsing*/
                                         if pos($, grpD)\==0  then level= level - 1          /* ◄■■■■■text parsing*/
                                         if level<0  then say  'error, mismatched'   $       /* ◄■■■■■text parsing*/
                                    end                                                      /* ◄■■■■■text parsing*/
      end   /*j*/                                                                            /* ◄■■■■■text parsing*/
                                                                                             /* ◄■■■■■text parsing*/
call add!                                        /*process any residual tokens.         */   /* ◄■■■■■text parsing*/
if level\==0  then say  'error, mismatched grouping symbol'                                  /* ◄■■■■■text parsing*/
if quoted     then say  'error, no end of quoted literal'      literalStart                  /* ◄■■■■■text parsing*/

      do m=1  for #;   say @.m                   /*display the tokens  ───►  terminal.  */
      end   /*m*/
exit 0                                           /*stick a fork in it,  we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────*/
add!: if !=''  then return;   #=#+1;  @.#=left("", max(0, tabs*(level-1)))!;  !=;   return

output when using the default input:

══════════════════════════════input══════════════════════════════
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))
═════════════════════════════════════════════════════════════════
(
          (
          data
           "quoted data"
           123
           4.5
          )
          (
          data
                    (
                    !@#
                              (
                              4.5
                              )
                     "(more"
                     "data)"
                    )
          )
)

Ruby

Works with: Ruby version 1.9

class SExpr
  def initialize(str)
    @original = str
    @data = parse_sexpr(str)
  end
  attr_reader :data, :original
  
  def to_sexpr
    @data.to_sexpr
  end
  
  private
  
  def parse_sexpr(str)
    state = :token_start
    tokens = []
    word = ""
    str.each_char do |char|
      case state
        
      when :token_start
        case char
        when "(" 
          tokens << :lbr
        when ")" 
          tokens << :rbr
        when /\s/
          # do nothing, just consume the whitespace
        when  '"'
          state = :read_quoted_string
          word = ""
        else
          state = :read_string_or_number
          word = char
        end
        
      when :read_quoted_string
        case char
        when '"'
          tokens << word
          state = :token_start
        else
          word << char
        end
        
      when :read_string_or_number
        case char
        when /\s/
          tokens << symbol_or_number(word)
          state = :token_start
        when ')'
          tokens << symbol_or_number(word)
          tokens << :rbr
          state = :token_start
        else
          word << char
        end
      end
    end
    
    sexpr_tokens_to_array(tokens)
  end
  
  def symbol_or_number(word)
    Integer(word)
  rescue ArgumentError
    begin 
      Float(word)
    rescue ArgumentError
      word.to_sym
    end
  end
  
  def sexpr_tokens_to_array(tokens, idx = 0)
    result = []
    while idx < tokens.length
      case tokens[idx]
      when :lbr
        tmp, idx = sexpr_tokens_to_array(tokens, idx + 1)
        result << tmp
      when :rbr
        return [result, idx]
      else 
        result << tokens[idx]
      end
      idx += 1
    end
    result[0]
  end
end

class Object
  def to_sexpr
    self
  end
end

class String
  def to_sexpr
    self.match(/[\s()]/) ? self.inspect : self
  end
end

class Symbol
  alias :to_sexpr :to_s
end

class Array
  def to_sexpr
    "(%s)" % inject([]) {|a, elem| a << elem.to_sexpr}.join(" ")
  end
end


sexpr = SExpr.new <<END
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
END
 
puts "original sexpr:\n#{sexpr.original}"
puts "\nruby data structure:\n#{sexpr.data}"
puts "\nand back to S-Expr:\n#{sexpr.to_sexpr}"

Output:

original sexpr:
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

ruby data structure:
[[:data, "quoted data", 123, 4.5], [:data, [:"!@#", [4.5], "(more", "data)"]]]

and back to S-Expr:
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Rust

lib.rs:

//! This implementation isn't based on anything in particular, although it's probably informed by a
//! lot of Rust's JSON encoding code.  It should be very fast (both encoding and decoding the toy
//! example here takes under a microsecond on my machine) and tries to avoid unnecessary allocation.
//!
//! In a real implementation, most of this would be private, with only a few visible functions, and
//! there would be somewhat nicer signatures (in particular, the fact that `ParseContext` has to be
//! mutable would get annoying in real code pretty quickly, so it would probably be split out).
//!
//! It supports the ability to read individual atoms, not just lists, although whether this is
//! useful is questionable.
//!
//! Caveats: Does not support symbols vs. non-symbols (it wouldn't be hard, but it would greatly
//! complicate setting up our test structure since we'd have to force it to go through functions
//! that checked to make sure `Symbol`s couldn't have spaces, or slow down our parser by checking
//! for this information each time, which is obnoxious).  Does not support string escaping, because
//! the decoding technique doesn't allocate extra space for strings.  Does support numbers, but
//! only float types (supporting more types is possible but would complicate the code
//! significantly).

extern crate typed_arena;

use typed_arena::Arena;

use self::Error::*;
use self::SExp::*;
use self::Token::*;
use std::io;
use std::num::FpCategory;
use std::str::FromStr;

/// The actual `SExp` structure.  Supports `f64`s, lists, and string literals.  Note that it takes
/// everything by reference, rather than owning it--this is mostly done just so we can allocate
/// `SExp`s statically (since we don't have to call `Vec`).  It does complicate the code a bit,
/// requiring us to have a `ParseContext` that holds an arena where lists are actually allocated.
#[derive(PartialEq, Debug)]
pub enum SExp<'a> {
    /// Float literal: 0.5
    F64(f64),

    /// List of SExps: ( a b c)
    List(&'a [SExp<'a>]),

    /// Plain old string literal: "abc"
    Str(&'a str),
}

/// Errors that can be thrown by the parser.
#[derive(PartialEq, Debug)]
pub enum Error {
    /// If the float is `NaN`, `Infinity`, etc.
    NoReprForFloat,

    /// Missing an end double quote during string parsing
    UnterminatedStringLiteral,

    /// Some other kind of I/O error
    Io,

    /// ) appeared where it shouldn't (usually as the first token)
    IncorrectCloseDelimiter,

    /// Usually means a missing ), but could also mean there were no tokens at all.
    UnexpectedEOF,

    /// More tokens after the list is finished, or after a literal if there is no list.
    ExpectedEOF,
}

impl From<io::Error> for Error {
    fn from(_err: io::Error) -> Error {
        Error::Io
    }
}

/// Tokens returned from the token stream.
#[derive(PartialEq, Debug)]
enum Token<'a> {
    /// Left parenthesis
    ListStart,

    /// Right parenthesis
    ListEnd,

    /// String or float literal, quotes removed.
    Literal(SExp<'a>),

    /// Stream is out of tokens.
    Eof,
}

/// An iterator over a string that yields a stream of Tokens.
///
/// Implementation note: it probably seems weird to store first, rest, AND string, since they should
/// all be derivable from string.  But see below.
#[derive(Copy, Clone, Debug)]
struct Tokens<'a> {
    /// The part of the string that still needs to be parsed
    string: &'a str,

    /// The first character to parse
    first: Option<char>,

    /// The rest of the string after the first character
    rest: &'a str,
}

impl<'a> Tokens<'a> {
    /// Initialize a token stream for a given string.
    fn new(string: &str) -> Tokens {
        let mut chars = string.chars();

        match chars.next() {
            Some(ch) => Tokens {
                string,
                first: Some(ch),
                rest: chars.as_str(),
            },
            None => Tokens {
                string,
                first: None,
                rest: string,
            },
        }
    }

    /// Utility function to update information in the iterator.  It might not be performant to keep
    /// rest cached, but there are times where we don't know exactly what string is (at least, not
    /// in a way that we can *safely* reconstruct it without allocating), so we keep both here.
    /// With some unsafe code we could probably get rid of one of them (and maybe first, too).
    fn update(&mut self, string: &'a str) {
        self.string = string;
        let mut chars = self.string.chars();

        if let Some(ch) = chars.next() {
            self.first = Some(ch);
            self.rest = chars.as_str();
        } else {
            self.first = None;
        };
    }

    /// This is where the lexing happens.  Note that it does not handle string escaping.
    fn next_token(&mut self) -> Result<Token<'a>, Error> {
        loop {
            match self.first {
                // List start
                Some('(') => {
                    self.update(self.rest);
                    return Ok(ListStart);
                }
                // List end
                Some(')') => {
                    self.update(self.rest);
                    return Ok(ListEnd);
                }
                // Quoted literal start
                Some('"') => {
                    // Split the string at most once.  This lets us get a
                    // reference to the next piece of the string without having
                    // to loop through the string again.
                    let mut iter = self.rest.splitn(2, '"');
                    // The first time splitn is run it will never return None, so this is safe.
                    let str = iter.next().unwrap();
                    match iter.next() {
                        // Extract the interior of the string without allocating.  If we want to
                        // handle string escaping, we would have to allocate at some point though.
                        Some(s) => {
                            self.update(s);
                            return Ok(Literal(Str(str)));
                        }
                        None => return Err(UnterminatedStringLiteral),
                    }
                }
                // Plain old literal start
                Some(c) => {
                    // Skip whitespace.  This could probably be made more efficient.
                    if c.is_whitespace() {
                        self.update(self.rest);
                        continue;
                    }
                    // Since we've exhausted all other possibilities, this must be a real literal.
                    // Unlike the quoted case, it's not an error to encounter EOF before whitespace.
                    let mut end_ch = None;
                    let str = {
                        let mut iter = self.string.splitn(2, |ch: char| {
                            let term = ch == ')' || ch == '(';
                            if term {
                                end_ch = Some(ch)
                            }
                            term || ch.is_whitespace()
                        });
                        // The first time splitn is run it will never return None, so this is safe.
                        let str = iter.next().unwrap();
                        self.rest = iter.next().unwrap_or("");
                        str
                    };
                    match end_ch {
                        // self.string will be incorrect in the Some(_) case.  The only reason it's
                        // okay is because the next time next() is called in this case, we know it
                        // will be '(' or ')', so it will never reach any code that actually looks
                        // at self.string.  In a real implementation this would be enforced by
                        // visibility rules.
                        Some(_) => self.first = end_ch,
                        None => self.update(self.rest),
                    }
                    return Ok(Literal(parse_literal(str)));
                }
                None => return Ok(Eof),
            }
        }
    }
}

/// This is not the most efficient way to do this, because we end up going over numeric literals
/// twice, but it avoids having to write our own number parsing logic.
fn parse_literal(literal: &str) -> SExp {
    match literal.bytes().next() {
        Some(b'0'..=b'9') | Some(b'-') => match f64::from_str(literal) {
            Ok(f) => F64(f),
            Err(_) => Str(literal),
        },
        _ => Str(literal),
    }
}

/// Parse context, holds information required by the parser (and owns any allocations it makes)
pub struct ParseContext<'a> {
    /// The string being parsed.  Not required, but convenient.
    string: &'a str,

    /// Arena holding any allocations made by the parser.
    arena: Option<Arena<Vec<SExp<'a>>>>,

    /// Stored in the parse context so it can be reused once allocated.
    stack: Vec<Vec<SExp<'a>>>,
}

impl<'a> ParseContext<'a> {
    /// Create a new parse context from a given string
    pub fn new(string: &'a str) -> ParseContext<'a> {
        ParseContext {
            string,
            arena: None,
            stack: Vec::new(),
        }
    }
}

impl<'a> SExp<'a> {
    /// Serialize a SExp.
    fn encode<T: io::Write>(&self, writer: &mut T) -> Result<(), Error> {
        match *self {
            F64(f) => {
                match f.classify() {
                    // We don't want to identify NaN, Infinity, etc. as floats.
                    FpCategory::Normal | FpCategory::Zero => {
                        write!(writer, "{}", f)?;
                        Ok(())
                    }
                    _ => Err(Error::NoReprForFloat),
                }
            }
            List(l) => {
                // Writing a list is very straightforward--write a left parenthesis, then
                // recursively call encode on each member, and then write a right parenthesis.  The
                // only reason the logic is as long as it is is to make sure we don't write
                // unnecessary spaces between parentheses in the zero or one element cases.
                write!(writer, "(")?;
                let mut iter = l.iter();
                if let Some(sexp) = iter.next() {
                    sexp.encode(writer)?;
                    for sexp in iter {
                        write!(writer, " ")?;
                        sexp.encode(writer)?;
                    }
                }
                write!(writer, ")")?;
                Ok(())
            }
            Str(s) => {
                write!(writer, "\"{}\"", s)?;
                Ok(())
            }
        }
    }

    /// Deserialize a SExp.
    pub fn parse(ctx: &'a mut ParseContext<'a>) -> Result<SExp<'a>, Error> {
        ctx.arena = Some(Arena::new());
        // Hopefully this unreachable! gets optimized out, because it should literally be
        // unreachable.
        let arena = match ctx.arena {
            Some(ref mut arena) => arena,
            None => unreachable!(),
        };
        let ParseContext {
            string,
            ref mut stack,
            ..
        } = *ctx;
        // Make sure the stack is cleared--we keep it in the context to avoid unnecessary
        // reallocation between parses (if you need to remember old parse information for a new
        // list, you can pass in a new context).
        stack.clear();
        let mut tokens = Tokens::new(string);
        // First, we check the very first token to see if we're parsing a full list.  It
        // simplifies parsing a lot in the subsequent code if we can assume that.
        let next = tokens.next_token();
        let mut list = match next? {
            ListStart => Vec::new(),
            Literal(s) => {
                return if tokens.next_token()? == Eof {
                    Ok(s)
                } else {
                    Err(ExpectedEOF)
                };
            }
            ListEnd => return Err(IncorrectCloseDelimiter),
            Eof => return Err(UnexpectedEOF),
        };

        // We know we're in a list if we got this far.
        loop {
            let tok = tokens.next_token();
            match tok? {
                ListStart => {
                    // We push the previous context onto our stack when we start reading a new list.
                    stack.push(list);
                    list = Vec::new()
                }
                Literal(s) => list.push(s), // Plain old literal, push it onto the current list
                ListEnd => {
                    match stack.pop() {
                        // Pop the old context off the stack on list end.
                        Some(mut l) => {
                            // We allocate a slot for the current list in our parse context (needed
                            // for safety) before pushing it onto its parent list.
                            l.push(List(&*arena.alloc(list)));
                            // Now reset the current list to the parent list
                            list = l;
                        }
                        // There was nothing on the stack, so we're at the end of the topmost list.
                        // The check to make sure there are no more tokens is required for
                        // correctness.
                        None => {
                            return match tokens.next_token()? {
                                Eof => Ok(List(&*arena.alloc(list))),
                                _ => Err(ExpectedEOF),
                            };
                        }
                    }
                }
                // We encountered an EOF before the list ended--that's an error.
                Eof => return Err(UnexpectedEOF),
            }
        }
    }

    /// Convenience method for the common case where you just want to encode a SExp as a String.
    pub fn buffer_encode(&self) -> Result<String, Error> {
        let mut m = Vec::new();
        self.encode(&mut m)?;
        // Because encode() only ever writes valid UTF-8, we can safely skip the secondary check we
        // normally have to do when converting from Vec<u8> to String.  If we didn't know that the
        // buffer was already UTF-8, we'd want to call container_as_str() here.
        unsafe { Ok(String::from_utf8_unchecked(m)) }
    }
}

pub const SEXP_STRUCT: SExp<'static> = List(&[
    List(&[Str("data"), Str("quoted data"), F64(123.), F64(4.5)]),
    List(&[
        Str("data"),
        List(&[Str("!@#"), List(&[F64(4.5)]), Str("(more"), Str("data)")]),
    ]),
]);

pub const SEXP_STRING_IN: &str = r#"((data "quoted data" 123 4.5)
(data (!@# (4.5) "(more" "data)")))"#;


and main.rs:

use s_expressions::{ParseContext, SExp, SEXP_STRING_IN, SEXP_STRUCT};

fn main() {
    println!("{:?}", SEXP_STRUCT.buffer_encode());
    let ctx = &mut ParseContext::new(SEXP_STRING_IN);
    println!("{:?}", SExp::parse(ctx));
}

Output:

Ok("((\"data\" \"quoted data\" 123 4.5) (\"data\" (\"!@#\" (4.5) \"(more\" \"data)\")))")
Ok(List([List([Str("data"), Str("quoted data"), F64(123.0), F64(4.5)]), List([Str("data"), List([Str("!@#"), List([F64(4.5)]), Str("(more"), Str("data)")])])]))

Scheme

Because Scheme, like all serious lisp implementations, has a native function called read for parsing s-expressions, this code will never be used. It serves more as an example of how to write simple parsers in Scheme. It also forgos turning things into their native scheme representation and uses strings for all atoms of data.

Note that this example includes erroneous closing quotes when checking for #\" because syntax highlighting sucks and no one should have to wade through blocks of red.

Using guile scheme 2.0.11

(define (sexpr-read port)
  (define (help port)
    (let ((char (read-char port)))
      (cond
       ((or (eof-object? char) (eq? char #\) )) '())
       ((eq? char #\( ) (cons (help port) (help port)))
       ((char-whitespace? char) (help port))
       ((eq? char #\"") (cons (quote-read port) (help port)))
       (#t (unread-char char port) (cons (string-read port) (help port))))))
  ; This is needed because the function conses all parsed sexprs onto something,
  ; so the top expression is one level too deep.
  (car (help port)))

(define (quote-read port)
  (define (help port)
    (let ((char (read-char port)))
      (if
       (or (eof-object? char) (eq? char #\""))
       '()
       (cons char (help port)))))
  (list->string (help port)))

(define (string-read port)
  (define (help port)
    (let ((char (read-char port)))
      (cond
       ((or (eof-object? char) (char-whitespace? char)) '())
       ((eq? char #\) ) (unread-char char port) '())
       (#t (cons char (help port))))))
  (list->string (help port)))

(define (format-sexpr expr)
  (define (help expr pad)
    (if
     (list? expr)
     (begin
      (format #t "~a(~%" (make-string pad #\tab))
      (for-each (lambda (x) (help x (1+ pad))) expr)
      (format #t "~a)~%" (make-string pad #\tab)))
     (format #t "~a~a~%" (make-string pad #\tab) expr)))
  (help expr 0))

(format-sexpr (sexpr-read
 (open-input-string "((data \"quoted data\" 123 4.5) (data (!@# (4.5) \"(more\" \"data)\")))")))

Output:

(
	(
		data
		quoted data
		123
		4.5
	)
	(
		data
		(
			!@#
			(
				4.5
			)
			(more
			data)
		)
	)
)

Sidef

Translation of: Perl

func sexpr(txt) {
    txt.trim!

    if (txt.match(/^\((.*)\)$/s)) {|m|
        txt = m[0]
    }
    else {
        die "Invalid: <<#{txt}>>"
    }

    var w
    var ret = []

    while (!txt.is_empty) {
        given (txt.first) {
            when('(') {
                (w, txt) = txt.extract_bracketed('()');
                w = sexpr(w)
            }
            when ('"') {
                (w, txt) = txt.extract_delimited('"')
                w.sub!(/^"(.*)"/, {|s1| s1 })
            }
            else {
                txt.sub!(/^(\S+)/, {|s1| w = s1; '' })
            }
        }
        ret << w
        txt.trim_beg!
    }
    return ret
}

func sexpr2txt(String e) {
    e ~~ /[\s"\(\)]/ ? do { e.gsub!('"', '\\"'); %Q("#{e}") } : e
}

func sexpr2txt(expr) {
    '(' + expr.map {|e| sexpr2txt(e) }.join(' ') + ')'
}

var s = sexpr(%q{

((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))

})

say s               # dump structure
say sexpr2txt(s)    # convert back

Output:

[["data", "quoted data", "123", "4.5"], ["data", ["!\@#", ["4.5"], "(more", "data)"]]]
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Tcl

Note that because Tcl doesn't expose a type system (well, not in a conventional sense) the parts of the parsed out data structure are tagged lists; the first element is one of “string”, “int”, “real” and “atom” to indicate a leaf token, or “list” to indicate a sublist. A “native” data structure could also be generated, but then that would turn things into lists that are not in the original.

package require Tcl 8.5

proc fromSexp {str} {
    set tokenizer {[()]|"(?:[^\\""]|\\.)*"|(?:[^()""\s\\]|\\.)+|[""]}
    set stack {}
    set accum {}
    foreach token [regexp -inline -all $tokenizer $str] {
	if {$token eq "("} {
	    lappend stack $accum
	    set accum {}
	} elseif {$token eq ")"} {
	    if {![llength $stack]} {error "unbalanced"}
	    set accum [list {*}[lindex $stack end] [list list {*}$accum]]
	    set stack [lrange $stack 0 end-1]
	} elseif {$token eq "\""} {
	    error "bad quote"
	} elseif {[string match {"*"} $token]} {
	    set token [string range $token 1 end-1]
	    lappend accum [list string [regsub -all {\\(.)} $token {\1}]]
	} else {
	    if {[string is integer -strict $token]} {
		set type int
	    } elseif {[string is double -strict $token]} {
		set type real
	    } else {
		set type atom
	    }
	    lappend accum [list $type [regsub -all {\\(.)} $token {\1}]]
	}
    }
    if {[llength $stack]} {error "unbalanced"}
    return [lindex $accum 0]
}
proc toSexp {tokList} {
    set content [lassign $tokList type]
    if {$type eq "list"} {
	set s "("
	set sep ""
	foreach bit $content {
	    append s $sep [toSexp $bit]
	    set sep " "
	}
	return [append s ")"]
    } elseif {$type eq "string"} {
	return "\"[regsub -all {[\\""]} [lindex $content 0] {\\\0}]\""
    } else {
	return [lindex $content 0]
    }
}

Demonstrating with the sample data:

set sample {((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))}
set parsed [fromSexp $sample]
puts "sample: $sample"
puts "parsed: $parsed"
puts "regen: [toSexp $parsed]"

Output:

sample: ((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
parsed: list {list {atom data} {string {quoted data}} {int 123} {real 4.5}} {list {atom data} {list {atom !@#} {list {real 4.5}} {string (more} {string data)}}}
regen: ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

As you can see, whitespace is not preserved in non-terminal locations.

TXR

TXR is in the Lisp family, and uses S-Expressions. So right from the system prompt we can do:

$ txr -p '(read)'
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
[Ctrl-D][Enter]
((data "quoted data" 123 4.5) (data (! (sys:var #(4.5)) "(more" "data)")))

However, note that the @ character has a special meaning: @obj turns into (sys:var obj). The purpose of this notation is to support Lisp code that requires meta-variables and meta-expressions. This can be used, for instance, in pattern matching to distinguish binding variables and matching operations from literal syntax.

The following solution avoids "cheating" in this way with the built-in parser; it implements a from-scratch S-exp parser which treats !@# as just a symbol.

The grammar is roughly as follows:

expr := ws? atom
     |  ws? ( ws? expr* ws? )

atom := float | int | sym | str

float := sign? digit+ . digit* exponent?
      |  sign? digit* . digit+ exponent?
      |  sign? digit+ exponent

int := sign? digit+

str := " (\" | anychar )* "

sym := sym-char +

sym-char := /* non-whitespace, but not ( and not ) */

Code:

@(define float (f))@\
  @(local (tok))@\
  @(cases)@\
    @{tok /[+\-]?\d+\.\d*([Ee][+\-]?\d+)?/}@\
  @(or)@\
    @{tok /[+\-]?\d*\.\d+([Ee][+\-]?\d+)?/}@\
  @(or)@\
    @{tok /[+\-]?\d+[Ee][+\-]?\d+/}@\
  @(end)@\
  @(bind f @(flo-str tok))@\
@(end)
@(define int (i))@\
  @(local (tok))@\
  @{tok /[+\-]?\d+/}@\
  @(bind i @(int-str tok))@\
@(end)
@(define sym (s))@\
  @(local (tok))@\
  @{tok /[^\s()]+/}@\
  @(bind s @(intern tok))@\
@(end)
@(define str (s))@\
  @(local (tok))@\
  @{tok /"(\\"|[^"])*"/}@\
  @(bind s @[tok 1..-1])@\
@(end)
@(define atom (a))@\
  @(cases)@\
    @(float a)@(or)@(int a)@(or)@(str a)@(or)@(sym a)@\
  @(end)@\
@(end)
@(define expr (e))@\
  @(cases)@\
    @/\s*/@(atom e)@\
  @(or)@\
    @/\s*\(\s*/@(coll :vars (e))@(expr e)@/\s*/@(last))@(end)@\
  @(end)@\
@(end)
@(freeform)
@(expr e)@junk
@(output)
expr: @(format nil "~s" e)
junk: @junk
@(end)

Run:

$ txr s-expressions.txr -
() 
expr: nil
junk: 
$ txr s-expressions.txr -
3e3
expr: 3000.0
junk: 
$ txr s-expressions.txr -
+3
expr: 3
junk: 
$ txr s-expressions.txr -
abc*
expr: abc*
junk: 
$ txr s-expressions.txr -
abc*)
expr: abc*
junk: )
$ txr s-expressions.txr -
((data "quoted data" 123 4.5)
 (data (!@# (4.5) "(more" "data)")))
expr: ((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))
junk:

TODO: Note that the recognizer for string literals does not actually process the interior escape sequences \"; these remain as part of the string data. The only processing is the stripping of the outer quotes from the lexeme.

Explanation of most confusing line:

    @/\s*\(\s*/@(coll :vars (e))@(expr e)@/\s*/@(last))@(end)

First, we match an open parenthesis that can be embedded in whitespace. Then we have a @(coll) construct which terminates with @(end). This is a repetition construct for collecting zero or more items. The :vars (e) argument makes the collect strict: each repetition must bind the variable e. More importantly, in this case, if nothing is collected, then e gets bound to nil (the empty list). The collect construct does not look at context beyond itself. To terminate the collect at the closing parenthesis we use @(last)). The second closing parenthesis here is literal text to be matched, not TXR syntax. This special clause establishes the terminating context without which the collect will munge all input. When the last clause matches, whatever it matches is consumed and the collect ends. (There is a related @(until) clause which terminates the collect, but leaves its own match unconsumed.)

Wren

Translation of: Kotlin

Library: Wren-pattern

Library: Wren-fmt

import "./pattern" for Pattern
import "./fmt" for Fmt

var INDENT = 2

var parseSExpr = Fn.new { |str|
    var ipat = " \t\n\f\v\r()\""
    var p = Pattern.new("""+0/s["+0^""|(|)|"|+1/I]""", Pattern.within, ipat)
    var t = p.findAll(str).map { |m| m.text }.toList
    if (t.count == 0) return null
    var o = false
    var c = 0
    for (i in t.count-1..0) {
        var ti = t[i].trim()
        var nd = Num.fromString(ti)
        if (ti == "\"") return null
        if (ti == "(") {
            t[i] = "["
            c = c + 1
        } else if (ti == ")") {
            t[i] = "]"
            c = c - 1
        } else if (nd) {
            var ni = Num.fromString(ti)
            t[i] = ni ? ni.toString : nd.toString
        } else if (ti.startsWith("\"")) { // escape embedded double quotes
            var temp = ti[1...-1]
            t[i] = "\"" + temp.replace("\"", "\\\"") + "\""
        }
        if (i > 0 && t[i] != "]" && t[i - 1].trim() != "(") t.insert(i, ", ")
        if (c == 0) {
            if (!o) o = true else return null
        }
    }
    return (c != 0) ? null : t
}

var toSExpr = Fn.new { |tokens|
    for (i in 0...tokens.count) {
        if (tokens[i] == "[") {
            tokens[i] = "("
        } else if (tokens[i] == "]") {
            tokens[i] = ")"
        } else if (tokens[i] == ", ") {
            tokens[i] = " "
        } else if (tokens[i].startsWith("\"")) { // unescape embedded quotes
            var temp = tokens[i][1...-1]
            tokens[i] = "\"" + temp.replace("\\\"", "\"") + "\""
        }
    }
    return tokens.join()
}

var prettyPrint = Fn.new { |tokens|
    var level = 0
    for (t in tokens) {
        var n
        if (t == ", " || t == " ") {
            continue
        } else if (t == "[" || t == "(") {   
            n = level * INDENT + 1
            level = level + 1
        } else if (t == "]" || t == ")") {
            level = level - 1
            n = level * INDENT + 1
        } else {
            n = level * INDENT + t.count
        }
        Fmt.print("$*s", n, t)
    }
}

var str = """((data "quoted data" 123 4.5)""" + "\n" +
          """ (data (!@# (4.5) "(more" "data)")))""" 
var tokens = parseSExpr.call(str)
if (!tokens) {
    System.print("Invalid s-expr!")
} else {
    System.print("Native data structure:")
    System.print(tokens.join())
    System.print("\nNative data structure (pretty print):")   
    prettyPrint.call(tokens)

    System.print("\nRecovered S-Expression:")
    System.print(toSExpr.call(tokens))
    System.print("\nRecovered S-Expression (pretty print):")
    prettyPrint.call(tokens)
}

Output:

Native data structure:
[[data, "quoted data", 123, 4.5], [data, [!@#, [4.5], "(more", "data)"]]]

Native data structure (pretty print):
[
  [
    data
    "quoted data"
    123
    4.5
  ]
  [
    data
    [
      !@#
      [
        4.5
      ]
      "(more"
      "data)"
    ]
  ]
]

Recovered S-Expression:
((data "quoted data" 123 4.5) (data (!@# (4.5) "(more" "data)")))

Recovered S-Expression (pretty print):
(
  (
    data
    "quoted data"
    123
    4.5
  )
  (
    data
    (
      !@#
      (
        4.5
      )
      "(more"
      "data)"
    )
  )
)