Talk:S-expressions: Difference between revisions

(→‎practical solutions preferred: Clarifying where another user's broken-up comment ends and begins-again, with replicated time and signature.)
 
(45 intermediate revisions by 12 users not shown)
Line 8:
the OCaml solution is for example on the border. it doesn't handle numbers and it does not even encode the difference between quoted and unquoted strings.--[[User:EMBee|eMBee]] 01:59, 18 October 2011 (UTC)
 
== Existing Standards ==
=== practical solutions preferred ===
 
People looking for "s-expression" are not finding a specification because this is not a language, but a name for a category of printed syntax from the Lisp culture. This is like looking for a standard which formalizes "reverse polish notation" or "infix". Well, C has infix, Fortran has infix, Java has infix, ...
 
These standard languages have read syntax which is based on S-expressions:
 
Common Lisp: ANSI Standard available in the form of the [[http://www.lispworks.com/documentation/HyperSpec/Front/index.htm Common Lisp HyperSpec]]. The reader syntax is described in [[http://www.lispworks.com/documentation/HyperSpec/Body/02_.htm Syntax]].
:this page describing the [http://www.lispworks.com/documentation/HyperSpec/Body/02_ad.htm Character Syntax Types] seems most interesting.--[[User:EMBee|eMBee]] 02:07, 20 October 2011 (UTC)
::Yes. In Common Lisp, the reading is very simple. Characters are assigned to categories in a read-table (in a context-independent way). The reader takes a character and based on what category it is in, it assembles it into a token, or dispatches a function associated with that character. Donald Knuth's TeX has a similar approach: it also has character categories (which TeX code can manipulate).[[Special:Contributions/24.85.131.247|24.85.131.247]] 02:22, 20 October 2011 (UTC)
:::indeed, i like the approach, it also demonstrates the simplicity of s-expressions (and lisp) syntax. you could not do this as easely with xml or json.--[[User:EMBee|eMBee]] 05:22, 20 October 2011 (UTC)
::Note that there is a category of "non-terminating" macro characters. This means that if they are part of a token, they behave like token constituents. The # macro dispatch character is this way, so for instance abc#def is actually a symbol with the name "ABC#DEF": the special meaning of # is defeated. If it were a terminating character, then its presence would signal the end of the token ABC. The parentheses are terminating, needless to say: (abc) is not a single token, but the same as ( abc ). Lisp tries to allow the programmer to have a lot of freedom in what constitutes a symbol. Just because some symbol has a special meaning doesn't mean we have to inconvenience ourselves by making it difficult to have that character in the middle of a symbol. This is quite important if you're doing domain-specific things where the symbol syntax comes from some outside domain. Another example is the writing of compilers where you have to deal with symbols from another programming language, as well as low-level symbols used by assemblers and linkers. It's not hard to map these to "native" symbols.[[Special:Contributions/24.85.131.247|24.85.131.247]] 02:22, 20 October 2011 (UTC)
:::this is what i thought of doing with the " character. sure, in lisp it is a terminal, but for this simple task it need not be.--[[User:EMBee|eMBee]] 05:22, 20 October 2011 (UTC)
ISO Standard Lisp: Also known as [[http://islisp.info/ ISLisp]].
 
Scheme: Revised [6] Report on the Algorithmic Language Scheme, [[http://www.r6rs.org/ R6RS]].
:syntax is in chapter [http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2 4.2]
 
There are many common elements, and there are differences.
 
In addition to some prominent languages like these, there have been countless dialects. Emacs Lisp has its own particular S-expression syntax, and so does every Lisp dialect ever hacked up.
 
== syntax for S-Expressions ==
it seems there is no formal definition of S-Expressions, and much of the arguments in the discussion on this task seems to revolve around what is required for S-Expressions and what is not.
this [http://people.csail.mit.edu/rivest/Sexp.txt draft RFC] makes an attempt at a definition, but it was not accepted for reasons that are unknown.
: the [http://www-formal.stanford.edu/jmc/recursive/node3.html original definition of S-Expressions] by John McCarthy can be found on his homepage.--[[User:EMBee|eMBee]] 05:28, 4 November 2011 (UTC)
 
for a definition i'd like to separate two aspects:
* the syntactical representation of elements
* the interpretation of these elements
 
=== syntax ===
if we use the task as a starting point we get:
* special characters: ( ) " \
* whitespace: space, tab, newline (carriage return)
* symbol: a sequence of characters without special chars (and without surrounding quotes).
* string: a sequence of characters in quotes. strings may contain special characters and whitespace. only " and \ need to be escaped.
* number: a sequence of characters capable of representing a number. (optional?, do we need to define all possible number representations? (different standards for floating point, hex, octal, etc))
* atom: a symbol, string or number
* S-Expression: a sequence of atoms or S-Expressions separated by whitespace and surrounded by parentheses ()
 
 
do we need others? the rfc above introduces special syntax for hex, and base64 encodings, as well as length-prefix encodings, all of which i consider not needed. there are other standards for hex, and there are other encodings, so i don't see why they need to be treated special.
 
=== interpretation ===
* symbols are different from strings and should be distinguished in the implementation of the parser.
beyond that, the implementation of symbols is not defined. symbols may be implemented in any way suitable for the language. [[wp:String Interning|interning]] is optional. (i'd like to point out that in pike strings are interned. so if i use strings to represent symbols, does that satisfy a requirement for interning symbols?)
* numbers are possibly an interpretation issue and need not be defined in the syntax.
 
very likely this definition is incomplete, possibly even has errors. please add and criticize.--[[User:EMBee|eMBee]] 06:02, 19 October 2011 (UTC)
 
== practical solutions preferred ==
this is not supposed to be an academic exercise in writing a perfect s-expression parser, but it should provide a practical demonstration that serves as a starting point for anyone who really do needs to use an s-expression parser in their project.
 
Line 22 ⟶ 72:
:::::::they are bothering, but i can for example work around unbalanced parenthesis by adding a few extra at the beginning or end, to make the parser happy, and then later analyze where the real error is. try that with xml.
::::::::Decent editors have a Lisp mode for automatically indenting S-expressions, matching parentheses, or selecting subexpressions and moving them around, etc.[[Special:Contributions/192.139.122.42|192.139.122.42]] 22:03, 19 October 2011 (UTC)
:::::::::And there's nXML mode for emacs, for example, which automates indentation and tag closing and so on when editing xml files. --[[User:Rdm|Rdm]]
:::::::of course this has nothing to do with the task or the definition of s-expressions, it just shows that s-expressions are easier to manipulate programmatically which probably is one of the reasons why lisp is as powerful as it is.--[[User:EMBee|eMBee]] 17:09, 19 October 2011 (UTC)
::''[... CONTINUATION] so this mechanism only makes sense for interoperability. And interoperability only makes sense when it's defined in such a way that both my implementation and that of another language are doing the same thing. And that's just not happening here, because that is not how the task is written. --[[User:Rdm|Rdm]] 10:39, 19 October 2011 (UTC)''
::::That said, javascript does actually have some primitive support for json (eval can be used to parse json from a trusted source). --[[User:Rdm|Rdm]] 23:20, 19 October 2011 (UTC)
:::well the task is not written that way because there is no standard. however in this discussion we may be able to flesh out some kind of standard that we can agree on.--[[User:EMBee|eMBee]] 12:04, 19 October 2011 (UTC)
: Will I use what's in this task in a real project? Probably not. Firstly, the task doesn't appear to want symbols (the wording rather actively goes against keeping symbols as symbols).
Line 44 ⟶ 96:
::i have been searching for a definition for s-expressions, and i could not find anything. this one raises some interesting points and has things i disagree with.--[[User:EMBee|eMBee]] 06:02, 19 October 2011 (UTC)
:::Common Lisp (ANSI standard), ISLisp (ISO) and Scheme are programing languages which are based on S-expressions. There are differences between them but they have a lot in common. Rivest's RFC is awful.[[Special:Contributions/192.139.122.42|192.139.122.42]] 22:07, 19 October 2011 (UTC)
 
== Existing Standards ==
 
Claims by people that they are unable to find any specification of S-expressions are exaggerated.
 
Common Lisp: ANSI Standard available in the form of the [[http://www.lispworks.com/documentation/HyperSpec/Front/index.htm Common Lisp HyperSpec]]. The reader syntax is described in [[http://www.lispworks.com/documentation/HyperSpec/Body/02_.htm Syntax]].
 
ISO Standard Lisp: Also known as [[http://islisp.info/ ISLisp]].
 
Scheme: Revised [6] Report on the Algorithmic Language Scheme, [[http://www.r6rs.org/ R6RS]].
 
== syntax for S-Expressions ==
it seems there is no formal definition of S-Expressions, and much of the arguments in the discussion on this task seems to revolve around what is required for S-Expressions and what is not.
this [http://people.csail.mit.edu/rivest/Sexp.txt draft RFC] makes an attempt at a definition, but it was not accepted for reasons that are unknown.
 
for a definition i'd like to separate two aspects:
* the syntactical representation of elements
* the interpretation of these elements
 
=== syntax ===
if we use the task as a starting point we get:
* special characters: ( ) " \
* whitespace: space, tab, newline (carriage return)
* symbol: a sequence of characters without special chars (and without surrounding quotes).
* string: a sequence of characters in quotes. strings may contain special characters and whitespace. only " and \ need to be escaped.
* number: a sequence of characters capable of representing a number. (optional?, do we need to define all possible number representations? (different standards for floating point, hex, octal, etc))
* atom: a symbol, string or number
* S-Expression: a sequence of atoms or S-Expressions separated by whitespace and surrounded by parentheses ()
 
do we need others? the rfc above introduces special syntax for hex, and base64 encodings, as well as length-prefix encodings, all of which i consider not needed. there are other standards for hex, and there are other encodings, so i don't see why they need to be treated special.
 
=== interpretation ===
* symbols are different from strings and should be distinguished in the implementation of the parser.
beyond that, the implementation of symbols is not defined. symbols may be implemented in any way suitable for the language. [[wp:String Interning|interning]] is optional. (i'd like to point out that in pike strings are interned. so if i use strings to represent symbols, does that satisfy a requirement for interning symbols?)
* numbers are possibly an interpretation issue and need not be defined in the syntax.
 
very likely this definition is incomplete, possibly even has errors. please add and criticize.--[[User:EMBee|eMBee]] 06:02, 19 October 2011 (UTC)
 
== Symbols and strings ==
Line 138 ⟶ 153:
:can you elaborate the difference? the solution should easy enough to make it useful practically. this should not be a demo on how complex it is to implement an s-expression parser that follows everyone's wishes, but it should provide a practical starting point for those who really do need an s-expression parser for their project. if the data handling mechanism goes beyond what you would normally do in J, then maybe it isn't the right approach. unfortunately i find J very hard to read, so i can't really understand the code.--[[User:EMBee|eMBee]] 03:52, 19 October 2011 (UTC)
::The difference between what? Between the parser that handles the list syntax and the parser that handles the atoms? If that is what you are asking about: The parser that handles the list syntax returns a list which itself contains things which are either contained parsed lists or strings. The strings are the sequence of characters which appeared in the input -- things like data (four characters), "quoted data" (13 characters) or 123 (three characters). The parser that handles the atoms is responsible for converting "quoted data" to an eleven character string and for converting 123 to an integer. The difference is that the list parser does not concern itself with what the strings represent and the data parser does not concern itself with the list structure. --[[User:Rdm|Rdm]] 22:44, 19 October 2011 (UTC)
:::i meant the difference between "this data handling mechanism" and "the list representation". so what you are saying is that you'd like the task description to explicitly explain that parsing the structure and parsing the data into numbers, strings, symbols etc, should be separate steps?
:::i don't know if it is necessary to make them completely independent steps, but i think it is a good idea to keep them separate enough to that for example the data handling is done by a function call and someone using the code can easily swap out the function with another one. i think the pike solution exemplifies this.
:::how can we write the task description to make this more clear?--[[User:EMBee|eMBee]] 01:51, 20 October 2011 (UTC)
 
==How about concentrating on the example?==
How about ''not'' worrying too much about the fact that S-expressions are mentioned and concentrating on doing enough for a Rosetta Code task.
 
Remember - this shouldn't necessarily be code to be used, as-is, in some other project. There is an example given and, apart from the examples output being couched in terms of Python, there should be enough there:
* Here's an example S-expression,
* Here's what your intermediate format should look like,
* And the final full-circle output should look like the input (apart from whitespace maybe).
 
This would address all the talk here on formats, Lisp, escape characters, and what types of atoms are needed. You would need strings, integers, floats and a nestable list structure to put them in.
 
The task description might need some work on maybe taking the specific reference to Python out and instead explaining that it is a nestable native list structure, etc, etc, .... It could also do with stressing the importance of implementing the task example over any wider considerations. --[[User:Paddy3118|Paddy3118]] 03:52, 20 October 2011 (UTC)
:well, i find the discussion quite enlightening so i don't mind the arguments. i am interested in solutions that are at least usable as a starting point. so it would be nice if they are written in a way that it is possible to swap out parts and extend where needed.
:the python reference is already gone.
:how would you stress the importance of implementing the task?--[[User:EMBee|eMBee]] 04:53, 20 October 2011 (UTC)
 
::There has been too much of the Python taken out I think. The intermediate language specific list structure should have been left in but instead of saying it was Python, just say enough so that implementers knew to use their languages native ints, floats, strings and nestable lists. (They might have to create a [[Linked list|linked list]] data-structure if their language or a library does not produce one).
::Also removed is the notion of taking the S-expression through to the internal representation, then back to an S-expression of the original which I thought was good too. --[[User:Paddy3118|Paddy3118]] 07:08, 20 October 2011 (UTC)
:::huh? the only thing ever removed was: "must, in addition, recognise integers and floating point literals (that are distinguished as digits followed by a decimal point, followed by digits)." replacing it with "should, in addition, recognize numbers if the language has appropriate datatypes." i have also moved that paragraph in the process. this was the last edit on the description [http://rosettacode.org/mw/index.php?title=S-Expressions&diff=next&oldid=123268 (see diff)]
:::i have checked the complete history to make sure i didn't delete something by accident.--[[User:EMBee|eMBee]] 09:06, 20 October 2011 (UTC)
 
::::Hi eMBee, This got lost:
:::::<lang python>[["data", "quoted data", 123, 4.5]
["data", ["!@#", [4.5], "(more", "data)"]]]</lang>
::::--[[User:Paddy3118|Paddy3118]] 18:42, 20 October 2011 (UTC)
:::::oh, now i understand what you mean: keep the list but describe it as a generic nested list, not a python list.
:::::i removed the list intentionally because i think there are now enough examples of how a native structure might look like. at least the lisp solutions, pike, python and ruby produce directly native results. (for the other languages i can't tell how native lists look like.)--[[User:EMBee|eMBee]] 00:22, 21 October 2011 (UTC)
 
== Lisp Solutions ==
 
The lisp solutions demonstrate the steps needed to actually read an s-expression from a string, but for those interested in learning how to write a simple parser in lisp it would also be nice if we could also have solutions that re-implement a parser with basic lisp tools. assume for example the s-expressions are written with [] instead of ().
 
how would a parser look like that turns
<lang lisp>(setq input "[[data \"quoted data\" 123 4.5]
[data [!@# [4.5] \"[more\" \"data]\"]]]")</lang>
into
<lang lisp>((data "quoted data" 123 4.5)
(data (|!@#| (4.5) "[more" "data]")))</lang>
and back?--[[User:EMBee|eMBee]] 09:49, 20 October 2011 (UTC)
:Unfortunately the task isn't to show a parser but to get the S-expression into an internal form that just happens to be native Lisp.[[User:Paddy3118|Paddy3118]] 08:51, 23 June 2012 (UTC)
::the intention of the task was to create a collection of s-expression parsers as well as demonstrate how to write a parser. if the lisp solutions could show how to implement an actual parser similar to what needs to be done in most other languages, i would consider this a nice addition to the current solution. so, yes, if someone wants to implement a full parser go ahead.--[[User:EMBee|eMBee]] 15:18, 24 June 2012 (UTC)
:You should show how easy it is to do in Lisp. S-expressions have been used as a data format where much of the processing is done in languages other than Lisp - for example in the Electronics industry. --[[User:Paddy3118|Paddy3118]] 08:51, 23 June 2012 (UTC)
::Using S-Expressions as a data format in a language other than Lisp?! Those godless heathens!!! =) --[[User:Lhignight|Larry Hignight]] 11:38, 23 June 2012 (UTC)
 
::: I've just looked up the formats I was thinking of. They are [[wp:EDIF|EDIF]] and [[wp:Standard Delay Format|Standard Delay Format]] which are heavily S-expression inspired but not the same as Lisp. --[[User:Paddy3118|Paddy3118]] 21:37, 23 June 2012 (UTC)
 
::Hi Donald, I think the Common Lisp section is solid now. Check it out and let me know what you think. --[[User:Lhignight|Larry Hignight]] 11:38, 23 June 2012 (UTC)
 
:::Hi Larry, I think the Lisp does the task and a bit more which is fine as the extra is explained and close enough to the topic to belong. --[[User:Paddy3118|Paddy3118]] 21:28, 23 June 2012 (UTC)
 
== Writer in separate subtask? ==
Aside from the Python and Ruby implementations, I haven't seen any other languages writing native code back to S-Exps. Perhaps the writer should be spunoff as a separate sub-task? Also, many of the implementations don't show any kind of output. --[[User:Lhignight|Larry Hignight]] 22:28, 22 June 2012 (UTC)
: (moved this out of the lisp section as it seems rather unrelated to it.
: one goal of this task was to provide a collection of code to handle s-expressions. for anyone writing a library both sides should be covered and seperating them may not be worth it. consider the java solution for example. that is real code in production use. splitting out the writer from it would just needlessly make it harder. also in most cases i believe the writer is not very hard so there wouldn't be much gain from separating it out.--[[User:EMBee|eMBee]] 15:18, 24 June 2012 (UTC)
: if any solution is missing a writer then it is incomplete. showing the output would of course also be nice.--[[User:EMBee|eMBee]] 15:23, 24 June 2012 (UTC)
 
== TCL native types ==
 
what would a native structure look like? i don't quite understand the comment: ''A “native” data structure could also be generated, but then that would turn things into lists that are not in the original.''
also, since tcl doesn't have a type system i think the tags are not really needed, except for the quotes.
 
since tcl doesn't need quotes for strings can't the structure be like:
{{data "quoted data" 123 4.5} {data {!@# {4.5} "(more" "data)"}}}
using " as part of the string value?--[[User:EMBee|eMBee]] 10:53, 3 November 2011 (UTC)
: In TCL, "everything is a string" (or list, depending on how you look at it). <code>{a b c}</code> is the same as <code>"a b c"</code>, which will lead to some unfortunate ambiguities. --[[User:Ledrug|Ledrug]] 16:25, 3 November 2011 (UTC)
:: ah, yes, that's a problem. actually, i found that the real problem is that a single element list is indistinguishable from the element itself:
<blockquote><pre>% puts [list [list [list a]]]
a</pre></blockquote>
:: therefore tagging is already needed just to distinguish lists from atoms.--[[User:EMBee|eMBee]] 17:37, 3 November 2011 (UTC)
::: There are dirty ways to distinguish (which I use in the task on JSON) but generally speaking Tcl programmers write their code to not need to ask whether a particular word is a list or terminal; it's not a distinction that they draw. (It's a different attitude.) The "native" representation of the above would be:
::: <pre>{{data {quoted data} 123 4.5} {data {!@# 4.5 (more data)}}}</pre>
:::But as you can see there's a fair loss of information because lists ''are'' a subtype of strings and singleton lists can be rendered like their first element under (common) circumstances. The alternative to putting the tagging inline would have been to have returned a separate type descriptor. (Generating everything from the fully parsed representation is left as an exercise.) –[[User:Dkf|Donal Fellows]] 10:47, 4 November 2011 (UTC)
 
==A bit late to add the extra credit?==
:''"'''Extra Credit:''' Let the writer produce pretty printed output with indenting and line-breaks."''
There are already a large number of solutions to the task. I was thinking of arguing for the removal of the extra credit that has just been added for that reason, but would others give their opinions. Thanks. --[[User:Paddy3118|Paddy3118]] 06:41, 26 October 2012 (UTC)
:: if it wasn't so late i would have added it to the task directly. being late is the reason for it being extra credit. this way the exiting tasks are not invalidated.
:: one could argue to make this a separate task though, but it's debatable which is better. --[[User:EMBee|eMBee]] 07:18, 26 October 2012 (UTC)
 
== extend task for Associative Arrays, Mappings, Dictionarys? ==
 
I'd like to extend this task to support Associative Arrays using the Lisp method of Cons Pairs:
((key1 . value1) (key2 . value2)) for { key1:value1, key2:value2 }
 
i think it is better to add it as Extra Credit to this task instead of creating a new task for it because a new task would require all of the code that this task needs, and then we'll get people wondering if the tasks should be merged.
 
i am aware that (a b) is equivalent to (a . (b)) in Lisp and therefore any list could be treated as an associative array, however for the purpose of this task (and for the usability as an interchangeable data format, we could just require that a pair with a list as value be written as (a . (b))
 
for the same reason a solution in lisp better ought to use a hash type when reading the data.
--[[User:EMBee|eMBee]] 11:07, 26 October 2012 (UTC)
 
== Haskell version failing with GHC 7.10.3 and current Stack ==
Compilation failing with two errors at the moment:
# 'between' is not in scope. (Not included in the imports)
# "Precedence parsing error cannot mix ‘<?>’ [infix 0] and ‘<?>’ [infix 0] in the same infix expression"-- [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 22:12, 27 April 2017 (UTC)
 
== JavaScript version bugfix for \" and \n in strings ==
I found that the present solution doesn't treat escaped double quotes and newlines in strings right (used in KiCad)
 
(property "ki_description" "Power symbol creates a global label with name \"GND\" ,\nexample new line")
 
See below for my "hack" for the procedural version. The functional version is too difficult to read for me - in any case, I hope that the original author comes back to fix it correctly, right now it just works for my purpose
String.prototype.parseSexpr = function () {
//schweick: modified first option in regexp to catch string containing \"
//I just trusted the expression from https://stackoverflow.com/a/30737232 (some others would not work):
var t = this.match(/\s*("(?:[^"\\]*(?:\\.)?)*"|\(|\)|"|[^\s()"]+)/g)
for (var o, c = 0, i = t.length - 1; i >= 0; i--) {
var n, ti = t[i].trim()
if (ti == '"') return
else if (ti == '(') t[i] = '[', c += 1
else if (ti == ')') t[i] = ']', c -= 1
else if ((n = +ti) == ti) t[i] = n
// schweick: inserted two replacements to double escape \", \n in strings
// which otherwise would loose their backslash in toPretty():
else t[i] = '\'' + ti.replace('\'', '\\\'').replace(/\\"/g, '\\\\"').replace(/\\n/g, '\\\\n') + '\''
if (i > 0 && ti != ']' && t[i - 1].trim() != '(') t.splice(i, 0, ',')
if (!c) if (!o) o = true; else return
}
return c ? undefined : eval(t.join(''))
}
Please delete my comment above when taken care of on the main page
[[User:Schweick|Schweick]] ([[User talk:Schweick|talk]]) 09:17, 25 January 2023 (UTC)
3

edits