Talk:Brace expansion: Difference between revisions

From Rosetta Code
Content added Content deleted
Line 66: Line 66:
::::: What's scary about it? The rules for ''identifying'' (parsing) the layers, and the rules for how to ''expand'' the string based on this information, are conceptually two different things...
::::: What's scary about it? The rules for ''identifying'' (parsing) the layers, and the rules for how to ''expand'' the string based on this information, are conceptually two different things...
::::: '''Parsing''' happens by finding valid balanced braces <small>(using the given information that closing braces match the *nearest* opening brace to their left, and that groups without at least 2 alternatives are not valid)</small>.
::::: '''Parsing''' happens by finding valid balanced braces <small>(using the given information that closing braces match the *nearest* opening brace to their left, and that groups without at least 2 alternatives are not valid)</small>.
::::: '''Expansion''' happens recursively (<small>i.e. from the inside out</small>), and cumulatively (<small>in case of multiple groups on the same layer</small>).
::::: '''Expansion''' happens recursively (<small>in case of nested groups</small>), as well as cumulatively (<small>in case of multiple groups on the same layer</small>).
::::: This is all explained in the spec, and demonstrated by the test cases.
::::: This is all explained in the spec, and demonstrated by the test cases.
::::: Regarding "''first identify the nesting, [...] and postpone expansion until all brace expansions have been identified''", you're free to do that if you feel that's the best way to solve the task in your language. In fact the Perl 6 solution does it that way: It uses a grammar to find where all the nested brace group parts are, and then passes the resulting AST tree to a recursive function which does the expansion. It's not a "requirement" though: The Perl and Python solutions both do identification and expansion in one go ''(albeit in rather different ways)''. The only requirement is that the function does the right thing; how it does it is up to you (based on what you think works best in your language).
::::: Regarding "''first identify the nesting, [...] and postpone expansion until all brace expansions have been identified''", you're free to do that if you feel that's the best way to solve the task in your language. In fact the Perl 6 solution does it that way: It uses a grammar to find where all the nested brace group parts are, and then passes the resulting AST tree to a recursive function which does the expansion. It's not a "requirement" though: The Perl and Python solutions both do identification and expansion in one go ''(albeit in rather different ways)''. The only requirement is that the function does the right thing; how it does it is up to you (based on what you think works best in your language).

Revision as of 22:17, 31 January 2014

Task is ready for accepting solutions!

As far as I'm concerned, the task description is stable. Feel free to add solutions for other languages, and if you have any problems with the task then drop a line here! --Smls (talk) 17:49, 25 January 2014 (UTC)

There appears to be a bug in the task. Your third test case should have backslashes on the comma by the "always leave in the backslash" rule. See the Perl 6 output. --TimToady (talk) 02:02, 26 January 2014 (UTC)
I edited the task to add the backslashes, after verifying that the Perl code actually does include the backslashes before comma. --TimToady (talk) 02:53, 26 January 2014 (UTC)
Oops, sorry about that, I verified the Perl function using a script that automatically tested it against a local copy of the test cases (and a bunch more), where I did add the backslashes. Thanks for fixing the test case here. --Smls (talk) 10:16, 26 January 2014 (UTC)
Also, with the toy output, it's not at all clear that the current Perl or Python implementations are doing this according to spec. The Perl code claims to follow the spec, but the spec is missing backslashes on the commas. The Python code talks about barfing on code that the specification says it should accept. Maybe we should require a bit more rigor in proving solutions actually pass the spec for backslashed and not-really-meta characters. --TimToady (talk) 02:22, 26 January 2014 (UTC)
Regarding the python code: the commas not in bracers can be parsed as literal chars, but the unmatched bracers as specified by task is not workable. How do you parse "{a,{b,c}" ? As "{a", "b", "c", or as "a", "{b", "c"? Same for closing bracers. This is not just a problem with descending parsers, it's simply ambiguous and counterintuitive, so best treated as a syntax error IMO. The part of spec about "{a}" parsed literally is also not done in python, which can take some workaround but is again not intuitive. --Ledrug (talk) 02:37, 26 January 2014 (UTC)
I understand that Python culture tends more toward throwing an exception if in doubt, but I think the intent of the task was to emulate the shell's notion of how to do this, hence the failsoft aspects, which are pretty easy to do with a decent backtracking engine. It's certainly straighforward in the Perl 6 solution, once I figured out the top level parses with slightly different rules than sublevels do. And it was certainly specced how it was supposed to treat weird characters. --TimToady (talk) 02:49, 26 January 2014 (UTC)
Well, I'm not really with the python culture, but that's not important. I can see why the spec is setup this way, but the fact remains that the rule about unmatched bracers is underdefined. The spec should say which bracers would be considered extra for consistency across implementations. I'll withdraw the python code for now since it does fail to follow other rules, while the task hopefully can get some clarification. --Ledrug (talk) 03:19, 26 January 2014 (UTC)
Another thing, what's supposed to happen when the string ends with an extra "\"? --Ledrug (talk) 03:29, 26 January 2014 (UTC)
I've attempted to clarify what to do in the case of ambiguity; in general, the closest brace that makes sense in context is used. I suppose a backslash at the end of the string should just be taken as a literal. --TimToady (talk) 04:00, 26 January 2014 (UTC)
Yeah, I thought the "closest brace" disambiguation was implied by "} closes the last opened alternation group " and all the talk about "balanced" brace pairs, but it's a good thing you added a more explicit clarification. As for it being "counterintuitive", well, it's how my shell (zsh) handles unbalanced braces, so at least to me it's intuitive... :) --Smls (talk) 10:16, 26 January 2014 (UTC)
Regarding backslash at the end of the string, I semi-purposefully left that case unspecced, because such inputs may well be impossible in real-life scenarios, and I didn't want to force implementors to complicate their solutions because of it. But I did implement it as "interpret literally and pass along to output" in the Perl solution, just in case. If you think it makes sense as a hard requirement, I'm OK with having it in the spec. Although I would also be fine with something like "You may assume that input strings will not end with an unescaped backslash" in the spec. --Smls (talk) 10:16, 26 January 2014 (UTC)
TimToady is right, I did intend for fault-tolerant parsing (as opposed to throwing an error on malformed input) to be a core aspect of this task (see the "Motivation" section). Of course if it's an unreasonable requirement (e.g. making the task too big/difficult) it could be dropped, but I think that would make the task much less interesting (and there are already many other Rosetta Code tasks that deal with strict parsing). Note that I managed to implement all the requirements in the Perl solution, with plain while loops and a non-backtracking regex (which could even be replaced by low-level 'substr' operations without much difficulty), so it should be possible in Python as well. --Smls (talk) 10:17, 26 January 2014 (UTC)
Hm yeah, I guess requiring solutions to demonstrate the four test cases (with a listing of the actual output), instead of some "toy output", might make sense after all (even if it will make the page very long once there are many implementations). Btw, that Perl 6 solution looks pretty sweet... :) --Smls (talk) 10:16, 26 January 2014 (UTC)
I updated both the task description and the Perl solution accordingly now. --Smls (talk) 10:57, 26 January 2014 (UTC)

Duplicate supression

The task currently has an ambiguity - specifically an explicit reference to an implementation (perl 6) which implements a requirement which is not explicitly stated in the task description.

More specifically, the perl 6 implementation suppresses duplicate results despite no description of this mechanism in the current task description. Consider {a,a,b} as an example which would generate duplicates. This issue shows up in the test case '{,{,gotta have{ ,\, again\, }}more }cowbell!' where nested empty prefixes appear. This case should have eight expansions. You can easily see this by placing a letter to the left of each non-escaped comma.

This means either the task description is buggy or the reference implementation is buggy. --Rdm (talk) 18:31, 31 January 2014 (UTC)

Well, I believe the task actually refers to the Perl (5) implementation, not the Perl 6 one. But leaving that aside, there's no duplicate suppression taking place here. Note what happens when I run these (through the Perl 6 version):
{X,{Y,gotta have{ ,\, again\, }}more }cowbell!
    Xcowbell!
    Ymore cowbell!
    gotta have more cowbell!
    gotta have\, again\, more cowbell!

{a,a,b}
    a
    a
    b
We still get four entries for the first one, and the dupes aren't removed from the second one. I don't know what you're doing different, offhand, perhaps not throwing away stuff when you backtrack? --TimToady (talk) 18:54, 31 January 2014 (UTC)
Actually, there does seem to be a bug in the latest revision of the Perl 6 solution (which wasn't present in the original revision). Either that, or the Rakudo Star version I'm using to test it is too old or something. Because without the inserted X / Y, it only prints 2 expansions for that test case. --Smls (talk) 19:07, 31 January 2014 (UTC)
You don't have latest version, which I fixed in the last hour. --TimToady (talk) 19:13, 31 January 2014 (UTC)
You're right, the latest revision does in fact work correctly again. My mistake. --Smls (talk) 19:22, 31 January 2014 (UTC)
The reference solution (Perl, not Perl 6) does agree with the spec: There should be no suppression of duplicates. As for the test case you mention, it has four expansions and not eight, because nested brace groups only expand the "branch" they belong to. For example, the pattern aa{,{,11}cc}22 has three expansions (aa22  aacc22  aa11cc22), not four. --Smls (talk) 18:58, 31 January 2014 (UTC)
That's a good point, that brace expansion should be only relevant in the surrounding context where it appears. So that's my bug and hopefully I can find it now. Thank you. --~~
Actually... after thinking about this further... I still think it's a problem.
Let's take aa{,{,11}cc}22 - we expand the inner braces and get aa{,cc}22 and aa{,11cc}22. This should then expand into aa22 aacc22 aa22 and aa11cc22. It seems to me that doing otherwise violates the specification that the inner brace be expanded first. Thoughts? Thanks. --Rdm (talk) 20:39, 31 January 2014 (UTC)
The spec doesn't state that. In fact, when you've only expanded an inner brace group, you cannot give an intermediate result for the whole string yet, only a result for the "parent alternative" that the inner group belongs to. Another way to look at this, is that you (conceptually) expand the outer layer first, then separately expand each resulting alternative, and so on:
aa{,{,11}cc}22  ----->  aa22
  ^^^^^^^^^^      \
                   `->  aa{,11}cc22  ----->  aacc22
                          ^^^^^        \
                                        `->  aa11cc22
The spec comes at it from a slightly different angle, but it evaluates to the same thing: "The rules for parsing/expanding the whole string also apply recursively to each alternative of an alternation group" - which, together with the given examples, should be unambiguous.
--Smls (talk) 21:28, 31 January 2014 (UTC)
This line of thinking scares me. It seems to me that you cannot reliably identify the outer layer, in the general case, until after you have identified the inner layers.
That said, perhaps you are suggesting a system where we first identify the nesting, from the inside out, and postpone expansion until all brace expansions have been identified. That would be relatively straightforward to implement and would explain the stack data structures I've noticed in other implementations. Still, if that is the requirement I would like it to be stated explicitly rather than implicitly or through a reference implementation. --Rdm (talk) 21:34, 31 January 2014 (UTC)
What's scary about it? The rules for identifying (parsing) the layers, and the rules for how to expand the string based on this information, are conceptually two different things...
Parsing happens by finding valid balanced braces (using the given information that closing braces match the *nearest* opening brace to their left, and that groups without at least 2 alternatives are not valid).
Expansion happens recursively (in case of nested groups), as well as cumulatively (in case of multiple groups on the same layer).
This is all explained in the spec, and demonstrated by the test cases.
Regarding "first identify the nesting, [...] and postpone expansion until all brace expansions have been identified", you're free to do that if you feel that's the best way to solve the task in your language. In fact the Perl 6 solution does it that way: It uses a grammar to find where all the nested brace group parts are, and then passes the resulting AST tree to a recursive function which does the expansion. It's not a "requirement" though: The Perl and Python solutions both do identification and expansion in one go (albeit in rather different ways). The only requirement is that the function does the right thing; how it does it is up to you (based on what you think works best in your language).
--Smls (talk) 22:01, 31 January 2014 (UTC)