Talk:Strip block comments: Difference between revisions

From Rosetta Code
Content added Content deleted
Line 9: Line 9:
::: I've rewritten the task definition so that it is much more explicit about what is to be stripped, and I've made user-supplied delimiters into an extra-credit thing. (Some types of solution can do it easily and others cannot, but both are valid solutions to the core of the task IMO.) –[[User:Dkf|Donal Fellows]] 09:51, 11 November 2010 (UTC)
::: I've rewritten the task definition so that it is much more explicit about what is to be stripped, and I've made user-supplied delimiters into an extra-credit thing. (Some types of solution can do it easily and others cannot, but both are valid solutions to the core of the task IMO.) –[[User:Dkf|Donal Fellows]] 09:51, 11 November 2010 (UTC)


:::: Note that the issue with parameterized delimiters is that they have syntax associated with them, and syntax does not parameterize well. (For example D and Cobra allow comments to be nested. And passing delimiters as strings would not work well for Lua.) --[[User:Rdm|Rdm]] 11:52, 11 November 2010 (UTC)
:::: Note that the issue with parameterized delimiters is that they have syntax associated with them, and syntax does not parameterize well. (For example D and Cobra allow comments to be nested. And passing delimiters as strings would not work well for Lua's block comments.) --[[User:Rdm|Rdm]] 11:52, 11 November 2010 (UTC)


== Definition of delimiters? ==
== Definition of delimiters? ==

Revision as of 12:29, 11 November 2010

State Machine

Note that this problem definition strongly favors a sequential state machine implementation. In particular, the example treatment for /*/ stuff */ adds complexity to a parallel implementation. --Rdm 18:25, 2 November 2010 (UTC)

We could bump it back to draft. Do you have a better definition? --Michael Mol 18:40, 2 November 2010 (UTC)
I believe the intent here is to emulate the treatment of comments in existing languages. Note also that no example treatment is given for cases like /*/*/*/*/*/ though presumably that sequence is equivalent to a single asterisk?
In any event, I think that the specification should reflect the rigid character of the desired end. I would eliminate the requirement that the comment delimiters be passed as parameters (or expand the definition of those parameters to include their aspects which have been implicitly specified... but I can not see a good way to do that). I would also include explicit examples for the treatment of character sequences which look like delimiters but are not delimiters because of their position.
If my interpretation is acceptable, I could tackle re-writing the task specification. --Rdm 18:47, 2 November 2010 (UTC)
I've rewritten the task definition so that it is much more explicit about what is to be stripped, and I've made user-supplied delimiters into an extra-credit thing. (Some types of solution can do it easily and others cannot, but both are valid solutions to the core of the task IMO.) –Donal Fellows 09:51, 11 November 2010 (UTC)
Note that the issue with parameterized delimiters is that they have syntax associated with them, and syntax does not parameterize well. (For example D and Cobra allow comments to be nested. And passing delimiters as strings would not work well for Lua's block comments.) --Rdm 11:52, 11 November 2010 (UTC)

Definition of delimiters?

Do we have a (constant) definition of the delimiters to use or are they parameters to the stripping function? This is important because it leads to quite different solutions… –Donal Fellows 09:56, 3 November 2010 (UTC)

J draft

I have not seen any advancement on this task, so I threw together a quick example where the comment delimiters are hardcoded (which, in my opinion, is a good design decision for this task).

That said, the code uses a state machine, so probably deserves a bit of comment.

First, here is the version of the code I am commenting on. (The main page might easily be updated with a different version):

<lang j>str=:#~1 0 _1*./@:(|."0 1)2>4{"1(5;(0,"0~".;._2]0 :0);'/*'i.a.)&;:

 1 0 0
 0 2 0
 2 3 2
 0 2 2

)</lang>

The core of this code is a state machine processing a sequence of characters. It first classifies characters into three classes: '/', '*' and everything else. These character classes correspond to the three columns of numbers you see there. ('/' corresponds to the left column and '*' corresponds to the middle column.)

The rows of numbers correspond to states. State 0 corresponds to the top row, state 1 corresponds to the next row, ..., state 3 corresponds to the bottom row.

The previous character's state (which is initially 0) and current character class are used to determine the current character's state. (These "new state numbers" are the numeric values you see arranged in that four row table.)

We then find characters in the original text which have a state less than 2 and whose neighbors on both sides also had a state less than 2.

And we throw out everything else.