Talk:Strip comments from a string: Difference between revisions

(→‎29 of 36 languages were incorrect?: added subroutine/function names to comment. -- ~~~~)
 
(4 intermediate revisions by 2 users not shown)
Line 1:
==General==
Two thoughts. First, wouldn't a comment notation supporting ranges (i.e. /* ... */) as well as truncate tokens ( #, ;, // ...) be more interesting? Otherwise, I'd suggest renaming this task to [[Truncate a String]]. Second (and this is just an idle idea more than anything else), a task for which a language stripped comments (per its own language's rules; // and # for PHP, // and /* */ for C++, etc) would be ''very'' interesting, as it combines demonstrating string processing as well as the language's own comment syntax. --[[User:Short Circuit|Michael Mol]] 13:53, 30 October 2010 (UTC)
:At the very least, the task should define what a comment is. (And, with a nod towards the "comment character inside string" issue, below, what a comment is not...) --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 05:13, 27 July 2022 (UTC)
 
=== Check your data ===
I have done this sort of thing in the past and the problem statement works if the format of what is being parsed does ''not'' allow the comment indicating character to be part of valid data.
 
As soon as you start to get, for example, arbitrary character strings then you need a more sophisticated parser that allows a comment marker character to appear in a string without the parser treating it as the start of a comment. --[[User:Paddy3118|Paddy3118]] 04:40, 12 December 2010 (UTC)
 
: ... Except that some languages treat nested comments (comments within comments) as legal. PL/I doesn't, REXX does, for instance.   It would be interesting to note which languages support nested comments. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 21:34, 26 April 2013 (UTC)
 
: Having nested comments allows a programmer to ''comment-out'' large sections of code, which of course, most assuredly contain comments. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 21:34, 26 April 2013 (UTC)
 
==Wayward space==
Line 24 ⟶ 29:
:So far, it seems like comments may contain comment characters, but that the non-comment text cannot quote or escape comment characters. This does not seem very useful. On the other hand, normally comment stripping happens inside of a parser which has mechanics to ignore comment characters when they appear in the wrong context. So the interesting task here is probably [[Parse_EBNF]]. --[[User:Rdm|Rdm]] 17:37, 27 July 2011 (UTC)
::Except [[Parse_EBNF]] task is a little ''too'' elaborate. Here we want a parser based on some BNF, while that task requires ''creating'' a parser based on some BNF, a program writing a program sort of thing. --[[User:Ledrug|Ledrug]] 18:53, 27 July 2011 (UTC)
:::Jump over strings is easy. Get start position of each delimiter and take the lowest one. Test if the count of the string-delimiters on the left side is odd - than the string is open - and search for the closing delimiter. After that point start searching again.--[[User:MichaelWodrich|MichaelWodrich]] ([[User talk:MichaelWodrich|talk]]) 00:01, 29 October 2020 (UTC)
 
== White space ==
Line 32 ⟶ 38:
=== 29 of 36 languages were incorrect? ===
At [http://rosettacode.org/mw/index.php?title=Strip_comments_from_a_string&oldid=119409 2 September 2011], I verified that 8 languages (C, C++, Java, Perl, Python, Ruby, sed, UNIX Shell) were incorrect, all for not trimming whitespace by 29 March 2011 rules. I suspect that 21 more languages (Ada, ALGOL 68, AutoHotKey, Clojure, D, Delphi, F#, Fantom, Fortran, Go, Haskell, Icon and Unicon, Inform 7, Lua, OCaml, PicoLisp, PL/I, PureBasic, REXX, Scheme, TUSCRIPT) might be incorrect for the same reason.
 
:: You may suspect incorrectness all you want, but the REXX example does indeed trim whitespace.   It's better to ask before assuming errors.   -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 22:27, 15 July 2015 (UTC)
 
Several contributors solved the task before 29 March 2011, when the current rules appeared. I believe that most of the existing solutions became incorrect at 29 March 2011.
Line 48 ⟶ 56:
<br>Sometimes, white space includes such things like:
 
::* &nbsp; blank(s)
::* &nbsp; sp &nbsp; &nbsp; (space)
::* &nbsp; ht &nbsp; &nbsp; (horizontal tab)
::* &nbsp; tab &nbsp; &nbsp; (usually, the same as HT)
::* &nbsp; vt &nbsp; &nbsp; (vertical tab)
::* &nbsp; cr &nbsp; &nbsp; (carriage return)
::* &nbsp; ff &nbsp; &nbsp; (form feed)
::* &nbsp; np &nbsp; &nbsp; (new page)
::* &nbsp; lf &nbsp; &nbsp; (line feed)
::* &nbsp; nl &nbsp; &nbsp; (new line)
::* &nbsp; nul &nbsp; &nbsp; (null character)
::* &nbsp; esc &nbsp; &nbsp; (escape)
::* &nbsp; eof &nbsp; &nbsp; (end-of-file)
::* &nbsp; can &nbsp; &nbsp; (cancel)
::* &nbsp; bel &nbsp; &nbsp; (bell)
::* &nbsp; bs &nbsp; &nbsp; (backspace)
<br>
::* &nbsp; soh &nbsp; &nbsp; (start of heading, console interrupt)
::* &nbsp; eot &nbsp; &nbsp; (end of transmission)
::* &nbsp; etx &nbsp; &nbsp; (end of text)
::* &nbsp; enq &nbsp; &nbsp; (enquiry)
::* &nbsp; ack &nbsp; &nbsp; (acknowledge)
::* &nbsp; si &nbsp; &nbsp; (shift in)
::* &nbsp; so &nbsp; &nbsp; (shift out)
::* &nbsp; etb &nbsp; &nbsp; (end of transmission block)
::* &nbsp; syn &nbsp; &nbsp; (synchronous idle)
::* &nbsp; dle &nbsp; &nbsp; (data link escape)
::* &nbsp; dc1 &nbsp; &nbsp; (device control 1)
::* &nbsp; dc2 &nbsp; &nbsp; (device control 2)
::* &nbsp; dc3 &nbsp; &nbsp; (device control 3)
::* &nbsp; dc4 &nbsp; &nbsp; (device control 4)
::* &nbsp; em &nbsp; &nbsp; (end of medium)
::* &nbsp; fs &nbsp; &nbsp; (file separator)
::* &nbsp; gs &nbsp; &nbsp; (group separator)
::* &nbsp; rs &nbsp; &nbsp; (record separator)
::* &nbsp; us &nbsp; &nbsp; (unit separator)
::* &nbsp; del &nbsp; &nbsp; (delete)
 
<br>
Of the above, the first sixteen or so are commonly known and used. Essentially, anything below a '''blank''' in ASCII or EBCDIC &nbsp; ''may'' &nbsp; be considered a control character, and in addition, ASCII also has '7f'x (DEL). Note also that some control codes have more than one mnemonic just to keep things interesting.
<br> I think whitespace (in the task's description should be defined or the word '''BLANKS''' should be used instead.
<br>It appears that most languages seem to trim blanks, not white space anyway. -- [[User:Gerard Schildberger|Gerard Schildberger]] 19:45, 3 September 2012 (UTC)
6,951

edits