Talk:Strip comments from a string: Difference between revisions

(→‎29 of 36 languages were incorrect?: added a comment about assuming incorrectness.)
 
(2 intermediate revisions by 2 users not shown)
Line 1:
==General==
Two thoughts. First, wouldn't a comment notation supporting ranges (i.e. /* ... */) as well as truncate tokens ( #, ;, // ...) be more interesting? Otherwise, I'd suggest renaming this task to [[Truncate a String]]. Second (and this is just an idle idea more than anything else), a task for which a language stripped comments (per its own language's rules; // and # for PHP, // and /* */ for C++, etc) would be ''very'' interesting, as it combines demonstrating string processing as well as the language's own comment syntax. --[[User:Short Circuit|Michael Mol]] 13:53, 30 October 2010 (UTC)
:At the very least, the task should define what a comment is. (And, with a nod towards the "comment character inside string" issue, below, what a comment is not...) --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 05:13, 27 July 2022 (UTC)
 
=== Check your data ===
I have done this sort of thing in the past and the problem statement works if the format of what is being parsed does ''not'' allow the comment indicating character to be part of valid data.
 
Line 28 ⟶ 29:
:So far, it seems like comments may contain comment characters, but that the non-comment text cannot quote or escape comment characters. This does not seem very useful. On the other hand, normally comment stripping happens inside of a parser which has mechanics to ignore comment characters when they appear in the wrong context. So the interesting task here is probably [[Parse_EBNF]]. --[[User:Rdm|Rdm]] 17:37, 27 July 2011 (UTC)
::Except [[Parse_EBNF]] task is a little ''too'' elaborate. Here we want a parser based on some BNF, while that task requires ''creating'' a parser based on some BNF, a program writing a program sort of thing. --[[User:Ledrug|Ledrug]] 18:53, 27 July 2011 (UTC)
:::Jump over strings is easy. Get start position of each delimiter and take the lowest one. Test if the count of the string-delimiters on the left side is odd - than the string is open - and search for the closing delimiter. After that point start searching again.--[[User:MichaelWodrich|MichaelWodrich]] ([[User talk:MichaelWodrich|talk]]) 00:01, 29 October 2020 (UTC)
 
== White space ==
Line 54 ⟶ 56:
<br>Sometimes, white space includes such things like:
 
::* &nbsp; blank(s)
::* &nbsp; sp &nbsp; &nbsp; (space)
::* &nbsp; ht &nbsp; &nbsp; (horizontal tab)
::* &nbsp; tab &nbsp; &nbsp; (usually, the same as HT)
::* &nbsp; vt &nbsp; &nbsp; (vertical tab)
::* &nbsp; cr &nbsp; &nbsp; (carriage return)
::* &nbsp; ff &nbsp; &nbsp; (form feed)
::* &nbsp; np &nbsp; &nbsp; (new page)
::* &nbsp; lf &nbsp; &nbsp; (line feed)
::* &nbsp; nl &nbsp; &nbsp; (new line)
::* &nbsp; nul &nbsp; &nbsp; (null character)
::* &nbsp; esc &nbsp; &nbsp; (escape)
::* &nbsp; eof &nbsp; &nbsp; (end-of-file)
::* &nbsp; can &nbsp; &nbsp; (cancel)
::* &nbsp; bel &nbsp; &nbsp; (bell)
::* &nbsp; bs &nbsp; &nbsp; (backspace)
<br>
::* &nbsp; soh &nbsp; &nbsp; (start of heading, console interrupt)
::* &nbsp; eot &nbsp; &nbsp; (end of transmission)
::* &nbsp; etx &nbsp; &nbsp; (end of text)
::* &nbsp; enq &nbsp; &nbsp; (enquiry)
::* &nbsp; ack &nbsp; &nbsp; (acknowledge)
::* &nbsp; si &nbsp; &nbsp; (shift in)
::* &nbsp; so &nbsp; &nbsp; (shift out)
::* &nbsp; etb &nbsp; &nbsp; (end of transmission block)
::* &nbsp; syn &nbsp; &nbsp; (synchronous idle)
::* &nbsp; dle &nbsp; &nbsp; (data link escape)
::* &nbsp; dc1 &nbsp; &nbsp; (device control 1)
::* &nbsp; dc2 &nbsp; &nbsp; (device control 2)
::* &nbsp; dc3 &nbsp; &nbsp; (device control 3)
::* &nbsp; dc4 &nbsp; &nbsp; (device control 4)
::* &nbsp; em &nbsp; &nbsp; (end of medium)
::* &nbsp; fs &nbsp; &nbsp; (file separator)
::* &nbsp; gs &nbsp; &nbsp; (group separator)
::* &nbsp; rs &nbsp; &nbsp; (record separator)
::* &nbsp; us &nbsp; &nbsp; (unit separator)
::* &nbsp; del &nbsp; &nbsp; (delete)
 
<br>
Of the above, the first sixteen or so are commonly known and used. Essentially, anything below a '''blank''' in ASCII or EBCDIC &nbsp; ''may'' &nbsp; be considered a control character, and in addition, ASCII also has '7f'x (DEL). Note also that some control codes have more than one mnemonic just to keep things interesting.
<br> I think whitespace (in the task's description should be defined or the word '''BLANKS''' should be used instead.
<br>It appears that most languages seem to trim blanks, not white space anyway. -- [[User:Gerard Schildberger|Gerard Schildberger]] 19:45, 3 September 2012 (UTC)
6,962

edits