Strip comments from a string: Difference between revisions

Content added Content deleted

Inline

Revision as of 19:05, 20 November 2010

The task is to remove text that follow any of a set of comment markers, (in these examples either a hash or a semicolon) from a string or input line.

The following examples will be truncated to just "apples, pears":

apples, pears # and bananas
apples, pears ; and bananas

D

<lang d>import std.stdio, std.regexp ; string remove1LineComment(string s, string pat = ";#") {

   return sub(s, `([^`~pat~`]*)([`~pat~`])[^\n\r]*([\n\r]|$)`, `$1$3`, "gm") ;

} void main() {

   string s = "apples, pears # and bananas

apples, pears ; and bananas " ;

   writefln("%s\n====>\n%s", s, remove1LineComment(s)) ;

}</lang> output:

apples, pears # and bananas
apples, pears ; and bananas
====>
apples, pears
apples, pears

Fortran

<lang fortran> !****************************************************

module string_routines

!****************************************************

implicit none
private
public :: strip_comments
contains

!****************************************************

function strip_comments(str,c) result(str2) implicit none character(len=*),intent(in) :: str character(len=1),intent(in) :: c !comment character character(len=len(str)) :: str2

integer :: i

i = index(str,c) if (i>0) then str2 = str(1:i-1) else str2 = str end if

end function strip_comments

!****************************************************

end module string_routines

!****************************************************

program main

!**************************************************** ! Example use of strip_comments function !****************************************************

use string_routines, only: strip_comments
implicit none

write(*,*) strip_comments('apples, pears # and bananas', '#')
write(*,*) strip_comments('apples, pears ; and bananas', ';')

!****************************************************

end program main

!**************************************************** </lang>

output:

apples, pears
apples, pears

Inform 7

<lang inform7>Home is a room.

When play begins: strip comments from "apples, pears # and bananas"; strip comments from "apples, pears ; and bananas"; end the story.

To strip comments from (T - indexed text): say "[T] -> "; replace the regular expression "<#;>.*$" in T with ""; say "[T][line break]".</lang>

Since square brackets have a special meaning in strings, Inform's regular expression syntax uses angle brackets for character grouping.

Lua

<lang lua>comment_symbols = ";#"

s1 = "apples, pears # and bananas" s2 = "apples, pears ; and bananas"

print ( string.match( s1, "[^"..comment_symbols.."]+" ) ) print ( string.match( s2, "[^"..comment_symbols.."]+" ) )</lang>

PureBasic

<lang PureBasic>Procedure.s Strip_comments(Str$)

 Protected result$=Str$, l, l1, l2
 l1 =FindString(Str$,"#",1)
 l2 =FindString(Str$,";",1)
 ;
 ; See if any comment sign was found, prioritizing '#'
 If l1
   l=l1
 ElseIf l2
   l=l2
 EndIf
 l-1
 If l>0
   result$=Left(Str$,l)
 EndIf 
 ProcedureReturn result$

EndProcedure</lang>

Implementation

#instring1 ="apples, pears # and bananas"
#instring2 ="apples, pears ; and bananas"

PrintN(Strip_comments(#instring1))
PrintN(Strip_comments(#instring2))

apples, pears
apples, pears

<lang python>>>> marker, line = '#', 'apples, pears # and bananas' >>> line[:line.index(marker)] 'apples, pears ' >>> >>> marker, line = ';', 'apples, pears ; and bananas' >>> line[:line.index(marker)] 'apples, pears '</lang>

REXX

The first version takes advantage of the fact that there are only two delimiters: # (hash or pound sign), and : (a semicolon). <lang rexx>/*REXX program to strip a string, delinated by a hash or semicolon. */

               old1='apples, pears # and bananas'

say 'old='old1

               new1=stripper1(old1)

say 'new='new1 say

               old2='apples, pears ; and bananas'

say 'old='old2

               new2=stripper1(old2)

say 'new='new2 say exit

stripper1: procedure; parse arg x /*get the argument (string). */ x=translate(x,'#',";") /*translate semicolons to hash*/ parse var x x '#' /*parse string, ending in hash*/ return x /*return the shortened string.*/</lang> Output:

old=apples, pears # and bananas
new=apples, pears

old=apples, pears ; and bananas
new=apples, pears

The second version uses a list of delimiters (which may be of any length). <lang rexx>/*REXX program to strip a string, delinated by hash, semicolon, ... */

               old1='apples, pears # and bananas'

say 'old='old1

               new1=stripper2(old1)

say 'new='new1 say

               old2='apples, pears ; and bananas'

say 'old='old2

               new2=stripper2(old2)

say 'new='new2 say exit

stripper2: procedure; parse arg x /*get the argument (string). */ delims='#;' /*a list of delimiters to use.*/ delim1=left(delims,1) /*get the 1st delimiter char. */ delim =copies(delim1,length(delims)) /*make enough of 'em for trans*/ x=translate(x,delim,delims) /*trans delims-->1st delimiter*/ parse var x x (delim1) /*parse string, ending in hash*/ return x /*return the shortened string.*/</lang> Output:

old=apples, pears # and bananas
new=apples, pears

old=apples, pears ; and bananas
new=apples, pears

Tcl

<lang tcl>proc stripLineComments {inputString {commentChars ";#"}} {

   # Switch the RE engine into line-respecting mode instead of the default whole-string mode
   regsub -all -line "\[$commentChars\].*$" $inputString ""

}</lang> Demonstration: <lang tcl># Multi-line string constant set input "apples, pears # and bananas apples, pears ; and bananas"

Do the stripping

puts [stripLineComments $input]</lang> Output:

apples, pears 
apples, pears

The above code has one issue though; it's notion of a set of characters is very much that of the RE engine. That's possibly desirable, but to handle any sequence of characters as a set of separators requires a bit more cleverness. <lang tcl>proc stripLineComments {inputString {commentChars ";#"}} {

   # Convert the character set into a transformation
   foreach c [split $commentChars ""] {lappend map $c "\uFFFF"}; # *very* rare character!
   # Apply transformation and then use a simpler constant RE to strip
   regsub -all -line {\uFFFF.*$} [string map $map $inputString] ""

}</lang> Output in the example is the same as above.

UNIX Shell

Works with: Bourne Shell

Traditional Unix shell implementations do not directly support string manipulation. In order to remove the comments from the string, the cut utility is utilized. However, the cut command does not support multiple delimiters, so it is necessary to invoke the utility twice in order to strip both the semicolon and hash prefixed comments:

<lang sh># Stripping comments from a string foostring=`echo "$foostring" | cut -f 1 -d '#' | cut -f 1 -d ';'`

Stripping comments from an input file

cut -f 1 -d '#' foobar.txt | cut -f 1 -d ';'</lang>

Strip comments from a string: Difference between revisions

Revision as of 19:05, 20 November 2010

D

Fortran

Inform 7

J

Lua

PureBasic

Python

REXX

Tcl

UNIX Shell