String matching: Difference between revisions

← Older edit Newer edit →

Content deleted Content added

Inline

Revision as of 02:49, 8 November 2010

Given two strings, demonstrate the following 3 types of matchings:

Determining if the first string starts with second string
Determining if the first string contains the second string at any location
Determining if the first string ends with the second string

Optional requirements:

Print the location of the match for part 2
Handle multiple occurrences of a string for part 2.

Ada

<lang Ada> with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Text_IO; use Ada.Text_IO;

procedure Match_Strings is

  S1 : constant String := "abcd";
  S2 : constant String := "abab";
  S3 : constant String := "ab";

begin

  if S1'Length >= S3'Length and then S1 (S1'First..S1'First + S3'Length - 1) = S3 then
     Put_Line ( & S1 & "' starts with '" & S3 & );
  end if;
  if S2'Length >= S3'Length and then S2 (S2'Last - S3'Length + 1..S2'Last) = S3 then
     Put_Line ( & S2 & "' ends with '" & S3 & );
  end if;
  Put_Line ( & S3 & "' first appears in '" & S1 & "' at" & Integer'Image (Index (S1, S3)));
  Put_Line
  (   & S3 & "' appears in '" & S2 &  &
     Integer'Image (Ada.Strings.Fixed.Count (S2, S3)) & " times"
  );

end Match_Strings; </lang> Sample output:

'abcd' starts with 'ab'
'abab' ends with 'ab'
'ab' first appears in 'abcd' at 1
'ab' appears in 'abab' 2 times

ALGOL 68

Translation of: python

Works with: ALGOL 68 version Revision 1 - no extensions to language used

Works with: ALGOL 68G version Any - tested with release 1.18.0-9h.tiny

<lang algol68># define some appropriate OPerators # PRIO STARTSWITH = 5, ENDSWITH = 5; OP STARTSWITH = (STRING str, prefix)BOOL: # assuming LWB = 1 #

 IF UPB str < UPB prefix THEN FALSE ELSE str[:UPB prefix]=prefix FI;

OP ENDSWITH = (STRING str, suffix)BOOL: # assuming LWB = 1 #

 IF UPB str < UPB suffix THEN FALSE ELSE str[UPB str-UPB suffix+1:]=suffix FI;

INT loc, loc2;

print((

 "abcd" STARTSWITH "ab", # returns TRUE #
 "abcd" ENDSWITH "zn", # returns FALSE #
 string in string("bb",loc,"abab"), # returns FALSE #
 string in string("ab",loc,"abab"), # returns TRUE #
 (string in string("bb",loc,"abab")|loc|-1), # returns -1 #
 (string in string("ab",loc,"abab")|loc|-1), # returns +1 #
 (string in string("ab",loc2,"abab"[loc+1:])|loc+loc2|-1) # returns +3 #

))</lang> Output:

TFFT         -1         +1         +3

<lang csharp> class Program { public static void Main (string[] args) { var value = "abcd".StartsWith("ab"); value = "abcd".EndsWith("zn"); //returns false value = "abab".Contains("bb"); //returns false value = "abab".Contains("ab"); //returns true int loc = "abab".IndexOf("bb"); //returns -1 loc = "abab".IndexOf("ab"); //returns 0 loc = "abab".IndexOf("ab",loc+1); //returns 2 } } </lang>

C++

<lang cpp>#include <string> using namespace std;

string s1="abcd"; string s2="abab"; string s3="ab"; //Beginning s1.compare(0,s3.size(),s3)==0; //End s1.compare(s1.size()-s3.size(),s3.size(),s3)==0; //Anywhere s1.find(s2)//returns string::npos int loc=s2.find(s3)//returns 0 loc=s2.find(s3,loc+1)//returns 2</lang>

Forth

<lang forth>: starts-with ( a l a2 l2 -- ? )

 tuck 2>r min 2r> compare 0= ;

ends-with ( a l a2 l2 -- ? )

 tuck 2>r negate over + 0 max /string 2r> compare 0= ;

\ use SEARCH ( a l a2 l2 -- a3 l3 ? ) for contains</lang>

J

<lang j>startswith=: ] -: ({.~ #) contains=: +./@:E.~ endswith=: ] -: ({.~ -@#)</lang>

Example use:

<lang j> 'abcd' startswith 'ab' 1

  'abcd' startswith 'cd'

0

  'abcd' endswith 'ab'

0

  'abcd' endswith 'cd'

1

  'abcd' contains 'bb'

0

  'abcd' contains 'ab'

1

  'abcd' contains 'bc'

1

  'abab' contains 'ab'

1

  'abab' I.@E.~ 'ab'       NB. find starting indicies

0 2</lang>

Note that these verbs also apply to arrays of type other than character so: <lang j> 0 1 2 3 startswith 0 1 NB. integer 1

  4.2 5.1 1.3 9 3 contains 1.3 4.2     NB. floating point

0

  4.2 5.1 1.3 4.2 9 3 contains 1.3 4.2

1</lang>

Java

<lang java>"abcd".startsWith("ab") //returns true "abcd".endsWith("zn") //returns false "abab".contains("bb") //returns false "abab".contains("ab") //returns true int loc = "abab".indexOf("bb") //returns -1 loc = "abab".indexOf("ab") //returns 0 loc = "abab".indexOf("ab",loc+1) //returns 2</lang>

Logo

<lang logo>to starts.with? :sub :thing

 if empty? :sub [output "true]
 if empty? :thing [output "false]
 if not equal? first :sub first :thing [output "false]
 output starts.with? butfirst :sub butfirst :thing

end

to ends.with? :sub :thing

 if empty? :sub [output "true]
 if empty? :thing [output "false]
 if not equal? last :sub last :thing [output "false]
 output ends.with? butlast :sub butlast :thing

end

show starts.with? "dog "doghouse ; true show ends.with? "house "doghouse ; true show substring? "gho "doghouse ; true (built-in)</lang>

Lua

<lang lua>s1 = "string" s2 = "str" s3 = "ing" s4 = "xyz"

print( "s1 starts with s2: ", string.find( s1, s2 ) == 1 ) print( "s1 starts with s3: ", string.find( s1, s3 ) == 1, "\n" )

print( "s1 contains s3: ", string.find( s1, s3 ) ~= nil ) print( "s1 contains s3: ", string.find( s1, s4 ) ~= nil, "\n" )

print( "s1 ends with s2: ", select( 2, string.find( s1, s2 ) ) == string.len( s1 ) ) print( "s1 ends with s3: ", select( 2, string.find( s1, s3 ) ) == string.len( s1 ) )</lang>

Objective-C

<lang objc>[@"abcd" hasPrefix:@"ab"] //returns true [@"abcd" hasSuffix:@"zn"] //returns false int loc = [@"abab" rangeOfString:@"bb"].location //returns -1 loc = [@"abab" rangeOfString:@"ab"].location //returns 0 loc = [@"abab" rangeOfString:@"ab" options:0 range:NSMakeRange(loc+1, [@"abab" length]-(loc+1))].location //returns 2</lang>

OCaml

<lang ocaml>let match1 s1 s2 =

 let len1 = String.length s1
 and len2 = String.length s2 in
 if len1 < len2 then false else
   let sub = String.sub s1 0 len2 in
   (sub = s2)</lang>

testing in the top-level:

# match1 "Hello" "Hello World!" ;;
- : bool = false
# match1 "Hello World!" "Hello" ;;
- : bool = true

<lang ocaml>let match2 s1 s2 =

 let len1 = String.length s1
 and len2 = String.length s2 in
 if len1 < len2 then false else
   let rec aux i =
     if i < 0 then false else
       let sub = String.sub s1 i len2 in
       if (sub = s2) then true else aux (pred i)
   in
   aux (len1 - len2)</lang>

# match2 "It's raining, Hello World!" "umbrella" ;;
- : bool = false
# match2 "It's raining, Hello World!" "Hello" ;;
- : bool = true

<lang ocaml>let match3 s1 s2 =

 let len1 = String.length s1
 and len2 = String.length s2 in
 if len1 < len2 then false else
   let sub = String.sub s1 (len1 - len2) len2 in
   (sub = s2)</lang>

# match3 "Hello World" "Hello" ;;
- : bool = false
# match3 "Hello World" "World" ;;
- : bool = true

<lang ocaml>let match2_loc s1 s2 =

 let len1 = String.length s1
 and len2 = String.length s2 in
 if len1 < len2 then (false, -1) else
   let rec aux i =
     if i < 0 then (false, -1) else
       let sub = String.sub s1 i len2 in
       if (sub = s2) then (true, i) else aux (pred i)
   in
   aux (len1 - len2)</lang>

# match2_loc "The sun's shining, Hello World!" "raining" ;;
- : bool * int = (false, -1)
# match2_loc "The sun's shining, Hello World!" "shining" ;;
- : bool * int = (true, 10)

<lang ocaml>let match2_num s1 s2 =

 let len1 = String.length s1
 and len2 = String.length s2 in
 if len1 < len2 then (false, 0) else
   let rec aux i n =
     if i < 0 then (n <> 0, n) else
       let sub = String.sub s1 i len2 in
       if (sub = s2)
       then aux (pred i) (succ n)
       else aux (pred i) (n)
   in
   aux (len1 - len2) 0</lang>

# match2_num "This cloud looks like a camel, \
    that other cloud looks like a llama" "stone" ;;
- : bool * int = (false, 0)
# match2_num "This cloud looks like a camel, \
    that other cloud looks like a llama" "cloud" ;;
- : bool * int = (true, 2)

Perl

<lang perl># the first four examples use regular expressions, so make sure to escape any special regex characters in the substring "abcd" =~ /^ab/ #returns true "abcd" =~ /zn$/ #returns false "abab" =~ /bb/ #returns false "abab" =~ /ab/ #returns true my $loc = index("abab", "bb") #returns -1 $loc = index("abab", "ab") #returns 0 $loc = index("abab", "ab", $loc+1) #returns 2</lang>

PicoLisp

<lang PicoLisp>: (pre? "ab" "abcd") -> "abcd"

(pre? "xy" "abcd")

-> NIL

(sub? "bc" "abcd")

-> "abcd"

(sub? "xy" "abcd")

-> NIL

(tail (chop "cd") (chop "abcd"))

-> ("c" "d")

(tail (chop "xy") (chop "abcd"))

-> NIL

(de positions (Pat Str)

  (setq Pat (chop Pat))
  (make
     (for ((I . L) (chop Str) L (cdr L))
        (and (head Pat L) (link I)) ) ) )

(positions "bc" "abcdabcd")

-> (2 6)</lang>

PureBasic

<lang PureBasic>Procedure StartsWith(String1$, String2$)

 Protected Result
 If FindString(String1$, String2$, 1) =1 ; E.g Found in possition 1
   Result =CountString(String1$, String2$)
 EndIf
 ProcedureReturn Result

EndProcedure

Procedure EndsWith(String1$, String2$)

 Protected Result, dl=Len(String1$)-Len(String2$)
 If dl>=0 And Right(String1$, Len(String2$))=String2$
   Result =CountString(String1$, String2$)
 EndIf
 ProcedureReturn Result

EndProcedure</lang> And a verification <lang PureBasic>Debug StartsWith("Rosettacode", "Rosetta") ; = 1 Debug StartsWith("Rosettacode", "code") ; = 0 Debug StartsWith("eleutherodactylus cruralis", "e") ; = 3 Debug EndsWith ("Rosettacode", "Rosetta") ; = 0 Debug EndsWith ("Rosettacode", "code") ; = 1 Debug EndsWith ("Rosettacode", "e") ; = 2</lang>

Python

<lang python>"abcd".startswith("ab") #returns true "abcd".endswith("zn") #returns false "bb" in "abab" #returns false "ab" in "abab" #returns true loc = "abab".find("bb") #returns -1 loc = "abab".find("ab") #returns 0 loc = "abab".find("ab",loc+1) #returns 2</lang>

REXX

Extra coding was added to take of using plurals in the messages. <lang rexx> /*REXX program to show some basic character string testing. */

parse arg a b

say 'string A=' a say 'string B=' b say

if left(A,length(b))==b then say 'string A starts with string B'

                       else say "string A doesn't start with string B"

say

p=pos(b,a) if p\==0 then say 'string A contains string B (starting in position' p")"

        else say "string A doesn't contains string B"

say

if right(A,length(b))==b then say 'string A ends with string B'

                        else say "string A doesn't end with string B"

say

Ps= p=0

 do forever until p==0
 p=pos(b,a,p+1)
 if p\==0 then Ps=Ps',' p
 end

Ps=space(strip(Ps,'L',",")) times=words(Ps) if times\==0 then say 'string A contains string B',

                     times 'time'left('s',times>1),
                     "(at position"left('s',times>1) Ps')'
            else say "string A doesn't contains string B"

</lang> Output when the following is specified (five Marx brothers):

Chico_Harpo_Groucho_Zeppo_Gummo p

string A= Chico_Harpo_Groucho_Zeppo_Gummo
string B= p

string A doesn't start with string B

string A contains string B (starting in position 10)

string A doesn't end with string B

string A contains string B 3 times (at positions 10, 23, 24)

Output when the following is specified:

Chico_Harpo_Groucho_Zeppo_Gummo Z

string A= Chico_Harpo_Groucho_Zeppo_Gummo
string B= Z

string A doesn't start with string B

string A contains string B (starting in position 21)

string A doesn't end with string B

string A contains string B 1 time (at position 21)

Output when the following is specified:

Chico_Harpo_Groucho_Zeppo_Gummo Chi

string A= Chico_Harpo_Groucho_Zeppo_Gummo
string B= Chi

string A starts with string B

string A contains string B (starting in position 1)

string A doesn't end with string B

string A contains string B 1 time (at position 1)

Output when the following is specified:

Chico_Harpo_Groucho_Zeppo_Gummo mmo

string A= Chico_Harpo_Groucho_Zeppo_Gummo
string B= mmo

string A doesn't start with string B

string A contains string B (starting in position 29)

string A ends with string B

string A contains string B 1 time (at position 29)

Ruby

Works with: Ruby version 1.8.7

<lang ruby>'abcd'.start_with?('ab') #returns true 'abcd'.end_with?('zn') #returns false 'abab'.include?('bb') #returns false 'abab'.include?('ab') #returns true 'abab'.index('bb') #returns -1 'abab'.index('ab') #returns 0 'abab'.index('ab', 1) #returns 2</lang>

Retro

<lang Retro>with strings'

startsWith? ( $1 $2 - f )

 dup getLength [ swap ] dip 0 swap getSubset compare ;

"abcdefghijkl" "abcde" startsWith? "abcdefghijkl" "bcd" startsWith?

"abcdefghijkl" "bcd" search "abcdefghijkl" "zmq" search

endsWith? ( $1 $2 - f )

 swap dup getLength + over getLength - compare ;

"abcdefghijkl" "ijkl" endsWith? "abcdefghijkl" "abc" endsWith?</lang>

Scala

<lang scala>"abcd".startsWith("ab") //returns true "abcd".endsWith("zn") //returns false "abab".contains("bb") //returns false "abab".contains("ab") //returns true

var loc="abab".indexOf("bb") //returns -1 loc = "abab".indexOf("ab") //returns 0 loc = "abab".indexOf("ab", loc+1) //returns 2</lang>

Tcl

In this code, we are looking in various ways for the string in the variable needle in the string in the variable haystack. <lang tcl>set isPrefix [string equal -length [string length $needle] $haystack $needle] set isContained [expr {[string first $needle $haystack] >= 0}] set isSuffix [string equal $needle [string range $haystack end-[expr {[string length $needle]-1}] end]]</lang>

Of course, in the cases where the needle is a glob-safe string (i.e., doesn't have any of the characters “*?[\” in), this can be written far more conveniently: <lang tcl>set isPrefix [string match $needle* $haystack] set isContained [string match *$needle* $haystack] set isSuffix [string match *$needle $haystack]</lang>

Another powerful technique is to use the regular expression engine in literal string mode: <lang tcl>set isContained [regexp ***=$needle $haystack]</lang> This can be extended by getting the regexp to return the locations of the matches, enabling the other forms of match to be done: <lang tcl>set matchLocations [regexp -indices -all -inline ***=$needle $haystack]

Each match location is a pair, being the index into the string where the needle started
to match and the index where the needle finished matching

set isContained [expr {[llength $matchLocations] > 0}] set isPrefix [expr {[lindex $matchLocations 0 0] == 0}] set isSuffix [expr {[lindex $matchLocations end 1] == [string length $haystack]-1}] set firstMatchStart [lindex $matchLocations 0 0] puts "Found \"$needle\" in \"$haystack\" at $firstMatchStart" foreach location $matchLocations {

   puts "needle matched at index [lindex $location 0]"

}</lang>