String matching
Given two strings, demonstrate the following 3 types of matchings:
- Determining if the first string starts with second string
- Determining if the first string contains the second string at any location
- Determining if the first string ends with the second string
You are encouraged to solve this task according to the task description, using any language you may know.
Basic Data Operation
This is a basic data operation. It represents a fundamental action on a basic data type.
You may see other such operations in the Basic Data Operations category, or:
Integer Operations
Arithmetic |
Comparison
Boolean Operations
Bitwise |
Logical
String Operations
Concatenation |
Interpolation |
Comparison |
Matching
Memory Operations
Pointers & references |
Addresses
Optional requirements:
- Print the location of the match for part 2
- Handle multiple occurrences of a string for part 2.
Ada
<lang Ada> with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Text_IO; use Ada.Text_IO;
procedure Match_Strings is
S1 : constant String := "abcd"; S2 : constant String := "abab"; S3 : constant String := "ab";
begin
if S1'Length >= S3'Length and then S1 (S1'First..S1'First + S3'Length - 1) = S3 then Put_Line ( & S1 & "' starts with '" & S3 & ); end if; if S2'Length >= S3'Length and then S2 (S2'Last - S3'Length + 1..S2'Last) = S3 then Put_Line ( & S2 & "' ends with '" & S3 & ); end if; Put_Line ( & S3 & "' first appears in '" & S1 & "' at" & Integer'Image (Index (S1, S3))); Put_Line ( & S3 & "' appears in '" & S2 & & Integer'Image (Ada.Strings.Fixed.Count (S2, S3)) & " times" );
end Match_Strings; </lang> Sample output:
'abcd' starts with 'ab' 'abab' ends with 'ab' 'ab' first appears in 'abcd' at 1 'ab' appears in 'abab' 2 times
ALGOL 68
<lang algol68># define some appropriate OPerators # PRIO STARTSWITH = 5, ENDSWITH = 5; OP STARTSWITH = (STRING str, prefix)BOOL: # assuming LWB = 1 #
IF UPB str < UPB prefix THEN FALSE ELSE str[:UPB prefix]=prefix FI;
OP ENDSWITH = (STRING str, suffix)BOOL: # assuming LWB = 1 #
IF UPB str < UPB suffix THEN FALSE ELSE str[UPB str-UPB suffix+1:]=suffix FI;
INT loc, loc2;
print((
"abcd" STARTSWITH "ab", # returns TRUE # "abcd" ENDSWITH "zn", # returns FALSE # string in string("bb",loc,"abab"), # returns FALSE # string in string("ab",loc,"abab"), # returns TRUE # (string in string("bb",loc,"abab")|loc|-1), # returns -1 # (string in string("ab",loc,"abab")|loc|-1), # returns +1 # (string in string("ab",loc2,"abab"[loc+1:])|loc+loc2|-1) # returns +3 #
))</lang> Output:
TFFT -1 +1 +3
AutoHotkey
<lang AutoHotkey> String1 = abcd String2 = abab
If (SubStr(String1,1,StrLen(String2)) = String2)
MsgBox, "%String1%" starts with "%String2%".
IfInString, String1, %String2% {
Position := InStr(String1,String2) StringReplace, String1, String1, %String2%, %String2%, UseErrorLevel MsgBox, "%String1%" contains "%String2%" at position %Position%`, and appears %ErrorLevel% times.
} StringRight, TempVar, String1, StrLen(String2) If TempVar = %String2%
MsgBox, "%String1%" ends with "%String2%".
</lang>
C
Case sensitive matching: <lang C>
- include<string.h>
- include<stdio.h>
- define true 1
- define false 0
int startsWith(char* container,char* target) { int i,targetLength = strlen(target),sourceLength = strlen(container);
if(targetLength<=sourceLength) { for(i=0;i<targetLength;i++) { if((target[i]!=container[i])) return false; } return true; }
return false; }
int endsWith(char* container,char* target) { int i,targetLength = strlen(target),sourceLength = strlen(container);
if(targetLength<=sourceLength) { for(i=sourceLength-targetLength;i<sourceLength;i++) { if(container[i]!=target[i-(sourceLength-targetLength)]) return false; } return true; }
return false; }
int doesContain(char* container,char* target) { int i,j,targetLength = strlen(target),sourceLength = strlen(container);
if(targetLength<=sourceLength) { for(i=0;i<sourceLength;i++) { if((container[i]==target[0])&&((i+targetLength)<=sourceLength)) { for(j=0;j<targetLength;j++) { if(container[i+j]!=target[j]) break; }
if(j==targetLength) return true; } } }
return false; }
int main() { printf("Starts with Test ( Hello,Hell ) : %d",startsWith("Hello","Hell")); printf("\nEnds with Test ( Code,ode ) : %d",endsWith("Code","ode")); printf("\nContains Test ( Google,msn ) : %d",doesContain("Google","msn"));
return 0; } </lang> Output : <lang> Starts with Test ( Hello,Hell ) : 1 Ends with Test ( Code,ode ) : 1 Contains Test ( Google,msn ) : 0 </lang>
C#
<lang csharp> class Program { public static void Main (string[] args) { var value = "abcd".StartsWith("ab"); value = "abcd".EndsWith("zn"); //returns false value = "abab".Contains("bb"); //returns false value = "abab".Contains("ab"); //returns true int loc = "abab".IndexOf("bb"); //returns -1 loc = "abab".IndexOf("ab"); //returns 0 loc = "abab".IndexOf("ab",loc+1); //returns 2 } } </lang>
C++
<lang cpp>#include <string> using namespace std;
string s1="abcd"; string s2="abab"; string s3="ab"; //Beginning s1.compare(0,s3.size(),s3)==0; //End s1.compare(s1.size()-s3.size(),s3.size(),s3)==0; //Anywhere s1.find(s2)//returns string::npos int loc=s2.find(s3)//returns 0 loc=s2.find(s3,loc+1)//returns 2</lang>
Clojure
<lang clojure>(def evals '((. "abcd" startsWith "ab") (. "abcd" endsWith "zn") (. "abab" contains "bb") (. "abab" contains "ab") (. "abab" indexOf "bb") (let [loc (. "abab" indexOf "ab")] (. "abab" indexOf "ab" (dec loc)))))
user> (for [i evals] [i (eval i)]) ([(. "abcd" startsWith "ab") true] [(. "abcd" endsWith "zn") false] [(. "abab" contains "bb") false] [(. "abab" contains "ab") true] [(. "abab" indexOf "bb") -1] [(let [loc (. "abab" indexOf "ab")] (. "abab" indexOf "ab" (dec loc))) 0])</lang>
Forth
<lang forth>: starts-with ( a l a2 l2 -- ? )
tuck 2>r min 2r> compare 0= ;
- ends-with ( a l a2 l2 -- ? )
tuck 2>r negate over + 0 max /string 2r> compare 0= ;
\ use SEARCH ( a l a2 l2 -- a3 l3 ? ) for contains</lang>
Haskell
<lang haskell>> import Data.List > "abc" `isPrefixOf` "abcdefg" True > "efg" `isSuffixOf` "abcdefg" True > "bcd" `isInfixOf` "abcdefg" True > "abc" `isInfixOf` "abcdefg" -- Prefixes and suffixes are also infixes True > let infixes a b = map fst $ filter (isPrefixOf a . snd) $ zip [0..] $ tails b > infixes "ab" "abcdefabqqab" [0,6,10]</lang>
J
<lang j>startswith=: ] -: ({.~ #) contains=: +./@:E.~ endswith=: ] -: ({.~ -@#)</lang>
Example use:
<lang j> 'abcd' startswith 'ab' 1
'abcd' startswith 'cd'
0
'abcd' endswith 'ab'
0
'abcd' endswith 'cd'
1
'abcd' contains 'bb'
0
'abcd' contains 'ab'
1
'abcd' contains 'bc'
1
'abab' contains 'ab'
1
'abab' I.@E.~ 'ab' NB. find starting indicies
0 2</lang>
Note that these verbs also apply to arrays of type other than character so: <lang j> 0 1 2 3 startswith 0 1 NB. integer 1
4.2 5.1 1.3 9 3 contains 1.3 4.2 NB. floating point
0
4.2 5.1 1.3 4.2 9 3 contains 1.3 4.2
1</lang>
Java
<lang java>"abcd".startsWith("ab") //returns true "abcd".endsWith("zn") //returns false "abab".contains("bb") //returns false "abab".contains("ab") //returns true int loc = "abab".indexOf("bb") //returns -1 loc = "abab".indexOf("ab") //returns 0 loc = "abab".indexOf("ab",loc+1) //returns 2</lang>
Logo
<lang logo>to starts.with? :sub :thing
if empty? :sub [output "true] if empty? :thing [output "false] if not equal? first :sub first :thing [output "false] output starts.with? butfirst :sub butfirst :thing
end
to ends.with? :sub :thing
if empty? :sub [output "true] if empty? :thing [output "false] if not equal? last :sub last :thing [output "false] output ends.with? butlast :sub butlast :thing
end
show starts.with? "dog "doghouse ; true show ends.with? "house "doghouse ; true show substring? "gho "doghouse ; true (built-in)</lang>
Lua
<lang lua>s1 = "string" s2 = "str" s3 = "ing" s4 = "xyz"
print( "s1 starts with s2: ", string.find( s1, s2 ) == 1 ) print( "s1 starts with s3: ", string.find( s1, s3 ) == 1, "\n" )
print( "s1 contains s3: ", string.find( s1, s3 ) ~= nil ) print( "s1 contains s3: ", string.find( s1, s4 ) ~= nil, "\n" )
print( "s1 ends with s2: ", select( 2, string.find( s1, s2 ) ) == string.len( s1 ) ) print( "s1 ends with s3: ", select( 2, string.find( s1, s3 ) ) == string.len( s1 ) )</lang>
Objective-C
<lang objc>[@"abcd" hasPrefix:@"ab"] //returns true [@"abcd" hasSuffix:@"zn"] //returns false int loc = [@"abab" rangeOfString:@"bb"].location //returns -1 loc = [@"abab" rangeOfString:@"ab"].location //returns 0 loc = [@"abab" rangeOfString:@"ab" options:0 range:NSMakeRange(loc+1, [@"abab" length]-(loc+1))].location //returns 2</lang>
OCaml
<lang ocaml>let match1 s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then false else let sub = String.sub s1 0 len2 in (sub = s2)</lang>
testing in the top-level:
# match1 "Hello" "Hello World!" ;; - : bool = false # match1 "Hello World!" "Hello" ;; - : bool = true
<lang ocaml>let match2 s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then false else let rec aux i = if i < 0 then false else let sub = String.sub s1 i len2 in if (sub = s2) then true else aux (pred i) in aux (len1 - len2)</lang>
# match2 "It's raining, Hello World!" "umbrella" ;; - : bool = false # match2 "It's raining, Hello World!" "Hello" ;; - : bool = true
<lang ocaml>let match3 s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then false else let sub = String.sub s1 (len1 - len2) len2 in (sub = s2)</lang>
# match3 "Hello World" "Hello" ;; - : bool = false # match3 "Hello World" "World" ;; - : bool = true
<lang ocaml>let match2_loc s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then (false, -1) else let rec aux i = if i < 0 then (false, -1) else let sub = String.sub s1 i len2 in if (sub = s2) then (true, i) else aux (pred i) in aux (len1 - len2)</lang>
# match2_loc "The sun's shining, Hello World!" "raining" ;; - : bool * int = (false, -1) # match2_loc "The sun's shining, Hello World!" "shining" ;; - : bool * int = (true, 10)
<lang ocaml>let match2_num s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then (false, 0) else let rec aux i n = if i < 0 then (n <> 0, n) else let sub = String.sub s1 i len2 in if (sub = s2) then aux (pred i) (succ n) else aux (pred i) (n) in aux (len1 - len2) 0</lang>
# match2_num "This cloud looks like a camel, \ that other cloud looks like a llama" "stone" ;; - : bool * int = (false, 0) # match2_num "This cloud looks like a camel, \ that other cloud looks like a llama" "cloud" ;; - : bool * int = (true, 2)
Perl
<lang perl># the first four examples use regular expressions, so make sure to escape any special regex characters in the substring "abcd" =~ /^ab/ #returns true "abcd" =~ /zn$/ #returns false "abab" =~ /bb/ #returns false "abab" =~ /ab/ #returns true my $loc = index("abab", "bb") #returns -1 $loc = index("abab", "ab") #returns 0 $loc = index("abab", "ab", $loc+1) #returns 2</lang>
PicoLisp
<lang PicoLisp>: (pre? "ab" "abcd") -> "abcd"
- (pre? "xy" "abcd")
-> NIL
- (sub? "bc" "abcd")
-> "abcd"
- (sub? "xy" "abcd")
-> NIL
- (tail (chop "cd") (chop "abcd"))
-> ("c" "d")
- (tail (chop "xy") (chop "abcd"))
-> NIL
(de positions (Pat Str)
(setq Pat (chop Pat)) (make (for ((I . L) (chop Str) L (cdr L)) (and (head Pat L) (link I)) ) ) )
- (positions "bc" "abcdabcd")
-> (2 6)</lang>
PL/I
<lang PL/I> /* Let s be one string, t be the other that might exist within s. */
/* 8-1-2011 */
k = index(s, t); if k = 0 then put skip edit (t, ' is nowhere in sight') (a); else if k = 1 then
put skip edit (t, ' starts at the beginning of ', s) (a);
else if k+length(t)-1 = length(s) then
put skip edit (t, ' is at the end of ', s) (a);
else put skip edit (t, ' is within ', s) (a);
if k > 0 then put skip edit (t, ' starts at position ', k) (a); </lang>
Optional extra:
<lang PL/I> /* Handle multiple occurrences. */
n = 1; do forever; k = index(s, t, n); if k = 0 then do; if n = 1 then put skip list (t, ' is nowhere in sight'); stop; end; else if k = 1 then put skip edit ('<', t, '> starts at the beginning of ', s) (a); else if k+length(t)-1 = length(s) then put skip edit ('<', t, '> is at the end of ', s) (a); else put skip edit ('<', t, '> is within ', s) (a); n = k + length(t);
if k > 0 then put skip edit ('<', t, '> starts at position ', trim(k)) (a); else stop; end;
</lang>
PureBasic
<lang PureBasic>Procedure StartsWith(String1$, String2$)
Protected Result If FindString(String1$, String2$, 1) =1 ; E.g Found in possition 1 Result =CountString(String1$, String2$) EndIf ProcedureReturn Result
EndProcedure
Procedure EndsWith(String1$, String2$)
Protected Result, dl=Len(String1$)-Len(String2$) If dl>=0 And Right(String1$, Len(String2$))=String2$ Result =CountString(String1$, String2$) EndIf ProcedureReturn Result
EndProcedure</lang> And a verification <lang PureBasic>Debug StartsWith("Rosettacode", "Rosetta") ; = 1 Debug StartsWith("Rosettacode", "code") ; = 0 Debug StartsWith("eleutherodactylus cruralis", "e") ; = 3 Debug EndsWith ("Rosettacode", "Rosetta") ; = 0 Debug EndsWith ("Rosettacode", "code") ; = 1 Debug EndsWith ("Rosettacode", "e") ; = 2</lang>
Python
<lang python>"abcd".startswith("ab") #returns true "abcd".endswith("zn") #returns false "bb" in "abab" #returns false "ab" in "abab" #returns true loc = "abab".find("bb") #returns -1 loc = "abab".find("ab") #returns 0 loc = "abab".find("ab",loc+1) #returns 2</lang>
REXX
Extra coding was added to take of using plurals in the messages. <lang rexx> /*REXX program to show some basic character string testing. */
parse arg a b
say 'string A=' a say 'string B=' b say
if left(A,length(b))==b then say 'string A starts with string B'
else say "string A doesn't start with string B"
say
p=pos(b,a) if p\==0 then say 'string A contains string B (starting in position' p")"
else say "string A doesn't contains string B"
say
if right(A,length(b))==b then say 'string A ends with string B'
else say "string A doesn't end with string B"
say
Ps= p=0
do forever until p==0 p=pos(b,a,p+1) if p\==0 then Ps=Ps',' p end
Ps=space(strip(Ps,'L',",")) times=words(Ps) if times\==0 then say 'string A contains string B',
times 'time'left('s',times>1), "(at position"left('s',times>1) Ps')' else say "string A doesn't contains string B"
</lang>
Output when the following is specified (five Marx brothers):
Chico_Harpo_Groucho_Zeppo_Gummo p
string A= Chico_Harpo_Groucho_Zeppo_Gummo string B= p string A doesn't start with string B string A contains string B (starting in position 10) string A doesn't end with string B string A contains string B 3 times (at positions 10, 23, 24)
Output when the following is specified:
Chico_Harpo_Groucho_Zeppo_Gummo Z
string A= Chico_Harpo_Groucho_Zeppo_Gummo string B= Z string A doesn't start with string B string A contains string B (starting in position 21) string A doesn't end with string B string A contains string B 1 time (at position 21)
Output when the following is specified:
Chico_Harpo_Groucho_Zeppo_Gummo Chi
string A= Chico_Harpo_Groucho_Zeppo_Gummo string B= Chi string A starts with string B string A contains string B (starting in position 1) string A doesn't end with string B string A contains string B 1 time (at position 1)
Output when the following is specified:
Chico_Harpo_Groucho_Zeppo_Gummo mmo
string A= Chico_Harpo_Groucho_Zeppo_Gummo string B= mmo string A doesn't start with string B string A contains string B (starting in position 29) string A ends with string B string A contains string B 1 time (at position 29)
Ruby
<lang ruby>'abcd'.start_with?('ab') #returns true 'abcd'.end_with?('zn') #returns false 'abab'.include?('bb') #returns false 'abab'.include?('ab') #returns true 'abab'.index('bb') #returns -1 'abab'.index('ab') #returns 0 'abab'.index('ab', 1) #returns 2</lang>
Retro
<lang Retro>with strings'
- startsWith? ( $1 $2 - f )
dup getLength [ swap ] dip 0 swap getSubset compare ;
"abcdefghijkl" "abcde" startsWith? "abcdefghijkl" "bcd" startsWith?
"abcdefghijkl" "bcd" search "abcdefghijkl" "zmq" search
- endsWith? ( $1 $2 - f )
swap dup getLength + over getLength - compare ;
"abcdefghijkl" "ijkl" endsWith? "abcdefghijkl" "abc" endsWith?</lang>
Scala
<lang scala>"abcd".startsWith("ab") //returns true "abcd".endsWith("zn") //returns false "abab".contains("bb") //returns false "abab".contains("ab") //returns true
var loc="abab".indexOf("bb") //returns -1 loc = "abab".indexOf("ab") //returns 0 loc = "abab".indexOf("ab", loc+1) //returns 2</lang>
Tcl
In this code, we are looking in various ways for the string in the variable needle in the string in the variable haystack. <lang tcl>set isPrefix [string equal -length [string length $needle] $haystack $needle] set isContained [expr {[string first $needle $haystack] >= 0}] set isSuffix [string equal $needle [string range $haystack end-[expr {[string length $needle]-1}] end]]</lang>
Of course, in the cases where the needle is a glob-safe string (i.e., doesn't have any of the characters “*?[\” in), this can be written far more conveniently: <lang tcl>set isPrefix [string match $needle* $haystack] set isContained [string match *$needle* $haystack] set isSuffix [string match *$needle $haystack]</lang>
Another powerful technique is to use the regular expression engine in literal string mode:
<lang tcl>set isContained [regexp ***=$needle $haystack]</lang>
This can be extended by getting the regexp
to return the locations of the matches, enabling the other forms of match to be done:
<lang tcl>set matchLocations [regexp -indices -all -inline ***=$needle $haystack]
- Each match location is a pair, being the index into the string where the needle started
- to match and the index where the needle finished matching
set isContained [expr {[llength $matchLocations] > 0}] set isPrefix [expr {[lindex $matchLocations 0 0] == 0}] set isSuffix [expr {[lindex $matchLocations end 1] == [string length $haystack]-1}] set firstMatchStart [lindex $matchLocations 0 0] puts "Found \"$needle\" in \"$haystack\" at $firstMatchStart" foreach location $matchLocations {
puts "needle matched at index [lindex $location 0]"
}</lang>
TUSCRIPT
<lang tuscript> $$ MODE TUSCRIPT ASK "string1", string1="" ASK "string2", string2=""
IF (string1.sw.string2) THEN PRINT string1," starts with ",string2 ELSE PRINT string1," not starts with ",string2 ENDIF SET beg=STRING (string1,string2,0,0,0,end) IF (beg!=0) THEN PRINT string1," contains ",string2 PRINT " starting in position ",beg PRINT " ending in position ",end ELSE PRINT string1," not contains ",string2 ENDIF
IF (string1.ew.string2) THEN PRINT string1," ends with ",string2 ELSE PRINT string1," not ends with ",string2 ENDIF </lang> Output:
string1 >Rosetta Code string2 >Code Rosetta Code not starts with Code Rosetta Code contains Code starting in position 9 ending in position 13 Rosetta Code ends with Code