String matching: Difference between revisions
Line 384: | Line 384: | ||
Output when the following is specified (five Marx brothers): |
Output when the following is specified (five Marx brothers): |
||
<br><br> |
<br><br> |
||
Chico_Harpo_Groucho_Zeppo_Gummo p |
|||
++++++++ |
|||
<pre style="height:30ex;overflow:scroll"> |
<pre style="height:30ex;overflow:scroll"> |
||
string A= Chico_Harpo_Groucho_Zeppo_Gummo |
|||
string B= p |
|||
string A doesn't start with string B |
|||
@@@@@@@@@@@@@@@@@@ |
|||
string A contains string B (starting in position 10) |
|||
string A doesn't end with string B |
|||
string A contains string B 3 times (at positions 10, 23, 24) |
|||
</pre> |
</pre> |
||
Output when the following is specified: |
Output when the following is specified: |
||
<br><br> |
<br><br> |
||
Chico_Harpo_Groucho_Zeppo_Gummo Z |
|||
++++++++ |
|||
<pre style="height:30ex;overflow:scroll"> |
<pre style="height:30ex;overflow:scroll"> |
||
string A= Chico_Harpo_Groucho_Zeppo_Gummo |
|||
string B= Z |
|||
string A doesn't start with string B |
|||
@@@@@@@@@@@@@@@@@@ |
|||
string A contains string B (starting in position 21) |
|||
string A doesn't end with string B |
|||
string A contains string B 1 time (at position 21) |
|||
</pre> |
</pre> |
||
Output when the following is specified: |
Output when the following is specified: |
||
<br><br> |
<br><br> |
||
Chico_Harpo_Groucho_Zeppo_Gummo Chi |
|||
++++++++ |
|||
<pre style="height:30ex;overflow:scroll"> |
<pre style="height:30ex;overflow:scroll"> |
||
string A= Chico_Harpo_Groucho_Zeppo_Gummo |
|||
string B= Chi |
|||
string A starts with string B |
|||
@@@@@@@@@@@@@@@@@@ |
|||
string A contains string B (starting in position 1) |
|||
string A doesn't end with string B |
|||
string A contains string B 1 time (at position 1) |
|||
</pre> |
</pre> |
||
Output when the following is specified: |
Output when the following is specified: |
||
<br><br> |
<br><br> |
||
Chico_Harpo_Groucho_Zeppo_Gummo mmo |
|||
++++++++ |
|||
<pre style="height:30ex;overflow:scroll"> |
<pre style="height:30ex;overflow:scroll"> |
||
string A= Chico_Harpo_Groucho_Zeppo_Gummo |
|||
string B= mmo |
|||
string A doesn't start with string B |
|||
string A contains string B (starting in position 29) |
|||
string A ends with string B |
|||
@@@@@@@@@@@@@@@@@@ |
|||
string A contains string B 1 time (at position 29) |
|||
</pre> |
</pre> |
||
Revision as of 02:49, 8 November 2010
You are encouraged to solve this task according to the task description, using any language you may know.
Given two strings, demonstrate the following 3 types of matchings:
- Determining if the first string starts with second string
- Determining if the first string contains the second string at any location
- Determining if the first string ends with the second string
Optional requirements:
- Print the location of the match for part 2
- Handle multiple occurrences of a string for part 2.
Ada
<lang Ada> with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Text_IO; use Ada.Text_IO;
procedure Match_Strings is
S1 : constant String := "abcd"; S2 : constant String := "abab"; S3 : constant String := "ab";
begin
if S1'Length >= S3'Length and then S1 (S1'First..S1'First + S3'Length - 1) = S3 then Put_Line ( & S1 & "' starts with '" & S3 & ); end if; if S2'Length >= S3'Length and then S2 (S2'Last - S3'Length + 1..S2'Last) = S3 then Put_Line ( & S2 & "' ends with '" & S3 & ); end if; Put_Line ( & S3 & "' first appears in '" & S1 & "' at" & Integer'Image (Index (S1, S3))); Put_Line ( & S3 & "' appears in '" & S2 & & Integer'Image (Ada.Strings.Fixed.Count (S2, S3)) & " times" );
end Match_Strings; </lang> Sample output:
'abcd' starts with 'ab' 'abab' ends with 'ab' 'ab' first appears in 'abcd' at 1 'ab' appears in 'abab' 2 times
ALGOL 68
<lang algol68># define some appropriate OPerators # PRIO STARTSWITH = 5, ENDSWITH = 5; OP STARTSWITH = (STRING str, prefix)BOOL: # assuming LWB = 1 #
IF UPB str < UPB prefix THEN FALSE ELSE str[:UPB prefix]=prefix FI;
OP ENDSWITH = (STRING str, suffix)BOOL: # assuming LWB = 1 #
IF UPB str < UPB suffix THEN FALSE ELSE str[UPB str-UPB suffix+1:]=suffix FI;
INT loc, loc2;
print((
"abcd" STARTSWITH "ab", # returns TRUE # "abcd" ENDSWITH "zn", # returns FALSE # string in string("bb",loc,"abab"), # returns FALSE # string in string("ab",loc,"abab"), # returns TRUE # (string in string("bb",loc,"abab")|loc|-1), # returns -1 # (string in string("ab",loc,"abab")|loc|-1), # returns +1 # (string in string("ab",loc2,"abab"[loc+1:])|loc+loc2|-1) # returns +3 #
))</lang> Output:
TFFT -1 +1 +3
C#
<lang csharp> class Program { public static void Main (string[] args) { var value = "abcd".StartsWith("ab"); value = "abcd".EndsWith("zn"); //returns false value = "abab".Contains("bb"); //returns false value = "abab".Contains("ab"); //returns true int loc = "abab".IndexOf("bb"); //returns -1 loc = "abab".IndexOf("ab"); //returns 0 loc = "abab".IndexOf("ab",loc+1); //returns 2 } } </lang>
C++
<lang cpp>#include <string> using namespace std;
string s1="abcd"; string s2="abab"; string s3="ab"; //Beginning s1.compare(0,s3.size(),s3)==0; //End s1.compare(s1.size()-s3.size(),s3.size(),s3)==0; //Anywhere s1.find(s2)//returns string::npos int loc=s2.find(s3)//returns 0 loc=s2.find(s3,loc+1)//returns 2</lang>
Forth
<lang forth>: starts-with ( a l a2 l2 -- ? )
tuck 2>r min 2r> compare 0= ;
- ends-with ( a l a2 l2 -- ? )
tuck 2>r negate over + 0 max /string 2r> compare 0= ;
\ use SEARCH ( a l a2 l2 -- a3 l3 ? ) for contains</lang>
J
<lang j>startswith=: ] -: ({.~ #) contains=: +./@:E.~ endswith=: ] -: ({.~ -@#)</lang>
Example use:
<lang j> 'abcd' startswith 'ab' 1
'abcd' startswith 'cd'
0
'abcd' endswith 'ab'
0
'abcd' endswith 'cd'
1
'abcd' contains 'bb'
0
'abcd' contains 'ab'
1
'abcd' contains 'bc'
1
'abab' contains 'ab'
1
'abab' I.@E.~ 'ab' NB. find starting indicies
0 2</lang>
Note that these verbs also apply to arrays of type other than character so: <lang j> 0 1 2 3 startswith 0 1 NB. integer 1
4.2 5.1 1.3 9 3 contains 1.3 4.2 NB. floating point
0
4.2 5.1 1.3 4.2 9 3 contains 1.3 4.2
1</lang>
Java
<lang java>"abcd".startsWith("ab") //returns true "abcd".endsWith("zn") //returns false "abab".contains("bb") //returns false "abab".contains("ab") //returns true int loc = "abab".indexOf("bb") //returns -1 loc = "abab".indexOf("ab") //returns 0 loc = "abab".indexOf("ab",loc+1) //returns 2</lang>
Logo
<lang logo>to starts.with? :sub :thing
if empty? :sub [output "true] if empty? :thing [output "false] if not equal? first :sub first :thing [output "false] output starts.with? butfirst :sub butfirst :thing
end
to ends.with? :sub :thing
if empty? :sub [output "true] if empty? :thing [output "false] if not equal? last :sub last :thing [output "false] output ends.with? butlast :sub butlast :thing
end
show starts.with? "dog "doghouse ; true show ends.with? "house "doghouse ; true show substring? "gho "doghouse ; true (built-in)</lang>
Lua
<lang lua>s1 = "string" s2 = "str" s3 = "ing" s4 = "xyz"
print( "s1 starts with s2: ", string.find( s1, s2 ) == 1 ) print( "s1 starts with s3: ", string.find( s1, s3 ) == 1, "\n" )
print( "s1 contains s3: ", string.find( s1, s3 ) ~= nil ) print( "s1 contains s3: ", string.find( s1, s4 ) ~= nil, "\n" )
print( "s1 ends with s2: ", select( 2, string.find( s1, s2 ) ) == string.len( s1 ) ) print( "s1 ends with s3: ", select( 2, string.find( s1, s3 ) ) == string.len( s1 ) )</lang>
Objective-C
<lang objc>[@"abcd" hasPrefix:@"ab"] //returns true [@"abcd" hasSuffix:@"zn"] //returns false int loc = [@"abab" rangeOfString:@"bb"].location //returns -1 loc = [@"abab" rangeOfString:@"ab"].location //returns 0 loc = [@"abab" rangeOfString:@"ab" options:0 range:NSMakeRange(loc+1, [@"abab" length]-(loc+1))].location //returns 2</lang>
OCaml
<lang ocaml>let match1 s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then false else let sub = String.sub s1 0 len2 in (sub = s2)</lang>
testing in the top-level:
# match1 "Hello" "Hello World!" ;; - : bool = false # match1 "Hello World!" "Hello" ;; - : bool = true
<lang ocaml>let match2 s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then false else let rec aux i = if i < 0 then false else let sub = String.sub s1 i len2 in if (sub = s2) then true else aux (pred i) in aux (len1 - len2)</lang>
# match2 "It's raining, Hello World!" "umbrella" ;; - : bool = false # match2 "It's raining, Hello World!" "Hello" ;; - : bool = true
<lang ocaml>let match3 s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then false else let sub = String.sub s1 (len1 - len2) len2 in (sub = s2)</lang>
# match3 "Hello World" "Hello" ;; - : bool = false # match3 "Hello World" "World" ;; - : bool = true
<lang ocaml>let match2_loc s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then (false, -1) else let rec aux i = if i < 0 then (false, -1) else let sub = String.sub s1 i len2 in if (sub = s2) then (true, i) else aux (pred i) in aux (len1 - len2)</lang>
# match2_loc "The sun's shining, Hello World!" "raining" ;; - : bool * int = (false, -1) # match2_loc "The sun's shining, Hello World!" "shining" ;; - : bool * int = (true, 10)
<lang ocaml>let match2_num s1 s2 =
let len1 = String.length s1 and len2 = String.length s2 in if len1 < len2 then (false, 0) else let rec aux i n = if i < 0 then (n <> 0, n) else let sub = String.sub s1 i len2 in if (sub = s2) then aux (pred i) (succ n) else aux (pred i) (n) in aux (len1 - len2) 0</lang>
# match2_num "This cloud looks like a camel, \ that other cloud looks like a llama" "stone" ;; - : bool * int = (false, 0) # match2_num "This cloud looks like a camel, \ that other cloud looks like a llama" "cloud" ;; - : bool * int = (true, 2)
Perl
<lang perl># the first four examples use regular expressions, so make sure to escape any special regex characters in the substring "abcd" =~ /^ab/ #returns true "abcd" =~ /zn$/ #returns false "abab" =~ /bb/ #returns false "abab" =~ /ab/ #returns true my $loc = index("abab", "bb") #returns -1 $loc = index("abab", "ab") #returns 0 $loc = index("abab", "ab", $loc+1) #returns 2</lang>
PicoLisp
<lang PicoLisp>: (pre? "ab" "abcd") -> "abcd"
- (pre? "xy" "abcd")
-> NIL
- (sub? "bc" "abcd")
-> "abcd"
- (sub? "xy" "abcd")
-> NIL
- (tail (chop "cd") (chop "abcd"))
-> ("c" "d")
- (tail (chop "xy") (chop "abcd"))
-> NIL
(de positions (Pat Str)
(setq Pat (chop Pat)) (make (for ((I . L) (chop Str) L (cdr L)) (and (head Pat L) (link I)) ) ) )
- (positions "bc" "abcdabcd")
-> (2 6)</lang>
PureBasic
<lang PureBasic>Procedure StartsWith(String1$, String2$)
Protected Result If FindString(String1$, String2$, 1) =1 ; E.g Found in possition 1 Result =CountString(String1$, String2$) EndIf ProcedureReturn Result
EndProcedure
Procedure EndsWith(String1$, String2$)
Protected Result, dl=Len(String1$)-Len(String2$) If dl>=0 And Right(String1$, Len(String2$))=String2$ Result =CountString(String1$, String2$) EndIf ProcedureReturn Result
EndProcedure</lang> And a verification <lang PureBasic>Debug StartsWith("Rosettacode", "Rosetta") ; = 1 Debug StartsWith("Rosettacode", "code") ; = 0 Debug StartsWith("eleutherodactylus cruralis", "e") ; = 3 Debug EndsWith ("Rosettacode", "Rosetta") ; = 0 Debug EndsWith ("Rosettacode", "code") ; = 1 Debug EndsWith ("Rosettacode", "e") ; = 2</lang>
Python
<lang python>"abcd".startswith("ab") #returns true "abcd".endswith("zn") #returns false "bb" in "abab" #returns false "ab" in "abab" #returns true loc = "abab".find("bb") #returns -1 loc = "abab".find("ab") #returns 0 loc = "abab".find("ab",loc+1) #returns 2</lang>
REXX
Extra coding was added to take of using plurals in the messages. <lang rexx> /*REXX program to show some basic character string testing. */
parse arg a b
say 'string A=' a say 'string B=' b say
if left(A,length(b))==b then say 'string A starts with string B'
else say "string A doesn't start with string B"
say
p=pos(b,a) if p\==0 then say 'string A contains string B (starting in position' p")"
else say "string A doesn't contains string B"
say
if right(A,length(b))==b then say 'string A ends with string B'
else say "string A doesn't end with string B"
say
Ps= p=0
do forever until p==0 p=pos(b,a,p+1) if p\==0 then Ps=Ps',' p end
Ps=space(strip(Ps,'L',",")) times=words(Ps) if times\==0 then say 'string A contains string B',
times 'time'left('s',times>1), "(at position"left('s',times>1) Ps')' else say "string A doesn't contains string B"
</lang>
Output when the following is specified (five Marx brothers):
Chico_Harpo_Groucho_Zeppo_Gummo p
string A= Chico_Harpo_Groucho_Zeppo_Gummo string B= p string A doesn't start with string B string A contains string B (starting in position 10) string A doesn't end with string B string A contains string B 3 times (at positions 10, 23, 24)
Output when the following is specified:
Chico_Harpo_Groucho_Zeppo_Gummo Z
string A= Chico_Harpo_Groucho_Zeppo_Gummo string B= Z string A doesn't start with string B string A contains string B (starting in position 21) string A doesn't end with string B string A contains string B 1 time (at position 21)
Output when the following is specified:
Chico_Harpo_Groucho_Zeppo_Gummo Chi
string A= Chico_Harpo_Groucho_Zeppo_Gummo string B= Chi string A starts with string B string A contains string B (starting in position 1) string A doesn't end with string B string A contains string B 1 time (at position 1)
Output when the following is specified:
Chico_Harpo_Groucho_Zeppo_Gummo mmo
string A= Chico_Harpo_Groucho_Zeppo_Gummo string B= mmo string A doesn't start with string B string A contains string B (starting in position 29) string A ends with string B string A contains string B 1 time (at position 29)
Ruby
<lang ruby>'abcd'.start_with?('ab') #returns true 'abcd'.end_with?('zn') #returns false 'abab'.include?('bb') #returns false 'abab'.include?('ab') #returns true 'abab'.index('bb') #returns -1 'abab'.index('ab') #returns 0 'abab'.index('ab', 1) #returns 2</lang>
Retro
<lang Retro>with strings'
- startsWith? ( $1 $2 - f )
dup getLength [ swap ] dip 0 swap getSubset compare ;
"abcdefghijkl" "abcde" startsWith? "abcdefghijkl" "bcd" startsWith?
"abcdefghijkl" "bcd" search "abcdefghijkl" "zmq" search
- endsWith? ( $1 $2 - f )
swap dup getLength + over getLength - compare ;
"abcdefghijkl" "ijkl" endsWith? "abcdefghijkl" "abc" endsWith?</lang>
Scala
<lang scala>"abcd".startsWith("ab") //returns true "abcd".endsWith("zn") //returns false "abab".contains("bb") //returns false "abab".contains("ab") //returns true
var loc="abab".indexOf("bb") //returns -1 loc = "abab".indexOf("ab") //returns 0 loc = "abab".indexOf("ab", loc+1) //returns 2</lang>
Tcl
In this code, we are looking in various ways for the string in the variable needle in the string in the variable haystack. <lang tcl>set isPrefix [string equal -length [string length $needle] $haystack $needle] set isContained [expr {[string first $needle $haystack] >= 0}] set isSuffix [string equal $needle [string range $haystack end-[expr {[string length $needle]-1}] end]]</lang>
Of course, in the cases where the needle is a glob-safe string (i.e., doesn't have any of the characters “*?[\” in), this can be written far more conveniently: <lang tcl>set isPrefix [string match $needle* $haystack] set isContained [string match *$needle* $haystack] set isSuffix [string match *$needle $haystack]</lang>
Another powerful technique is to use the regular expression engine in literal string mode:
<lang tcl>set isContained [regexp ***=$needle $haystack]</lang>
This can be extended by getting the regexp
to return the locations of the matches, enabling the other forms of match to be done:
<lang tcl>set matchLocations [regexp -indices -all -inline ***=$needle $haystack]
- Each match location is a pair, being the index into the string where the needle started
- to match and the index where the needle finished matching
set isContained [expr {[llength $matchLocations] > 0}] set isPrefix [expr {[lindex $matchLocations 0 0] == 0}] set isSuffix [expr {[lindex $matchLocations end 1] == [string length $haystack]-1}] set firstMatchStart [lindex $matchLocations 0 0] puts "Found \"$needle\" in \"$haystack\" at $firstMatchStart" foreach location $matchLocations {
puts "needle matched at index [lindex $location 0]"
}</lang>