Substring: Difference between revisions

Content added Content deleted

Inline

Revision as of 17:19, 28 August 2009

In this task display a substring:

starting from n characters in and of m length;
starting from n characters in, up to the end of the string;
whole string minus last character;
starting from a known character within the string and of m length;
starting from a known substring within the string and of m length.

Ada

String in Ada is an array of Character elements indexed by Positive: <lang Ada> type String is array (Positive range <>) of Character; </lang> Substring is a first-class object in Ada, an anonymous subtype of String. The language uses the term slice for it. Slices can be retrieved, assigned and passed as a parameter to subprograms in mutable or immutable mode. A slice is specified as: <lang Ada> A (<first-index>..<last-index>) </lang>

A string array in Ada can start with any positive index. This is why the implementation below uses Str'First in all slices, which in this concrete case is 1, but intentionally left in the code because the task refers to N as an offset to the string beginning rather than an index in the string. In Ada it is unusual to deal with slices in such way. One uses plain string index instead. <lang Ada> with Ada.Text_IO; use Ada.Text_IO; with Ada.Strings.Fixed; use Ada.Strings.Fixed;

procedure Test_Slices is

  Str : constant String := "abcdefgh";
  N : constant := 2;
  M : constant := 3;

begin

  Put_Line (Str (Str'First + N - 1..Str'First + N + M - 2));
  Put_Line (Str (Str'First + N - 1..Str'Last));
  Put_Line (Str (Str'First..Str'Last - 1));
  Put_Line (Head (Tail (Str, Str'Last - Index (Str, "d", 1)), M));
  Put_Line (Head (Tail (Str, Str'Last - Index (Str, "de", 1) - 1), M));

end Test_Slices;</lang> Sample output:

bcd
bcdefgh
abcdefg
efg
fgh

C

<lang c>#include <stdio.h>

include <stdlib.h>
include <string.h>

char *substring(const char *s, int n, int m) {

 char *result;

 /* n < 0 or m < 0 is invalid */
 if (n < 0 || m < 0)
   return NULL;

 /* make sure string does not end before n
  * and advance the "s" pointer to beginning of substring */
 for ( ; n > 0; s++, n--)
   if (*s == '\0')
     /* string ends before n: invalid */
     return NULL;

 result = malloc(m+1);
 result[0] = '\0';
 strncat(result, s, m); /* strncat() will automatically add null terminator
                         * if string ends early or after reading m characters */
 return result;

}

char *str_wholeless1(const char *s) {

 int slen = strlen(s);

 return substring(s, 0, slen-1);

}

char *str_fromch(const char *s, int ch, int m) {

 return substring(s, strchr(s, ch) - s, m);

}

char *str_fromstr(const char *s, char *in, int m) {

 return substring(s, strstr(s, in) - s , m);

}</lang>

<lang c>#define TEST(A) do { \

   const char *r = (A);	      \
   printf("%s\n", r);	      \
   free(r);     \
 } while(0)

int main() {

 const char *s = "hello world shortest program";

 TEST( substring(s, 12, 5) );      // get "short"
 TEST( substring(s, 6, -1) );      // get "world shortest program"
 TEST( str_wholeless1(s) );        // "... progra"
 TEST( str_fromch(s, 'w', 5) );    // "world"
 TEST( str_fromstr(s, "ro", 3) ); // "rog"

 return 0;

}</lang>

Common Lisp

<lang lisp>(let ((string "0123456789")

     (n 2)
     (m 3)
     (start #\5)
     (substring "34"))
 (list (subseq string n (+ n m))
       (subseq string n)
       (subseq string 0 (1- (length string)))
       (let ((pos (position start string)))
         (subseq string pos (+ pos m)))
       (let ((pos (search substring string)))
         (subseq string pos (+ pos m)))))</lang>

D

<lang d> import std.stdio; import std.string; void main() {

       char[]ostr = "The quick brown fox jumps over the lazy dog.";
       int n = 5,m = 3,o;
       writefln("%s",ostr[n..n+m]);
       writefln("%s",ostr[n..$]);
       writefln("%s",ostr[0..$-1]);
       o = ostr.find("q");
       writefln("%s",ostr[o..o+m]);
       o = ostr.find("qu");
       writefln("%s",ostr[o..o+m]);

} </lang>

E

<lang e>def string := "aardvarks" def n := 4 def m := 4 println(string(n, n + m)) println(string(n)) println(string(0, string.size() - 1)) println({string(def i := string.indexOf1('d'), i + m)}) println({string(def i := string.startOf("ard"), i + m)})</lang> Output:

vark
varks
aardvark
dvar
ardv

Forth

<lang forth> 2 constant Pos 3 constant Len

substrings

 s" abcdefgh"  ( addr len )
 over Pos + Len   cr type       \ cde
 2dup Pos /string cr type       \ cdefgh
 2dup 1-          cr type       \ abcdefg
 2dup 'd scan     Len min cr type       \ def
 s" de" search if Len min cr type then  \ def

</lang>

J

<lang J> 5{.3}.'Marshmallow' shmal

  3}.'Marshmallow'

shmallow

  }:'Marshmallow'

Marshmallo

  ({.~ i.&'m')'Marshmallow'

Marsh

  5{.(}.~ I.@E.~&'sh')'Marshmallow'

shmal</lang>

Note that there are other, sometimes better, ways of accomplishing this task.

<lang J> 'Marshmallow'{~(+i.)/3 5 shmal</lang>

Note also that these operations work the same way on lists of numbers that they do on this example list of characters.

Java

Strings in Java are 0-indexed. <lang java>String x = "testing123"; System.out.println(x.substring(n, n + m)); System.out.println(x.substring(n)); System.out.println(x.substring(0, x.length() - 1)); int index1 = x.indexOf('i'); System.out.println(x.substring(index1, index1 + m)); int index2 = x.indexOf("ing"); System.out.println(x.substring(index2, index2 + m)); //indexOf methods also have an optional "from index" argument which will //make indexOf ignore characters before that index</lang>

Logo

Works with: UCB Logo

The following are defined to behave similarly to the built-in index operator ITEM. As with most Logo list operators, these are designed to work for both words (strings) and lists. <lang logo> to items :n :thing

 if :n >= count :thing [output :thing]
 output items :n butlast :thing

end

to butitems :n :thing

 if or :n <= 0 empty? :thing [output :thing]
 output butitems :n-1 butfirst :thing

end

to middle :n :m :thing

 output items :m-(:n-1) butitems :n-1 :thing

end

to lastitems :n :thing

 if :n >= count :thing [output :thing]
output lastitems :n butfirst :thing

end

to starts.with :sub :thing

 if empty? :sub [output "true]
 if empty? :thing [output "false]
 if not equal? first :sub first :thing [output "false]
 output starts.with butfirst :sub butfirst :thing

end

to members :sub :thing

 output cascade [starts.with :sub ?] [bf ?] :thing

end

note: Logo indices start at one

make "s "abcdefgh print items 3 butitems 2 :s ; cde print middle 3 5 :s ; cde print butitems 2 :s ; cdefgh print butlast :s ; abcdefg print items 3 member "d :s ; def print items 3 members "de :s ; def </lang>

OCaml

<lang ocaml># let s = "ABCDEFGH" ;; val s : string = "ABCDEFGH"

let n, m = 2, 3 ;;

val n : int = 2 val m : int = 3

String.sub s n m ;;

- : string = "CDE"

String.sub s n (String.length s - n) ;;

- : string = "CDEFGH"

String.sub s 0 (String.length s - 1) ;;

- : string = "ABCDEFG"

String.sub s (String.index s 'D') m ;;

- : string = "DEF"

#load "str.cma";;
let n = Str.search_forward (Str.regexp_string "DE") s 0 in

 String.sub s n m ;;

- : string = "DEF"</lang>

Perl

<lang perl>my $str = 'abcdefgh'; my $n = 2; my $m = 3; print substr($str, $n, $m), "\n"; print substr($str, $n), "\n"; print substr($str, 0, -1), "\n"; print substr($str, index($str, 'd'), $m), "\n"; print substr($str, index($str, 'de'), $m), "\n";</lang>

PHP

Python

Python uses zero-based indexing, so the n'th character is at index n-1.

<lang python>>>> s = 'abcdefgh' >>> n, m, char, chars = 2, 3, 'd', 'cd' >>> # starting from n=2 characters in and m=3 in length; >>> s[n-1:n+m-1] 'bcd' >>> # starting from n characters in, up to the end of the string; >>> s[n-1:] 'bcdefgh' >>> # whole string minus last character; >>> s[:-1] 'abcdefg' >>> # starting from a known character char="d" within the string and of m length; >>> indx = s.index(char) >>> s[indx:indx+m] 'def' >>> # starting from a known substring chars="cd" within the string and of m length. >>> indx = s.index(chars) >>> s[indx:indx+m] 'cde' >>> </lang>

R

Ruby

<lang ruby>str = 'abcdefgh' n = 2 m = 3 puts str[n, m] puts str[n..-1] puts str[0..-2] puts str[str.index('d'), m] puts str[str.index('de'), m]</lang>

Smalltalk

The distinction between searching a single character or a string into another string is rather blurred. In the following code, instead of using 'w' (a string) we could use $w (a character), but it makes no difference.

<lang smalltalk>|s| s := 'hello world shortest program'.

(s copyFrom: 13 to: (13+4)) displayNl. "4 is the length (5) - 1, since we need the index of the

last char we want, which is included"

(s copyFrom: 7) displayNl. (s allButLast) displayNl.

(s copyFrom: ((s indexOfRegex: 'w') first)

  to: ( ((s indexOfRegex: 'w') first) + 4) ) displayNl.

(s copyFrom: ((s indexOfRegex: 'ro') first)

  to: ( ((s indexOfRegex: 'ro') first) + 2) ) displayNl.</lang>

These last two examples in particular seem rather complex, so we can extend the string class.

Works with: GNU Smalltalk

<lang smalltalk>String extend [

 copyFrom: index length: nChar [
   ^ self copyFrom: index to: ( index + nChar - 1 )
 ]
 copyFromRegex: regEx length: nChar [
   |i|
   i := self indexOfRegex: regEx.
   ^ self copyFrom: (i first) length: nChar
 ]

].

"and show it simpler..."

(s copyFrom: 13 length: 5) displayNl. (s copyFromRegex: 'w' length: 5) displayNl. (s copyFromRegex: 'ro' length: 3) displayNl.</lang>

Tcl

<lang tcl>set str "abcdefgh" set n 2 set m 3

puts [string range $str $n [expr {$n+$m-1}]] puts [string range $str $n end] puts [string range $str 0 end-1]

Because Tcl does substrings with a pair of indices, it is easier to express
the last two parts of the task as a chained pair of [string range] operations.

puts [string range [string range $str [string first "d" $str] end] [expr {$m-1}] puts [string range [string range $str [string first "de" $str] end] [expr {$m-1}]</lang> Of course, if you were doing 'position-plus-length' a lot, it would be easier to add another subcommand to string, like this:

Works with: Tcl version 8.5

<lang tcl># Define the substring operation proc ::substring {string start length} {

   string range [string range $string $start end] 0 $length-1

}

Plumb it into the language

set ops [namespace ensemble configure string -map] dict set ops substr ::substring namespace ensemble configure string -map $ops

Now show off by repeating the challenge!

set str "abcdefgh" set n 2 set m 3

puts [string substr $str $n $m] puts [string range $str $n end] puts [string range $str 0 end-1] puts [string substr $str [string first "d" $str] $m] puts [string substr $str [string first "de" $str] $m]</lang>