Substring: Difference between revisions
→{{header|J}}: add requested task |
Undo revision 109590 by Markhobley The change asks for a lot of tasks to be updated without flagging such, or mentioning it on the talk page. |
||
Line 1: | Line 1: | ||
{{Task|Basic language learning}}[[Category:String manipulation]]{{basic data operation}} |
{{Task|Basic language learning}}[[Category:String manipulation]]{{basic data operation}}In this task display a substring: |
||
The solution should demonstrate how to achieve each of the following results: |
|||
* starting from <tt>n</tt> characters in and of <tt>m</tt> length; |
* starting from <tt>n</tt> characters in and of <tt>m</tt> length; |
||
* starting from <tt>n</tt> characters in, up to the end of the string; |
* starting from <tt>n</tt> characters in, up to the end of the string; |
||
* |
* whole string minus last character; |
||
* the whole string minus the last character; |
|||
* starting from a known character within the string and of <tt>m</tt> length; |
* starting from a known character within the string and of <tt>m</tt> length; |
||
* starting from a known substring within the string and of <tt>m</tt> length. |
* starting from a known substring within the string and of <tt>m</tt> length. |
Revision as of 04:25, 5 June 2011
You are encouraged to solve this task according to the task description, using any language you may know.
Basic Data Operation
This is a basic data operation. It represents a fundamental action on a basic data type.
You may see other such operations in the Basic Data Operations category, or:
Integer Operations
Arithmetic |
Comparison
Boolean Operations
Bitwise |
Logical
String Operations
Concatenation |
Interpolation |
Comparison |
Matching
Memory Operations
Pointers & references |
Addresses
In this task display a substring:
- starting from n characters in and of m length;
- starting from n characters in, up to the end of the string;
- whole string minus last character;
- starting from a known character within the string and of m length;
- starting from a known substring within the string and of m length.
Ada
String in Ada is an array of Character elements indexed by Positive: <lang Ada>type String is array (Positive range <>) of Character;</lang> Substring is a first-class object in Ada, an anonymous subtype of String. The language uses the term slice for it. Slices can be retrieved, assigned and passed as a parameter to subprograms in mutable or immutable mode. A slice is specified as: <lang Ada>A (<first-index>..<last-index>)</lang>
A string array in Ada can start with any positive index. This is why the implementation below uses Str'First in all slices, which in this concrete case is 1, but intentionally left in the code because the task refers to N as an offset to the string beginning rather than an index in the string. In Ada it is unusual to deal with slices in such way. One uses plain string index instead. <lang Ada>with Ada.Text_IO; use Ada.Text_IO; with Ada.Strings.Fixed; use Ada.Strings.Fixed;
procedure Test_Slices is
Str : constant String := "abcdefgh"; N : constant := 2; M : constant := 3;
begin
Put_Line (Str (Str'First + N - 1..Str'First + N + M - 2)); Put_Line (Str (Str'First + N - 1..Str'Last)); Put_Line (Str (Str'First..Str'Last - 1)); Put_Line (Head (Tail (Str, Str'Last - Index (Str, "d", 1)), M)); Put_Line (Head (Tail (Str, Str'Last - Index (Str, "de", 1) - 1), M));
end Test_Slices;</lang> Sample output:
bcd bcdefgh abcdefg efg fgh
Aikido
Aikido uses square brackets for slices. The syntax is [start:end]
. If you want to use length you have to add to the start. Shifting strings left or right removes characters from the ends.
<lang aikido> const str = "abcdefg" var n = 2 var m = 3
println (str[n:n+m-1]) // pos 2 length 3 println (str[n:]) // pos 2 to end println (str >> 1) // remove last character var p = find (str, 'c') println (str[p:p+m-1]) // from pos of p length 3
var s = find (str, "bc") println (str[s, s+m-1]) // pos of bc length 3 </lang>
ALGOL 68
<lang Algol68>main: (
STRING s = "abcdefgh"; INT n = 2, m = 3; CHAR char = "d"; STRING chars = "cd";
printf(($gl$, s[n:n+m-1])); printf(($gl$, s[n:])); printf(($gl$, s[:UPB s-1])); INT pos; char in string("d", pos, s); printf(($gl$, s[pos:pos+m-1])); string in string("de", pos, s); printf(($gl$, s[pos:pos+m-1]))
)</lang>Output:
bcd bcdefgh abcdefg def def
AutoHotkey
The code contains some alternatives. <lang autohotkey>String := "abcdefghijklmnopqrstuvwxyz"
- also
- String = abcdefghijklmnopqrstuvwxyz
n := 12 m := 5
- starting from n characters in and of m length;
subString := SubStr(String, n, m)
- alternative
- StringMid, subString, String, n, m
MsgBox % subString
- starting from n characters in, up to the end of the string;
subString := SubStr(String, n)
- alternative
- StringMid, subString, String, n
MsgBox % subString
- whole string minus last character;
StringTrimRight, subString, String, 1
- alternatives
- subString := SubStr(String, 1, StrLen(String) - 1)
- StringMid, subString, String, 1, StrLen(String) - 1
MsgBox % subString
- starting from a known character within the string and of m length;
findChar := "q" subString := SubStr(String, InStr(String, findChar), m)
- alternatives
- RegExMatch(String, findChar . ".{" . m - 1 . "}", subString)
- StringMid, subString, String, InStr(String, findChar), m
MsgBox % subString
- starting from a known character within the string and of m length;
findString := "pq" subString := SubStr(String, InStr(String, findString), m)
- alternatives
- RegExMatch(String, findString . ".{" . m - StrLen(findString) . "}", subString)
- StringMid, subString, String, InStr(String, findString), m
MsgBox % subString </lang>
Output:
lmnop lmnopqrstuvwxyz abcdefghijklmnopqrstuvwxy qrstu pqrst
AWK
<lang awk>BEGIN { str = "abcdefghijklmnopqrstuvwxyz" n = 12 m = 5
print substr(str, n, m) print substr(str, n) print substr(str, 1, length(str) - 1) print substr(str, index(str, "q"), m) print substr(str, index(str, "pq"), m) }</lang>
Output:
$ awk -f substring.awk lmnop lmnopqrstuvwxyz abcdefghijklmnopqrstuvwxy qrstu pqrst
BASIC
<lang qbasic>DIM baseString AS STRING, subString AS STRING, findString AS STRING DIM m AS INTEGER, n AS INTEGER
baseString = "abcdefghijklmnopqrstuvwxyz" n = 12 m = 5
' starting from n characters in and of m length; subString = MID$(baseString, n, m) PRINT subString
' starting from n characters in, up to the end of the string; subString = MID$(baseString, n) PRINT subString
' whole string minus last character; subString = LEFT$(baseString, LEN(baseString) - 1) PRINT subString
' starting from a known character within the string and of m length; ' starting from a known substring within the string and of m length. findString = "pq" subString = MID$(baseString, INSTR(baseString, findString), m) PRINT subString </lang>
Output:
lmnop lmnopqrstuvwxyz abcdefghijklmnopqrstuvwxy pqrst
ZX Spectrum Basic
ZX Spectrum Basic has unfortunately no direct way to find a substring within a string, however a similar effect can be done searching with a for loop: <lang basic>10 LET A$="abcdefghijklmnopqrstuvwxyz" 15 LET n=10: LET m=7 20 PRINT A$(n TO n+m-1) 30 PRINT A$(n TO ) 40 PRINT A$( TO LEN (A$)-1) 50 FOR i=1 TO LEN (A$) 60 IF A$(i)="g" THEN PRINT A$(i TO i+m-1): GO TO 80 70 NEXT i 80 LET B$="ijk" 90 FOR i=1 TO LEN (A$)-LEN (B$)+1 100 IF A$(i TO i+LEN (B$)-1)=B$ THEN PRINT A$(i TO i+m-1): GO TO 120 110 NEXT i 120 STOP </lang> Output:
jklmnop jklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxy ghijklm ijklmno
BBC BASIC
<lang bbcbasic> basestring$ = "The five boxing wizards jump quickly"
n% = 10 m% = 5 REM starting from n characters in and of m length: substring$ = MID$(basestring$, n%, m%) PRINT substring$ REM starting from n characters in, up to the end of the string: substring$ = MID$(basestring$, n%) PRINT substring$ REM whole string minus last character: substring$ = LEFT$(basestring$) PRINT substring$ REM starting from a known character within the string and of m length: char$ = "w" substring$ = MID$(basestring$, INSTR(basestring$, char$), m%) PRINT substring$ REM starting from a known substring within the string and of m length: find$ = "iz" substring$ = MID$(basestring$, INSTR(basestring$, find$), m%) PRINT substring$</lang>
Output:
boxin boxing wizards jump quickly The five boxing wizards jump quickl wizar izard
C
<lang c>#include <stddef.h>
- include <stdio.h>
- include <stdlib.h>
- include <string.h>
char *substring(const char *s, size_t n, ptrdiff_t m) {
char *result; /* check for null s */ if (NULL == s) return NULL; /* negative m to mean 'up to the mth char from right' */ if (m < 0) m = strlen(s) + m - n + 1;
/* n < 0 or m < 0 is invalid */ if (n < 0 || m < 0) return NULL;
/* make sure string does not end before n * and advance the "s" pointer to beginning of substring */ for ( ; n > 0; s++, n--) if (*s == '\0') /* string ends before n: invalid */ return NULL;
result = malloc(m+1); if (NULL == result) /* memory allocation failed */ return NULL; result[0]=0; strncat(result, s, m); /* strncat() will automatically add null terminator * if string ends early or after reading m characters */ return result;
}
char *str_wholeless1(const char *s) {
return substring(s, 0, strlen(s) - 1);
}
char *str_fromch(const char *s, int ch, ptrdiff_t m) {
return substring(s, strchr(s, ch) - s, m);
}
char *str_fromstr(const char *s, char *in, ptrdiff_t m) {
return substring(s, strstr(s, in) - s , m);
}
- define TEST(A) do { \
char *r = (A); \ if (NULL == r) \ puts("--error--"); \ else { \ puts(r); \ free(r); \ } \ } while(0)
int main() {
const char *s = "hello world shortest program";
TEST( substring(s, 12, 5) ); // get "short" TEST( substring(s, 6, -1) ); // get "world shortest program" TEST( str_wholeless1(s) ); // "... progra" TEST( str_fromch(s, 'w', 5) ); // "world" TEST( str_fromstr(s, "ro", 3) ); // "rog"
return 0;
}</lang>
C++
<lang cpp>#include <iostream>
- include <string>
int main() {
std::string s = "0123456789";
int const n = 3; int const m = 4; char const c = '2'; std::string const sub = "456";
std::cout << s.substr(n, m)<< "\n"; std::cout << s.substr(n) << "\n"; std::cout << s.substr(0, s.size()-1) << "\n"; std::cout << s.substr(s.find(c), m) << "\n"; std::cout << s.substr(s.find(sub), m) << "\n";
}</lang>
C_sharp
<lang csharp>using System; namespace SubString {
class Program { static void Main(string[] args) { string s = "0123456789"; const int n = 3; const int m = 2; const char c = '3'; const string z = "345";
Console.WriteLine(s.Substring(n, m)); Console.WriteLine(s.Substring(n, s.Length - n)); Console.WriteLine(s.Substring(0, s.Length - 1)); Console.WriteLine(s.Substring(s.IndexOf(c,0,s.Length), m)); Console.WriteLine(s.Substring(s.IndexOf(z, 0, s.Length), m)); } }
} </lang>
Clojure
<lang lisp>
(def string "alphabet") (def n 2) (def m 4) (def len (count string))
- starting from n characters in and of m length;
(println
(subs string n (+ n m))) ;phab
- starting from n characters in, up to the end of the string;
(println
(subs string n)) ;phabet
- whole string minus last character;
(println
(subs string 0 (dec len))) ;alphabe
- starting from a known character within the string and of m length;
(let [pos (.indexOf string (int \l))]
(println (subs string pos (+ pos m)))) ;lpha
- starting from a known substring within the string and of m length.
(let [pos (.indexOf string "ph")]
(println (subs string pos (+ pos m)))) ;phab
</lang>
Common Lisp
<lang lisp>(let ((string "0123456789")
(n 2) (m 3) (start #\5) (substring "34")) (list (subseq string n (+ n m)) (subseq string n) (subseq string 0 (1- (length string))) (let ((pos (position start string))) (subseq string pos (+ pos m))) (let ((pos (search substring string))) (subseq string pos (+ pos m)))))</lang>
D
<lang d>import std.stdio, std.string; void main() {
string str = "the quick brown fox jumps over the lazy dog"; int n = 5, m = 3, i;
writefln("%s", str[n..n+m]);
writefln("%s", str[n..$]);
writefln("%s", str[0..$-1]);
i = str.indexOf("q"); writefln("%s", str[i..i+m]);
i = str.indexOf("qu"); writefln("%s", str[i..i+m]);
}</lang> Output:
uic uick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog qui qui
Delphi
<lang Delphi>program ShowSubstring;
{$APPTYPE CONSOLE}
uses SysUtils;
const
s = '0123456789'; n = 3; m = 4; c = '2'; sub = '456';
begin
Writeln(Copy(s, n, m)); // starting from n characters in and of m length; Writeln(Copy(s, n, Length(s))); // starting from n characters in, up to the end of the string; Writeln(Copy(s, 1, Length(s) - 1)); // whole string minus last character; Writeln(Copy(s, Pos(c, s), m)); // starting from a known character within the string and of m length; Writeln(Copy(s, Pos(sub, s), m)); // starting from a known substring within the string and of m length.
end.</lang>
Output:
2345 23456789 012345678 2345 4567
E
<lang e>def string := "aardvarks" def n := 4 def m := 4 println(string(n, n + m)) println(string(n)) println(string(0, string.size() - 1)) println({string(def i := string.indexOf1('d'), i + m)}) println({string(def i := string.startOf("ard"), i + m)})</lang> Output:
vark varks aardvark dvar ardv
Euphoria
<lang Euphoria>sequence baseString, subString, findString integer findChar integer m, n
baseString = "abcdefghijklmnopqrstuvwxyz"
-- starting from n characters in and of m length; n = 12 m = 5 subString = baseString[n..n+m-1] puts(1, subString ) puts(1,'\n')
-- starting from n characters in, up to the end of the string; n = 12 subString = baseString[n..$] puts(1, subString ) puts(1,'\n')
-- whole string minus last character; subString = baseString[1..$-1] puts(1, subString ) puts(1,'\n')
-- starting from a known character within the string and of m length; findChar = 'o' m = 5 n = find(findChar,baseString) subString = baseString[n..n+m-1] puts(1, subString ) puts(1,'\n')
-- starting from a known substring within the string and of m length. findString = "pq" m = 5 n = match(findString,baseString) subString = baseString[n..n+m-1] puts(1, subString ) puts(1,'\n')</lang>
Output:
lmnop lmnopqrstuvwxyz abcdefghijklmnopqrstuvwxy opqrs pqrst
Factor
<lang factor>USING: math sequences kernel ;
! starting from n characters in and of m length
- subseq* ( from length seq -- newseq ) [ over + ] dip subseq ;
! starting from n characters in, up to the end of the string
- dummy ( seq n -- tailseq ) tail ;
! whole string minus last character
- dummy1 ( seq -- headseq ) but-last ;
USING: fry sequences kernel ; ! helper word
- subseq-from-* ( subseq len seq quot -- seq ) [ nip ] prepose 2keep subseq* ; inline
! starting from a known character within the string and of m length;
- subseq-from-char ( char len seq -- seq ) [ index ] subseq-from-* ;
! starting from a known substring within the string and of m length.
- subseq-from-seq ( subseq len seq -- seq ) [ start ] subseq-from-* ;</lang>
Forth
<lang forth>2 constant Pos 3 constant Len
- substrings
s" abcdefgh" ( addr len ) over Pos + Len cr type \ cde 2dup Pos /string cr type \ cdefgh 2dup 1- cr type \ abcdefg 2dup 'd scan Len min cr type \ def s" de" search if Len min cr type then \ def
- </lang>
Fortran
<lang fortran>program test_substring
character (*), parameter :: string = 'The quick brown fox jumps over the lazy dog.' character (*), parameter :: substring = 'brown' character , parameter :: c = 'q' integer , parameter :: n = 5 integer , parameter :: m = 15 integer :: i
! Display the substring starting from n characters in and of length m.
write (*, '(a)') string (n : n + m - 1)
! Display the substring starting from n characters in, up to the end of the string.
write (*, '(a)') string (n :)
! Display the whole string minus the last character.
i = len (string) - 1 write (*, '(a)') string (: i)
! Display the substring starting from a known character and of length m.
i = index (string, c) write (*, '(a)') string (i : i + m - 1)
! Display the substring starting from a known substring and of length m.
i = index (string, substring) write (*, '(a)') string (i : i + m - 1)
end program test_substring</lang> Output:
quick brown fox quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog quick brown fox brown fox jumps
Note that in Fortran positions inside character strings are one-based, i. e. the first character is in position one.
Go
<lang go>package main import "fmt" import "strings"
func main() {
s := "ABCDEFGH" n, m := 2, 3
fmt.Println(s[n:n+m]) // "CDE" fmt.Println(s[n:]) // "CDEFGH" fmt.Println(s[0:len(s)-1]) // "ABCDEFG" fmt.Println(s[strings.Index(s, "D"):strings.Index(s, "D")+m]) // "DEF" fmt.Println(s[strings.Index(s, "DE"):strings.Index(s, "DE")+m]) // "DEF"
}</lang>
Groovy
Strings in Groovy are 0-indexed. <lang groovy>def str = 'abcdefgh' def n = 2 def m = 3 println str[n..n+m-1] println str[n..-1] println str[0..-2] def index1 = str.indexOf('d') println str[index1..index1+m-1] def index2 = str.indexOf('de') println str[index2..index2+m-1]</lang>
Haskell
A string in Haskell is a list of chars: [Char]
- The first three tasks are simply:
*Main> take 3 $ drop 2 "1234567890" "345" *Main> drop 2 "1234567890" "34567890" *Main> init "1234567890" "123456789"
- The last two can be formulated with the following function:
<lang Haskell>t45 n c s | null sub = []
| otherwise = take n. head $ sub where sub = filter(isPrefixOf c) $ tails s</lang>
*Main> t45 3 "4" "1234567890" "456" *Main> t45 3 "45" "1234567890" "456" *Main> t45 3 "31" "1234567890" ""
HicEst
<lang hicest>CHARACTER :: string = 'ABCDEFGHIJK', known = 'B', substring = 'CDE' REAL, PARAMETER :: n = 5, m = 8
WRITE(Messagebox) string(n : n + m - 1), "| substring starting from n, length m" WRITE(Messagebox) string(n :), "| substring starting from n, to end of string" WRITE(Messagebox) string(1: LEN(string)-1), "| whole string minus last character"
pos_known = INDEX(string, known) WRITE(Messagebox) string(pos_known : pos_known+m-1), "| substring starting from pos_known, length m"
pos_substring = INDEX(string, substring) WRITE(Messagebox) string(pos_substring : pos_substring+m-1), "| substring starting from pos_substring, length m"</lang>
Icon and Unicon
<lang Icon>procedure main(arglist) write("Usage: substring <string> <first position> <second position> <single character> <substring>") s := \arglist[1] | "aardvarks" n := \arglist[2] | 5 m := \arglist[3] | 4 c := \arglist[4] | "d" ss := \arglist[5] | "ard"
write( s[n+:m] ) write( s[n:0] ) write( s[1:-1] ) write( s[find(c,s)+:m] ) write( s[find(ss,s)+:m] ) end</lang>
J
<lang J> 5{.3}.'Marshmallow' shmal
3}.'Marshmallow'
shmallow
}.'Marshmallow'
arshmallow
}:'Marshmallow'
Marshmallo
5{.(}.~ i.&'m')'Marshmallow'
mallo
5{.(}.~ I.@E.~&'sh')'Marshmallow'
shmal</lang>
Note that there are other, sometimes better, ways of accomplishing this task.
<lang J> 'Marshmallow'{~(+i.)/3 5 shmal</lang>
The taketo
/ takeafter
and dropto
/ dropafter
utilities from the strings
script further simplify these types of tasks.
<lang J> require 'strings'
'sh' dropto 'Marshmallow'
shmallow
5{. 'sh' dropto 'Marshmallow'
shmal
'sh' takeafter 'Marshmallow'
mallow</lang>
Note also that these operations work the same way on lists of numbers that they do on this example list of characters.
<lang J> 3}. 2 3 5 7 11 13 17 19 7 11 13 17 19
7 11 dropafter 2 3 5 7 11 13 17 19
2 3 5 7 11</lang>
Java
Strings in Java are 0-indexed. <lang java>String x = "testing123"; System.out.println(x.substring(n, n + m)); System.out.println(x.substring(n)); System.out.println(x.substring(0, x.length() - 1)); int index1 = x.indexOf('i'); System.out.println(x.substring(index1, index1 + m)); int index2 = x.indexOf("ing"); System.out.println(x.substring(index2, index2 + m)); //indexOf methods also have an optional "from index" argument which will //make indexOf ignore characters before that index</lang>
JavaScript
The String
object has two similar methods: substr
and substring
.
substr(start, [len])
returns a substring beginning at a specified location and having a specified length.substring(start, [end])
returns a string containing the substring fromstart
up to, but not including,end
.
<lang javascript>var str = "abcdefgh";
var n = 2; var m = 3;
// * starting from n characters in and of m length; str.substr(n, m); // => "cde"
// * starting from n characters in, up to the end of the string; str.substr(n); // => "cdefgh" str.substring(n); // => "cdefgh"
// * whole string minus last character; str.substring(0, str.length - 1); // => "abcdefg"
// * starting from a known character within the string and of m length; str.substr(str.indexOf('b'), m); // => "bcd"
// * starting from a known substring within the string and of m length. str.substr(str.indexOf('bc'), m); // => "bcd"</lang>
Liberty BASIC
<lang lb>'These tasks can be completed with various combinations of Liberty Basic's 'built in Mid$()/ Instr()/ Left$()/ Right$()/ and Len() functions, but these 'examples only use the Mid$()/ Instr()/ and Len() functions.
baseString$ = "Thequickbrownfoxjumpsoverthelazydog." n = 12 m = 5
'starting from n characters in and of m length Print Mid$(baseString$, n, m)
'starting from n characters in, up to the end of the string Print Mid$(baseString$, n)
'whole string minus last character Print Mid$(baseString$, 1, (Len(baseString$) - 1))
'starting from a known character within the string and of m length Print Mid$(baseString$, Instr(baseString$, "f", 1), m)
'starting from a known substring within the string and of m length Print Mid$(baseString$, Instr(baseString$, "jump", 1), m)</lang>
Logo
The following are defined to behave similarly to the built-in index operator ITEM. As with most Logo list operators, these are designed to work for both words (strings) and lists. <lang logo>to items :n :thing
if :n >= count :thing [output :thing] output items :n butlast :thing
end
to butitems :n :thing
if or :n <= 0 empty? :thing [output :thing] output butitems :n-1 butfirst :thing
end
to middle :n :m :thing
output items :m-(:n-1) butitems :n-1 :thing
end
to lastitems :n :thing
if :n >= count :thing [output :thing] output lastitems :n butfirst :thing
end
to starts.with :sub :thing
if empty? :sub [output "true] if empty? :thing [output "false] if not equal? first :sub first :thing [output "false] output starts.with butfirst :sub butfirst :thing
end
to members :sub :thing
output cascade [starts.with :sub ?] [bf ?] :thing
end
- note
- Logo indices start at one
make "s "abcdefgh print items 3 butitems 2 :s ; cde print middle 3 5 :s ; cde print butitems 2 :s ; cdefgh print butlast :s ; abcdefg print items 3 member "d :s ; def print items 3 members "de :s ; def</lang>
Lua
<lang lua>str = "abcdefghijklmnopqrstuvwxyz" n, m = 5, 15
print( string.sub( str, n, m ) ) -- efghijklmno print( string.sub( str, n, -1 ) ) -- efghijklmnopqrstuvwxyz print( string.sub( str, 1, -2 ) ) -- abcdefghijklmnopqrstuvwxy
pos = string.find( str, "i" ) if pos ~= nil then print( string.sub( str, pos, pos+m ) ) end -- ijklmnopqrstuvwx
pos = string.find( str, "ijk" ) if pos ~= nil then print( string.sub( str, pos, pos+m ) ) end-- ijklmnopqrstuvwx </lang>
Mathematica
The StringTake
and StringDrop
are relevant for this exercise.
<lang Mathematica> n = 2 m = 3 StringTake["Mathematica", {n+1, n+m-1}]
StringDrop["Mathematica", n]
(* StringPosition returns a list of starting and ending character positions for a substring *) pos = StringPosition["Mathematica", "e"]11 StringTake["Mathematica", {pos, pos+m-1}]
(* Similar to above *) pos = StringPosition["Mathematica", "the"]1 StringTake["Mathematica", {pos, pos+m-1}] </lang>
MUMPS
MUMPS has the first position in a string numbered as 1. <lang MUMPS> SUBSTR(S,N,M,C,K)
;show substring operations ;S is the string ;N is a position within the string (that is, n<length(string)) ;M is an integer of positions to show ;C is a character within the string S ;K is a substring within the string S ;$Find returns the position after the substring NEW X WRITE !,"The base string is:",!,?5,"'",S,"'" WRITE !,"From position ",N," for ",M," characters:" WRITE !,?5,$EXTRACT(S,N,N+M-1) WRITE !,"From position ",N," to the end of the string:" WRITE !,?5,$EXTRACT(S,N,$LENGTH(S)) WRITE !,"Whole string minus last character:" WRITE !,?5,$EXTRACT(S,1,$LENGTH(S)-1) WRITE !,"Starting from character '",C,"' for ",M," characters:" SET X=$FIND(S,C)-$LENGTH(C) WRITE !,?5,$EXTRACT(S,X,X+M-1) WRITE !,"Starting from string '",K,"' for ",M," characters:" SET X=$FIND(S,K)-$LENGTH(K) W !,?5,$EXTRACT(S,X,X+M-1) QUIT
</lang> Usage:
USER>D SUBSTR^ROSETTA("ABCD1234efgh",3,4,"D","23") The base string is: 'ABCD1234efgh' From position 3 for 4 characters: CD12 From position 3 to the end of the string: CD1234efgh Whole string minus last character: ABCD1234efg Starting from character 'D' for 4 characters: D123 Starting from string '23' for 4 characters: 234e
newLISP
<lang newLISP>> (set 'str "alphabet" 'n 2 'm 4) 4 > ; starting from n characters in and of m length > (slice str n m) "phab" > ; starting from n characters in, up to the end of the string > (slice str n) "phabet" > ; whole string minus last character > (chop str) "alphabe" > ; starting from a known character within the string and of m length > (slice str (find "l" str) m) "lpha" > ; starting from a known substring within the string and of m length > (slice str (find "ph" str) m) "phab" </lang>
Niue
<lang Niue>( based on the JavaScript code ) 'abcdefgh 's ; s str-len 'len ; 2 'n ; 3 'm ;
( starting from n characters in and of m length ) s n n m + substring . ( => cde ) newline
( starting from n characters in, up to the end of the string ) s n len substring . ( => cdefgh ) newline
( whole string minus last character ) s 0 len 1 - substring . ( => abcdefg ) newline
( starting from a known character within the string and of m length ) s s 'b str-find dup m + substring . ( => bcd ) newline
( starting from a known substring within the string and of m length ) s s 'bc str-find dup m + substring . ( => bcd ) newline </lang>
Objeck
<lang objeck> bundle Default {
class SubString { function : Main(args : String[]) ~ Nil { s := "0123456789";
n := 3; m := 4; c := '2'; sub := "456";
s->SubString(n, m)->PrintLine(); s->SubString(n)->PrintLine(); s->SubString(0, s->Size())->PrintLine(); s->SubString(s->Find(c), m)->PrintLine(); s->SubString(s->Find(sub), m)->PrintLine(); } }
} </lang>
OCaml
<lang ocaml># let s = "ABCDEFGH" ;; val s : string = "ABCDEFGH"
- let n, m = 2, 3 ;;
val n : int = 2 val m : int = 3
- String.sub s n m ;;
- : string = "CDE"
- String.sub s n (String.length s - n) ;;
- : string = "CDEFGH"
- String.sub s 0 (String.length s - 1) ;;
- : string = "ABCDEFG"
- String.sub s (String.index s 'D') m ;;
- : string = "DEF"
- #load "str.cma";;
- let n = Str.search_forward (Str.regexp_string "DE") s 0 in
String.sub s n m ;;
- : string = "DEF"</lang>
Oz
<lang oz>declare
fun {DropUntil Xs Prefix} case Xs of nil then nil [] _|Xr then if {List.isPrefix Prefix Xs} then Xs else {DropUntil Xr Prefix} end end end
Digits = "1234567890"
in
{ForAll [{List.take {List.drop Digits 2} 3} = "345" {List.drop Digits 2} = "34567890" {List.take Digits {Length Digits}-1} = "123456789" {List.take {DropUntil Digits "4"} 3} = "456" {List.take {DropUntil Digits "56"} 3} = "567" {List.take {DropUntil Digits "31"} 3} = "" ] System.showInfo}</lang>
Perl
<lang perl>my $str = 'abcdefgh'; my $n = 2; my $m = 3; print substr($str, $n, $m), "\n"; print substr($str, $n), "\n"; print substr($str, 0, -1), "\n"; print substr($str, index($str, 'd'), $m), "\n"; print substr($str, index($str, 'de'), $m), "\n";</lang>
Perl 6
<lang perl6>my $str = 'abcdefgh'; my $n = 2; my $m = 3; say $str.substr($n, $m); say $str.substr($n); say $str.substr(0, *-1); say $str.substr($str.index('d'), $m); say $str.substr($str.index('de'), $m);</lang>
PHP
<lang php><?php $str = 'abcdefgh'; $n = 2; $m = 3; echo substr($str, $n, $m), "\n"; echo substr($str, $n), "\n"; echo substr($str, 0, -1), "\n"; echo substr($str, strpos($str, 'd'), $m), "\n"; echo substr($str, strpos($str, 'de'), $m), "\n"; ?></lang>
PicoLisp
<lang PicoLisp>(let Str (chop "This is a string")
(prinl (head 4 (nth Str 6))) # From 6 of 4 length (prinl (nth Str 6)) # From 6 up to the end (prinl (head -1 Str)) # Minus last character (prinl (head 8 (member "s" Str))) # From character "s" of length 8 (prinl # From "isa" of length 8 (head 8 (seek '((S) (pre? "is a" S)) Str) ) ) )</lang>
Output:
is a is a string This is a strin s is a s is a str
PL/I
<lang PL/I> s='abcdefghijk'; n=4; m=3; u=substr(s,n,m); u=substr(s,n); u=substr(s,1,length(s)-1); u=substr(s,index(s,'def',m); u=substr(s,index(s,'g',m); </lang>
PowerShell
Since .NET and PowerShell use zero-based indexing, all character indexes have to be reduced by one. <lang powershell># test string $s = "abcdefgh"
- test parameters
$n, $m, $c, $s2 = 2, 3, [char]'d', $s2 = 'cd'
- starting from n characters in and of m length
- n = 2, m = 3
$s.Substring($n-1, $m) # returns 'bcd'
- starting from n characters in, up to the end of the string
- n = 2
$s.Substring($n-1) # returns 'bcdefgh'
- whole string minus last character
$s.Substring(0, $s.Length - 1) # returns 'abcdefg'
- starting from a known character within the string and of m length
- c = 'd', m =3
$s.Substring($s.IndexOf($c), $m) # returns 'def'
- starting from a known substring within the string and of m length
- s2 = 'cd', m = 3
$s.Substring($s.IndexOf($s2), $m) # returns 'cde'</lang>
PureBasic
<lang PureBasic>If OpenConsole()
Define baseString.s, m, n baseString = "Thequickbrownfoxjumpsoverthelazydog." n = 12 m = 5 ;Display the substring starting from n characters in and of m length. PrintN(Mid(baseString, n, m)) ;Display the substring starting from n characters in, up to the end of the string. PrintN(Mid(baseString, n)) ;or PrintN(Right(baseString, Len(baseString) - n)) ;Display the substring whole string minus last character PrintN(Left(baseString, Len(baseString) - 1)) ;Display the substring starting from a known character within the string and of m length. PrintN(Mid(baseString, FindString(baseString, "b", 1), m))
;Display the substring starting from a known substring within the string and of m length. PrintN(Mid(baseString, FindString(baseString, "ju", 1), m))
Print(#CRLF$ + #CRLF$ + "Press ENTER to exit") Input() CloseConsole()
EndIf</lang> Sample output:
wnfox wnfoxjumpsoverthelazydog. Thequickbrownfoxjumpsoverthelazydog brown jumps
Python
Python uses zero-based indexing, so the n'th character is at index n-1.
<lang python>>>> s = 'abcdefgh' >>> n, m, char, chars = 2, 3, 'd', 'cd' >>> # starting from n=2 characters in and m=3 in length; >>> s[n-1:n+m-1] 'bcd' >>> # starting from n characters in, up to the end of the string; >>> s[n-1:] 'bcdefgh' >>> # whole string minus last character; >>> s[:-1] 'abcdefg' >>> # starting from a known character char="d" within the string and of m length; >>> indx = s.index(char) >>> s[indx:indx+m] 'def' >>> # starting from a known substring chars="cd" within the string and of m length. >>> indx = s.index(chars) >>> s[indx:indx+m] 'cde' >>></lang>
R
<lang R>s <- "abcdefgh" n <- 2; m <- 2; char <- 'd'; chars <- 'cd' substring(s, n, n + m) substring(s, n) substring(s, 1, nchar(s)-1) indx <- which(strsplit(s, )1%in%strsplit(char, )1) substring(s, indx, indx + m) indx <- which(strsplit(s, )1%in%strsplit(chars, )1)[1] substring(s, indx, indx + m)</lang>
REBOL
<lang REBOL>REBOL [ Title: "Retrieve Substring" Author: oofoe Date: 2009-12-06 URL: http://rosettacode.org/wiki/Retrieve_a_substring ]
s: "abcdefgh" n: 2 m: 3 char: #"d" chars: "cd"
- Note that REBOL uses base-1 indexing. Strings are series values,
- just like blocks or lists so I can use the same words to manipulate
- them. All these examples use the 'copy' function against the 's'
- string with a particular offset as needed.
- For the fragment "copy/part skip s n - 1 m", read from right to
- left. First you have 'm', which we ignore for now. Then evaluate
- 'n - 1' (makes 1), to adjust the offset. Then 'skip' jumps from the
- start of the string by that offset. 'copy' starts copying from the
- new start position and the '/part' refinement limits the copy by 'm'
- characters.
print ["Starting from n, length m:" copy/part skip s n - 1 m]
- It may be helpful to see the expression with optional parenthesis
print ["Starting from n, length m (parens):" (copy/part (skip s (n - 1)) m)]
- This example is much simpler, so hopefully it's easier to see how
- the string start is position for the copy
print ["Starting from n to end of string:" copy skip s n - 1]
print ["Whole string minus last character:" copy/part s (length? s) - 1]
print ["Starting from known character, length m:" copy/part find s char m]
print ["Starting from substring, length m:" copy/part find s chars m]</lang>
Output:
Script: "Retrieve Substring" (6-Dec-2009) Starting from n, length m: bcd Starting from n, length m (parens): bcd Starting from n to end of string: bcdefgh Whole string minus last character: abcdefg Starting from known character, length m: def Starting from substring, length m: cde
REXX
<lang rexx> s='abcdefghijk' n=4; m=3
u=substr(s,n,m)
u=substr(s,n)
u=substr(s,1,length(s)-1)
u=substr(s,pos('def',s),m)
u=substr(s,pos('g',s),m) </lang>
Ruby
<lang ruby>str = 'abcdefgh' n = 2 m = 3 puts str[n, m] puts str[n..-1] puts str[0..-2] puts str[str.index('d'), m] puts str[str.index('de'), m] puts str[/a.*d/]</lang>
Sather
<lang sather>class MAIN is
main is s ::= "hello world shortest program"; #OUT + s.substring(12, 5) + "\n"; #OUT + s.substring(6) + "\n"; #OUT + s.head( s.size - 1) + "\n"; #OUT + s.substring(s.search('w'), 5) + "\n"; #OUT + s.substring(s.search("ro"), 3) + "\n"; end;
end;</lang>
Scala
<lang scala>val str = "The good life is one inspired by love and guided by knowledge." val n = 21 val m = 16
println(str.slice(n, n+m)) println(str.slice(n, str.length)) println(str.slice(0, str.length-1)) println(str.slice(str.indexOf('l'), str.indexOf('l')+m)) println(str.slice(str.indexOf("good"), str.indexOf("good")+m))</lang>
Scheme
<lang scheme>(define s "Hello, world!") (define n 5) (define m (+ n 6))
(display (substring s n m)) (newline)
(display (substring s n)) (newline)
(display (substring s 0 (- (string-length s) 1))) (newline)
(display (substring s (string-index s #\o) m)) (newline)
(display (substring s (string-contains s "lo") m)) (newline)</lang>
Seed7
<lang seed7>$ include "seed7_05.s7i";
const proc: main is func
local const string: stri is "abcdefgh"; const integer: N is 2; const integer: M is 3; begin writeln(stri[N len M]); writeln(stri[N ..]); writeln(stri[.. pred(length(stri))]); writeln(stri[pos(stri, 'c') len M]); writeln(stri[pos(stri, "de") len M]); end func;</lang>
Sample output:
bcd bcdefgh abcdefg cde def
Slate
<lang slate>
- s := 'hello world shortest program'.
- n := 13.
- m := 4.
inform: (s copyFrom: n to: n + m). inform: (s copyFrom: n). inform: s allButLast. inform: (s copyFrom: (s indexOf: $w) to: (s indexOf: $w) + m). inform: (s copyFrom: (s indexOfSubSeq: 'ro') to: (s indexOfSubSeq: 'ro') + m). </lang>
Smalltalk
The distinction between searching a single character or a string into another string is rather blurred. In the following code, instead of using 'w' (a string) we could use $w (a character), but it makes no difference.
<lang smalltalk>|s| s := 'hello world shortest program'.
(s copyFrom: 13 to: (13+4)) displayNl. "4 is the length (5) - 1, since we need the index of the
last char we want, which is included"
(s copyFrom: 7) displayNl. (s allButLast) displayNl.
(s copyFrom: ((s indexOfRegex: 'w') first)
to: ( ((s indexOfRegex: 'w') first) + 4) ) displayNl.
(s copyFrom: ((s indexOfRegex: 'ro') first)
to: ( ((s indexOfRegex: 'ro') first) + 2) ) displayNl.</lang>
These last two examples in particular seem rather complex, so we can extend the string class.
<lang smalltalk>String extend [
copyFrom: index length: nChar [ ^ self copyFrom: index to: ( index + nChar - 1 ) ] copyFromRegex: regEx length: nChar [ |i| i := self indexOfRegex: regEx. ^ self copyFrom: (i first) length: nChar ]
].
"and show it simpler..."
(s copyFrom: 13 length: 5) displayNl. (s copyFromRegex: 'w' length: 5) displayNl. (s copyFromRegex: 'ro' length: 3) displayNl.</lang>
SNOBOL4
<lang snobol> string = "abcdefghijklmnopqrstuvwxyz" n = 12 m = 5 known_char = "q" known_str = "pq"
- starting from n characters in and of m length;
string len(n - 1) len(m) . output
- starting from n characters in, up to the end of the string;
string len(n - 1) rem . output
- whole string minus last character;
string rtab(1) . output
- starting from a known character within the string and of m length;
string break(known_char) len(m) . output
- starting from a known substring <= m within the string and of m length.
string (known_str len(m - size(known_str))) . output end</lang>
Output:
lmnop lmnopqrstuvwxyz abcdefghijklmnopqrstuvwxy qrstu pqrst
Tcl
<lang tcl>set str "abcdefgh" set n 2 set m 3
puts [string range $str $n [expr {$n+$m-1}]] puts [string range $str $n end] puts [string range $str 0 end-1]
- Because Tcl does substrings with a pair of indices, it is easier to express
- the last two parts of the task as a chained pair of [string range] operations.
- A maximally efficient solution would calculate the indices in full first.
puts [string range [string range $str [string first "d" $str] end] [expr {$m-1}]] puts [string range [string range $str [string first "de" $str] end] [expr {$m-1}]]
- From Tcl 8.5 onwards, these can be contracted somewhat.
puts [string range [string range $str [string first "d" $str] end] $m-1]
puts [string range [string range $str [string first "de" $str] end] $m-1]</lang>
Of course, if you were doing 'position-plus-length' a lot, it would be easier to add another subcommand to string
, like this:
<lang tcl># Define the substring operation, efficiently proc ::substring {string start length} {
string range $string $start [expr {$start + $length - 1}]
}
- Plumb it into the language
set ops [namespace ensemble configure string -map] dict set ops substr ::substring namespace ensemble configure string -map $ops
- Now show off by repeating the challenge!
set str "abcdefgh" set n 2 set m 3
puts [string substr $str $n $m] puts [string range $str $n end] puts [string range $str 0 end-1] puts [string substr $str [string first "d" $str] $m] puts [string substr $str [string first "de" $str] $m]</lang>
TUSCRIPT
<lang tuscript> $$ MODE TUSCRIPT string="abcdefgh", n=4,m=n+2 substring=EXTRACT (string,#n,#m)
PRINT substring
substring=Extract (string,#n,0)
PRINT substring
substring=EXTRACT (string,0,-1)
PRINT substring
n=SEARCH (string,":d:"),m=n+2 substring=EXTRACT (string,#n,#m)
PRINT substring
substring=EXTRACT (string,":{substring}:"|,0)
PRINT substring
</lang> Output:
de defgh abcdefg de fgh
UNIX Shell
This example shows how to cut(1) a substring from a string.
<lang bash>#!/bin/sh str=abcdefghijklmnopqrstuvwxyz n=12 m=5
printf %s "$str" | cut -c $n-`expr $n + $m - 1` printf %s "$str" | cut -c $n- printf '%s\n' "${str%?}" printf q%s "${str#*q}" | cut -c 1-$m printf pq%s "${str#*pq}" | cut -c 1-$m</lang>
Output:
$ sh substring.sh lmnop lmnopqrstuvwxyz abcdefghijklmnopqrstuvwxy qrstu pqrst
- cut -c counts characters from 1.
- cut(1) runs on each line of standard input, therefore the string must not contain a newline.
- One can use the old style `expr $n + $m - 1` or the new style $((n + m - 1)) to calculate the index.
- cut(1) prints the substring to standard output. To put the substring in a variable, use one of
- var=`printf %s "$str" | cut -c $n-\`expr $n + $m - 1\``
- var=$( printf %s "$str" | cut -c $n-$((n + m - 1)) )
Yorick
<lang yorick>str = "abcdefgh"; n = 2; m = 3;
// starting from n character in and of m length write, strpart(str, n:n+m-1); // starting from n character in, up to the end of the string write, strpart(str, n:); // whole string minus last character write, strpart(str, :-1); // starting from a known character within the string and of m length match = strfind("d", str); write, strpart(str, [match(1), match(1)+m]); // starting from a known substring within the string and of m length match = strfind("cd", str); write, strpart(str, [match(1), match(1)+m]);</lang>
- Programming Tasks
- Basic language learning
- String manipulation
- Basic Data Operations
- Ada
- Aikido
- ALGOL 68
- AutoHotkey
- BASIC
- ZX Spectrum Basic
- BBC BASIC
- C
- C++
- C sharp
- Clojure
- Common Lisp
- D
- Delphi
- E
- Euphoria
- Factor
- Forth
- Fortran
- Go
- Groovy
- Haskell
- HicEst
- Icon
- Unicon
- J
- Java
- JavaScript
- Liberty BASIC
- Logo
- Lua
- Mathematica
- MUMPS
- NewLISP
- Niue
- Objeck
- OCaml
- Oz
- Perl
- Perl 6
- PHP
- PicoLisp
- PL/I
- PowerShell
- PureBasic
- Python
- R
- REBOL
- REXX
- Ruby
- Sather
- Scala
- Scheme
- Seed7
- Slate
- Smalltalk
- SNOBOL4
- Tcl
- TUSCRIPT
- UNIX Shell
- Yorick
- PARI/GP/Omit