Binary strings: Difference between revisions

Content added Content deleted

Inline

Revision as of 05:36, 16 November 2013

Many languages have powerful and useful (binary safe) string manipulation functions, while others don't, making it harder for these languages to accomplish some tasks. This task is about creating functions to handle binary strings (strings made of arbitrary bytes, i.e. byte strings according to Wikipedia) for those languages that don't have built-in support for them. If your language of choice does have this built-in support, show a possible alternative implementation for the functions or abilities already provided by the language. In particular the functions you need to create are:

String creation and destruction (when needed and if there's no garbage collection or similar mechanism)
String assignment
String comparison
String cloning and copying
Check if a string is empty
Append a byte to a string
Extract a substring from a string
Replace every occurrence of a byte (or a string) in a string with another string
Join strings

Possible contexts of use: compression algorithms (like LZW compression), L-systems (manipulation of symbols), many more.

Ada

Ada has native support for single dimensioned arrays, which provide all specified operations. String is a case of array. The array of bytes is predefined in Ada in the package System.Storage_Elements (LRM 13.7.1). Storage_Element is substitute for byte.

<lang Ada>declare

  Data : Storage_Array (1..20); -- Data created

begin

  Data := (others => 0); -- Assign all zeros
  if Data = (1..10 => 0) then -- Compare with 10 zeros
     declare
        Copy : Storage_Array := Data; -- Copy Data
     begin
        if Data'Length = 0 then -- If empty
           ...
        end if;
     end;
  end if;
  ... Data & 1 ...         -- The result is Data with byte 1 appended
  ... Data & (1,2,3,4) ... -- The result is Data with bytes 1,2,3,4 appended
  ... Data (3..5) ...      -- The result the substring of Data from 3 to 5

end; -- Data destructed</lang> Storage_Array is "binary string" used for memory representation. For stream-oriented I/O communication Ada provides alternative "binary string" called Stream_Element_Array (LRM 13.13.1). When dealing with octets of bits, programmers are encouraged to provide a data type of their own to ensure that the byte is exactly 8 bits length. For example: <lang Ada>type Octet is mod 2**8; for Octet'Size use 8; type Octet_String is array (Positive range <>) of Octet;</lang> Alternatively: <lang Ada>with Interfaces; use Interfaces; ... type Octet is new Interfaces.Unsigned_8; type Octet_String is array (Positive range <>) of Octet;</lang> Note that all of these types will have all operations described above.

ALGOL 68

Translation of: Tcl

Works with: ALGOL 68 version Standard - no extensions to language used

Works with: ALGOL 68G version Any - tested with release mk15-0.8b.fc9.i386

<lang algol68># String creation # STRING a,b,c,d,e,f,g,h,i,j,l,r; a := "hello world"; print((a, new line));

String destruction (for garbage collection) #

b := (); BEGIN

 LOC STRING lb := "hello earth";  # allocate off the LOC stack  #
 HEAP STRING hb := "hello moon"; # allocate out of the HEAP space #
 ~

END; # local variable "lb" has LOC stack space recovered at END #

String assignment #

c := "a"+REPR 0+"b"; print (("string length c:", UPB c, new line));# ==> 3 #

String comparison #

l := "ab"; r := "CD";

BOOL result; FORMAT summary = $""""g""" is "b("","NOT ")"lexicographically "g" """g""""l$ ;

result := l < r OR l LT r; printf((summary, l, result, "less than", r)); result := l <= r OR l LE r # OR l ≤ r #; printf((summary, l, result, "less than or equal to", r)); result := l = r OR l EQ r; printf((summary, l, result, "equal to", r)); result := l /= r OR l NE r # OR l ≠ r #; printf((summary, l, result, "not equal to", r)); result := l >= r OR l GE r # OR l ≥ r #; printf((summary, l, result, "greater than or equal to", r)); result := l > r OR l GT r; printf((summary, l, result, "greater than", r));

String cloning and copying #

e := f;

Check if a string is empty #

IF g = "" THEN print(("g is empty", new line)) FI; IF UPB g = 0 THEN print(("g is empty", new line)) FI;

Append a byte to a string #

h +:= "A";

Append a string to a string #

h +:= "BCD"; h PLUSAB "EFG";

Prepend a string to a string - because STRING addition isn't communitive #

"789" +=: h; "456" PLUSTO h; print(("The result of prepends and appends: ", h, new line));

Extract a substring from a string #

i := h[2:3]; print(("Substring 2:3 of ",h," is ",i, new line));

Replace every occurrences of a byte (or a string) in a string with another string #

PROC replace = (STRING string, old, new, INT count)STRING: (

 INT pos;
 STRING tail := string, out;
 TO count WHILE string in string(old, pos, tail) DO
   out +:= tail[:pos-1]+new;
   tail := tail[pos+UPB old:]
 OD;
 out+tail

);

j := replace("hello world", "world", "planet", max int); print(("After replace string: ", j, new line));

INT offset = 7;

Replace a character at an offset in the string #

j[offset] := "P"; print(("After replace 7th character: ", j, new line));

Replace a substring at an offset in the string #

j[offset:offset+3] := "PlAN"; print(("After replace 7:10th characters: ", j, new line));

Insert a string before an offset in the string #

j := j[:offset-1]+"INSERTED "+j[offset:]; print(("Insert string before 7th character: ", j, new line));

Join strings #

a := "hel"; b := "lo w"; c := "orld"; d := a+b+c;

print(("a+b+c is ",d, new line));

Pack a string into the target CPU's word #

BYTES word := bytes pack(d);

Extract a CHAR from a CPU word #

print(("7th byte in CPU word is: ", offset ELEM word, new line))</lang> Output:

hello world
string length c:         +3
"ab" is NOT lexicographically less than "CD"
"ab" is NOT lexicographically less than or equal to "CD"
"ab" is NOT lexicographically equal to "CD"
"ab" is lexicographically not equal to "CD"
"ab" is lexicographically greater than or equal to "CD"
"ab" is lexicographically greater than "CD"
g is empty
g is empty
The result of prepends and appends: 456789ABCDEFG
Substring 2:3 of 456789ABCDEFG is 56
After replace string: hello planet
After replace 7th character: hello Planet
After replace 7:10th characters: hello PlANet
Insert string before 7th character: hello INSERTED PlANet
a+b+c is hello world
7th byte in CPU word is: w

AWK

<lang AWK>#!/usr/bin/awk -f

BEGIN { # string creation a="123\0 abc "; b="456\x09"; c="789"; printf("abc=<%s><%s><%s>\n",a,b,c);

# string comparison printf("(a==b) is %i\n",a==b)

# string copying A = a; B = b; C = c; printf("ABC=<%s><%s><%s>\n",A,B,C);

# check if string is empty if (length(a)==0) { printf("string a is empty\n"); } else { printf("string a is not empty\n"); }

# append a byte to a string a=a"\x40"; printf("abc=<%s><%s><%s>\n",a,b,c);

# substring e = substr(a,1,6); printf("substr(a,1,6)=<%s>\n",e);

# join strings d=a""b""c; printf("d=<%s>\n",d); }</lang>

Output:

abc=<123 abc ><456	><789>
(a==b) is 0
ABC=<123 abc ><456	><789>
string a is not empty
abc=<123 abc @><456	><789>
substr(a,1,6)=<123 a>
d=<123 abc @456	789>

BASIC

ZX Spectrum Basic

<lang basic>10 REM create two strings 20 LET s$ = "Hello" 30 LET t$ = "Bob" 40 REM choose any random character 50 LET c = INT(RND*256) 60 REM add the character to the string 70 LET s$ = s$ + CHR$(c) 80 REM check if the string is empty 90 IF s$ = "" THEN PRINT "String is empty" 100 REM compare two strings 110 IF s$ = t$ THEN PRINT "Strings are the same" 120 REM print characters 2 to 4 of a string (a substring) 130 PRINT s$(2 TO 4)</lang>

BBC BASIC

<lang bbcbasic> A$ = CHR$(0) + CHR$(1) + CHR$(254) + CHR$(255) : REM assignment

     B$ = A$                                        : REM clone / copy
     IF A$ = B$ THEN PRINT "Strings are equal"      : REM comparison
     IF A$ = "" THEN PRINT "String is empty"        : REM Check if empty
     A$ += CHR$(128)                                : REM Append a byte
     S$ = MID$(A$, S%, L%)                          : REM Extract a substring
     C$ = A$ + B$                                   : REM Join strings
     
     REM To replace every occurrence of a byte:
     old$ = CHR$(1)
     new$ = CHR$(5)
     REPEAT
       I% = INSTR(A$, old$)
       IF I% MID$(A$, I%, 1) = new$
     UNTIL I% = 0

</lang>

C

<lang c>#include <stdio.h>

include <stdlib.h>
include <string.h>

typedef struct str_t { size_t len, alloc; unsigned char *s; } bstr_t, *bstr;

define str_len(s) ((s)->len)

bstr str_new(size_t len) { bstr s = malloc(sizeof(bstr_t)); if (len < 8) len = 8; s->alloc = len; s->s = malloc(len); s->len = 0; return s; }

void str_extend(bstr s) { size_t ns = s->alloc * 2; if (ns - s->alloc > 1024) ns = s->alloc + 1024; s->s = realloc(s->s, ns); s->alloc = ns; }

void str_del(bstr s) { free(s->s), free(s); }

int str_cmp(bstr l, bstr r) { int res, len = l->len; if (len > r->len) len = r->len;

if ((res = memcmp(l->s, r->s, len))) return res; return l->len > r->len ? 1 : -1; }

bstr str_dup(bstr src) { bstr x = str_new(src->len); memcpy(x->s, src->s, src->len); x->len = src->len; return x; }

bstr str_from_chars(const char *t) { if (!t) return str_new(0); size_t l = strlen(t); bstr x = str_new(l + 1); x->len = l; memcpy(x->s, t, l); return x; }

void str_append(bstr s, unsigned char b) { if (s->len >= s->alloc) str_extend(s); s->s[s->len++] = b; }

bstr str_substr(bstr s, int from, int to) { if (!to) to = s->len; if (from < 0) from += s->len; if (from < 0 || from >= s->len) return 0; if (to < from) to = from + 1; bstr x = str_new(to - from); x->len = to - from; memcpy(x->s, s->s + from, x->len); return x; }

bstr str_cat(bstr s, bstr s2) { while (s->alloc < s->len + s2->len) str_extend(s); memcpy(s->s + s->len, s2->s, s2->len); s->len += s2->len; return s; }

void str_swap(bstr a, bstr b) { size_t tz; unsigned char *ts; tz = a->alloc; a->alloc = b->alloc; b->alloc = tz; tz = a->len; a->len = b->len; b->len = tz; ts = a->s; a->s = b->s; b->s = ts; }

bstr str_subst(bstr tgt, bstr pat, bstr repl) { bstr tmp = str_new(0); int i; for (i = 0; i + pat->len <= tgt->len;) { if (memcmp(tgt->s + i, pat->s, pat->len)) { str_append(tmp, tgt->s[i]); i++; } else { str_cat(tmp, repl); i += pat->len; if (!pat->len) str_append(tmp, tgt->s[i++]); } } while (i < tgt->len) str_append(tmp, tgt->s[i++]); str_swap(tmp, tgt); str_del(tmp); return tgt; }

void str_set(bstr dest, bstr src) { while (dest->len < src->len) str_extend(dest); memcpy(dest->s, src->s, src->len); dest->len = src->len; }

int main() { bstr s = str_from_chars("aaaaHaaaaaFaaaaHa"); bstr s2 = str_from_chars("___."); bstr s3 = str_from_chars("");

str_subst(s, s3, s2); printf("%.*s\n", s->len, s->s);

str_del(s); str_del(s2); str_del(s3);

return 0; }</lang>

C#

Works with: C sharp version 3.0

<lang csharp>using System;

class Program {

   static void Main()
   {
       //string creation
       var x = "hello world";

       //# mark string for garbage collection
       x = null;

       //# string assignment with a null byte
       x = "ab\0";
       Console.WriteLine(x);
       Console.WriteLine(x.Length); // 3

       //# string comparison
       if (x == "hello")
           Console.WriteLine("equal");
       else
           Console.WriteLine("not equal");

       if (x.CompareTo("bc") == -1)
           Console.WriteLine("x is lexicographically less than 'bc'");

       //# string cloning 
       var c = new char[3];
       x.CopyTo(0, c, 0, 3);
       object objecty = new string(c);
       var y = new string(c);

       Console.WriteLine(x == y);      //same as string.equals
       Console.WriteLine(x.Equals(y)); //it overrides object.Equals

       Console.WriteLine(x == objecty); //uses object.Equals, return false

       //# check if empty
       var empty = "";
       string nullString = null;
       var whitespace = "   ";
       if (nullString == null && empty == string.Empty && 
           string.IsNullOrEmpty(nullString) && string.IsNullOrEmpty(empty) &&
           string.IsNullOrWhiteSpace(nullString) && string.IsNullOrWhiteSpace(empty) &&
           string.IsNullOrWhiteSpace(whitespace))
           Console.WriteLine("Strings are null, empty or whitespace");

       //# append a byte
       x = "helloworld";
       x += (char)83;
       Console.WriteLine(x);

       //# substring
       var slice = x.Substring(5, 5);
       Console.WriteLine(slice);

       //# replace bytes
       var greeting = x.Replace("worldS", "");
       Console.WriteLine(greeting);

       //# join strings
       var join = greeting + " " + slice;
       Console.WriteLine(join);
   }

}</lang>

Common Lisp

String creation (garbage collection will handle its destruction) using the string as an atom and casting a character list to a string <lang lisp> "string" (coerce (#\s #\t #\r #\i #\n #\g) 'string) </lang>

String assignment <lang lisp> (defvar *string* "string") </lang>

comparing two string <lang lisp> (equal "string" "string") </lang>

copy a string <lang lisp> (copy-seq "string") </lang>

<lang lisp> (defun string-empty-p (string)

 (cond
   ((= 0 (length string))t)
   (nil)))

</lang>

<lang lisp> (concatenate 'string "string" "b") </lang>

<lang lisp> (subseq "string" 1 6) "ring" </lang>

string replacement isn't covered by the ansi standard probably best to use (replace-all) or cl-ppcre

joining strings works in the same way as appending bytes

Component Pascal

BlackBox Component Builder <lang oberon2> MODULE NpctBinaryString; IMPORT StdLog,Strings;

PROCEDURE Do*; VAR str: ARRAY 256 OF CHAR; pStr,pAux: POINTER TO ARRAY OF CHAR; b: BYTE; pIni: INTEGER; BEGIN (* String creation, on heap *) NEW(pStr,256); (* Garbage collectable *) NEW(pAux,256);

(* String assingment *) pStr^ := "This is a string on a heap"; pAux^ := "This is a string on a heap"; str := "This is other string";

(* String comparision *) StdLog.String("pStr = str:> ");StdLog.Bool(pStr$ = str$);StdLog.Ln; StdLog.String("pStr = pAux:> ");StdLog.Bool(pStr$ = pAux$);StdLog.Ln;

(* String cloning and copying *) NEW(pAux,LEN(pStr$) + 1);pAux^ := pStr$;

(* Check if a string is empty *) (* version 1 *) pAux^ := ""; StdLog.String("is empty pAux?(1):> ");StdLog.Bool(pAux$ = "");StdLog.Ln; (* version 2 *) pAux[0] := 0X; StdLog.String("is empty pAux?(2):> ");StdLog.Bool(pAux$ = "");StdLog.Ln; (* version 3 *) pAux[0] := 0X; StdLog.String("is empty pAux?(3):> ");StdLog.Bool(pAux[0] = 0X);StdLog.Ln; (* version 4 *) pAux^ := ""; StdLog.String("is empty pAux?(4):> ");StdLog.Bool(pAux[0] = 0X);StdLog.Ln;

(* Append a byte to a string *) NEW(pAux,256);pAux^ := "BBBBBBBBBBBBBBBBBBBBB"; b := 65;pAux[LEN(pAux$)] := CHR(b); StdLog.String("pAux:> ");StdLog.String(pAux);StdLog.Ln;

(* Extract a substring from a string *) Strings.Extract(pStr,0,16,pAux); StdLog.String("pAux:> ");StdLog.String(pAux);StdLog.Ln;

(* Replace a every ocurrence of a string with another string *) pAux^ := "a"; (* Pattern *) Strings.Find(pStr,pAux,0,pIni); WHILE pIni > 0 DO Strings.Replace(pStr,pIni,LEN(pAux$),"one"); Strings.Find(pStr,pAux,pIni + 1,pIni); END; StdLog.String("pStr:> ");StdLog.String(pStr);StdLog.Ln;

(* Join strings *) pStr^ := "First string";pAux^ := "Second String"; str := pStr$ + "." + pAux$; StdLog.String("pStr + '.' + pAux:>");StdLog.String(str);StdLog.Ln END Do; END NpctBinaryString. </lang> Execute: ^Q NpctBinaryString.Do
Output:

pStr = str:>  $FALSE
pStr = pAux:>  $TRUE
is empty pAux?(1):>  $TRUE
is empty pAux?(2):>  $TRUE
is empty pAux?(3):>  $TRUE
is empty pAux?(4):>  $TRUE
pAux:> BBBBBBBBBBBBBBBBBBBBBA
pAux:> This is a string
pStr:> This is one string on one heonep
pStr + '.' + pAux:>First string.Second String

D

<lang d>import std.array: empty; import std.string: replace;

void main() {

   // String creation (destruction is usually handled by
   // the garbage collector)
   ubyte[] str1;

   // String assignments
   str1 = cast(ubyte[])"blah";
   // hex string, same as "\x00\xFB\xCD\x32\xFD\x0A"
   // whitespace and newlines are ignored
   str1 = cast(ubyte[])x"00 FBCD 32FD 0A";

   // String comparison
   ubyte[] str2;
   if (str1 == str2) {} // strings equal

   // String cloning and copying
   str2 = str1.dup; // copy entire string or array

   // Check if a string is empty
   if (str1.empty) {} // string empty
   if (str1.length) {} // string not empty
   if (!str1.length) {} // string empty

   // Append a ubyte to a string
   str1 ~= x"0A";
   str1 ~= 'a';

   // Extract a substring from a string
   str1 = cast(ubyte[])"blork";
   // this takes off the first and last bytes and
   // assigns them to the new ubyte string
   // This is just a light slice, no string data copied
   ubyte[] substr = str1[1 .. $-1];

   // Replace every occurrence of a ubyte (or a string)
   // in a string with another string
   str1 = cast(ubyte[])"blah";
   replace(cast(char[])str1, "la", "al");

   // Join strings
   ubyte[] str3 = str1 ~ str2;

}</lang>

E

(Since the task is not a specific program, the code here consists of example REPL sessions, not a whole program.)

In E, binary data is represented as ELists (implemented as arrays or ropes) of integers; a String is strictly a character string. ELists come in Flex (mutable) and Const (immutable) varieties.

To work with binary strings we must first have a byte type; this is a place where E shows its Java roots (to be fixed).

value: int8</lang>

There are several ways to create a FlexList; perhaps the simplest is: <lang e>? def bstr := [].diverge(int8)
1. value: [].diverge()
? def bstr1 := [1,2,3].diverge(int8)
1. value: [1, 2, 3].diverge()
? def bstr2 := [-0x7F,0x2,0x3].diverge(int8)
1. value: [-127, 2, 3].diverge()</lang>
As E is a memory-safe garbage-collected language there is no explicit destruction. It is good practice to work with immutable ConstLists when reasonable, however; especially when passing strings around.
There is no specific assignment between FlexLists; a reference may be passed in the usual manner, or the contents of one could be copied to another as shown below.
There is no comparison operation between FlexLists (since it would not be a stable ordering ), but there is between ConstLists. <lang e>? bstr1.snapshot() < bstr2.snapshot()
1. value: false</lang>
To make an independent copy of a FlexList, simply .diverge() it again.
<lang e>? bstr1.size().isZero()
1. value: false
? bstr.size().isZero()
1. value: true</lang>
Appending a single element to a FlexList is done by .push(x): <lang e>? bstr.push(0) ? bstr
1. value: [0].diverge()</lang>
Substrings, or runs, are always immutable and specified as start-end indexes (as opposed to first-last or start-count). Or, one can copy an arbitrary portion of one list into another using replace(target range, source list, source range). <lang e>? bstr1(1, 2)
1. value: [2]
? bstr.replace(0, bstr.size(), bstr2, 1, 3) ? bstr
1. value: [2, 3].diverge()</lang>
Replacing must be written as an explicit loop; there is no built-in operation (though there is for character strings). <lang e>? for i => byte ? (byte == 2) in bstr2 { bstr2[i] := -1 } ? bstr2
1. value: [-127, -1, 3].diverge()</lang>
Two lists can be concatenated into a ConstList by +: bstr1 + bstr2. append appends on the end of a FlexList, and replace can be used to insert at the beginning or anywhere inside. <lang e>? bstr1.append(bstr2) ? bstr1
1. value: [1, 2, 3, -127, 2, 3].diverge()</lang>

Erlang

<lang erlang>-module(binary_string). -compile([export_all]).

%% Erlang has very easy handling of binary strings. Using %% binary/bitstring syntax the various task features will be %% demonstrated.

%% Erlang has GC so destruction is not shown. test() ->

   Binary = <<0,1,1,2,3,5,8,13>>, % binaries can be created directly                                                                                                                                                                                                                                                         
   io:format("Creation: ~p~n",[Binary]),
   Copy = binary:copy(Binary), % They can also be copied                                                                                                                                                                                                                                                                     
   io:format("Copy: ~p~n",[Copy]),
   Compared = Binary =:= Copy, % They can be compared directly                                                                                                                                                                                                                                                               
   io:format("Equal: ~p = ~p ? ~p~n",[Binary,Copy,Compared]),
   Empty1 = size(Binary) =:= 0, % The empty binary would have size 0                                                                                                                                                                                                                                                         
   io:format("Empty: ~p ? ~p~n",[Binary,Empty1]),
   Empty2 = size(<<>>) =:= 0, % The empty binary would have size 0                                                                                                                                                                                                                                                           
   io:format("Empty: ~p ? ~p~n",[<<>>,Empty2]),
   Substring = binary:part(Binary,3,3),
   io:format("Substring: ~p [~b..~b] => ~p~n",[Binary,3,5,Substring]),
   Replace = binary:replace(Binary,[<<1>>],<<42>>,[global]),
   io:format("Replacement: ~p~n",[Replace]),
   Append = <<Binary/binary,21>>,
   io:format("Append: ~p~n",[Append]),
   Join = <<Binary/binary,<<21,34,55>>/binary>>,
   io:format("Join: ~p~n",[Join]).

%% Since the task also asks that we show how these can be reproduced %% rather than just using BIFs, the following are some example %% recursive functions reimplementing some of the above.

%% Empty string is_empty(<<>>) ->

   true;

is_empty(_) ->

   false.

%% Replacement: replace(Binary,Value,Replacement) ->

   replace(Binary,Value,Replacement,<<>>).

replace(<<>>,_,_,Acc) ->

   Acc;

replace(<<Value,Rest/binary>>,Value,Replacement,Acc) ->

   replace(Rest,Value,Replacement,<< Acc/binary, Replacement >>);

replace(<<Keep,Rest/binary>>,Value,Replacement,Acc) ->

   replace(Rest,Value,Replacement,<< Acc/binary, Keep >>).</lang>

Output:

<lang erlang>215> binary_string:test(). Creation: <<0,1,1,2,3,5,8,13>> Copy: <<0,1,1,2,3,5,8,13>> Equal: <<0,1,1,2,3,5,8,13>> = <<0,1,1,2,3,5,8,13>> ? true Empty: <<0,1,1,2,3,5,8,13>> ? false Empty: <<>> ? true Substring: <<0,1,1,2,3,5,8,13>> [3..5] => <<2,3,5>> Replacement: <<0,42,42,2,3,5,8,13>> Append: <<0,1,1,2,3,5,8,13,21>> Join: <<0,1,1,2,3,5,8,13,21,34,55>></lang>

Factor

Factor has a byte-array type which works exactly like other arrays, except only bytes can be stored in it. Comparisons on byte-arrays (like comparisons on arrays) are lexicographic.

To convert a string to a byte-array: <lang factor>"Hello, byte-array!" utf8 encode .</lang>

B{
    72 101 108 108 111 44 32 98 121 116 101 45 97 114 114 97 121 33
}

Reverse: <lang factor>B{ 147 250 150 123 } shift-jis decode .</lang>

"日本"

Forth

Counted strings are often used to store a string in memory. <lang forth>create cstr1 ," A sample string" create cstr2 ," another string" create buf 256 allot

cstr1 count buf place s" and " buf +place cstr2 count buf +place buf count type \ A sample string and another string</lang>

All strings are binary strings, represented with a base address and a byte count. Most string functions operate on these address-length pairs.

Built-in string/memory functions:

COMPARE compares two strings
MOVE copies a string to another location
CMOVE and CMOVE> can copy chunks of bytes around within a string, either down or up.

Substrings are represented by a different pointer and count within a string.

Other functions may be defined. <lang forth>: empty? ( str len -- ? ) nip 0= ;

+c ( c str len -- ) + c! ;

replace-bytes ( from to str len -- )

 bounds do
   over i c@ = if dup i c! then
 loop 2drop ;

</lang>

Go

<lang go>package main

import (

   "bytes"
   "fmt"

)

// Strings in Go allow arbitrary bytes. They are implemented basically as // immutable byte slices and syntactic sugar. This program shows functions // required by the task on byte slices, thus it mostly highlights what // happens behind the syntactic sugar. The program does not attempt to // reproduce the immutability property of strings, as that does not seem // to be the intent of the task.

func main() {

   // Task point: String creation and destruction.
   // Strings are most often constructed from literals as in s := "binary"
   // With byte slices,
   b := []byte{'b', 'i', 'n', 'a', 'r', 'y'}
   fmt.Println(b) // output shows numeric form of bytes.
   // Go is garbage collected.  There are no destruction operations.

   // Task point: String assignment.
   // t = s assigns strings.  Since strings are immutable, it is irrelevant
   // whether the string is copied or not.
   // With byte slices, the same works,
   var c []byte
   c = b
   fmt.Println(c)

   // Task point: String comparison.
   // operators <, <=, ==, >=, and > work directly on strings comparing them
   // by lexicographic order.
   // With byte slices, there are standard library functions, bytes.Equal
   // and bytes.Compare.
   fmt.Println(bytes.Equal(b, c)) // prints true

   // Task point: String cloning and copying.
   // The immutable property of Go strings makes cloning and copying
   // meaningless for strings.
   // With byte slices though, it is relevant.  The assignment c = b shown
   // above does a reference copy, leaving both c and b based on the same
   // underlying data.  To clone or copy the underlying data,
   d := make([]byte, len(b)) // allocate new space
   copy(d, b)                // copy the data
   // The data can be manipulated independently now:
   d[1] = 'a'
   d[4] = 'n'
   fmt.Println(string(b)) // convert to string for readable output
   fmt.Println(string(d))

   // Task point: Check if a string is empty.
   // Most typical for strings is s == "", but len(s) == 0 works too.
   // For byte slices, "" does not work, len(b) == 0 is correct.
   fmt.Println(len(b) == 0)

   // Task point: Append a byte to a string.
   // The language does not provide a way to do this directly with strings.
   // Instead, the byte must be converted to a one-byte string first, as in,
   // s += string('z')
   // For byte slices, the language provides the append function,
   z := append(b, 'z')
   fmt.Printf("%s\n", z) // another way to get readable output

   // Task point: Extract a substring from a string.
   // Slicing syntax is the for both strings and slices.
   sub := b[1:3]
   fmt.Println(string(sub))

   // Task point: Replace every occurrence of a byte (or a string)
   // in a string with another string.
   // Go supports this with similar library functions for strings and
   // byte slices.  Strings:  t = strings.Replace(s, "n", "m", -1).
   // The byte slice equivalent returns a modified copy, leaving the
   // original byte slice untouched,
   f := bytes.Replace(d, []byte{'n'}, []byte{'m'}, -1)
   fmt.Printf("%s -> %s\n", d, f)

   // Task point: Join strings.
   // Using slicing syntax again, with strings,
   // rem := s[:1] + s[3:] leaves rem == "bary".
   // Only the concatenation of the parts is different with byte slices,
   rem := append(append([]byte{}, b[:1]...), b[3:]...)
   fmt.Println(string(rem))

}</lang>

Output:

[98 105 110 97 114 121]
[98 105 110 97 114 121]
true
binary
banany
false
binaryz
in
banany -> bamamy
bary

Haskell

Note that any of the following functions can be assigned to 'variables' in a working program or could just as easily be written as one-off expressions. They are given here as they are to elucidate the workings of Haskell's type system. Hopefully the type declarations will help beginners understand what's going on. Also note that there are likely more concise ways to express many of the below functions. However, I have opted for clarity here as Haskell can be somewhat intimidating to the (currently) non- functional programmer. <lang haskell>import Text.Regex {- The above import is needed only for the last function. It is used there purely for readability and conciseness -}

{- Assigning a string to a 'variable'. We're being explicit about it just for show. Haskell would be able to figure out the type of "world" -} string = "world" :: String</lang>

<lang haskell>{- Comparing two given strings and returning a boolean result using a simple conditional -} strCompare :: String -> String -> Bool strCompare x y =

   if x == y
       then True
       else False</lang>

<lang haskell>{- As strings are equivalent to lists of characters in Haskell, test and see if the given string is an empty list -} strIsEmpty :: String -> Bool strIsEmpty x =

   if x == []
       then True
       else False</lang>

<lang haskell>{- This is the most obvious way to append strings, using the built-in (++) concatenation operator Note the same would work to join any two strings (as 'variables' or as typed strings -} strAppend :: String -> String -> String strAppend x y = x ++ y</lang>

<lang haskell>{- Take the specified number of characters from the given string -} strExtract :: Int -> String -> String strExtract x s = take x s</lang>

<lang haskell>{- Take a certain substring, specified by two integers, from the given string -} strPull :: Int -> Int -> String -> String strPull x y s = take (y-x+1) (drop x s)</lang>

<lang haskell>{- Much thanks to brool.com for this nice and elegant solution. Using an imported standard library (Text.Regex), replace a given substring with another -} strReplace :: String -> String -> String -> String strReplace old new orig = subRegex (mkRegex old) orig new</lang>

Icon and Unicon

Icon and Unicon strings strings are variable length and unrestricted. See Logical Operations for ways to manipulate strings at the bit level. <lang Icon>s := "\x00" # strings can contain any value, even nulls s := "abc" # create a string s := &null # destroy a string (well sbsnfon it for garbage collection) v := s # assignment s == t # expression s equals t s << t # expression s less than t s <<= t # expression s less than or equal to t v := s # strings are immutable, no copying or cloning are needed s == "" # equal empty string

s = 0 # string length is zero

s ||:= "a" # append a byte "a" to s via concatenation t := s[2+:3] # t is set to position 2 for 3 characters s := replace(s,s2,s3) # IPL replace function s := s1 || s2 # concatenation (joining) of strings</lang>

The

Library: Icon Programming Library

provides the procedure replace in strings

<lang Icon>procedure replace(s1, s2, s3) #: string replacement

  local result, i

  result := ""
  i := *s2
  if i = 0 then fail			# would loop on empty string

  s1 ? {
     while result ||:= tab(find(s2)) do {
        result ||:= s3
        move(i)
        }
     return result || tab(0)
     }

end</lang>

J

J's literal data type supports arbitrary binary data (strings are binary strings by default). J's semantics are pass by value (with garbage collection) with a minor exception (mapped files).

Example binary string creation

<lang j> name=: </lang>

Example binary string deletion (removing all references to a string allows it to be deleted, in this case we give the name a numeric value to replace its prior string value):

Example binary string assignment

<lang j> name=: 'value'</lang>

Example binary string comparison

Example binary string cloning and copying

<lang j> name1= 'example'

  name2= name1</lang>

Though, technically, its the internal reference which is cloned, not the internal representation of the value. But operations which modify strings are copy on write, so this distinction is not visible without going outside the language.

Example check if a binary string is empty

<lang j> 0=#string</lang>

Example apppend a byte to a binary string

<lang j> string=: 'example'

  byte=: DEL
  string=: string,byte</lang>

Extract a substring from a binary string

<lang j> 3{.5}.'The quick brown fox runs...'</lang>

Replace every occurrence of a byte (or a string) in a string with another string

<lang j>require 'strings' 'The quick brown fox runs...' rplc ' ';' !!! '</lang>

Join strings

<lang j> 'string1','string2'</lang>

Note also: given an integer n, the corresponding byte value may be obtained by indexing into a. which is the ordered array of all bytes.: <lang j> n{a.</lang>

Thus, the binary string containing bytes with numeric values 1 0 255 can be obtained this way: <lang j>1 0 255{a.</lang>

JavaScript

JavaScript has a native support for binary strings, all strings are "binary" and they're not zero terminated however to be more exact you can't really see the bytes on the string, strings go from Unicode 0 to Unicode FFFF <lang JavaScript>//String creation var str=; //or str2=new String();

//String assignment str="Hello"; //or str2=', Hey there'; //can use " or ' str=str+str2;//concantenates //string deletion delete str2;//this will return true or false, true when it has been successfully deleted, it shouldn't/won't work when the variable has been declared with the keyword 'var', you don't have to initialize variables with the var keyword in JavaScript, but when you do, you cannot 'delete' them. However JavaScript garbage collects, so when the function returns, the variable declared on the function is erased.

//String comparison str!=="Hello"; //!== not equal-> returns true there's also !=== str=="Hello, Hey there"; //returns true //compares 'byte' by 'byte' "Character Z">"Character A"; //returns true, when using > or < operators it converts the string to an array and evalues the first index that is higher than another. (using unicode values) in this case 'Z' char code is 90 (decimal) and 'A' char code is 65, therefore making one string "larger" than the other.

//String cloning and copying string=str;//Strings are immutable therefore when you assign a string to a variable another one is created. So for two variables to have the 'same' string you have to add that string to an object, and get/set the string from that object

//Check if a string is empty Boolean(); //returns false [0]; //returns undefined .charCodeAt(); //returns NaN ==0; //returns true ===0; //returns false ==false; //returns true

//Append byte to String str+="\x40";//using + operator before the equal sign on a string makes it equal to str=str+"\x40"

//Extract a substring from a string //str is "Hello, Hey there@" str.substr(3); //returns "lo, Hey there@" str.substr(-5); //returns "here@" negative values just go to the end str.substr(7,9); //returns "Hey there" index of 7 + 9 characters after the 7 str.substring(3); //same as substr str.substring(-5); //negative values don't work on substring same as substr(0) str.substring(7,9); //returns "He" that is, whatever is between index 7 and index 9, same as substring(9,7)

//Replace every occurence of x byte with another string str3="url,url,url,url,url"; str3.replace(/,/g,'\n') //Regex ,returns the same string with the , replaced by \n str4=str3.replace(/./g,function(index){//it also supports callback functions, the function will be called when a match has been found.. return index==','?'\n':index;//returns replacement })

//Join Strings [str," ",str3].join(" "/*this is the character that will glue the strings*/)//we can join an array of strings str3+str4; str.concat('\n',str4); //concantenate them</lang>

Liberty BASIC

Liberty BASIC's strings are native byte strings. They can contain any byte sequence. They are not zero-terminated. They can be huge in size. <lang lb> 'string creation s$ = "Hello, world!"

'string destruction - not needed because of garbage collection s$ = ""

'string comparison s$ = "Hello, world!" If s$ = "Hello, world!" then print "Equal Strings"

'string copying a$ = s$

'check If empty If s$ = "" then print "Empty String"

'append a byte s$ = s$ + Chr$(33)

'extract a substring a$ = Mid$(s$, 1, 5)

'replace bytes a$ = "Hello, world!" for i = 1 to len(a$)

   if mid$(a$,i,1)="l" then
       a$=left$(a$,i-1)+"L"+mid$(a$,i+1)
   end if

next print a$

'join strings s$ = "Good" + "bye" + " for now."

</lang>

Lua

<lang lua>foo = 'foo' -- Ducktyping foo to be string 'foo' bar = 'bar' assert (foo == "foo") -- Comparing string var to string literal assert (foo ~= bar) str = foo -- Copy foo contents to str if #str == 0 then -- # operator returns string length

   print 'str is empty'

end str=str..string.char(50)-- Char concatenated with .. operator substr = str:sub(1,3) -- Extract substring from index 1 to 3, inclusively

str = "string string string string" -- This function will replace all occurances of 'replaced' in a string with 'replacement' function replaceAll(str,replaced,replacement)

   local function sub (a,b)
       if b > a then
           return str:sub(a,b)
       end
       return nil
   end
   a,b = str:find(replaced)
   while a do
       str = str:sub(1,a-1) .. replacement .. str:sub(b+1,#str)
       a,b = str:find(replaced)
   end
   return str

end str = replaceAll (str, 'ing', 'ong') print (str)

str = foo .. bar -- Strings concatenate with .. operator</lang>

Mathematica

<lang Mathematica>(* String creation and destruction *) BinaryString = {}; BinaryString = . ; (* String assignment *) BinaryString1 = {12,56,82,65} , BinaryString2 = {83,12,56,65} -> {12,56,82,65} -> {83,12,56,65} (* String comparison *) BinaryString1 === BinaryString2 -> False (* String cloning and copying *) BinaryString3 = BinaryString1 -> {12,56,82,65} (* Check if a string is empty *) BinaryString3 === {} -> False (* Append a byte to a string *) AppendTo[BinaryString3, 22] -> {12,56,82,65,22} (* Extract a substring from a string *) Take[BinaryString3, {2, 5}] -> {56,82,65,22} (* Replace every occurrence of a byte (or a string) in a string with another string *) BinaryString3 /. {22 -> Sequence[33, 44]} -> {12,56,82,65,33,44} (* Join strings *) BinaryString4 = Join[BinaryString1 , BinaryString2] -> {12,56,82,65,83,12,56,65}</lang>

MATLAB / Octave

<lang Matlab> a=['123',0,' abc ']; b=['456',9]; c='789'; disp(a); disp(b); disp(c);

% string comparison printf('(a==b) is %i\n',strcmp(a,b));

% string copying A = a; B = b; C = c; disp(A); disp(B); disp(C);

% check if string is empty if (length(a)==0) printf('\nstring a is empty\n'); else printf('\nstring a is not empty\n'); end

% append a byte to a string a=[a,64];

       disp(a);

% substring e = a(1:6);

       disp(e);

% join strings d=[a,b,c]; disp(d); </lang> Output:

123 abc 
456	
789
(a==b) is 0
123 abc 
456	
789

string a is not empty
123 abc @
123 a
123 abc @456	789

OCaml

String creation and destruction

String.create n returns a fresh string of length n, which initially contains arbitrary characters: <lang ocaml># String.create 10 ;; - : string = "\000\023\000\000\001\000\000\000\000\000"</lang>

No destruction, OCaml features a garbage collector.

OCaml strings can contain any of the 256 possible bytes included the null character '\000'.

String assignment

<lang ocaml># let str = "some text" ;; val str : string = "some text"

(* modifying a character, OCaml strings are mutable *)

str.[0] <- 'S' ;;

- : unit = ()</lang>

String comparison

<lang ocaml># str = "Some text" ;; - : bool = true

"Hello" > "Ciao" ;;

- : bool = true</lang>

String cloning and copying

<lang ocaml># String.copy str ;; - : string = "Some text"</lang>

Check if a string is empty

<lang ocaml># let string_is_empty s = (s = "") ;; val string_is_empty : string -> bool = <fun>

string_is_empty str ;;

- : bool = false

string_is_empty "" ;;

- : bool = true</lang>

Append a byte to a string

it is not possible to append a byte to a string, in the sens modifying the length of a given string, but we can use the concatenation operator to append a byte and return the result as a new string

<lang ocaml># str ^ "!" ;; - : string = "Some text!"</lang>

But OCaml has a module named Buffer for string buffers. This module implements string buffers that automatically expand as necessary. It provides accumulative concatenation of strings in quasi-linear time (instead of quadratic time when strings are concatenated pairwise).

<lang ocaml>Buffer.add_char str c</lang>

Extract a substring from a string

<lang ocaml># String.sub str 5 4 ;; - : string = "text"</lang>

Replace every occurrence of a byte (or a string) in a string with another string

using the Str module <lang ocaml># #load "str.cma";;

let replace str occ by =

   Str.global_replace (Str.regexp_string occ) by str
 ;;

val replace : string -> string -> string -> string = <fun>

replace "The white dog let out a single, loud bark." "white" "black" ;;

- : string = "The black dog let out a single, loud bark."</lang>

Join strings

<lang ocaml># "Now just remind me" ^ " how the horse moves again?" ;; - : string = "Now just remind me how the horse moves again?"</lang>

Pascal

Pascal's original strings were limited to 255 characters. Most implementations had the string length in byte 0. Extension exist for longer strings as well as C compatible string terminated by null. See Examples below <lang pascal>const

 greeting = 'Hello';

var

 s1: string;
 s2: ansistring;
 s3: pchar;

begin { Assignments }

 s1 := 'Mister Presiden';  (* typo is on purpose. See below! *)

{ Comparisons }

 if s2 > 'a' then
   writeln ('The first letter of ', s1, ' is later than a');

{ Cloning and copying }

 s2 := greeting;

{ Check if a string is empty }

 if s1 =  then
   writeln('This string is empty!');

{ Append a byte to a string }

 s1 := s1 + 't';

{ Extract a substring from a string }

 s3 := copy(S2, 2, 4);  (* s3 receives ello *)

{ String replacement } (* the unit StrUtils of the FreePascal rtl has AnsiReplaceStr *)

 s1 := AnsiReplaceStr('Thees ees a text weeth typos', 'ee', 'i');

{ Join strings}

 s3 := greeting + ' and how are you, ' + s1 + '?';

end.</lang>

Perl 6

<lang perl6># Perl 6 is perfectly fine with NUL *characters* in strings:

my Str $s = 'nema' ~ 0.chr ~ 'problema!'; say $s;

However, Perl 6 makes a clear distinction between strings
(i.e. sequences of characters), like your name, or …

my Str $str = "My God, it's full of chars!";

… and sequences of bytes (called Bufs), for example a PNG image, or …

my Buf $buf = Buf.new(255, 0, 1, 2, 3); say $buf;

Strs can be encoded into Bufs …

my Buf $this = 'foo'.encode('ascii');

… and Bufs can be decoded into Strs …

my Str $that = $this.decode('ascii');

So it's all there. Nevertheless, let's solve this task explicitly
in order to see some nice language features:

We define a class …

class ByteStr {

   # … that keeps an array of bytes, and we delegate some
   # straight-forward stuff directly to this attribute:
   # (Note: "has byte @.bytes" would be nicer, but that is
   # not yet implemented in rakudo or niecza.)
   has Int @.bytes handles(< Bool elems gist push >);

   # A handful of methods …
   method clone() {
       self.new(:@.bytes);
   }

   method substr(Int $pos, Int $length) {
       self.new(:bytes(@.bytes[$pos .. $pos + $length - 1]));
   }

   method replace(*@substitutions) {
       my %h = @substitutions;
       @.bytes.=map: { %h{$_} // $_ }
   }

}

A couple of operators for our new type:

multi infix:<cmp>(ByteStr $x, ByteStr $y) { $x.bytes cmp $y.bytes } multi infix:<~> (ByteStr $x, ByteStr $y) { ByteStr.new(:bytes($x.bytes, $y.bytes)) }

create some byte strings (destruction not needed due to garbage collection)

my ByteStr $b0 = ByteStr.new; my ByteStr $b1 = ByteStr.new(:bytes( 'foo'.ords, 0, 10, 'bar'.ords ));

assignment ($b1 and $b2 contain the same ByteStr object afterwards):

my ByteStr $b2 = $b1;

comparing:

say 'b0 cmp b1 = ', $b0 cmp $b1; say 'b1 cmp b2 = ', $b1 cmp $b2;

cloning:

my $clone = $b1.clone; $b1.replace('o'.ord => 0); say 'b1 = ', $b1; say 'b2 = ', $b2; say 'clone = ', $clone;

to check for (non-)emptiness we evaluate the ByteStr in boolean context:

say 'b0 is ', $b0 ?? 'not empty' !! 'empty'; say 'b1 is ', $b1 ?? 'not empty' !! 'empty';

appending a byte:

$b1.push: 123;

extracting a substring:

my $sub = $b1.substr(2, 4); say 'sub = ', $sub;

replacing a byte:

$b2.replace(102 => 103); say $b2;

joining:

my ByteStr $b3 = $b1 ~ $sub; say 'joined = ', $b3;</lang>

Output:

Note: The ␀ represents a NUL byte.

nema␀problema!
Buf:0x<ff 00 01 02 03>
b0 cmp b1 = Increase
b1 cmp b2 = Same
b1 = 102 0 0 0 10 98 97 114
b2 = 102 0 0 0 10 98 97 114
clone = 102 111 111 0 10 98 97 114
b0 is empty
b1 is not empty
sub = 0 0 10 98
103 0 0 0 10 98 97 114 123
joined = 103 0 0 0 10 98 97 114 123 0 0 10 98

PicoLisp

Byte strings are represented in PicoLisp as lists of numbers. They can be maniplated easily with the built-in list functionality.

I/O of raw bytes is done via the 'wr' (write) and 'rd' (read) functions. The following creates a file consisting of 256 bytes, with values from 0 to 255: <lang PicoLisp>: (out "rawfile"

  (mapc wr (range 0 255)) )</lang>

Looking at a hex dump of that file: <lang PicoLisp>: (hd "rawfile") 00000000 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ................ 00000010 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F ................ 00000020 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F !"#$%&'()*+,-./ 00000030 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 0123456789:;<=>? ...</lang> To read part of that file, an external tool like 'dd' might be used: <lang PicoLisp>: (in '(dd "skip=32" "bs=1" "count=16" "if=rawfile")

  (make
     (while (rd 1)
        (link @) ) ) )

-> (32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47)</lang> Now such byte lists can be assigned the normal way ('let', 'setq' etc.), they can be compared with '=', '>', '>=' etc, and manipulated with all internal map-, filter-, concatenation-, reversal-, pattern matching, and other functions.

If desired, a string containing meaningful values can also be converted to a transient symbol, e.g. the example above <lang PicoLisp>: (pack (mapcar char (32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47))) -> " !\"#$%&'()*+,-./"</lang>

PL/I

<lang PL/I> /* PL/I has immediate facilities for all those operations except for */ /* replace. */ s = t; /* assignment */ s = t || u; /* catenation - append one or more bytes. */ if length(s) = 0 then ... /* test for an empty string. */ if s = t then ... /* compare strings. */ u = substr(t, i, j); /* take a substring of t beginning at the */

                          /* i-th character andcontinuing for j     */
                          /* characters.                            */

substr(u, i, j) = t; /* replace j characters in u, beginning */

                          /* with the i-th character.               */

/* In string t, replace every occurrence of string u with string v. */ replace: procedure (t, u, v);

  declare (t, u, v) character (*) varying;

  do until (k = 0);
     k = index(t, u);
     if k > 0 then
        t = substr(t, 1, k-1) || v || substr(t, k+length(u));
  end;

end replace; </lang>

PureBasic

string creation

x$ = "hello world"

string destruction

x$ = ""

string comparison

If x$ = "hello world" : PrintN("String is equal") : EndIf

string copying;

y$ = x$

check If empty

If x$ = "" : PrintN("String is empty") : EndIf

append a byte

x$ = x$ + Chr(41)

extract a substring

x$ = Mid(x$, 1, 5)

replace bytes

x$ = ReplaceString(x$, "world", "earth")

join strings

x$ = "hel" + "lo w" + "orld" </lang>

Python

2.x

Python 2.x's string type (str) is a native byte string type. They can contain any byte sequence - they're not zero-terminated. There is a separate type for Unicode data (unicode).

String creation

<lang python>s1 = "A 'string' literal \n" s2 = 'You may use any of \' or " as delimiter' s3 = """This text

  goes over several lines
      up to the closing triple quote"""</lang>

String assignment

There is nothing special about assignments:

<lang python>s = "Hello " t = "world!" u = s + t # + concatenates</lang>

String comparison

They're compared byte by byte, lexicographically:

<lang python>assert "Hello" == 'Hello' assert '\t' == '\x09' assert "one" < "two" assert "two" >= "three"</lang>

String cloning and copying

Strings are immutable, so there is no need to clone/copy them. If you want to modify a string, you must create a new one with the desired contents. (There is another type, array, that provides a mutable buffer)

Check if a string is empty

<lang python>if x==: print "Empty string" if not x: print "Empty string, provided you know x is a string"</lang>

Append a byte to a string

<lang python>txt = "Some text" txt += '\x07'

txt refers now to a new string having "Some text\x07"</lang>

Extract a substring from a string

Strings are sequences, they can be indexed with s[index] (index is 0-based) and sliced s[start:stop] (all characters from s[start] up to, but not including, s[stop])

<lang python>txt = "Some more text" assert txt[4] == " " assert txt[0:4] == "Some" assert txt[:4] == "Some" # you can omit the starting index if 0 assert txt[5:9] == "more" assert txt[5:] == "more text" # omitting the second index means "to the end"</lang>

Negative indexes count from the end: -1 is the last byte, and so on:

<lang python>txt = "Some more text" assert txt[-1] == "t" assert txt[-4:] == "text"</lang>

Replace every occurrence of a byte (or a string) in a string with another string

Strings are objects and have methods, like replace:

<lang python>v1 = "hello world" v2 = v1.replace("l", "L") print v2 # prints heLLo worLd</lang>

Join strings

If they're separate variables, use the + operator:

<lang python>v1 = "hello" v2 = "world" msg = v1 + " " + v2</lang>

If the elements to join are contained inside any iterable container (e.g. a list)

<lang python>items = ["Smith", "John", "417 Evergreen Av", "Chimichurri", "481-3172"] joined = ",".join(items) print joined

output:
Smith,John,417 Evergreen Av,Chimichurri,481-3172</lang>

The reverse operation (split) is also possible:

<lang python>line = "Smith,John,417 Evergreen Av,Chimichurri,481-3172" fields = line.split(',') print fields

output:
['Smith', 'John', '417 Evergreen Av', 'Chimichurri', '481-3172']</lang>

3.x

Python 3.x has two binary string types: bytes (immutable) and bytearray (mutable). They can contain any byte sequence. They are completely separate from the string type (str). Most of the operators for strings, also work on bytes and bytearray

To specify a literal immutable byte string (bytes), prefix a string literal with "b": <lang python>s1 = b"A 'byte string' literal \n" s2 = b'You may use any of \' or " as delimiter' s3 = b"""This text

  goes over several lines
      up to the closing triple quote"""</lang>

You can use the normal string escape sequences to encode special bytes.

Indexing a byte string results in an integer (the byte value at that byte): <lang python>x = b'abc' x[0] # evaluates to 97</lang>

Similarly, a byte string can be converted to and from a list of integers:

<lang python>x = b'abc' list(x) # evaluates to [97, 98, 99] bytes([97, 98, 99]) # evaluates to b'abc'</lang>

Racket

lang racket

Byte strings can be created either by a function (b1) or as a literal string (b2). No operation is needed for destruction due to garbage collection.

(define b1 (make-bytes 5 65)) ; b1 -> #"AAAAA" (define b2 #"BBBBB") ; b2 -> #"BBBBB"

String assignment. Note that b2 cannot be mutated since literal byte strings are immutable.

(bytes-set! b1 0 66) ; b1 -> #"BAAAA"

Comparison. Less than & greater than are lexicographic comparison.

(bytes=? b1 b2) ; -> #f (bytes<? b1 b2) ; -> #t (bytes>? b1 b2) ; -> #f

Byte strings can be cloned by copying to a new one or by overwriting an existing one.

(define b3 (bytes-copy b1)) ; b3 -> #"BAAAA" (bytes-copy! b1 0 b2) ; b1 -> #"BBBBB"

Byte strings can be appended to one another. A single byte is appended as a length 1 string.

(bytes-append b1 b2) ; -> #"BBBBBBBBBB" (bytes-append b3 #"B") ; -> #"BAAAAB"

Substring

(subbytes b3 0) ; -> #"BAAAA" (subbytes b3 0 2) ; -> #"BA"

Regular expressions can be used to do replacements in a byte string (or ordinary strings)

(regexp-replace #"B" b1 #"A") ; -> #"ABBBB" (only the first one) (regexp-replace* #"B" b1 #"A") ; -> #"AAAAA"

Joining strings

(bytes-join (list b2 b3) #" ") ; -> #"BBBBB BAAAA" </lang>

REXX

Some older REXXes don't have a changestr bif, so one is included here. <lang REXX>/*REXX program shows ways to use and express binary strings. */

dingsta='11110101'b /*4 versions, bit str assignment.*/ dingsta="11110101"b /*same as above. */ dingsta='11110101'B /*same as above. */ dingsta='1111 0101'B /*same as above. */

dingst2=dingsta /*clone 1 str to another (copy). */

other='1001 0101 1111 0111'b /*another binary (bit) string. */

if dingsta=other then say 'they are equal' /*compare two strings.*/

if other== then say 'OTHER is empty.' /*see if it's empty. */ if length(other)==0 then say 'OTHER is empty.' /*another version. */

otherA=other || '$' /*append a dollar sign to OTHER. */ otherB=other'$' /*same as above, with less fuss. */

guts=substr(c2b(other),10,3) /*get the 10th through 12th bits.*/

                                      /*see sub below.   Some REXXes   */
                                      /*have C2B as a built-in function*/

new=changestr('A',other,"Z") /*change the letter A to Z. */

tt=changestr('~~',other,";") /*change 2 tildes to a semicolon.*/

joined=dignsta || dingst2 /*join 2 strs together (concat). */ exit /*stick a fork in it, we're done.*/ /*─────────────────────────────────C2B subroutine───────────────────────*/ c2b: return x2b(c2x(arg(1))) /*return the string as a binary string. */</lang> Some older REXXes don't have a changestr bif, so one is included here ──► CHANGESTR.REX.

Ruby

A String object holds and manipulates an arbitrary sequence of bytes. There are also the Array#pack and String#unpack methods to convert data to binary strings. <lang ruby># string creation x = "hello world"

string destruction

x = nil

string assignment with a null byte

x = "a\0b" x.length # ==> 3

string comparison

if x == "hello"

 puts "equal"

else

 puts "not equal"

end y = 'bc' if x < y

 puts "#{x} is lexicographically less than #{y}"

end

string cloning

xx = x.dup x == xx # true, same length and content x.equal?(xx) # false, different objects

check if empty

if x.empty?

 puts "is empty"

end

append a byte

p x << "\07"

substring

p xx = x[0..-2] x[1,2] = "XYZ" p x

replace bytes

p y = "hello world".tr("l", "L")

join strings

a = "hel" b = "lo w" c = "orld" p d = a + b + c</lang>

Run BASIC

<lang runbasic>' Create string s$ = "Hello, world"

' String destruction s$ = ""

' String comparison If s$ = "Hello, world" then print "Equal String"

' Copying string a$ = s$

' Check If empty If s$ = "" then print "String is MT"

' Append a byte s$ = s$ + Chr$(65)

' Extract a substring a$ = Mid$(s$, 1, 5) ' bytes 1 -> 5

'substitute string "world" with "universe" a$ = "Hello, world" for i = 1 to len(a$)

   if mid$(a$,i,5)="world" then
       a$=left$(a$,i-1)+"universe"+mid$(a$,i+5)
   end if

next print a$

'join strings s$ = "See " + "you " + "later." print s$</lang>

Seed7

Seed7 strings are capable to hold binary data. The memory of Seed7 strings is managed automatically. String declaration:

var string: stri is "asdf";   # variable declaration
const string: stri is "jkl";  # constant declaration

String assignment

stri := "blah";

String comparison

stri1 =  stri2         # equal
stri1 <> stri2         # not equal
stri1 <  stri2         # less than
stri1 <= stri2         # less than or equal
stri1 >  stri2         # greater than
stri1 >= stri2         # greater than or equal
compare(stri1, stri2)  # return -1, 0 or 1, depending on the comparison.

String copying (same as assignment)

stri2 := stri2;

Check if a string is empty

stri = ""         # compare with ""
length(stri) = 0  # check length

Append a byte to a string

stri &:= 'a';

Extract a substring from a string

stri[startPos .. endPos]   # substring from startPos to endPos
stri[startPos ..]          # substring from startPos to the end of stri
stri[.. endPos]            # substring from the beginning of stri to endPos
stri[startPos len aLength  # substring from startPos with maximum length of aLength

Replace every occurrence of a byte (or a string) in a string with another string

replace(stri,"la","al");

Join strings

stri3 = stri1 & stri2;

The string.s7i library contains more string functions.

Tcl

Tcl strings are binary safe, and a binary string is any string that only contains UNICODE characters in the range \u0000–\u00FF. <lang tcl># string creation set x "hello world"

string destruction

unset x

string assignment with a null byte

set x a\0b string length $x ;# ==> 3

string comparison

if {$x eq "hello"} {puts equal} else {puts "not equal"} set y bc if {$x < $y} {puts "$x is lexicographically less than $y"}

string copying; cloning happens automatically behind the scenes

set xx $x

check if empty

if {$x eq ""} {puts "is empty"} if {[string length $x] == 0} {puts "is empty"}

append a byte

append x \07

substring

set xx [string range $x 0 end-1]

replace bytes

set y [string map {l L} "hello world"]

join strings

set a "hel" set b "lo w" set c "orld" set d $a$b$c</lang>