Binary strings

From Rosetta Code
Revision as of 20:23, 19 May 2009 by rosettacode>Kevin Reid (Basic string manipulation functions moved to Binary string manipulation functions: Clarity; 'string' so often means 'character string')
Task
Binary strings
You are encouraged to solve this task according to the task description, using any language you may know.

Many languages have powerful and useful (binary safe) strings manipulation functions, while others haven't, making it harder for these languages to accomplish some kind of tasks. This task is about creating functions to handle binary strings (strings made of arbitrary bytes, i.e. byte strings according to Wikipedia) for those languages that haven't a builtin support for them. If your language of choice is not among these, show a possible alternative implementation for the functions or abilities already provided by the language. In particular the functions you need to create are:

  • String creation and destruction (when needed and if there's no garbage collection or similar mechanisms)
  • String assignment (there's no need to pretend the syntax of common types assignments)
  • String comparison
  • String cloning and copying
  • Check if a string is empty
  • Append a byte to a string
  • Extract a substring from a string
  • Replace every occurrences of a byte (or a string) in a string with another string
  • Join strings

Possible contexts of use: compression algorithms (like LZW compression), L-systems (manipulation of symbols), many more.

ALGOL 68

Translation of: Tcl
Works with: ALGOL 68 version Standard - no extensions to language used
Works with: ALGOL 68G version Any - tested with release mk15-0.8b.fc9.i386

<lang># String creation # STRING a,b,c,d,e,f,g,h,i,j,l,r; a := "hello world"; print((a, new line));

  1. String destruction (for garbage collection) #

b := (); BEGIN

 LOC STRING lb := "hello earth";  # allocate off the LOC stack  #
 HEAP STRING hb := "hello moon"; # allocate out of the HEAP space #
 ~

END; # local variable "lb" has LOC stack space recovered at END #

  1. String assignment #

c := "a"+REPR 0+"b"; print (("string length c:", UPB c, new line));# ==> 3 #

  1. String comparison #

l := "ab"; r := "CD";

BOOL result; FORMAT summary = $""""g""" is "b("","NOT ")"lexicographically "g" """g""""l$ ;

result := l < r OR l LT r; printf((summary, l, result, "less than", r)); result := l <= r OR l LE r # OR l ≤ r #; printf((summary, l, result, "less than or equal to", r)); result := l = r OR l EQ r; printf((summary, l, result, "equal to", r)); result := l /= r OR l NE r # OR l ≠ r #; printf((summary, l, result, "not equal to", r)); result := l >= r OR l GE r # OR l ≥ r #; printf((summary, l, result, "greater than or equal to", r)); result := l > r OR l GT r; printf((summary, l, result, "greater than", r));

  1. String cloning and copying #

e := f;

  1. Check if a string is empty #

IF g = "" THEN print(("g is empty", new line)) FI; IF UPB g = 0 THEN print(("g is empty", new line)) FI;

  1. Append a byte to a string #

h +:= "A";

  1. Append a string to a string #

h +:= "BCD"; h PLUSAB "EFG";

  1. Prepend a string to a string - because STRING addition isn't communitive #

"789" +=: h; "456" PLUSTO h; print(("The result of prepends and appends: ", h, new line));

  1. Extract a substring from a string #

i := h[2:3]; print(("Substring 2:3 of ",h," is ",i, new line));

  1. Replace every occurrences of a byte (or a string) in a string with another string #

PROC replace = (STRING string, old, new, INT count)STRING: (

 INT pos;
 STRING tail := string, out;
 TO count WHILE string in string(old, pos, tail) DO
   out +:= tail[:pos-1]+new;
   tail := tail[pos+UPB old:]
 OD;
 out+tail

);

j := replace("hello world", "world", "planet", max int); print(("After replace string: ", j, new line));

INT offset = 7;

  1. Replace a character at an offset in the string #

j[offset] := "P"; print(("After replace 7th character: ", j, new line));

  1. Replace a substring at an offset in the string #

j[offset:offset+3] := "PlAN"; print(("After replace 7:10th characters: ", j, new line));

  1. Insert a string before an offset in the string #

j := j[:offset-1]+"INSERTED "+j[offset:]; print(("Insert string before 7th character: ", j, new line));

  1. Join strings #

a := "hel"; b := "lo w"; c := "orld"; d := a+b+c;

print(("a+b+c is ",d, new line));

  1. Pack a string into the target CPU's word #

BYTES word := bytes pack(d);

  1. Extract a CHAR from a CPU word #

print(("7th byte in CPU word is: ", offset ELEM word, new line))</lang> Output:

hello world
string length c:         +3
"ab" is NOT lexicographically less than "CD"
"ab" is NOT lexicographically less than or equal to "CD"
"ab" is NOT lexicographically equal to "CD"
"ab" is lexicographically not equal to "CD"
"ab" is lexicographically greater than or equal to "CD"
"ab" is lexicographically greater than "CD"
g is empty
g is empty
The result of prepends and appends: 456789ABCDEFG
Substring 2:3 of 456789ABCDEFG is 56
After replace string: hello planet
After replace 7th character: hello Planet
After replace 7:10th characters: hello PlANet
Insert string before 7th character: hello INSERTED PlANet
a+b+c is hello world
7th byte in CPU word is: w

C

estrings.h <lang c>#ifndef ESTRINGS_H_

  1. define ESTRINGS_H_
  1. include <string.h>
  2. include <stdlib.h>
  3. include <stdbool.h>
  1. define BYTES_PER_BLOCK 128

struct StringStruct {

 char *bstring;
 size_t length;
 size_t blocks;

}; typedef struct StringStruct *String;


String newString(); String setString(String s, char *p, size_t len); String appendChar(String s, char c); int compareStrings(String s1, String s2); void destroyString(String s); String copyString(String to, String from); String cloneString(String s); bool stringIsEmpty(String s); String subString(String s, size_t from, size_t to); String replaceWith(String str, String ch, String repl); String joinStrings(String s1, String s2);

  1. endif</lang>

estrings.c <lang c>#include "estrings.h"

  1. include <stdio.h>
  2. define NOT_IMPLEMENTED_YET fprintf(stderr, "not implemented yet\n")</lang>

<lang c>String newString() {

 String t;
 t = malloc(sizeof(struct StringStruct));
 if ( t == NULL ) return NULL;
 t->length = 0;
 t->blocks = 1;
 t->bstring = malloc(BYTES_PER_BLOCK * t->blocks);
 if ( t->bstring == NULL ) { free(t); return NULL; }
 return t;

}</lang>

<lang c>static bool _fitString(String s, size_t len) {

 s->blocks = len/BYTES_PER_BLOCK + 1;
 s->bstring = realloc(s->bstring, s->blocks * BYTES_PER_BLOCK);
 if ( s->bstring != NULL ) return true;
 return false;

}</lang>

<lang c>String setString(String s, char *p, size_t len) {

  if ( s != NULL )
  {
     if ( p == NULL ) { s->length = 0; return s; }
     _fitString(s, len);
     if ( s->bstring != NULL )
     {
       memcpy(s->bstring, p, len);
       s->length = len;
     }
  }
  return s;

}</lang>

<lang c>String appendChar(String s, char c) {

  if ( s == NULL ) return NULL;
  _fitString(s, s->length + 1);
  s->length++;
  if ( s->bstring != NULL )
  {
     s->bstring[s->length-1] = c;
  }
  return s;

}</lang>

<lang c>int compareStrings(String s1, String s2) {

  if ( s1->length < s2->length ) return -1;
  if ( s1->length > s2->length ) return 1;
  return memcmp(s1->bstring, s2->bstring, s1->length);

}</lang>

<lang c>void destroyString(String s) {

  if ( s != NULL )
  {
     if ( s->bstring != NULL ) free(s->bstring);
     free(s);
  }

}</lang>

<lang c>String copyString(String to, String from) {

  if ( (to->bstring != NULL) && (from->bstring != NULL) )
  {
     to->blocks = from->blocks;
     to->length = from->length;
     to->bstring = realloc(to->bstring, to->blocks * BYTES_PER_BLOCK);
     memcpy(to->bstring, from->bstring, to->length);
  }
  return to;

}</lang>

<lang c>String cloneString(String s) {

  String ps = malloc(sizeof(struct StringStruct));
  if ( ps != NULL )
  {
     ps->length = s->length;
     ps->blocks = s->blocks;
     ps->bstring = malloc(s->blocks * BYTES_PER_BLOCK);
     if ( ps->bstring != NULL )
     {
        memcpy(ps->bstring, s->bstring, s->length);
     } else {
        free(ps); return NULL;
     }
  }
  return ps;

}</lang>

<lang c>bool stringIsEmpty(String s) {

 if ( s == NULL ) return true;
 if ( s->length == 0 ) return true;
 return false;

}</lang>

<lang c>String subString(String s, size_t from, size_t to) {

 String ss;
 if ( stringIsEmpty(s) || (to < from) || ( from >= s->length )) return newString();
 if ( (from == 0) && (to >= (s->length - 1) ) ) return cloneString(s);
 ss = newString();
 if ( ss == NULL ) return NULL;
 if ( _fitString(ss, to - from) ) {
   ss->length = to - from;
   memcpy(ss->bstring, s->bstring+from, ss->length);
 }
 return ss;

}</lang>

<lang c>String replaceWith(String str, String ch, String repl) {

 String d = NULL;
 int occ = 0, i, j;
 if ( stringIsEmpty(str) ) return NULL;
 if ( stringIsEmpty(ch) ) return cloneString(str);
 if ( ch->length > 1 ) {
   NOT_IMPLEMENTED_YET;
   return str;
 }
 for(i=0; i < str->length; i++) {
   if ( str->bstring[i] == ch->bstring[0] ) occ++;
 }
 if ( occ == 0 ) return cloneString(str);
 d = newString();
 if ( _fitString(d, str->length + occ * (repl->length - 1)) ) {
   d->length = str->length + occ * (repl->length - 1);
   for(i=0, j=0; i < str->length; i++) {
     if ( str->bstring[i] != ch->bstring[0] ) {
   d->bstring[j] = str->bstring[i];
   j++;
     } else {
   memcpy(d->bstring + j, repl->bstring, repl->length);
   j += repl->length;
     }
   }
 }
 return d;

}</lang>

<lang c>String joinStrings(String s1, String s2) {

 String d;
 
 if ( stringIsEmpty(s1) ) return cloneString(s2);
 if ( stringIsEmpty(s2) ) return cloneString(s1);
 d = newString();
 if ( _fitString(d, s1->length + s2->length) ) {
   memcpy(d->bstring, s1->bstring, s1->length);
   memcpy(d->bstring + s1->length, s2->bstring, s2->length);
   d->length = s1->length + s2->length;
 }
 return d;

}</lang>

<lang c>#undef NOT_IMPLEMENTED_YET</lang>

Haskell

Note that any of the following functions can be assigned to 'variables' in a working program or could just as easily be written as one-off expressions. They are given here as they are to elucidate the workings of Haskell's type system. Hopefully the type declarations will help beginners understand what's going on. Also note that there are likely more concise ways to express many of the below functions. However, I have opted for clarity here as Haskell can be somewhat intimidating to the (currently) non- functional programmer. <lang haskell> import Text.Regex {- The above import is needed only for the last function. It is used there purely for readability and conciseness -}

{- Assigning a string to a 'variable'. We're being explicit about it just for show. Haskell would be able to figure out the type of "world" -} string = "world" :: String</lang>

<lang haskell> {- Comparing two given strings and returning a boolean result using a simple conditional -} strCompare :: String -> String -> Bool strCompare x y =

   if x == y
       then True
       else False</lang>

<lang haskell> {- As strings are equivalent to lists of characters in Haskell, test and see if the given string is an empty list -} strIsEmpty :: String -> Bool strIsEmpty x =

   if x == []
       then True
       else False</lang>

<lang haskell> {- This is the most obvious way to append strings, using the built-in (++) concatenation operator Note the same would work to join any two strings (as 'variables' or as typed strings -} strAppend :: String -> String -> String strAppend x y = x ++ y</lang>

<lang haskell> {- Take the specified number of characters from the given string -} strExtract :: Int -> String -> String strExtract x s = take x s</lang>

<lang haskell> {- Take a certain substring, specified by two integers, from the given string -} strPull :: Int -> Int -> String -> String strPull x y s = take (y-x+1) (drop x s)</lang>

<lang haskell> {- Much thanks to brool.com for this nice and elegant solution. Using an imported standard library (Text.Regex), replace a given substring with another -} strReplace :: String -> String -> String -> String strReplace old new orig = subRegex (mkRegex old) orig new </lang>

Tcl

Tcl strings are binary safe. <lang tcl># string creation set x "hello world"

  1. string destruction

unset x

  1. string assignment with a null byte

set x a\0b string length $x ;# ==> 3

  1. string comparison

if {$x eq "hello"} {puts equal} else {puts "not equal"} set y bc if {$x < $y} {puts "$x is lexicographically less than $y"}

  1. string copying

set xx $x

  1. check if empty

if {$x eq ""} {puts "is empty"} if {[string length $x] == 0} {puts "is empty"}

  1. append a byte

append x \07

  1. substring

set xx [string range $x 0 end-1]

  1. replace bytes

set y [string map {l L} "hello world"]

  1. join strings

set a "hel" set b "lo w" set c "orld" set d $a$b$c</lang>