String length

From Rosetta Code
Revision as of 22:52, 16 April 2009 by rosettacode>ShinTakezou (→‎Byte Length: lang back to pre for a cmd line example; this way, the ø code is not ruined)
Task
String length
You are encouraged to solve this task according to the task description, using any language you may know.

In this task, the goal is to find the character and byte length of a string. This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters. For example, the character length of "møøse" is 5 but the byte length is 7 in UTF-8 and 10 in UTF-16.

Please mark your examples with ===Character Length=== or ===Byte Length===.

4D

Byte Length

<lang 4d>$length:=Length("Hello, world!")</lang>

ActionScript

Character Length

<lang actionscript>myStrVar.length()</lang>

Ada

Works with: GCC version 4.1.2

Byte Length

<lang ada>

Str    : String := "Hello World";
Length : constant Natural := Str'Size / 8;

</lang> The 'Size attribute returns the size of an object in bits. Provided that under "byte" one understands an octet of bits, the length in "bytes" will be 'Size divided to 8. Note that this is not necessarily the machine storage unit. In order to make the program portable, System.Storage_Unit should be used instead of "magic number" 8. System.Storage_Unit yields the number of bits in a storage unit on the current machine. Further, the length of a string object is not the length of what the string contains in whatever measurement units. String as an object may have a "dope" to keep the array bounds. In fact the object length can even be 0, if the compiler optimized the object away. So in most cases "byte length" makes no sense in Ada.

Character Length

<lang ada> Latin_1_Str  : String  := "Hello World"; UCS_16_Str  : Wide_String  := "Hello World"; Unicode_Str  : Wide_Wide_String := "Hello World"; Latin_1_Length : constant Natural := Latin_1_Str'Length; UCS_16_Length  : constant Natural := UCS_16_Str'Length; Unicode_Length : constant Natural := Unicode_Str'Length; </lang> The attribute 'Length yields the number of elements of an array. Since strings in Ada are arrays of characters, 'Length is the string length. Ada supports strings of Latin-1, UCS-16 and full Unicode characters. In the example above character length of all three strings is 11. The length of the objects in bits will differ.

ALGOL 68

Bits and Bytes Length

<lang algol> BITS bits := bits pack((TRUE, TRUE, FALSE, FALSE)); # packed array of BOOL #

BYTES bytes := bytes pack("Hello, world"); # packed array of CHAR #
print((
  " BITS and BYTES are fixed width:", new line,
  " bits width:", bits width, ", max bits: ", max bits, ", bits:", bits, new line,
  " bytes width: ",bytes width, ", UPB:",UPB STRING(bytes), ", string:", STRING(bytes),"!", new line
))

</lang> Output:

BITS and BYTES are fixed width:
bits width:        +32, max bits: TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT, bits:TTFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
bytes width:         +32, UPB:        +32, string:Hello, world!

Character Length

<lang algol> STRING str := "hello, world";

INT length := UPB str;
printf(($lx"Length of """g""" is "g(3)l$,str,length));

printf(($lx"STRINGS can start at -1, in which case LWB must be used:"l$));
STRING s := "abcd"[@-1];
print(("s:",s, ", LWB:", LWB s, ", UPB:",UPB s, ", LEN:",UPB s - LWB s + 1))

</lang> Output:

Length of "hello, world" is +12
STRINGS can start at -1, in which case LWB must be used:
s:abcd, LWB:         -1, UPB:         +2, LEN:         +4

AppleScript

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang applescript>count of "Hello World"</lang> Mac OS X 10.5 (Leopard) includes AppleScript 2.0 which uses only Unicode (UTF-16) character strings. This example has not been tested and may not work on previous versions of AppleScript. <lang applescript>

set inString to "Hello World" as Unicode text
set byteCount to 0
set idList to id of inString
repeat with incr in idList
  set byteCount to byteCount + 2
  if incr as integer > 65535 then
    set byteCount to byteCount + 2
  end if
end repeat
byteCount

</lang>

Character Length

<lang applescript>count of "Hello World"</lang> Or: <lang applescript>count "Hello World"</lang>

AWK

Byte Length

From within any code block: <lang awk> w=length("Hello, world!") # static string example

x=length("Hello," s " world!") # dynamic string example
y=length($1)                   # input field example
z=length(s)                    # variable name example</lang>

Ad hoc program from command line:

 echo "Hello, wørld!" | awk '{print length($0)}'   # 14

From executable script: (prints for every line arriving on stdin) <lang awk> #!/usr/bin/awk -f

{print"The length of this line is "length($0)}</lang>

BASIC

Character Length

Works with: QuickBasic version 4.5

BASIC only supports single-byte characters. The character "ø" is converted to "°" for printing to the console and length functions, but will still output to a file as "ø". <lang qbasic> INPUT a$

PRINT LEN(a$)</lang>

C

Byte Length

Works with: ANSI C
Works with: GCC version 3.3.3

<lang c>#include <string.h>

int main(void) {

 const char *string = "Hello, world!";
 size_t length = strlen(string);
        
 return 0;

}</lang> or by hand:

<lang c>int main(void) {

 const char *string = "Hello, world!";
 size_t length = 0;
 
 char *p = (char *) string;
 while (*p++ != '\0') length++;                                         
 
 return 0;

}</lang>

or (for arrays of char only)

<lang c>#include <stdlib.h>

int main(void) {

 char const s[] = "Hello, world!";
 size_t length = sizeof s - 1;
 
 return 0;

}</lang>

Character Length

For wide character strings (usually Unicode uniform-width encodings such as UCS-2 or UCS-4):

<lang c>#include <stdio.h>

  1. include <wchar.h>

int main(void) {

  wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */
  size_t length;
  length = wcslen(s);
  printf("Length in characters = %d\n", length);
  printf("Length in bytes      = %d\n", sizeof(s) * sizeof(wchar_t));
  
  return 0;

}</lang>

TODO: non-standard library calls for system multi-byte encodings, such as _mbcslen()

C++

Byte Length

Works with: ISO C++
Works with: g++ version 4.0.2

<lang cpp>#include <string> // (not <string.h>!) using std::string;

int main() {

 string s = "Hello, world!";
 string::size_type length = s.length(); // option 1: In Characters/Bytes
 string::size_type size = s.size();     // option 2: In Characters/Bytes
 // In bytes same as above since sizeof(char) == 1
 string::size_type bytes = s.length() * sizeof(string::value_type); 

}</lang> For wide character strings:

<lang cpp>#include <string> using std::wstring;

int main() {

 wstring s = L"\u304A\u306F\u3088\u3046";
 wstring::size_type length = s.length() * sizeof(wstring::value_type); // in bytes

}</lang>

Character Length

Works with: ISO C++
Works with: g++ version 4.0.2

For wide character strings:

<lang cpp>#include <string> using std::wstring;

int main() {

 wstring s = L"\u304A\u306F\u3088\u3046";
 wstring::size_type length = s.length();

}</lang>

For narrow character strings and arbitrary locales:

Works with: ISO C++
Works with: g++ version 4.1.2 20061115 (prerelease) (SUSE Linux)

<lang cpp>#include <cwchar> // for mbstate_t

  1. include <locale>

// give the character length for a given named locale std::size_t char_length(std::string const& text, char const* locale_name) {

 // locales work on pointers; get length and data from string and
 // then don't touch the original string any more, to avoid
 // invalidating the data pointer
 std::size_t len = text.length();
 char const* input = text.data();
 // get the named locale
 std::locale loc(locale_name);
 // get the conversion facet of the locale
 typedef std::codecvt<wchar_t, char, std::mbstate_t> cvt_type;
 cvt_type const& cvt = std::use_facet<cvt_type>(loc);
 // allocate buffer for conversion destination
 std::size_t bufsize = cvt.max_length()*len;
 wchar_t* destbuf = new wchar_t[bufsize];
 wchar_t* dest_end;
 // do the conversion
 mbstate_t state = mbstate_t();
 cvt.in(state, input, input+len, input, destbuf, destbuf+bufsize, dest_end);
 // determine the length of the converted sequence
 std::size_t length = dest_end - destbuf;
 // get rid of the buffer
 delete[] destbuf;
 // return the result
 return length;

}</lang>

Example usage (note that the locale names are OS specific):

<lang cpp>#include <iostream>

int main() {

 // Tür (German for door) in UTF8
 std::cout << char_length("\x54\xc3\xbc\x72", "de_DE.utf8") << "\n"; // outputs 3
 // Tür in ISO-8859-1
 std::cout << char_length("\x54\xfc\x72", "de_DE") << "\n"; // outputs 3

}</lang>

Note that the strings are given as explicit hex sequences, so that the encoding used for the source code won't matter.

C#

Platform: .NET

Works with: C # version 1.0+

Character Length

<lang cpp>string s = "Hello, world!"; int characterLength = s.Length;</lang>

Byte Length

Strings in .NET are stored in Unicode. <lang cpp>using System.Text;

string s = "Hello, world!"; int byteLength = Encoding.Unicode.GetByteCount(s);</lang> To get the number of bytes that the string would require in a different encoding, e.g., UTF8: <lang cpp>int utf8ByteLength = Encoding.UTF8.GetByteCount(s);</lang>

Clean

Byte Length

Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.

<lang cpp>import StdEnv

strlen :: String -> Int strlen string = size string

Start = strlen "Hello, world!"</lang>

ColdFusion

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang cfm>#len("Hello World")#</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang cfm>#len("Hello World")#</lang>

Common Lisp

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang lisp>(length "Hello World")</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang lisp>(length "Hello World")</lang>

Component Pascal

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang pascal>LEN("Hello, World!")</lang>

D

Byte Length

<lang d>string i = readln().chomp(); writefln("Length: ", i.length); </lang>

Character Length

<lang d>string i = readln().chomp(); writefln("Character length: ", i.toUTF32().length); </lang>

E

Character Length

<lang e>"Hello World".size()</lang>

Forth

Works with: ANS Forth

Byte Length

Strings in Forth come in two forms, neither of which are the null-terminated form commonly used in the C standard library.

Counted string

A counted string is a single pointer to a short string in memory. The string's first byte is the count of the number of characters in the string. This is how symbols are stored in a Forth dictionary.

<lang forth>CREATE s ," Hello world" \ create string "s" s C@ ( -- length=11 ) s COUNT ( addr len ) \ convert to a stack string, described below</lang>

Stack string

A string on the stack is represented by a pair of cells: the address of the string data and the length of the string data (in characters). The word COUNT converts a counted string into a stack string. The STRING utility wordset of ANS Forth works on these addr-len pairs. This representation has the advantages of not requiring null-termination, easy representation of substrings, and not being limited to 255 characters.

<lang forth>S" string" ( addr len) DUP . \ 6</lang>

Character Length

The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.)

The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx".

<lang forth>2 base !

utf8+ ( str -- str )
 begin
   char+
   dup c@
   11000000 and
   10000000 <>
 until ;

decimal</lang>

<lang forth>: count-utf8 ( zstr -- n )

 0
 begin
   swap dup c@
 while
   utf8+
   swap 1+
 repeat drop ;</lang>

Haskell

Byte Length

It is not possible to determine the "byte length" of an ordinary string, because in Haskell, a string is a boxed list of unicode characters. So each character in a string is represented as whatever the compiler considers as the most efficient representation of a cons-cell and a unicode character, and not as a byte.

For efficient storage of sequences of bytes, there's Data.ByteString, which uses Word8 as a base type. Byte strings have an additional Data.ByteString.Char8 interface, which will truncate each Unicode Char to 8 bits as soon as it is converted to a byte string. However, this is not adequate for the task, because truncation simple will garble characters other than Latin-1, instead of encoding them into UTF-8, say.

There are several (non-standard, so far) Unicode encoding libraries available on Hackage. As an example, we'll use encoding-0.2, as Data.Encoding:

<lang haskell>import Data.Encoding import Data.ByteString as B

strUTF8  :: ByteString strUTF8 = encode UTF8 "Hello World!"

strUTF32 :: ByteString strUTF32 = encode UTF32 "Hello World!"

strlenUTF8 = B.length strUTF8 strlenUTF32 = B.length strUTF32</lang>

Character Length

Works with: GHCi version 6.6
Works with: Hugs

The base type Char defined by the standard is already intended for (plain) Unicode characters.

<lang haskell>strlen = length "Hello, world!"</lang>

IDL

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Compiler: any IDL compiler should do

<lang haskell>length = strlen("Hello, world!")</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang haskell>length = strlen("Hello, world!")</lang>

Io

Byte Length

<lang io>"møøse" size</lang>

Character Length

<lang io>"møøse" sizeInBytes</lang>

J

Byte Length

<lang j>

  #     'møøse'

7 </lang>

Character Length

<lang j>

  #7 u: 'møøse'

5 </lang>

Java

Byte Length

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length method of String objects returns the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

<lang java5>String s = "Hello, world!"; int byteCount = s.length() * 2;</lang>

Another way to know the byte length of a string is to explicitly specify the charset we desire.

<lang java5>String s = "Hello, world!"; int byteCountUTF16 = s.getBytes("UTF-16").length; int byteCountUTF8 = s.getBytes("UTF-8").length;</lang>>

Character Length

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

The length method of String objects gives the number of 16-bit values used to encode a string. <lang java5>String s = "Hello, world!"; int length = s.length();</lang>

Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method. <lang java5>String str = "\uD834\uDD2A"; //U+1D12A int length1 = str.length(); //2 int length2 = str.codePointCount(0, str.length()); //1</lang>

JavaScript

Byte Length

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

<lang javascript>var s = "Hello, world!"; var byteCount = s.length * 2; //26</lang>

Character Length

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters. <lang javascript>var str1 = "Hello, world!"; var len1 = str1.length; //13

var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair var len2 = str2.length; //2</lang>

JudoScript

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang judoscript>//Store length of hello world in length and print it . length = "Hello World".length();</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang judoscript>//Store length of hello world in length and print it . length = "Hello World".length()</lang>

Logo is so old that only ASCII encoding is supported. Modern versions of Logo may have enhanced character set support. <lang logo>print count "|Hello World|  ; 11 print count "møøse  ; 5 print char 248  ; ø - implies ISO-Latin character set</lang>

LSE64

Byte Length

LSE stores strings as arrays of characters in 64-bit cells plus a count. <lang lse>" Hello world" @ 1 + 8 * , # 96 = (11+1)*(size of a cell) = 12*8</lang>

Character Length

LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string. <lang lse>" Hello world" @ , # 11</lang>

Lua

Works with: Lua version 5.0+

Byte Length

<lang lua>str="Hello world" length=#str</lang>

or

<lang lua>str="Hello world" length = string.len(s)</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang lua>string="Hello world" length=#string</lang>

Mathematica

Character length

<lang mathematica>StringLength["Hello world"]</lang>

Byte length

<lang mathematica>StringByteCount["Hello world"]</lang>

MAXScript

Character Length

<lang maxscript>"Hello world".count</lang>

mIRC Scripting Language

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang mirc>alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang mirc>alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }</lang>

Modula-3

Byte Length

<lang modula3> MODULE ByteLength EXPORTS Main;

IMPORT IO, Fmt, Text;

VAR s: TEXT := "Foo bar baz";

BEGIN

 IO.Put("Byte length of s: " & Fmt.Int((Text.Length(s) * BYTESIZE(s))) & "\n");

END ByteLength. </lang>

Character Length

<lang modula3> MODULE StringLength EXPORTS Main;

IMPORT IO, Fmt, Text;

VAR s: TEXT := "Foo bar baz";

BEGIN

 IO.Put("String length of s: " & Fmt.Int(Text.Length(s)) & "\n");

END StringLength. </lang>

Oberon-2

Byte Length

<lang oberon2> MODULE Size;

  IMPORT Out;
  VAR s: LONGINT;
     string: ARRAY 5 OF CHAR;

BEGIN

  string := "Foo";
  s := LEN(string);
  Out.String("Size: ");
  Out.LongInt(s,0);
  Out.Ln;

END Size. </lang>

Output:

Size: 5

Character Length

<lang oberon2> MODULE Length;

  IMPORT Out, Strings;
  VAR l: INTEGER;
     string: ARRAY 5 OF CHAR;

BEGIN

  string := "Foo";
  l := Strings.Length(string);
  Out.String("Length: ");
  Out.Int(l,0);
  Out.Ln;

END Length. </lang>

Output:

Length: 3

Objective-C

Character Length

<lang objc>// Return the length in unicode characters unsigned numberOfCharacters = [@"m\xf8\xf8se" length]; // 5</lang>

Byte Length

<lang objc>// Return the number of bytes depending on the encoding unsigned numberOfBytes = [[@"m\xf8\xf8se" dataUsingEncoding:NSUTF8Encoding] length]; // 7</lang>

Works with: Mac OS X version 10.4+

<lang objc>// Return the number of bytes depending on the encoding unsigned numberOfBytes = [@"m\xf8\xf8se" lengthOfBytesUsingEncoding:NSUTF8Encoding]; // 7</lang>

OCaml

In OCaml currently, characters inside the standard type string are bytes, and a single character taken alone has the same binary representation as the OCaml int (which is equivalent to a C long) which is a machine word.

For internationalization there is Camomile, a comprehensive Unicode library for OCaml. Camomile provides Unicode character type, UTF-8, UTF-16, and more...

Byte Length

Standard OCaml strings are classic ASCII ISO 8859-1, so the function String.length returns the byte length which is the character length in this encoding: <lang ocaml>String.length "Hello world" ;;</lang>

Character Length

While using the UTF8 module of Camomile the byte length of an utf8 encoded string will be get with String.length and the character length will be returned by UTF8.length: <lang ocaml> String.length "møøse" UTF8.length "møøse" </lang>

Octave

<lang octave>s = "string"; stringlen = length(s)</lang>

This gives the number of bytes, not of characters. e.g. length("è") is 2 when "è" is encoded e.g. as UTF-8.

Perl

Byte Length

Works with: Perl version 5.8

Strings in Perl consist of characters. Measuring the byte length therefore requires conversion to some binary representation (called encoding, both noun and verb).

<lang perl>use utf8; # so we can use literal characters like ☺ in source use Encode qw(encode);

print length encode 'UTF-8', "Hello, world! ☺";

  1. 17. The last character takes 3 bytes, the others 1 byte each.

print length encode 'UTF-16', "Hello, world! ☺";

  1. 32. 2 bytes for the BOM, then 15 byte pairs for each character.</lang>

Character Length

Works with: Perl version 5.X

<lang perl>my $length = length "Hello, world!";</lang>

PHP

Byte Length

<lang php>$length = strlen('Hello, world!');</lang>

Character Length

<lang php>$length = mb_strlen('Hello, world!', 'UTF-8'); // or whatever encoding</lang>

PL/SQL

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang plsql>DECLARE

 string VARCHAR2( 50 ) := 'Hello, world!';
 stringlength NUMBER;

BEGIN

 stringlength := length( string );

END;</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang plsql>DECLARE

 string VARCHAR2( 50 ) := 'Hello, world!';
 stringlength NUMBER;

BEGIN

 stringlength := length( string );

END;</lang>

Pop11

Byte Length

Currently Pop11 supports only strings consisting of 1-byte units. Strings can carry arbitrary binary data, so user can for example use UTF-8 (however builtin procedures will treat each byte as a single character). The length function for strings returns length in bytes:

<lang pop11>lvars str = 'Hello, world!'; lvars len = length(str);</lang>

Python

2.x

In Python 2.x, there are two types of strings: regular (8-bit) strings, and Unicode strings. Unicode string literals are prefixed with "u".

Byte Length

Works with: Python version 2.x

For 8-bit strings, the byte length is the same as the character length:

>>> len('ascii')
5

For Unicode strings, byte length depends on the encoding. Python use 2 or 4 bytes per character internally for unicode strings, depending on how it was built. The internal representation is not interesting for the user.

# The letter Alef
>>> len(u'\u05d0'.encode('utf-8'))
2
>>> len(u'\u05d0'.encode('iso-8859-8'))
1

Example from the problem statement: <lang python>#!/bin/env python

  1. -*- coding: UTF-8 -*-

s = u"møøse" assert len(s) == 5 assert len(s.encode('UTF-8')) == 7 assert len(s.encode('UTF-16')) == 12 # The extra character is probably a leading Unicode byte-order mark (BOM).</lang>

Character Length

Works with: Python version 2.4

len() returns the number of characters in a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first:

>>> len('ascii')
5
>>> len(u'\u05d0') # the letter Alef as unicode literal
1
>>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string
1

3.x

In Python 3.x, strings are Unicode strings.

Byte Length

Byte length depends on the encoding. Python use 2 or 4 bytes per character internally for unicode strings, depending on how it was built. The internal representation is not interesting for the user.

You can use len() to get the length of a byte sequence. To get a byte sequence from a string, you have to encode it with the desired encoding:

# The letter Alef
>>> len('\u05d0'.encode('utf-8'))
2
>>> len('\u05d0'.encode('iso-8859-8'))
1

Example from the problem statement: <lang python>#!/bin/env python

  1. -*- coding: UTF-8 -*-

s = "møøse" assert len(s) == 5 assert len(s.encode('UTF-8')) == 7 assert len(s.encode('UTF-16')) == 12 # The extra character is probably a leading Unicode byte-order mark (BOM).</lang>

Character Length

len() returns the number of characters in a string. To get the length of an encoded byte sequence, you have to decode it first:

>>> len('ascii')
5
>>> len('\u05d0') # the letter Alef as unicode literal
1
>>> len(b'\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 byte sequence
1

Ruby

Byte Length

<lang ruby>string="Hello world" print string.length or puts "Hello World".length</lang>

Character Length

<lang ruby>require 'active_support' puts "Hello World".chars.length</lang>

Scheme

Byte Length

Works with: Gauche version 0.8.7 [utf-8,pthreads]

string-size function is only Gauche function. <lang scheme>(string-size "Hello world")</lang>

Character Length

Works with: Gauche version 0.8.7 [utf-8,pthreads]

string-length function is in R5RS, R6RS. <lang scheme> (string-length "Hello world")</lang>

Seed7

Character Length

<lang seed7>length("Hello, world!")</lang>

Smalltalk

Byte Length

<lang smalltalk>string := 'Hello, world!'. string size.</lang>

Character Length

In GNU Smalltalk:

<lang smalltalk>string := 'Hello, world!". string numberOfCharacters.</lang>

requires loading the Iconv package:

<lang smalltalk>PackageLoader fileInPackage: 'Iconv'</lang>

Standard ML

Works with: SML/NJ version 110.60
Works with: Moscow ML version 2.01
Works with: MLton version 20061107

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang sml>val strlen = size "Hello, world!";</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang sml>val strlen = size "Hello, world!";</lang>

Tcl

Byte Length

Basic version:

<lang tcl>string bytelength "Hello, world!"</lang>

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

<lang tcl>fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly set s1 "hello, world" set s2 "\u304A\u306F\u3088\u3046" puts [format "length of \"%s\" in bytes is %d" $s1 [string bytelength $s1]] puts [format "length of \"%s\" in bytes is %d" $s2 [string bytelength $s2]]</lang>

Character Length

Basic version:

<lang tcl>string length "Hello, world!"</lang>

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

 fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
 set s1 "hello, world"
 set s2 "\u304A\u306F\u3088\u3046"
 puts [format "length of \"%s\" in characters is %d"  $s1 [string length $s1]]
 puts [format "length of \"%s\" in characters is %d"  $s2 [string length $s2]]

Toka

Byte Length

<lang toka>" hello, world!" string.getLength</lang>

UNIX Shell

Byte Length

With external utilities:

Works with: bourne shell

<lang bash>string='Hello, world!' length=`echo -n "$string" | wc -c | tr -dc '0-9'` echo $length # if you want it printed to the terminal</lang>

With SUSv3 parameter expansion modifier:

Works with: Almquist SHell
Works with: Bourne Again SHell version 3.2
Works with: Korn SHell version 5.2.14 99/07/13.2
Works with: Z SHell

<lang bash>string='Hello, world!' length="${#string}" echo $length # if you want it printed to the terminal</lang>

VBScript

Byte Length

<lang vbscript>LenB(string|varname)</lang>

Returns the number of bytes required to store a string in memory. Returns null if string|varname is null.

Character Length

<lang vbscript>Len(string|varname)</lang>

Returns the length of the string|varname . Returns null if string|varname is null.

XSLT

Character Length

<lang xml><?xml version="1.0" encoding="UTF-8"?></lang>

...

<lang xml><xsl:value-of select="string-length('møøse')" /> </lang>

xTalk

Works with: HyperCard

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang xtalk>put the length of "Hello World"</lang>

or

<lang xtalk>put the number of characters in "Hello World"</lang>

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang xtalk>put the length of "Hello World"</lang>

or

<lang xtalk>put the number of characters in "Hello World"</lang>