String length

From Rosetta Code
Task
String length
You are encouraged to solve this task according to the task description, using any language you may know.
Task

Find the character and byte length of a string.

This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters.

By character, we mean an individual Unicode code point, not a user-visible grapheme containing combining characters.

For example, the character length of "møøse" is 5 but the byte length is 7 in UTF-8 and 10 in UTF-16.

Non-BMP code points (those between 0x10000 and 0x10FFFF) must also be handled correctly: answers should produce actual character counts in code points, not in code unit counts.

Therefore a string like "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" (consisting of the 7 Unicode characters U+1D518 U+1D52B U+1D526 U+1D520 U+1D52C U+1D521 U+1D522) is 7 characters long, not 14 UTF-16 code units; and it is 28 bytes long whether encoded in UTF-8 or in UTF-16.

Please mark your examples with ===Character Length=== or ===Byte Length===.

If your language is capable of providing the string length in graphemes, mark those examples with ===Grapheme Length===.

For example, the string "J̲o̲s̲é̲" ("J\x{332}o\x{332}s\x{332}e\x{301}\x{332}") has 4 user-visible graphemes, 9 characters (code points), and 14 bytes when encoded in UTF-8.

Metrics: length

Sub-string search: Count occurrences of a substring

Multi-string operations: LCP, LCS, concatenation

Manipulation: reverse, lower- and uppercase

Contents

360 Assembly[edit]

Assembler 360 use EBCDIC coding, so one character is one byte. The L' atrribute can be seen as the length function for assembler 360.

*        String length             06/07/2016
LEN CSECT
USING LEN,15 base register
LA 1,L'C length of C
XDECO 1,PG
XPRNT PG,12
LA 1,L'H length of H
XDECO 1,PG
XPRNT PG,12
LA 1,L'F length of F
XDECO 1,PG
XPRNT PG,12
LA 1,L'D length of D
XDECO 1,PG
XPRNT PG,12
LA 1,L'PG length of PG
XDECO 1,PG
XPRNT PG,12
BR 14 exit length
C DS C character 1
H DS H half word 2
F DS F full word 4
D DS D double word 8
PG DS CL12 string 12
END LEN
Output:
           1
           2
           4
           8
          12

4D[edit]

Byte Length[edit]

$length:=Length("Hello, world!")

ActionScript[edit]

Byte length[edit]

This uses UTF-8 encoding. For other encodings, the ByteArray's writeMultiByte() method can be used.

 
package {
 
import flash.display.Sprite;
import flash.events.Event;
import flash.utils.ByteArray;
 
public class StringByteLength extends Sprite {
 
public function StringByteLength() {
if ( stage ) _init();
else addEventListener(Event.ADDED_TO_STAGE, _init);
}
 
private function _init(e:Event = null):void {
var s1:String = "The quick brown fox jumps over the lazy dog";
var s2:String = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
var s3:String = "José";
 
var b:ByteArray = new ByteArray();
b.writeUTFBytes(s1);
trace(b.length); // 43
 
b.clear();
b.writeUTFBytes(s2);
trace(b.length); // 28
 
b.clear();
b.writeUTFBytes(s3);
trace(b.length); // 5
}
 
}
 
}
 

Character Length[edit]

 
var s1:String = "The quick brown fox jumps over the lazy dog";
var s2:String = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
var s3:String = "José";
trace(s1.length, s2.length, s3.length); // 43, 14, 4
 

Ada[edit]

Works with: GCC version 4.1.2

Byte Length[edit]

Str    : String := "Hello World";
Length : constant Natural := Str'Size / 8;

The 'Size attribute returns the size of an object in bits. Provided that under "byte" one understands an octet of bits, the length in "bytes" will be 'Size divided to 8. Note that this is not necessarily the machine storage unit. In order to make the program portable, System.Storage_Unit should be used instead of "magic number" 8. System.Storage_Unit yields the number of bits in a storage unit on the current machine. Further, the length of a string object is not the length of what the string contains in whatever measurement units. String as an object may have a "dope" to keep the array bounds. In fact the object length can even be 0, if the compiler optimized the object away. So in most cases "byte length" makes no sense in Ada.

Character Length[edit]

Latin_1_Str    : String           := "Hello World";
UCS_16_Str  : Wide_String  := "Hello World";
Unicode_Str  : Wide_Wide_String := "Hello World";
Latin_1_Length : constant Natural := Latin_1_Str'Length;
UCS_16_Length  : constant Natural := UCS_16_Str'Length;
Unicode_Length : constant Natural := Unicode_Str'Length;

The attribute 'Length yields the number of elements of an array. Since strings in Ada are arrays of characters, 'Length is the string length. Ada supports strings of Latin-1, UCS-16 and full Unicode characters. In the example above character length of all three strings is 11. The length of the objects in bits will differ.

Aime[edit]

Byte Length[edit]

length("Hello, World!")

ALGOL 68[edit]

Bits and Bytes Length[edit]

BITS bits := bits pack((TRUE, TRUE, FALSE, FALSE)); # packed array of BOOL #
BYTES bytes := bytes pack("Hello, world"); # packed array of CHAR #
print((
"BITS and BYTES are fixed width:", new line,
"bits width:", bits width, ", max bits: ", max bits, ", bits:", bits, new line,
"bytes width: ",bytes width, ", UPB:",UPB STRING(bytes), ", string:", STRING(bytes),"!", new line
))

Output:

BITS and BYTES are fixed width:
bits width:        +32, max bits: TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT, bits:TTFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
bytes width:         +32, UPB:        +32, string:Hello, world!

Character Length[edit]

STRING str := "hello, world";
INT length := UPB str;
printf(($"Length of """g""" is "g(3)l$,str,length));
 
printf(($l"STRINGS can start at -1, in which case LWB must be used:"l$));
STRING s := "abcd"[@-1];
print(("s:",s, ", LWB:", LWB s, ", UPB:",UPB s, ", LEN:",UPB s - LWB s + 1))

Output:

Length of "hello, world" is +12
STRINGS can start at -1, in which case LWB must be used:
s:abcd, LWB:         -1, UPB:         +2, LEN:         +4

Apex[edit]

 
String myString = 'abcd';
System.debug('Size of String', myString.length());
 

AppleScript[edit]

Byte Length[edit]

count of "Hello World"

Mac OS X 10.5 (Leopard) includes AppleScript 2.0 which uses only Unicode (UTF-16) character strings. This example has been tested on OSX 10.8.5. Added a combining char for testing.

 
set inString to "Hello é̦世界"
set byteCount to 0
 
repeat with c in inString
set t to id of c
if ((count of t) > 0) then
repeat with i in t
set byteCount to byteCount + doit(i)
end repeat
else
set byteCount to byteCount + doit(t)
end if
end repeat
 
byteCount
 
on doit(cid)
set n to (cid as integer)
if n > 67108863 then -- 0x3FFFFFF
return 6
else if n > 2097151 then -- 0x1FFFFF
return 5
else if n > 65535 then -- 0xFFFF
return 4
else if n > 2047 then -- 0x07FF
return 3
else if n > 127 then -- 0x7F
return 2
else
return 1
end if
end doit

Character Length[edit]

count of "Hello World"

Or:

count "Hello World"

Applesoft BASIC[edit]

? LEN("HELLO, WORLD!")

AutoHotkey[edit]

Character Length[edit]

Msgbox % StrLen("Hello World")

Or:

String := "Hello World"
StringLen, Length, String
Msgbox % Length

AWK[edit]

Byte Length[edit]

From within any code block:

w=length("Hello, world!")      # static string example
x=length("Hello," s " world!") # dynamic string example
y=length($1) # input field example
z=length(s) # variable name example

Ad hoc program from command line:

 echo "Hello, wørld!" | awk '{print length($0)}'   # 14

From executable script: (prints for every line arriving on stdin)

#!/usr/bin/awk -f
{print"The length of this line is "length($0)}

Axe[edit]

Axe supports two string encodings: a rough equivalent to ASCII, and a token-based format. These examples are for ASCII.

Byte Length[edit]

"HELLO, WORLD"→Str1
Disp length(Str1)▶Dec,i

Batch File[edit]

Byte Length[edit]

@echo off
setlocal enabledelayedexpansion
call :length %1 res
echo length of %1 is %res%
goto :eof
 
:length
set str=%~1
set cnt=0
:loop
if "%str%" equ "" (
set %2=%cnt%
goto :eof
)
set str=!str:~1!
set /a cnt = cnt + 1
goto loop

BASIC[edit]

Character Length[edit]

Works with: QBasic
Works with: Liberty BASIC
Works with: PowerBASIC version PB/CC, PB/DOS

BASIC only supports single-byte characters. The character "ø" is converted to "°" for printing to the console and length functions, but will still output to a file as "ø".

 INPUT a$
PRINT LEN(a$)

ZX Spectrum Basic[edit]

The ZX Spectrum needs line numbers:

10 INPUT a$
20 PRINT LEN(a$)

BBC BASIC[edit]

Character Length[edit]

      INPUT text$
PRINT LEN(text$)

Byte Length[edit]

      CP_ACP = 0
CP_UTF8 = &FDE9
 
textA$ = "møøse"
textW$ = " "
textU$ = " "
 
SYS "MultiByteToWideChar", CP_ACP, 0, textA$, -1, !^textW$, LEN(textW$)/2 TO nW%
SYS "WideCharToMultiByte", CP_UTF8, 0, textW$, -1, !^textU$, LEN(textU$), 0, 0
PRINT "Length in bytes (ANSI encoding) = " ; LEN(textA$)
PRINT "Length in bytes (UTF-16 encoding) = " ; 2*(nW%-1)
PRINT "Length in bytes (UTF-8 encoding) = " ; LEN($$!^textU$)

Output:

Length in bytes (ANSI encoding) = 5
Length in bytes (UTF-16 encoding) = 10
Length in bytes (UTF-8 encoding) = 7

Bracmat[edit]

The solutions work with UTF-8 encoded strings.

Byte Length[edit]

(ByteLength=
length
. @(!arg:? [?length)
& !length
);
 
out$ByteLength$𝔘𝔫𝔦𝔠𝔬𝔡𝔢

Answer:

28

Character Length[edit]

(CharacterLength=
length c
. 0:?length
& @( !arg
 :  ?
( %?c
& utf$!c:?k
& 1+!length:?length
& ~
)
 ?
)
| !length
);
 
out$CharacterLength$𝔘𝔫𝔦𝔠𝔬𝔡𝔢

Answer:

7

An improved version scans the input string character wise, not byte wise. Thus many string positions that are deemed not to be possible starting positions of UTF-8 are not even tried. The patterns [!p and [?p implement a ratchet mechanism. [!p indicates the start of a character and [?p remembers the end of the character, which becomes the start position of the next byte.

(CharacterLength=
length c p
. 0:?length:?p
& @( !arg
 :  ?
( [!p %?c
& utf$!c:?k
& 1+!length:?length
)
([?p&~)
 ?
)
| !length
);

C[edit]

Byte Length[edit]

Works with: ANSI C
Works with: GCC version 3.3.3
#include <string.h>
 
int main(void)
{
const char *string = "Hello, world!";
size_t length = strlen(string);
 
return 0;
}

or by hand:

int main(void) 
{
const char *string = "Hello, world!";
size_t length = 0;
 
const char *p = string;
while (*p++ != '\0') length++;
 
return 0;
}

or (for arrays of char only)

#include <stdlib.h>
 
int main(void)
{
char s[] = "Hello, world!";
size_t length = sizeof s - 1;
 
return 0;
}

Character Length[edit]

For wide character strings (usually Unicode uniform-width encodings such as UCS-2 or UCS-4):

#include <stdio.h>
#include <wchar.h>
 
int main(void)
{
wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */
size_t length;
 
length = wcslen(s);
printf("Length in characters = %d\n", length);
printf("Length in bytes = %d\n", sizeof(s) * sizeof(wchar_t));
 
return 0;
}

Dealing with raw multibyte string[edit]

Following code is written in UTF-8, and environment locale is assumed to be UTF-8 too. Note that "møøse" is here directly written in the source code for clarity, which is not a good idea in general. mbstowcs(), when passed NULL as the first argument, effectively counts the number of chars in given string under current locale.

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
 
int main()
{
setlocale(LC_CTYPE, "");
char moose[] = "møøse";
printf("bytes: %d\n", sizeof(moose) - 1);
printf("chars: %d\n", (int)mbstowcs(0, moose, 0));
 
return 0;
}
output
bytes: 7
chars: 5

C++[edit]

Byte Length[edit]

Works with: ISO C++
Works with: g++ version 4.0.2
#include <string> // (not <string.h>!)
using std::string;
 
int main()
{
string s = "Hello, world!";
string::size_type length = s.length(); // option 1: In Characters/Bytes
string::size_type size = s.size(); // option 2: In Characters/Bytes
// In bytes same as above since sizeof(char) == 1
string::size_type bytes = s.length() * sizeof(string::value_type);
}

For wide character strings:

#include <string>
using std::wstring;
 
int main()
{
wstring s = L"\u304A\u306F\u3088\u3046";
wstring::size_type length = s.length() * sizeof(wstring::value_type); // in bytes
}

Character Length[edit]

Works with: C++98
Works with: g++ version 4.0.2

For wide character strings:

#include <string>
using std::wstring;
 
int main()
{
wstring s = L"\u304A\u306F\u3088\u3046";
wstring::size_type length = s.length();
}

For narrow character strings:

Works with: C++11
Works with: clang++ version 3.0
#include <iostream>
#include <codecvt>
int main()
{
std::string utf8 = "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9d\x84\x8b"; // U+007a, U+00df, U+6c34, U+1d10b
std::cout << "Byte length: " << utf8.size() << '\n';
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
std::cout << "Character length: " << conv.from_bytes(utf8).size() << '\n';
}
Works with: C++98
Works with: g++ version 4.1.2 20061115 (prerelease) (SUSE Linux)
#include <cwchar>  // for mbstate_t
#include <locale>
 
// give the character length for a given named locale
std::size_t char_length(std::string const& text, char const* locale_name)
{
// locales work on pointers; get length and data from string and
// then don't touch the original string any more, to avoid
// invalidating the data pointer
std::size_t len = text.length();
char const* input = text.data();
 
// get the named locale
std::locale loc(locale_name);
 
// get the conversion facet of the locale
typedef std::codecvt<wchar_t, char, std::mbstate_t> cvt_type;
cvt_type const& cvt = std::use_facet<cvt_type>(loc);
 
// allocate buffer for conversion destination
std::size_t bufsize = cvt.max_length()*len;
wchar_t* destbuf = new wchar_t[bufsize];
wchar_t* dest_end;
 
// do the conversion
mbstate_t state = mbstate_t();
cvt.in(state, input, input+len, input, destbuf, destbuf+bufsize, dest_end);
 
// determine the length of the converted sequence
std::size_t length = dest_end - destbuf;
 
// get rid of the buffer
delete[] destbuf;
 
// return the result
return length;
}

Example usage (note that the locale names are OS specific):

#include <iostream>
 
int main()
{
// Tür (German for door) in UTF8
std::cout << char_length("\x54\xc3\xbc\x72", "de_DE.utf8") << "\n"; // outputs 3
 
// Tür in ISO-8859-1
std::cout << char_length("\x54\xfc\x72", "de_DE") << "\n"; // outputs 3
}

Note that the strings are given as explicit hex sequences, so that the encoding used for the source code won't matter.

C#[edit]

Platform: .NET

Works with: C # version 1.0+

Character Length[edit]

string s = "Hello, world!";
int characterLength = s.Length;

Byte Length[edit]

Strings in .NET are stored in Unicode.

using System.Text;
 
string s = "Hello, world!";
int byteLength = Encoding.Unicode.GetByteCount(s);

To get the number of bytes that the string would require in a different encoding, e.g., UTF8:

int utf8ByteLength = Encoding.UTF8.GetByteCount(s);

Clean[edit]

Byte Length[edit]

Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.

import StdEnv
 
strlen :: String -> Int
strlen string = size string
 
Start = strlen "Hello, world!"

Clojure[edit]

Byte Length[edit]

(def utf-8-octet-length #(-> % (.getBytes "UTF-8") count))
(map utf-8-octet-length ["møøse" "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" "J\u0332o\u0332s\u0332e\u0301\u0332"]) ; (7 28 14)
 
(def utf-16-octet-length (comp (partial * 2) count))
(map utf-16-octet-length ["møøse" "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" "J\u0332o\u0332s\u0332e\u0301\u0332"]) ; (10 28 18)
 
(def code-unit-length count)
(map code-unit-length ["møøse" "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" "J\u0332o\u0332s\u0332e\u0301\u0332"]) ; (5 14 9)

Character length[edit]

(def character-length #(.codePointCount % 0 (count %)))
(map character-length ["møøse" "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" "J\u0332o\u0332s\u0332e\u0301\u0332"]) ; (5 7 9)

Grapheme Length[edit]

(def grapheme-length
#(->> (doto (java.text.BreakIterator/getCharacterInstance)
(.setText %))
(partial (memfn next))
repeatedly
(take-while (partial not= java.text.BreakIterator/DONE))
count))
(map grapheme-length ["møøse" "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" "J\u0332o\u0332s\u0332e\u0301\u0332"]) ; (5 7 4)

COBOL[edit]

Byte Length[edit]

FUNCTION BYTE-LENGTH(str)

Alternative, non-standard extensions:

Works with: GNU Cobol
LENGTH OF str
Works with: GNU Cobol
Works with: Visual COBOL
FUNCTION LENGTH-AN(str)

Character Length[edit]

FUNCTION LENGTH(str)

ColdFusion[edit]

Byte Length[edit]

 
<cfoutput>
<cfset str = "Hello World">
<cfset j = createObject("java","java.lang.String").init(str)>
<cfset t = j.getBytes()>
<p>#arrayLen(t)#</p>
</cfoutput>
 

Character Length[edit]

#len("Hello World")#

Common Lisp[edit]

Byte Length[edit]

In Common Lisp, there is no standard way to examine byte representations of characters, except perhaps to write a string to a file, then reopen the file as binary. However, specific implementations will have ways to do so. For example:

Works with: SBCL
(length (sb-ext:string-to-octets "Hello Wørld"))

returns 12.

Character Length[edit]

Common Lisp represents strings as sequences of characters, not bytes, so there is no ambiguity about the encoding. The length function always returns the number of characters in a string.

(length "Hello World")

returns 11, and

(length "Hello Wørld")

returns 11 too.

Component Pascal[edit]

Component Pascal encodes strings in UTF-16, which represents each character with 16-bit value.

Character Length[edit]

 
MODULE TestLen;
 
IMPORT Out;
 
PROCEDURE DoCharLength*;
VAR s: ARRAY 16 OF CHAR; len: INTEGER;
BEGIN
s := "møøse";
len := LEN(s$);
Out.String("s: "); Out.String(s); Out.Ln;
Out.String("Length of characters: "); Out.Int(len, 0); Out.Ln
END DoCharLength;
 
END TestLen.
 

A symbol $ in LEN(s$) in Component Pascal allows to copy sequence of characters up to null-terminated character. So, LEN(s$) returns a real length of characters instead of allocated by variable.

Running command TestLen.DoCharLength gives following output:

s: møøse
Length of characters: 5

Byte Length[edit]

 
MODULE TestLen;
 
IMPORT Out;
 
PROCEDURE DoByteLength*;
VAR s: ARRAY 16 OF CHAR; len, v: INTEGER;
BEGIN
s := "møøse";
len := LEN(s$);
v := SIZE(CHAR) * len;
Out.String("s: "); Out.String(s); Out.Ln;
Out.String("Length of characters in bytes: "); Out.Int(v, 0); Out.Ln
END DoByteLength;
 
END TestLen.
 

Running command TestLen.DoByteLength gives following output:

s: møøse
Length of characters in bytes: 10

D[edit]

Byte Length[edit]

import std.stdio;
 
void showByteLen(T)(T[] str) {
writefln("Byte length: %2d - %(%02x%)",
str.length * T.sizeof, cast(ubyte[])str);
}
 
void main() {
string s1a = "møøse"; // UTF-8
showByteLen(s1a);
wstring s1b = "møøse"; // UTF-16
showByteLen(s1b);
dstring s1c = "møøse"; // UTF-32
showByteLen(s1c);
writeln();
 
string s2a = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
showByteLen(s2a);
wstring s2b = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
showByteLen(s2b);
dstring s2c = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
showByteLen(s2c);
writeln();
 
string s3a = "J̲o̲s̲é̲";
showByteLen(s3a);
wstring s3b = "J̲o̲s̲é̲";
showByteLen(s3b);
dstring s3c = "J̲o̲s̲é̲";
showByteLen(s3c);
}
Output:
Byte length:  7 - 6dc3b8c3b87365
Byte length: 10 - 6d00f800f80073006500
Byte length: 20 - 6d000000f8000000f80000007300000065000000

Byte length: 28 - f09d9498f09d94abf09d94a6f09d94a0f09d94acf09d94a1f09d94a2
Byte length: 28 - 35d818dd35d82bdd35d826dd35d820dd35d82cdd35d821dd35d822dd
Byte length: 28 - 18d501002bd5010026d5010020d501002cd5010021d5010022d50100

Byte length: 14 - 4accb26fccb273ccb265cc81ccb2
Byte length: 18 - 4a0032036f00320373003203650001033203
Byte length: 36 - 4a000000320300006f000000320300007300000032030000650000000103000032030000

Character Length[edit]

import std.stdio, std.range, std.conv;
 
void showCodePointsLen(T)(T[] str) {
writefln("Character length: %2d - %(%x %)",
str.walkLength(), cast(uint[])to!(dchar[])(str));
}
 
void main() {
string s1a = "møøse"; // UTF-8
showCodePointsLen(s1a);
wstring s1b = "møøse"; // UTF-16
showCodePointsLen(s1b);
dstring s1c = "møøse"; // UTF-32
showCodePointsLen(s1c);
writeln();
 
string s2a = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
showCodePointsLen(s2a);
wstring s2b = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
showCodePointsLen(s2b);
dstring s2c = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
showCodePointsLen(s2c);
writeln();
 
string s3a = "J̲o̲s̲é̲";
showCodePointsLen(s3a);
wstring s3b = "J̲o̲s̲é̲";
showCodePointsLen(s3b);
dstring s3c = "J̲o̲s̲é̲";
showCodePointsLen(s3c);
}
Output:
Character length:  5 - 6d f8 f8 73 65
Character length:  5 - 6d f8 f8 73 65
Character length:  5 - 6d f8 f8 73 65

Character length:  7 - 1d518 1d52b 1d526 1d520 1d52c 1d521 1d522
Character length:  7 - 1d518 1d52b 1d526 1d520 1d52c 1d521 1d522
Character length:  7 - 1d518 1d52b 1d526 1d520 1d52c 1d521 1d522

Character length:  9 - 4a 332 6f 332 73 332 65 301 332
Character length:  9 - 4a 332 6f 332 73 332 65 301 332
Character length:  9 - 4a 332 6f 332 73 332 65 301 332

Dc[edit]

Character Length[edit]

The following code output 5, which is the length of the string "abcde"

[abcde]Zp

Déjà Vu[edit]

Byte Length[edit]

Byte length depends on the encoding, which internally is UTF-8, but users of the language can only get at the raw bytes after encoding a string into a blob.

!. len !encode!utf-8 "møøse"
!. len !encode!utf-8 "𝔘𝔫𝔦𝔠𝔬𝔡𝔢"
Output:
7
28

Character Length[edit]

!. len "møøse"
!. len "𝔘𝔫𝔦𝔠𝔬𝔡𝔢"
Output:
5
7

E[edit]

Character Length[edit]

"Hello World".size()

Elena[edit]

Character Length[edit]

    #var s := "Hello, world!". // UTF-8 literal
#var ws := "Привет мир!"w. // UTF-16 literal
 
#var s_length := s length. // Number of UTF-8 characters
#var ws_length := ws length. // Number of UTF-16 characters
#var u_length := ws toArray length. //Number of UTF-32 characters
 

Byte Length[edit]

    #var s_byte_length := s toByteArray length. // Number of bytes
#var ws_byte_length := ws toByteArray length. // Number of bytes
 

Elixir[edit]

Byte Length[edit]

 
name = "J\x{332}o\x{332}s\x{332}e\x{301}\x{332}"
byte_size(name)
# => 14
 

Character Length[edit]

 
name = "J\x{332}o\x{332}s\x{332}e\x{301}\x{332}"
Enum.count(String.codepoints(name))
# => 9
 

Grapheme Length[edit]

 
name = "J\x{332}o\x{332}s\x{332}e\x{301}\x{332}"
String.length(name)
# => 4
 

Emacs Lisp[edit]

Character Length[edit]

(length "hello")
=> 5

Byte Length[edit]

(string-bytes "\u1D518\u1D52B\u1D526")
=> 12

string-bytes is the length of Emacs' internal representation. In Emacs 23 up this is utf-8. In earlier versions it was "emacs-mule".

Display Length[edit]

string-width is the displayed width of a string in the current frame and window. This is not the same as grapheme length since various Asian characters may display in 2 columns, depending on the type of tty or GUI.

(let ((str (apply 'string
(mapcar (lambda (c) (decode-char 'ucs c))
'(#x1112 #x1161 #x11ab #x1100 #x1173 #x11af)))))
(list (length str)
(string-bytes str)
(string-width str)))
=> (6 18 4) ;; in emacs 23 up

Erlang[edit]

Character Length[edit]

Strings are lists of integers in Erlang. So "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" is the list [120088,120107,120102,120096,120108,120097,120098].

9> U = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢".
[120088,120107,120102,120096,120108,120097,120098]
10> erlang:length(U).
7

Euphoria[edit]

Character Length[edit]

print(1,length("Hello World"))

F#[edit]

This is delegated to the standard .Net framework string and encoding functions.

Byte Length[edit]

open System.Text
let byte_length str = Encoding.UTF8.GetByteCount(str)

Character Length[edit]

"Hello, World".Length

Factor[edit]

Byte Length[edit]

Here are two words to compute the byte length of strings. The first one doesn't allocate new memory, the second one can easily be adapted to measure the byte length of encodings other than UTF8.

: string-byte-length ( string -- n ) [ code-point-length ] map-sum ;
: string-byte-length-2 ( string -- n ) utf8 encode length ;

Character Length[edit]

length works on any sequece, of which strings are one. Strings are UTF8 encoded.

length

Fantom[edit]

Byte length[edit]

A string can be converted into an instance of Buf to treat the string as a sequence of bytes according to a given charset: the default is UTF8, but 16-bit representations can also be used.

 
fansh> c := "møøse"
møøse
fansh> c.toBuf.size // find the byte length of the string in default (UTF8) encoding
7
fansh> c.toBuf.toHex // display UTF8 representation
6dc3b8c3b87365
fansh> c.toBuf(Charset.utf16LE).size // byte length in UTF16 little-endian
10
fansh> c.toBuf(Charset.utf16LE).toHex // display as UTF16 little-endian
6d00f800f80073006500
fansh> c.toBuf(Charset.utf16BE).size // byte length in UTF16 big-endian
10
fansh> c.toBuf(Charset.utf16BE).toHex // display as UTF16 big-endian
006d00f800f800730065
 

Character length[edit]

 
fansh> c := "møøse"
møøse
fansh> c.size
5
 

Forth[edit]

Works with: ANS Forth

Byte Length[edit]

Strings in Forth come in two forms, neither of which are the null-terminated form commonly used in the C standard library.

Counted string

A counted string is a single pointer to a short string in memory. The string's first byte is the count of the number of characters in the string. This is how symbols are stored in a Forth dictionary.

CREATE s ," Hello world" \ create string "s"
s C@ ( -- length=11 )
s COUNT ( addr len ) \ convert to a stack string, described below

Stack string

A string on the stack is represented by a pair of cells: the address of the string data and the length of the string data (in characters). The word COUNT converts a counted string into a stack string. The STRING utility wordset of ANS Forth works on these addr-len pairs. This representation has the advantages of not requiring null-termination, easy representation of substrings, and not being limited to 255 characters.

S" string" ( addr len)
DUP . \ 6

Character Length[edit]

The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.)

The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx".

2 base !
: utf8+ ( str -- str )
begin
char+
dup c@
11000000 and
10000000 <>
until ;
decimal
: count-utf8 ( zstr -- n )
0
begin
swap dup c@
while
utf8+
swap 1+
repeat drop ;

FreeBASIC[edit]

' FB 1.05.0 Win64
 
Dim s As String = "moose" '' variable length ascii string
Dim f As String * 5 = "moose" '' fixed length ascii string (in practice a zero byte is appended)
Dim z As ZString * 6 = "moose" '' fixed length zero terminated ascii string
Dim w As WString * 6 = "møøse" '' fixed length zero terminated unicode string
 
' Variable length strings have a descriptor consisting of 3 Integers (12 bytes on 32 bit, 24 bytes on 64 bit systems)
' In order, the descriptor contains the address of the data, the memory currently used and the memory allocated
 
' In Windows, WString uses UCS-2 encoding (i.e. 2 bytes per character, surrogates are not supported)
' In Linux, WString uses UCS-4 encoding (i.e. 4 bytes per character)
 
' The Len function always returns the length of the string in characters
' The SizeOf function returns the bytes used (by the descriptor in the case of variable length strings)
 
Print "s : " ; s, "Character Length : "; Len(s), "Byte Length : "; Len(s); " (data)"
Print "s : " ; s, "Character Length : "; Len(s), "Byte Length : "; SizeOf(s); " (descriptor)"
Print "f : " ; f, "Character Length : "; Len(s), "Byte Length : "; SizeOf(f)
Print "z : " ; z, "Character Length : "; Len(s), "Byte Length : "; SizeOf(z)
Print "w : " ; w, "Character Length : "; Len(s), "Byte Length : "; SizeOf(w)
Print
Sleep
Output:
s : moose     Character Length :  5       Byte Length :  5  (data)
s : moose     Character Length :  5       Byte Length :  24 (descriptor)
f : moose     Character Length :  5       Byte Length :  6
z : moose     Character Length :  5       Byte Length :  6
w : møøse     Character Length :  5       Byte Length :  12

GAP[edit]

Length("abc");
# or same result with
Size("abc");

Gnuplot[edit]

Byte Length[edit]

print strlen("hello")
=> 5

Go[edit]

Byte Length[edit]

package main
 
import "fmt"
 
func main() {
m := "møøse"
u := "𝔘𝔫𝔦𝔠𝔬𝔡𝔢"
j := "J̲o̲s̲é̲"
fmt.Printf("%d %s % x\n", len(m), m, m)
fmt.Printf("%d %s %x\n", len(u), u, u)
fmt.Printf("%d %s % x\n", len(j), j, j)
}

Output:

7 møøse 6d c3 b8 c3 b8 73 65
28 𝔘𝔫𝔦𝔠𝔬𝔡𝔢 f09d9498f09d94abf09d94a6f09d94a0f09d94acf09d94a1f09d94a2
14 J̲o̲s̲é̲ 4a cc b2 6f cc b2 73 cc b2 65 cc 81 cc b2

Character Length[edit]

package main
 
import (
"fmt"
"unicode/utf8"
)
 
func main() {
m := "møøse"
u := "𝔘𝔫𝔦𝔠𝔬𝔡𝔢"
j := "J̲o̲s̲é̲"
fmt.Printf("%d %s %x\n", utf8.RuneCountInString(m), m, []rune(m))
fmt.Printf("%d %s %x\n", utf8.RuneCountInString(u), u, []rune(u))
fmt.Printf("%d %s %x\n", utf8.RuneCountInString(j), j, []rune(j))
}

Output:

5 møøse [6d f8 f8 73 65]
7 𝔘𝔫𝔦𝔠𝔬𝔡𝔢 [1d518 1d52b 1d526 1d520 1d52c 1d521 1d522]
9 J̲o̲s̲é̲ [4a 332 6f 332 73 332 65 301 332]

Grapheme Length[edit]

Go does not have language or library features to recognize graphemes directly. For example, it does not provide functions implementing Unicode Standard Annex #29, Unicode Text Segmentation. It does however have convenient functions for recognizing Unicode character categories, and so an expected subset of grapheme possibilites is easy to recognize. Here is a solution recognizing the category "Mn", which includes the combining characters used in the task example.

package main
 
import (
"fmt"
"unicode"
"unicode/utf8"
)
 
func main() {
m := "møøse"
u := "𝔘𝔫𝔦𝔠𝔬𝔡𝔢"
j := "J̲o̲s̲é̲"
fmt.Printf("%d %s %x\n", grLen(m), m, []rune(m))
fmt.Printf("%d %s %x\n", grLen(u), u, []rune(u))
fmt.Printf("%d %s %x\n", grLen(j), j, []rune(j))
}
 
func grLen(s string) int {
if len(s) == 0 {
return 0
}
gr := 1
_, s1 := utf8.DecodeRuneInString(s)
for _, r := range s[s1:] {
if !unicode.Is(unicode.Mn, r) {
gr++
}
}
return gr
}

Output:

5 møøse [6d f8 f8 73 65]
7 𝔘𝔫𝔦𝔠𝔬𝔡𝔢 [1d518 1d52b 1d526 1d520 1d52c 1d521 1d522]
4 J̲o̲s̲é̲ [4a 332 6f 332 73 332 65 301 332]

Groovy[edit]

Calculating "Byte-length" (by which one typically means "in-memory storage size in bytes") is not possible through the facilities of the Groovy language alone. Calculating "Character length" is built into the Groovy extensions to java.lang.String.

Character Length[edit]

println "Hello World!".size()

Output:

12

Note: The Java "String.length()" method also works in Groovy, but "size()" is consistent with usage in other sequential or composite types.

GW-BASIC[edit]

GW-BASIC only supports single-byte characters.

10 INPUT A$
20 PRINT LEN(A$)

Haskell[edit]

Byte Length[edit]

It is not possible to determine the "byte length" of an ordinary string, because in Haskell, a string is a boxed list of unicode characters. So each character in a string is represented as whatever the compiler considers as the most efficient representation of a cons-cell and a unicode character, and not as a byte.

For efficient storage of sequences of bytes, there's Data.ByteString, which uses Word8 as a base type. Byte strings have an additional Data.ByteString.Char8 interface, which will truncate each Unicode Char to 8 bits as soon as it is converted to a byte string. However, this is not adequate for the task, because truncation simple will garble characters other than Latin-1, instead of encoding them into UTF-8, say.

There are several (non-standard, so far) Unicode encoding libraries available on Hackage. As an example, we'll use encoding-0.2, as Data.Encoding:

import Data.Encoding
import Data.ByteString as B
 
strUTF8 :: ByteString
strUTF8 = encode UTF8 "Hello World!"
 
strUTF32 :: ByteString
strUTF32 = encode UTF32 "Hello World!"
 
strlenUTF8 = B.length strUTF8
strlenUTF32 = B.length strUTF32

Character Length[edit]

Works with: GHCi version 6.6
Works with: Hugs

The base type Char defined by the standard is already intended for (plain) Unicode characters.

strlen = length "Hello, world!"

HicEst[edit]

LEN("1 character == 1 byte") ! 21

Icon and Unicon[edit]

Character Length[edit]

   length := *s

Note: Neither Icon nor Unicon currently supports double-byte character sets.

IDL[edit]

Byte Length[edit]

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Compiler: any IDL compiler should do

length = strlen("Hello, world!")

Character Length[edit]

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.
length = strlen("Hello, world!")

Io[edit]

Byte Length[edit]

"møøse" sizeInBytes

Character Length[edit]

"møøse" size

J[edit]

Byte Length[edit]

   #     'møøse'
7

Here we use the default encoding for character literals (8 bit wide literals).

Character Length[edit]

   #7 u: 'møøse'
5

Here we have used 16 bit wide character literals. See also the dictionary page for u:.

Java[edit]

Byte Length[edit]

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values.

Another way to know the byte length of a string -who cares- is to explicitly specify the charset we desire.

String s = "Hello, world!"; 
int byteCountUTF16 = s.getBytes("UTF-16").length; // Incorrect: it yields 28 (that is with the BOM)
int byteCountUTF16LE = s.getBytes("UTF-16LE").length; // Correct: it yields 26
int byteCountUTF8 = s.getBytes("UTF-8").length; // yields 13

Character Length[edit]

Java encodes strings in UTF-16, which represents each character (code point) with one or two 16-bit code units. This is a variable-length encoding scheme. The most commonly used characters are represented by one 16-bit code unit, while rarer ones like some mathematical symbols are represented by two.

The length method of String objects is not the length of that String in characters. Instead, it only gives the number of 16-bit code units used to encode a string. This is not (always) the number of Unicode characters (code points) in the string.

String s = "Hello, world!";
int not_really_the_length = s.length(); // XXX: does not (always) count Unicode characters (code points)!

Since Java 1.5, the actual number of characters (code points) can be determined by calling the codePointCount method.

String str = "\uD834\uDD2A"; //U+1D12A
int not_really__the_length = str.length(); // value is 2, which is not the length in characters
int actual_length = str.codePointCount(0, str.length()); // value is 1, which is the length in characters

Grapheme Length[edit]

import java.text.BreakIterator;
 
public class Grapheme {
public static void main(String[] args) {
printLength("møøse");
printLength("𝔘𝔫𝔦𝔠𝔬𝔡𝔢");
printLength("J̲o̲s̲é̲");
}
 
public static void printLength(String s) {
BreakIterator it = BreakIterator.getCharacterInstance();
it.setText(s);
int count = 0;
while (it.next() != BreakIterator.DONE) {
count++;
}
System.out.println("Grapheme length: " + count+ " " + s);
}
}

Output:

Grapheme length: 5 møøse
Grapheme length: 7 𝔘𝔫𝔦𝔠𝔬𝔡𝔢
Grapheme length: 4 J̲o̲s̲é̲

JavaScript[edit]

Byte Length[edit]

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

var s = "Hello, world!";
var byteCount = s.length * 2; //26

Character Length[edit]

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.

var str1 = "Hello, world!";
var len1 = str1.length; //13
 
var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; //2

jq[edit]

jq strings are JSON strings and are therefore encoded as UTF-8. When given a JSON string, the length filter emits the number of Unicode codepoints that it contains:

$ cat String_length.jq
def describe:
"length of \(.) is \(length)";
 
("J̲o̲s̲é̲", "𝔘𝔫𝔦𝔠𝔬𝔡𝔢") | describe
 
$ jq -n -f String_length.jq
"length of J̲o̲s̲é̲ is 8"
"length of 𝔘𝔫𝔦𝔠𝔬𝔡𝔢 is 7"

Julia[edit]

Julia encodes strings as UTF-8, so the byte length (via sizeof) will be different from the string length (via length) only if the string contains non-ASCII characters.

Byte Length[edit]

sizeof("Hello, world!") # gives 13
sizeof("Hellö, wørld!") # gives 15

Character Length[edit]

length("Hello, world!") # gives 13
length("Hellö, wørld!") # gives 13

JudoScript[edit]

Byte Length[edit]

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.
//Store length of hello world in length and print it
. length = "Hello World".length();

Character Length[edit]

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.
//Store length of hello world in length and print it
. length = "Hello World".length()

LabVIEW[edit]

Byte Length[edit]

LabVIEW is using a special variant of UTF-8, so byte length == character length.


Character Length[edit]

LV strlen.png


Lasso[edit]

Character Length[edit]

'Hello, world!'->size // 13
'møøse'->size // 5
'𝔘𝔫𝔦𝔠𝔬𝔡𝔢'->size // 7

Byte Length[edit]

'Hello, world!'->asBytes->size // 13
'møøse'->asBytes->size // 7
'𝔘𝔫𝔦𝔠𝔬𝔡𝔢'->asBytes->size // 28

LFE[edit]

Character Length[edit]

 
(length "ASCII text")
10
(length "𝔘𝔫𝔦𝔠𝔬𝔡𝔢 𝔗𝔢𝒙𝔱")
12
> (set encoded (binary ("𝔘𝔫𝔦𝔠𝔬𝔡𝔢 𝔗𝔢𝒙𝔱" utf8)))
#B(240 157 148 152 240 157 148 171 240 157 ...)
> (length (unicode:characters_to_list encoded 'utf8))
12
 

Byte Length[edit]

 
> (set encoded (binary ("𝔘𝔫𝔦𝔠𝔬𝔡𝔢 𝔗𝔢𝒙𝔱" utf8)))
#B(240 157 148 152 240 157 148 171 240 157 ...)
> (byte_size encoded)
45
> (set bytes (binary ("𝔘𝔫𝔦𝔠𝔬𝔡𝔢 𝔗𝔢𝒙𝔱")))
#B(24 43 38 32 44 33 34 32 23 34 153 49)
> (byte_size bytes)
12
> (set encoded (binary ("ASCII text" utf8)))
#B(65 83 67 73 73 32 116 101 120 116)
> (byte_size encoded)
10
 

Liberty BASIC[edit]

See BASIC

Lingo[edit]

Character Length[edit]

utf8Str = "Hello world äöü"
put utf8Str.length
-- 15

Byte Length[edit]

utf8Str = "Hello world äöü"
put bytearray(utf8Str).length
-- 18

[edit]

Logo is so old that only ASCII encoding is supported. Modern versions of Logo may have enhanced character set support.

print count "|Hello World|  ; 11
print count "møøse  ; 5
print char 248  ; ø - implies ISO-Latin character set

LSE64[edit]

Byte Length[edit]

LSE stores strings as arrays of characters in 64-bit cells plus a count.

" Hello world" @ 1 + 8 * ,   # 96 = (11+1)*(size of a cell) = 12*8

Character Length[edit]

LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string.

" Hello world" @ ,   # 11

Lua[edit]

Works with: Lua version 5.0+

In Lua, a character is always the size of one byte so there is no difference between byte length and character length.

Byte Length[edit]

str = "Hello world"
length = #str

or

str = "Hello world"
length = string.len(str)

Character Length[edit]

str = "Hello world"
length = #str

or

str = "Hello world"
length = string.len(str)

Maple[edit]

Character length[edit]

length("Hello world");

Byte count[edit]

nops(convert("Hello world",bytes));

Mathematica[edit]

Character length[edit]

StringLength["Hello world"]

Byte length[edit]

StringByteCount["Hello world"]

MATLAB[edit]

Character Length[edit]

>> length('møøse')
 
ans =
 
5

Byte Length[edit]

MATLAB apparently encodes strings using UTF-16.

>> numel(dec2hex('møøse'))
 
ans =
 
10

Maxima[edit]

s: "the quick brown fox jumps over the lazy dog";
slength(s);
/* 43 */

MAXScript[edit]

Character Length[edit]

"Hello world".count


Mercury[edit]

Mercury's C and Erlang backends use UTF-8 encoded strings; the Java and C# backends using the underlying UTF-16 encoding of those languages. The function string.length/1 returns the number of code units in a string in target language encoding. The function string.count_utf8_code_units/1 returns the number of UTF-8 code units in a string regardless of the target language.

Byte Length[edit]

:- module string_byte_length.
:- interface.
 
:- import_module io.
 
:- pred main(io::di, io::uo) is det.
 
:- implementation.
 
:- import_module list, string.
 
main(!IO) :-
Words = ["møøse", "𝔘𝔫𝔦𝔠𝔬𝔡𝔢", "J\x332\o\x332\s\x332\e\x301\\x332\"],
io.write_list(Words, "", write_length, !IO).
 
:- pred write_length(string::in, io::di, io::uo) is det.
 
write_length(String, !IO):-
NumBytes = count_utf8_code_units(String),
io.format("%s: %d bytes\n", [s(String), i(NumBytes)], !IO).

Output:

møøse: 7 bytes
𝔘𝔫𝔦𝔠𝔬𝔡𝔢: 28 bytes
J̲o̲s̲é̲: 14 bytes

Character Length[edit]

The function string.count_codepoints/1 returns the number of code points in a string.

:- module string_character_length.
:- interface.
 
:- import_module io.
 
:- pred main(io::di, io::uo) is det.
 
:- implementation.
 
:- import_module list, string.
 
main(!IO) :-
Words = ["møøse", "𝔘𝔫𝔦𝔠𝔬𝔡𝔢", "J\x332\o\x332\s\x332\e\x301\\x332\"],
io.write_list(Words, "", write_length, !IO).
 
:- pred write_length(string::in, io::di, io::uo) is det.
 
write_length(String, !IO) :-
NumChars = count_codepoints(String),
io.format("%s: %d characters\n", [s(String), i(NumChars)], !IO).

Output:

møøse: 5 characters
𝔘𝔫𝔦𝔠𝔬𝔡𝔢: 7 characters
J̲o̲s̲é̲: 9 characters

Metafont[edit]

Metafont has no way of handling properly encodings different from ASCII. So it is able to count only the number of bytes in a string.

string s;
s := "Hello Moose";
show length(s);  % 11 (ok)
s := "Hello Møøse";
show length(s);  % 13 (number of bytes when the string is UTF-8 encoded,
 % since ø takes two bytes)

Note: in the lang tag, Møøse is Latin1-reencoded, showing up two bytes (as Latin1) instead of one

MIPS Assembly[edit]

This only supports ASCII encoding, so it'll return both byte length and char length.

 
.data
#.asciiz automatically adds the NULL terminator character, \0 for us.
string: .asciiz "Nice string you got there!"
 
.text
main:
la $a1,string #load the beginning address of the string.
 
loop:
lb $a2,($a1) #load byte (i.e. the char) at $a1 into $a2
addi $a1,$a1,1 #increment $a1
beqz $a2,exit_procedure #see if we've hit the NULL char yet
addi $a0,$a0,1 #increment counter
j loop #back to start
 
exit_procedure:
li $v0,1 #set syscall to print integer
syscall
 
li $v0,10 #set syscall to cleanly exit EXIT_SUCCESS
syscall
 

mIRC Scripting Language[edit]

Byte Length[edit]

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.
alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }

Character Length[edit]

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

$utfdecode() converts an UTF-8 string to the locale encoding, with unrepresentable characters as question marks. Since mIRC is not yet fully Unicode aware, entering Unicode text trough a dialog box will automatically convert it to ASCII.

alias utf8len { return $len($utfdecode($1)) }
alias stringlength2 {
var %name = Børje
echo -a %name is: $utf8len(%name) characters long!
}

Modula-3[edit]

Byte Length[edit]

MODULE ByteLength EXPORTS Main;
 
IMPORT IO, Fmt, Text;
 
VAR s: TEXT := "Foo bar baz";
 
BEGIN
IO.Put("Byte length of s: " & Fmt.Int((Text.Length(s) * BYTESIZE(s))) & "\n");
END ByteLength.

Character Length[edit]

MODULE StringLength EXPORTS Main;
 
IMPORT IO, Fmt, Text;
 
VAR s: TEXT := "Foo bar baz";
 
BEGIN
IO.Put("String length of s: " & Fmt.Int(Text.Length(s)) & "\n");
END StringLength.

NewLISP[edit]

Character Length[edit]

(set 'Str "møøse")
(println Str " is " (length Str) " characters long")

Nemerle[edit]

Both examples rely on .Net faculties, so they're almost identical to C#

Character Length[edit]

def message = "How long am I anyways?";
def charlength = message.Length;

Byte Length[edit]

using System.Text;
 
def message = "How long am I anyways?";
def bytelength = Encoding.Unicode.GetByteCount(message);

Nim[edit]

Byte Length[edit]

var s: string = "Hello, world! ☺"
echo '"',s, '"'," has byte length: ", len(s)
 
# -> "Hello, world! ☺" has unicode char length: 17

Character Length[edit]

import unicode
 
var s: string = "Hello, world! ☺"
echo '"',s, '"'," has unicode char length: ", runeLen(s)
 
# -> "Hello, world! ☺" has unicode char length: 15

Oberon-2[edit]

Byte Length[edit]

MODULE Size;
 
IMPORT Out;
 
VAR s: LONGINT;
string: ARRAY 5 OF CHAR;
 
BEGIN
string := "Foo";
s := LEN(string);
Out.String("Size: ");
Out.LongInt(s,0);
Out.Ln;
END Size.

Output:

Size: 5

Character Length[edit]

MODULE Length;
 
IMPORT Out, Strings;
 
VAR l: INTEGER;
string: ARRAY 5 OF CHAR;
 
BEGIN
string := "Foo";
l := Strings.Length(string);
Out.String("Length: ");
Out.Int(l,0);
Out.Ln;
END Length.

Output:

Length: 3

Objective-C[edit]

In order to be not ambiguous about the encoding used in the string, we explicitly provide it in UTF-8 encoding. The string is "møøse" (ø UTF-8 encoded is in hexadecimal C3 B8).

Character Length[edit]

Objective-C encodes strings in UTF-16, which represents each character (code point) with one or two 16-bit code units. This is a variable-length encoding scheme. The most commonly used characters are represented by one 16-bit code unit, while "supplementary characters" are represented by two (called a "surrogate pair").

The length method of NSString objects is not the length of that string in characters. Instead, it only gives the number of 16-bit code units used to encode a string. This is not (always) the number of Unicode characters (code points) in the string.

// Return the length in characters
// XXX: does not (always) count Unicode characters (code points)!
unsigned int numberOfCharacters = [@"møøse" length]; // 5

Since Mac OS X 10.6, CFString has methods for converting between supplementary characters and surrogate pair. However, the easiest way to get the number of characters is probably to encode it in UTF-32 (which is a fixed-length encoding) and divide by 4:

int realCharacterCount = [s lengthOfBytesUsingEncoding: NSUTF32StringEncoding] / 4;

Byte Length[edit]

Objective-C encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length method of NSString objects returns the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

int byteCount = [@"møøse" length] * 2; // 10

Another way to know the byte length of a string is to explicitly specify the charset we desire.

// Return the number of bytes depending on the encoding,
// here explicitly UTF-8
unsigned numberOfBytes =
[@"møøse" lengthOfBytesUsingEncoding: NSUTF8StringEncoding]; // 7

Objeck[edit]

All character string elements are 1-byte in size therefore a string's byte size and length are the same.

Character Length[edit]

 
"Foo"->Size()->PrintLine();
 

Byte Length[edit]

 
"Foo"->Size()->PrintLine();
 

OCaml[edit]

In OCaml currently, characters inside the standard type string are bytes, and a single character taken alone has the same binary representation as the OCaml int (which is equivalent to a C long) which is a machine word.

For internationalization there is Camomile, a comprehensive Unicode library for OCaml. Camomile provides Unicode character type, UTF-8, UTF-16, and more...

Byte Length[edit]

Standard OCaml strings are classic ASCII ISO 8859-1, so the function String.length returns the byte length which is the character length in this encoding:

String.length "Hello world" ;;

Character Length[edit]

While using the UTF8 module of Camomile the byte length of an utf8 encoded string will be get with String.length and the character length will be returned by UTF8.length:

String.length "møøse"
UTF8.length "møøse"

Octave[edit]

s = "string";
stringlen = length(s)

This gives the number of bytes, not of characters. e.g. length("è") is 2 when "è" is encoded e.g. as UTF-8.


Oforth[edit]

Oforth strings are UTF8 encoded.

size method returns number of UTF8 characters into a string

basicSize method returns number of bytes into a string

OpenEdge/Progress[edit]

The codepage can be set independently for input / output and internal operations. The following examples are started from an iso8859-1 session and therefore need to use fix-codepage to adjust the string to utf-8.

Character Length[edit]

DEF VAR lcc AS LONGCHAR.
 
FIX-CODEPAGE( lcc ) = "UTF-8".
lcc = "møøse".
 
MESSAGE LENGTH( lcc ) VIEW-AS ALERT-BOX.

Byte Length[edit]

DEF VAR lcc AS LONGCHAR.
 
FIX-CODEPAGE( lcc ) = "UTF-8".
lcc = "møøse".
 
MESSAGE LENGTH( lcc, "RAW" ) VIEW-AS ALERT-BOX.

Oz[edit]

Byte Length[edit]

{Show {Length "Hello World"}}

Oz uses a single-byte encoding by default. So for normal strings, this will also show the correct character length.

PARI/GP[edit]

Character Length[edit]

Characters = bytes in Pari; the underlying strings are C strings interpreted as US-ASCII.

len(s)=#s; \\ Alternately, len(s)=length(s); or even len=length;

Byte Length[edit]

This works on objects of any sort, not just strings, and includes overhead.

len(s)=sizebyte(s);

Pascal[edit]

Byte Length[edit]

 
const
s = 'abcdef';
begin
writeln (length(s))
end.
 

Output:

6

Perl[edit]

Byte Length[edit]

Works with: Perl version 5.8

Strings in Perl consist of characters. Measuring the byte length therefore requires conversion to some binary representation (called encoding, both noun and verb).

use utf8; # so we can use literal characters like ☺ in source
use Encode qw(encode);
 
print length encode 'UTF-8', "Hello, world! ☺";
# 17. The last character takes 3 bytes, the others 1 byte each.
 
print length encode 'UTF-16', "Hello, world! ☺";
# 32. 2 bytes for the BOM, then 15 byte pairs for each character.

Character Length[edit]

Works with: Perl version 5.X
my $length = length "Hello, world!";

Grapheme Length[edit]

Since Perl 5.12, /\X/ matches an extended grapheme cluster. See "Unicode overhaul" in perl5120delta and also UAX #29.

Perl understands that "\x{1112}\x{1161}\x{11ab}\x{1100}\x{1173}\x{11af}" (한글) contains 2 graphemes, just like "\x{d55c}\x{ae00}" (한글). The longer string uses Korean combining jamo characters.

Works with: Perl version 5.12
use v5.12;
my $string = "\x{1112}\x{1161}\x{11ab}\x{1100}\x{1173}\x{11af}"; # 한글
my $len;
$len++ while ($string =~ /\X/g);
printf "Grapheme length: %d\n", $len;
Output:
Grapheme length: 2

Perl 6[edit]

Byte Length[edit]

say 'møøse'.encode('UTF-8').bytes;

Character Length[edit]

say 'møøse'.codes;

Grapheme Length[edit]

say 'møøse'.chars;

Phix[edit]

As yet there is no offical support for character-wise processing of unicode strings, but there is some incomplete code knocking about somewhere (I think the file is unicode.e).

Byte Length[edit]

constant s = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢"
?length(s)
Output:
28

PHP[edit]

Program in a UTF8 linux:

<?php
foreach (array('møøse', '𝔘𝔫𝔦𝔠𝔬𝔡𝔢', 'J̲o̲s̲é̲') as $s1) {
printf('String "%s" measured with strlen: %d mb_strlen: %s grapheme_strlen %s%s',
$s1, strlen($s1),mb_strlen($s1), grapheme_strlen($s1), PHP_EOL);
}
 

yields the result:

String "møøse" measured with strlen: 7 mb_strlen: 7 grapheme_strlen 5
String "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" measured with strlen: 28 mb_strlen: 28 grapheme_strlen 7
String "J̲o̲s̲é̲" measured with strlen: 13 mb_strlen: 13 grapheme_strlen 4

PicoLisp[edit]

(let Str "møøse"
(prinl "Character Length of \"" Str "\" is " (length Str))
(prinl "Byte Length of \"" Str "\" is " (size Str)) )

Output:

Character Length of "møøse" is 5
Byte Length of "møøse" is 7
-> 7

PL/I[edit]

declare WS widechar (13) initial ('Hello world.');
put ('Character length=', length (WS));
put skip list ('Byte length=', size(WS));
 
declare SM graphic (13) initial ('Hello world');
put ('Character length=', length(SM));
put skip list ('Byte length=', size(trim(SM)));

PL/SQL[edit]

LENGTH calculates length using characters as defined by the input character set. LENGTHB uses bytes instead of characters. LENGTHC uses Unicode complete characters. LENGTH2 uses UCS2 code points. LENGTH4 uses UCS4 code points.

Byte Length[edit]

DECLARE
string VARCHAR2(50) := 'Hello, world!';
stringlength NUMBER;
BEGIN
stringlength := LENGTHB(string);
END;

Character Length[edit]

DECLARE
string VARCHAR2(50) := 'Hello, world!';
stringlength NUMBER;
unicodelength NUMBER;
ucs2length NUMBER;
ucs4length NUMBER;
BEGIN
stringlength := LENGTH(string);
unicodelength := LENGTHC(string);
ucs2length := LENGTH2(string);
ucs4length := LENGTH4(string);
END;

Pop11[edit]

Byte Length[edit]

Currently Pop11 supports only strings consisting of 1-byte units. Strings can carry arbitrary binary data, so user can for example use UTF-8 (however builtin procedures will treat each byte as a single character). The length function for strings returns length in bytes:

lvars str = 'Hello, world!';
lvars len = length(str);

PostScript[edit]

Character Length[edit]

 
(Hello World) length =
11
 

Potion[edit]

Character Length[edit]

"møøse" length print
"𝔘𝔫𝔦𝔠𝔬𝔡𝔢" length print
"J̲o̲s̲é̲" length print

PowerShell[edit]

Character Length[edit]

$s = "Hëlló Wørłð"
$s.Length

Byte Length[edit]

Translation of: C#

For UTF-16, which is the default in .NET and therefore PowerShell:

$s = "Hëlló Wørłð"
[System.Text.Encoding]::Unicode.GetByteCount($s)

For UTF-8:

[System.Text.Encoding]::UTF8.GetByteCount($s)

PureBasic[edit]

Character Length[edit]

 a = Len("Hello World") ;a will be 11

Byte Length[edit]

Returns the number of bytes required to store the string in memory in the given format in bytes. 'Format' can be #PB_Ascii, #PB_UTF8 or #PB_Unicode. PureBasic code can be compiled using either Unicode (2-byte) or Ascii (1-byte) encodings for strings. If 'Format' is not specified, the mode of the executable (unicode or ascii) is used.

Note: The number of bytes returned does not include the terminating Null-Character of the string. The size of the Null-Character is 1 byte for Ascii and UTF8 mode and 2 bytes for Unicode mode.

a = StringByteLength("ä", #PB_UTF8)    ;a will be 2
b = StringByteLength("ä", #PB_Ascii) ;b will be 1
c = StringByteLength("ä", #PB_Unicode) ;c will be 2
 

Python[edit]

2.x[edit]

In Python 2.x, there are two types of strings: regular (8-bit) strings, and Unicode strings. Unicode string literals are prefixed with "u".

Byte Length[edit]

Works with: Python version 2.x

For 8-bit strings, the byte length is the same as the character length:

print len('ascii')
# 5

For Unicode strings, length depends on the internal encoding. Since version 2.2 Python shipped with two build options: it either uses 2 or 4 bytes per character. The internal representation is not interesting for the user.

# The letter Alef
print len(u'\u05d0'.encode('utf-8'))
# 2
print len(u'\u05d0'.encode('iso-8859-8'))
# 1

Example from the problem statement:

#!/bin/env python
# -*- coding: UTF-8 -*-
s = u"møøse"
assert len(s) == 5
assert len(s.encode('UTF-8')) == 7
assert len(s.encode('UTF-16-BE')) == 10 # There are 3 different UTF-16 encodings: LE and BE are little endian and big endian respectively, the third one (without suffix) adds 2 extra leading bytes: the byte-order mark (BOM).

Character Length[edit]

Works with: Python version 2.4

len() returns the number of code units (not code points!) in a Unicode string or plain ASCII string. On a wide build, this is the same as the number of code points, but on a narrow one it is not. Most linux distributions install the wide build by default, you can check the build at runtime with:

import sys
sys.maxunicode # 1114111 on a wide build, 65535 on a narrow build

To get the length of encoded string, you have to decode it first:

print len('ascii')
# 5
print len(u'\u05d0') # the letter Alef as unicode literal
# 1
print len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string
# 1
print hex(sys.maxunicode), len(unichr(0x1F4A9))
# ('0x10ffff', 1)

On a narrow build, len() gives the wrong answer for non-BMP chars

print hex(sys.maxunicode), len(unichr(0x1F4A9))
# ('0xffff', 2)

3.x[edit]

In Python 3.x, strings are Unicode strings and a bytes type if available for storing an immutable sequence of bytes (there's also available a bytearray type, which is mutable)

Byte Length[edit]

You can use len() to get the length of a byte sequence.

print(len(b'Hello, World!'))
# 13

To get a byte sequence from a string, you have to encode it with the desired encoding:

# The letter Alef
print(len('\u05d0'.encode())) # the default encoding is utf-8 in Python3
# 2
print(len('\u05d0'.encode('iso-8859-8')))
# 1

Example from the problem statement:

#!/bin/env python
# -*- coding: UTF-8 -*-
s = "møøse"
assert len(s) == 5
assert len(s.encode('UTF-8')) == 7
assert len(s.encode('UTF-16-BE')) == 10 # There are 3 different UTF-16 encodings: LE and BE are little endian and big endian respectively, the third one (without suffix) adds 2 extra leading bytes: the byte-order mark (BOM).
u="𝔘𝔫𝔦𝔠𝔬𝔡𝔢"
assert len(u.encode()) == 28
assert len(u.encode('UTF-16-BE')) == 28

Character Length[edit]

Since Python3.3 the internal storage of unicode strings has been optimized: strings that don't contain characters outside the latin-1 set, are stored with 8 bits for each character, strings that don't contain codepoints outside the BMP (lone surrogates aren't allowed) are stored as UCS-2, while all the others use UCS-4.

Thus Python is able to avoid memory overhead when dealing with only ASCII strings, while handling correctly all codepoints in Unicode. len() returns the number of characters/codepoints:

print(len("𝔘𝔫𝔦𝔠𝔬𝔡𝔢")) 
# 7

Until Python 3.2 instead, length depended on the internal encoding, since it shipped with two build options: it either used 2 or 4 bytes per character.

len() returned the number of code units in a string, which could be different from the number of characters. In a narrow build, this is not a reliable way to get the number of characters. You can only easily count code points in a wide build. Most linux distributions install the wide build by default, you can check the build at runtime with:

import sys
sys.maxunicode # 1114111 on a wide build, 65535 on a narrow build
print(len('ascii'))
# 5
print(len('\u05d0')) # the letter Alef as unicode literal
# 1

To get the length of an encoded byte sequence, you have to decode it first:

print(len(b'\xd7\x90'.decode('utf-8'))) # Alef encoded as utf-8 byte sequence
# 1
print(hex(sys.maxunicode), len(unichr(0x1F4A9)))
# ('0x10ffff', 1)

On a narrow build, len() gives the wrong answer for non-BMP chars

print(hex(sys.maxunicode), len(unichr(0x1F4A9)))
# ('0xffff', 2)

R[edit]

Byte length[edit]

a <- "m\u00f8\u00f8se"
print(nchar(a, type="bytes")) # print 7

Character length[edit]

print(nchar(a, type="chars"))  # print 5

Racket[edit]

Using this definition:

(define str "J\u0332o\u0332s\u0332e\u0301\u0332")

on the REPL, we get the following:

Character length[edit]

-> (printf "str has ~a characters" (string-length str))
str has 9 characters

Byte length[edit]

-> (printf "str has ~a bytes in utf-8" (bytes-length (string->bytes/utf-8 str)))
str has 14 bytes in utf-8

REBOL[edit]

Rebol 2 does not natively support UCS (Unicode), so character and byte length are the same. See utf-8.r for an external UTF-8 library.

Rebol 3 natively supports UTF-8.

Byte Length[edit]

;; r2
length? "møøse"
 
;; r3
length? to-binary "møøse"

Character length[edit]

;; r3
length? "møøse"

Retro[edit]

Byte Length[edit]

"møøse" getLength putn

Character Length[edit]

Retro does not have built-in support for Unicode, but counting of characters can be done with a small amount of effort.

chain: UTF8'
{{
 : utf+ ( $-$ )
[ 1+ dup @ %11000000 and %10000000 = ] while ;
 
 : count ( $-$ )
0 !here
repeat dup @ 0; drop utf+ here ++ again ;
---reveal---
 : getLength ( $-n )
count drop @here ;
}}
;chain
 
"møøse" ^UTF8'getLength putn

REXX[edit]

Classic REXX don't support Unicodes, so character and byte length are the same.
All characters (in strings) are stored as 8-bit bytes.     Indeed, everything in REXX
is stored as character strings.

Byte Length[edit]

/*REXX program displays the lengths  (in bytes/characters)  for various strings.        */
/* 1 */ /*a handy-dandy over/under scale.*/
/* 123456789012345 */
hello = 'Hello, world!'  ; say 'the length of HELLO is ' length(hello)
happy = 'Hello, world! ☺'  ; say 'the length of HAPPY is ' length(happy)
jose = 'José'  ; say 'the length of JOSE is ' length(jose)
nill = ''  ; say 'the length of NILL is ' length(nill)
null =  ; say 'the length of NULL is ' length(null)
sum = 5+1  ; say 'the length of SUM is ' length(sum)
/* [↑] is, of course, 6. */
/*stick a fork in it, we're done.*/

output

length of HELLO is  13
length of HAPPY is  15
length of  JOSE is  4
length of  NILL is  0
length of  NULL is  0
length of   SUM is  1

Ring[edit]

 
aString = "Welcome to the Ring Programming Language"
aStringSize = len(aString)
see "Character lenghts : " + aStringSize
 

Ruby[edit]

Byte Length[edit]

Since Ruby 1.8.7, String#bytesize is the byte length.

Works with: Ruby version 1.8.7 or 1.9
# -*- coding: utf-8 -*-
 
puts "あいうえお".bytesize
# => 15

Character Length[edit]

Since Ruby 1.9, String#length (alias String#size) is the character length. The magic comment, "coding: utf-8", sets the encoding of all string literals in this file.

Works with: Ruby version 1.9
# -*- coding: utf-8 -*-
 
puts "あいうえお".length
# => 5
 
puts "あいうえお".size # alias for length
# => 5

Code Set Independence[edit]

The next examples show the byte length and character length of "møøse" in different encodings.

To run these programs, you must convert them to different encodings.

  • If you use Emacs: Paste each program into Emacs. The magic comment, like -*- coding: iso-8859-1 -*-, will tell Emacs to save with that encoding.
  • If your text editor saves UTF-8: Convert the file before running it. For example:
    $ ruby -pe '$_.encode!("iso-8859-1", "utf-8")' scratch.rb | ruby
Works with: Ruby version 1.9
Program Output
# -*- coding: iso-8859-1 -*-
s = "møøse"
puts "Byte length: %d" % s.bytesize
puts "Character length: %d" % s.length
Byte length: 5
Character length: 5
# -*- coding: utf-8 -*-
s = "møøse"
puts "Byte length: %d" % s.bytesize
puts "Character length: %d" % s.length
Byte length: 7
Character length: 5
# -*- coding: gb18030 -*-
s = "møøse"
puts "Byte length: %d" % s.bytesize
puts "Character length: %d" % s.length
Byte length: 11
Character length: 5

Ruby 1.8[edit]

The next example works with both Ruby 1.8 and Ruby 1.9. In Ruby 1.8, the strings have no encodings, and String#length is the byte length. In Ruby 1.8, the regular expressions knows three Japanese encodings.

  • /./n uses no multibyte encoding.
  • /./e uses EUC-JP.
  • /./s uses Shift-JIS or Windows-31J.
  • /./u uses UTF-8.

Then either string.scan(/./u).size or string.gsub(/./u, ' ').size counts the UTF-8 characters in string.

# -*- coding: utf-8 -*-
 
class String
# Define String#bytesize for Ruby 1.8.6.
unless method_defined?(:bytesize)
alias bytesize length
end
end
 
s = "文字化け"
puts "Byte length: %d" % s.bytesize
puts "Character length: %d" % s.gsub(/./u, ' ').size

Run BASIC[edit]

input a$
print len(a$)

SAS[edit]

data _null_;
a="Hello, World!";
b=length(c);
put _all_;
run;

Scheme[edit]

Byte Length[edit]

Works with: Gauche version 0.8.7 [utf-8,pthreads]

string-size function is only Gauche function.

(string-size "Hello world")
Works with: PLT Scheme version 4.2.4
(bytes-length #"Hello world")

Character Length[edit]

Works with: Gauche version 0.8.7 [utf-8,pthreads]

string-length function is in R5RS, R6RS.

  (string-length "Hello world")

Seed7[edit]

Character Length[edit]

length("Hello, world!")

SETL[edit]

Character Length[edit]

print(# "Hello, world!"); -- '#' is the cardinality operator. Works on strings, tuples, and sets.

Sidef[edit]

var str = "J\x{332}o\x{332}s\x{332}e\x{301}\x{332}";

Byte Length[edit]

UTF-8 byte length (default):

say str.bytes.len;       #=> 14

UTF-16 byte length:

say str.encode('UTF-16').bytes.len;      #=> 20

Character Length[edit]

say str.chars.len;    #=> 9

Grapheme Length[edit]

say str.graphs.len;   #=> 4

Scala[edit]

Library: Scala
 
object StringLength extends App {
val s1 = "møøse"
val s3 = List("\uD835\uDD18", "\uD835\uDD2B", "\uD835\uDD26",
"\uD835\uDD20", "\uD835\uDD2C", "\uD835\uDD21", "\uD835\uDD22").mkString
val s4 = "J\u0332o\u0332s\u0332e\u0301\u0332"
 
List(s1, s3, s4).foreach(s => println(
s"The string: $s, characterlength= ${s.length} UTF8bytes= ${
s.getBytes("
UTF-8").size
} UTF16bytes= ${s.getBytes("
UTF-16LE").size}"))
}
 
Output:
The string: møøse, characterlength= 5 UTF8bytes= 7 UTF16bytes= 10
The string: 𝔘𝔫𝔦𝔠𝔬𝔡𝔢, characterlength= 14 UTF8bytes= 28 UTF16bytes= 28
The string: J̲o̲s̲é̲, characterlength= 9 UTF8bytes= 14 UTF16bytes= 18

Slate[edit]

'Hello, world!' length.

Smalltalk[edit]

Byte Length[edit]

string := 'Hello, world!'.
string size.

Character Length[edit]

In GNU Smalltalk:

string := 'Hello, world!'.
string numberOfCharacters.

requires loading the Iconv package:

PackageLoader fileInPackage: 'Iconv'

SNOBOL4[edit]

Byte Length[edit]

 
output = "Byte length: " size(trim(input))
end
 

Character Length[edit]

The example works AFAIK only with CSnobol4 by Phil Budne

 
-include "utf.sno"
output = "Char length: " utfsize(trim(input))
end
 

Sparkling[edit]

Byte length[edit]

spn:1> sizeof "Hello, wørld!"
= 14

SQL[edit]

Byte length[edit]

SELECT LENGTH(CAST('møøse' AS BLOB));

Character length[edit]

SELECT LENGTH('møøse');

Standard ML[edit]

Byte Length[edit]

Works with: SML/NJ version 110.60
Works with: Moscow ML version 2.01
Works with: MLton version 20061107
val strlen = size "Hello, world!";

Character Length[edit]

Works with: SML/NJ version 110.74
val strlen = UTF8.size "Hello, world!";

Swift[edit]

Grapheme Length[edit]

Swift has a concept of "character" that goes beyond Unicode code points. A Character is a "Unicode grapheme cluster", which can consist of one or more Unicode code points.

To count "characters" (Unicode grapheme clusters):

Works with: Swift version 2.x
let numberOfCharacters = "møøse".characters.count  // 5
Works with: Swift version 1.2
let numberOfCharacters = count("møøse")            // 5
Works with: Swift version 1.0-1.1
let numberOfCharacters = countElements("møøse")    // 5

Character Length[edit]

To count Unicode code points:

Works with: Swift version 2.x
let numberOfCodePoints = "møøse".unicodeScalars.count           // 5
Works with: Swift version 1.2
let numberOfCodePoints = count("møøse".unicodeScalars)          // 5
Works with: Swift version 1.0-1.1
let numberOfCodePoints = countElements("møøse".unicodeScalars)  // 5

Byte Length[edit]

This depends on which encoding you want to use.

For length in UTF-8, count the number of UTF-8 code units:

Works with: Swift version 2.x
let numberOfBytesUTF8 = "møøse".utf8.count           // 7
Works with: Swift version 1.2
let numberOfBytesUTF8 = count("møøse".utf8)          // 7
Works with: Swift version 1.0-1.1
let numberOfBytesUTF8 = countElements("møøse".utf8)  // 7

For length in UTF-16, count the number of UTF-16 code units, and multiply by 2:

Works with: Swift version 2.x
let numberOfBytesUTF16 = "møøse".utf16.count * 2           // 10
Works with: Swift version 1.2
let numberOfBytesUTF16 = count("møøse".utf16) * 2          // 10
Works with: Swift version 1.0-1.1
let numberOfBytesUTF16 = countElements("møøse".utf16) * 2  // 10

Tcl[edit]

Byte Length[edit]

Formally, Tcl does not guarantee to use any particular representation for its strings internally (the underlying implementation objects can hold strings in at least three different formats, mutating between them as necessary) so the way to calculate the "byte length" of a string can only be done with respect to some user-selected encoding. This is done this way (for UTF-8):

string length [encoding convertto utf-8 $theString]

Thus, we have these examples:

set s1 "hello, world"
set s2 "\u304A\u306F\u3088\u3046"
set enc utf-8
puts [format "length of \"%s\" in bytes is %d" \
$s1 [string length [encoding convertto $enc $s1]]]
puts [format "length of \"%s\" in bytes is %d" \
$s2 [string length [encoding convertto $enc $s2]]]

Character Length[edit]

Basic version:

string length "Hello, world!"

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
set s1 "hello, world"
set s2 "\u304A\u306F\u3088\u3046"
puts [format "length of \"%s\" in characters is %d" $s1 [string length $s1]]
puts [format "length of \"%s\" in characters is %d" $s2 [string length $s2]]

TI-89 BASIC[edit]

The TI-89 uses an fixed 8-bit encoding so there is no difference between character length and byte length.

■ dim("møøse")              5

Toka[edit]

Byte Length[edit]

" hello, world!" string.getLength

Trith[edit]

Character Length[edit]

"møøse" length

Byte Length[edit]

"møøse" size

TUSCRIPT[edit]

Character Length[edit]

 
$$ MODE TUSCRIPT
string="hello, world"
l=LENGTH (string)
PRINT "character length of string '",string,"': ",l
 

Output:

Character length of string 'hello, world': 12 

UNIX Shell[edit]

Byte Length[edit]

With external utility:[edit]

Works with: Bourne Shell
string='Hello, world!'
length=`expr "x$string" : '.*' - 1`
echo $length # if you want it printed to the terminal

With SUSv3 parameter expansion modifier:[edit]

Works with: Almquist SHell
Works with: Bourne Again SHell version 3.2
Works with: pdksh version 5.2.14 99/07/13.2
Works with: Z SHell
string='Hello, world!'
length="${#string}"
echo $length # if you want it printed to the terminal

Vala[edit]

Character Length[edit]

 
string s = "Hello, world!";
int characterLength = s.length;
 

VBA[edit]

Cf. VBScript (below).

VBScript[edit]

Byte Length[edit]

LenB(string|varname)

Returns the number of bytes required to store a string in memory. Returns null if string|varname is null.

Character Length[edit]

Len(string|varname)

Returns the length of the string|varname . Returns null if string|varname is null.

x86 Assembly[edit]

Byte Length[edit]

The following code uses AT&T syntax and was tested using AS (the portable GNU assembler) under Linux.

 
.data
string: .asciz "Test"
 
.text
.globl main
 
main:
pushl  %ebp
movl  %esp, %ebp
 
pushl  %edi
xorb  %al, %al
movl $-1, %ecx
movl $string, %edi
cld
repne scasb
not  %ecx
dec  %ecx
popl  %edi
 
 ;; string length is stored in %ecx register
 
leave
ret
 

XPL0[edit]

include c:\cxpl\stdlib;
IntOut(0, StrLen("Character length = Byte length = String length = "))

Output:

49

XSLT[edit]

Character Length[edit]

<?xml version="1.0" encoding="UTF-8"?>
...
<xsl:value-of select="string-length('møøse')" />   <!-- 5 -->

xTalk[edit]

Works with: HyperCard

Byte Length[edit]

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.
put the length of "Hello World"

or

put the number of characters in "Hello World"

Character Length[edit]

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.
put the length of "Hello World"

or

put the number of characters in "Hello World"

Wren[edit]

Byte Length[edit]

System.print("møøse".bytes.count)
System.print("𝔘𝔫𝔦𝔠𝔬𝔡𝔢".bytes.count)
System.print("J̲o̲s̲é̲".bytes.count)
 

Character Length[edit]

System.print("møøse".count)
System.print("𝔘𝔫𝔦𝔠𝔬𝔡𝔢".count)
System.print("J̲o̲s̲é̲".count)
 

Yorick[edit]

Character Length[edit]

strlen("Hello, world!")

zkl[edit]

Although zkl handles 8 bit ASCII natively, it also knows UTF-8 up to 6 bytes. In a string, "\uabcd" (hex) and "\U???;" encode 2 and n byte UTF-8 Unicode characters. It does not handle UTF-16 or 4 byte Unicode. It also knows about the bad spots in UTF-8 (such as overlongs).

If your terminal/editor deals with UTF-8 (mine doesn't), you don't need to use the escapes, just put the unicode characters in quotes (ie the editor inserts UTF bytes, which are non zero).

Character Length[edit]

"abc".len() //-->3
"\ufeff\u00A2 \u20ac".len() //-->9 "BOM¢ €"

Byte Length[edit]

"abc".len() //-->3
"\ufeff\u00A2 \u20ac".len() //-->9
Data(0,Int,"\ufeff\u00A2 \u20ac") //-->Data(9) (bytes)
"J\u0332o\u0332s\u0332e\u0301\u0332".len() //-->14
"\U1D518;\U1D52B;\U1D526;\U1D520;\U1D52C;\U1D521;\U1D522;".len() //-->28

Character Length[edit]

UTF-8 characters are counted, modifiers (such as underscore) are counted as separate characters.

"abc".len(8) //-->3
"\ufeff\u00A2 \u20ac".len(8) //-->4 "BOM¢ €"
"\U1000;".len(8) //-->Exception thrown: ValueError(Invalid UTF-8 string)
"\uD800" //-->SyntaxError : Line 2: Bad Unicode constant (\uD800-\uDFFF)
"J\u0332o\u0332s\u0332e\u0301\u0332".len(8) //-->9 "J̲o̲s̲é̲"
"\U1D518;\U1D52B;\U1D526;\U1D520;\U1D52C;\U1D521;\U1D522;".len(8) //-->7 "𝔘𝔫𝔦𝔠𝔬𝔡𝔢"

https://en.wikipedia.org/wiki/Comparison_of_programming_languages_%28string_functions%29#length