String Byte Length: Difference between revisions

From Rosetta Code
Content added Content deleted
m (Stupid case-sensitivity.)
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
#REDIRECT [[String length]]
{{Template:split-review}}
{{task}}

In this task, the goal is to find the <em>byte</em> length of a string. This means encodings like [[UTF-8]] may need to be handled specially, as there is not necessarily a one-to-one relationship between bytes and characters, and some languages recognize this. For example, the character length of "møøse" is 5 but the byte length is 7 in UTF-8 and 10 in UTF-16.

For character length, see [[String Character Length]].

=={{header|4D}}==
$length:=Length("Hello, world!")

=={{header|ActionScript}}==
myStrVar.length()

=={{header|Ada}}==
'''Compiler:''' GCC 4.1.2

Str : String := "Hello World";
Length : constant Natural := Str'Size / System.Storage_Unit;

The 'size attribute returns the size of an object in bits. System.Storage_Unit is the number of bits in a byte on the current machine.

=={{header|AppleScript}}==
count of "Hello World"

=={{header|AWK}}==
From within any code block:
w=length("Hello, world!") # static string example
x=length("Hello," s " world!") # dynamic string example
y=length($1) # input field example
z=length(s) # variable name example
Ad hoc program from command line:
echo "Hello, wørld!" | awk '{print length($0)}' # 14
From executable script: (prints for every line arriving on stdin)
#!/usr/bin/awk -f
{print"The length of this line is "length($0)}

=={{header|C}}==
'''Standard:''' [[ANSI C]] (AKA [[C89]]):

'''Compiler:''' GCC 3.3.3

#include <string.h>
int main(void)
{
const char *string = "Hello, world!";
size_t length = strlen(string);
return 0;
}

or by hand:

int main(void)
{
const char *string = "Hello, world!";
size_t length = 0;
char *p = (char *) string;
while (*p++ != '\0') length++;
return 0;
}

or (for arrays of char only)

#include <stdlib.h>
int main(void)
{
char const s[] = "Hello, world!";
size_t length = sizeof s - 1;
return 0;
}

=={{header|C++}}==

'''Standard:''' [[ISO C plus plus|ISO C++]] (AKA [[C plus plus 98|C++98]]):

'''Compiler:''' g++ 4.0.2

#include <string> // note: '''not''' <string.h>
int main()
{
std::string s = "Hello, world!";
std::string::size_type length = s.length(); // option 1: In Characters/Bytes
std::string::size_type size = s.size(); // option 2: In Characters/Bytes
// In bytes same as above since sizeof(char) == 1
std::string::size_type bytes = s.length() * sizeof(std::string::value_type);
}

For wide character strings:

#include <string>
int main()
{
std::wstring s = L"\u304A\u306F\u3088\u3046";
std::wstring::size_type length = s.length() * sizeof(std::wstring::value_type); // in bytes
}

=={{header|C sharp|C#}}==
'''Platform:''' [[.NET]]
'''Language Version:''' 1.0+

string s = "Hello, world!";
int clength = s.Length; // In characters
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.

=={{header|Clean}}==
Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.

import StdEnv
strlen :: String -> Int
strlen string = size string
Start = strlen "Hello, world!"

=={{header|ColdFusion}}==
#len("Hello World")#

=={{header|Common Lisp}}==
(length "Hello World")

=={{header|Component Pascal}}==
LEN("Hello, World!")

=={{header|Forth}}==
'''Interpreter:''' ANS Forth

Strings in Forth come in two forms, neither of which are the null-terminated form commonly used in the C standard library.

===Counted string===
A counted string is a single pointer to a short string in memory. The string's first byte is the count of the number of characters in the string. This is how symbols are stored in a Forth dictionary.

CREATE s ," Hello world" \ create string "s"
s C@ ( -- length=11 )

===Stack string===
A string on the stack is represented by a pair of cells: the address of the string data and the length of the string data (in characters). The word '''COUNT''' converts a counted string into a stack string. The STRING utility wordset of ANS Forth works on these addr-len pairs. This representation has the advantages of not requiring null-termination, easy representation of substrings, and not being limited to 255 characters.

S" string" ( addr len)
DUP . \ 6

=={{header|Haskell}}==

It is not possible to determine the "byte length" of an ordinary string, because in Haskell, a string is a boxed list of unicode characters. So each character in a string is represented as whatever the compiler considers as the most efficient representation of a cons-cell and a unicode character, and not as a byte.

For efficient storage of sequences of bytes, there's ''Data.ByteString'', which uses ''Word8'' as a base type. Byte strings have an additional ''Data.ByteString.Char8'' interface, which will truncate each Unicode ''Char'' to 8 bits as soon as it is converted to a byte string. However, this is not adequate for the task, because truncation simple will garble characters other than Latin-1, instead of encoding them into UTF-8, say.

There are several (non-standard, so far) Unicode encoding libraries available on [http://hackage.haskell.org/ Hackage]. As an example, we'll use [http://hackage.haskell.org/packages/archive/encoding/0.2/doc/html/Data-Encoding.html encoding-0.2], as ''Data.Encoding'':

import Data.Encoding
import Data.ByteString as B
strUTF8 :: ByteString
strUTF8 = encode UTF8 "Hello World!"
strUTF32 :: ByteString
strUTF32 = encode UTF32 "Hello World!"
strlenUTF8 = B.length strUTF8
strlenUTF32 = B.length strUTF32

=={{header|IDL}}==
'''Compiler:''' any IDL compiler should do

length = strlen("Hello, world!")

=={{header|Java}}==

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length method of String objects returns the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

String s = "Hello, world!";
int byteCount = s.length() * 2;

Another way to know the byte length of a string is to explicitly specify the charset we desire.

String s = "Hello, world!";
int byteCountUTF16 = s.getBytes("UTF-16").length;
int byteCountUTF8 = s.getBytes("UTF-8").length;

=={{header|JavaScript}}==
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

var s = "Hello, world!";
var byteCount = s.length * 2; //26

=={{header|JudoScript}}==
//Store length of hello world in length and print it
. length = "Hello World".length();

=={{header|LSE64}}==
LSE stores strings as arrays of characters in 64-bit cells plus a count.
" Hello world" @ 1 + 8 * , # 96 = (11+1)*(size of a cell) = 12*8

=={{header|Lua}}==
'''Interpreter:''' [[Lua]] 5.0 or later.

string="Hello world"
length=#string

=={{header|mIRC Scripting Language}}==
'''Interpreter:''' [[mIRC]]

alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }

=={{header|OCaml}}==
'''Interpreter'''/'''Compiler:''' [[Ocaml]] 3.09

String.length "Hello world";;


=={{header|Perl}}==
'''Interpreter:''' [[perl]] 5.8

Strings in Perl consist of characters. Measuring the byte length therefore requires conversion to some binary representation (called encoding, both noun and verb).

use utf8; # so we can use literal characters like ☺ in source
use Encode qw(encode);
print length encode 'UTF-8', "Hello, world! ☺";
# 17. The last character takes 3 bytes, the others 1 byte each.
print length encode 'UTF-16', "Hello, world! ☺";
# 32. 2 bytes for the BOM, then 15 byte pairs for each character.

=={{header|PHP}}==
$length = strlen('Hello, world!');

=={{header|PL/SQL|PL/SQL}}==
DECLARE
string VARCHAR2( 50 ) := 'Hello, world!';
stringlength NUMBER;
BEGIN
stringlength := length( string );
END;

=={{header|Pop11}}==
Currently Pop11 supports only strings consisting of 1-byte units.
Strings can carry arbitrary binary data, so user can for example
use UTF-8 (however builtin procedures will treat each byte as
a single character). The length function for strings returns
length in bytes:

lvars str = 'Hello, world!';
lvars len = length(str);

=={{header|Python}}==
'''Interpreter:''' [[Python]] 2.x

Byte length depends on the encoding. Python use 2 or 4 bytes per character internally for unicode strings, depending on how it was built. The internal representation is not interesting for the user.

# The letter Alef
>>> len(u'\u05d0'.encode('utf-8'))
2
>>> len(u'\u05d0'.encode('iso-8859-8'))
1

=={{header|Ruby}}==
string="Hello world"
print string.length
or
puts "Hello World".length

=={{header|Scheme}}==
(string-length "Hello world")

=={{header|Smalltalk}}==
string := 'Hello, world!".
string size.

=={{header|Standard ML}}==
'''Interpreter:''' [[Standard ML of New Jersey | SML/NJ]] 110.60, [[Moscow ML]] 2.01 (January 2004)

'''Compiler:''' [[MLton]] 20061107

val strlen = size "Hello, world!";

=={{header|Tcl}}==
Basic version:

string bytelength "Hello, world!"

or more elaborately, needs '''Interpreter''' any 8.X. Tested on 8.4.12.

fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
set s1 "hello, world"
set s2 "\u304A\u306F\u3088\u3046"
puts [format "length of \"%s\" in bytes is %d" $s1 [string bytelength $s1]]
puts [format "length of \"%s\" in bytes is %d" $s2 [string bytelength $s2]]

=={{header|Toka}}==
" hello, world!" string.getLength

=={{header|UNIX Shell}}==
With external utilities:

'''Interpreter:''' any bourne shell

string='Hello, world!'
length=`echo -n "$string" | wc -c | tr -dc '0-9'`
echo $length # if you want it printed to the terminal

With SUSv3 parameter expansion modifier:

'''Interpreter:''' [[Almquist SHell]] (NetBSD 3.0), [[Bourne Again SHell]] 3.2, [[Korn SHell]] (5.2.14 99/07/13.2), [[Z SHell]]

string='Hello, world!'
length="${#string}"
echo $length # if you want it printed to the terminal


=={{header|VBScript}}==
LenB(string|varname)

Returns the number of bytes required to store a string in memory
Returns null if string|varname is null

=={{header|xTalk}}==
'''Interpreter:''' HyperCard

put the length of "Hello World"

or

put the number of characters in "Hello World"

Latest revision as of 19:32, 19 January 2008

Redirect to: