String Character Length: Difference between revisions

m
Stupid case-sensitivity.
(verified that AWK example only does byte length)
m (Stupid case-sensitivity.)
 
(7 intermediate revisions by 2 users not shown)
Line 1:
#REDIRECT [[String length]]
{{task}}
{{Template:split-review}}
In this task, the goal is to find the <em>character</em> length of a string. This means encodings like [[UTF-8]] need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters.
 
For byte length, see [[String Byte Length]].
 
=={{header|ActionScript}}==
myStrVar.length()
 
=={{header|Ada}}==
'''Compiler:''' GCC 4.1.2
 
Str : String := "Hello World";
Length : constant Natural := Str'Length;
 
=={{header|ALGOL 68}}==
STRING str := "hello, world";
INT length := UPB str;
printf(($"Length of """g""" is "g(3)$,str,length))
Result:
Length of "hello, world" is +12
 
=={{header|AppleScript}}==
count of "Hello World"
Or:
count "Hello World"
 
=={{header|C}}==
'''Standard:''' [[ANSI C]] (AKA [[C89]]):
 
'''Compiler:''' GCC 3.3.3
 
#include <string.h>
int main(void)
{
const char *string = "Hello, world!";
size_t length = strlen(string);
return 0;
}
 
or by hand:
 
int main(void)
{
const char *string = "Hello, world!";
size_t length = 0;
char *p = (char *) string;
while (*p++ != '\0') length++;
return 0;
}
 
or (for arrays of char only)
 
#include <stdlib.h>
int main(void)
{
char const s[] = "Hello, world!";
size_t length = sizeof s - 1;
return 0;
}
 
For wide character strings (usually Unicode):
 
#include <stdio.h>
#include <wchar.h>
int main(void)
{
wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */
size_t length;
length = wcslen(s);
printf("Length in characters = %d\n", length);
printf("Length in bytes = %d\n", sizeof(s) * sizeof(wchar_t));
return 0;
}
 
=={{header|Objective-C}}==
// Return the length in unicode characters
unsigned length = [@"Hello Word!" length];
 
=={{header|C++}}==
 
'''Standard:''' [[ISO C plus plus|ISO C++]] (AKA [[C plus plus 98|C++98]]):
 
'''Compiler:''' g++ 4.0.2
 
#include <string> // note: '''not''' <string.h>
int main()
{
std::string s = "Hello, world!";
// Always in characters == bytes since sizeof(char) == 1
std::string::size_type length = s.length(); // option 1: In Characters/Bytes
std::string::size_type size = s.size(); // option 2: In Characters/Bytes
}
 
For wide character strings:
 
#include <string>
int main()
{
std::wstring s = L"\u304A\u306F\u3088\u3046";
std::wstring::size_type length = s.length();
}
 
=={{header|C sharp|C#}}==
'''Platform:''' [[.NET]]
'''Language Version:''' 1.0+
 
string s = "Hello, world!";
int clength = s.Length; // In characters
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.
 
==[[Clean]]==
[[Category:Clean]]
 
Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.
 
import StdEnv
strlen :: String -> Int
strlen string = size string
Start = strlen "Hello, world!"
 
=={{header|ColdFusion}}==
#len("Hello World")#
 
=={{header|Common Lisp}}==
(length "Hello World")
 
=={{header|Component Pascal}}==
LEN("Hello, World!")
 
=={{header|E}}==
"Hello World".size()
 
=={{header|Forth}}==
The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.)
 
'''Interpreter:''' ANS Forth
 
The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx".
 
2 base !
: utf8+ ( str -- str )
begin
char+
dup c@
11000000 and
10000000 <>
until ;
decimal
: count-utf8 ( zstr -- n )
0
begin
swap dup c@
while
utf8+
swap 1+
repeat drop ;
 
=={{header|Haskell}}==
'''Interpreter:''' [[GHC | GHCi]] 6.6, [[Hugs]]
 
'''Compiler:''' [[GHC]] 6.6
 
The base type ''Char'' defined by the standard is already intended for (plain) Unicode characters.
 
strlen = length "Hello, world!"
 
=={{header|IDL}}==
'''Compiler:''' any IDL compiler should do
 
length = strlen("Hello, world!")
 
=={{header|Java}}==
 
Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
 
The length method of String objects gives the number of 16-bit values used to encode a string.
String s = "Hello, world!";
int length = s.length();
 
Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method.
String str = "\uD834\uDD2A"; //U+1D12A
int length1 = str.length(); //2
int length2 = str.codePointCount(0, str.length()); //1
 
=={{header|JavaScript}}==
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
 
JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.
var str1 = "Hello, world!";
var len1 = str1.length; //13
var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; //2
 
=={{header|JudoScript}}==
//Store length of hello world in length and print it
. length = "Hello World".length();
 
=={{header|LSE64}}==
LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string.
" Hello world" @ , # 11
 
=={{header|Lua}}==
 
'''Interpreter:''' [[Lua]] 5.0 or later.
 
string="Hello world"
length=#string
 
=={{header|MAXScript}}==
"Hello world".count
 
=={{header|mIRC Scripting Language}}==
'''Interpreter:''' [[mIRC]]
 
alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }
 
=={{header|OCaml}}==
'''Interpreter'''/'''Compiler:''' [[Ocaml]] 3.09
 
String.length "Hello world";;
 
 
=={{header|Perl}}==
'''Interpreter:''' [[Perl]] any 5.X
 
my $length = length "Hello, world!";
 
=={{header|PHP}}==
$length = strlen('Hello, world!');
 
=={{header|PL/SQL|PL/SQL}}==
DECLARE
string VARCHAR2( 50 ) := 'Hello, world!';
stringlength NUMBER;
BEGIN
stringlength := length( string );
END;
 
=={{header|Python}}==
'''Interpreter:''' [[Python]] 2.4
 
len() returns the number of characters in a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first:
<pre>
>>> len('ascii')
5
>>> len(u'\u05d0') # the letter Alef as unicode literal
1
>>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string
1
</pre>
 
=={{header|Ruby}}==
'''Library:''' [[active_support]]
 
require 'active_support'
puts "Hello World".chars.length
 
=={{header|Scheme}}==
(string-length "Hello world")
 
=={{header|Seed7}}==
length("Hello, world!")
 
=={{header|Smalltalk}}==
string := 'Hello, world!".
string size.
 
=={{header|Standard ML}}==
'''Interpreter:''' [[Standard ML of New Jersey | SML/NJ]] 110.60, [[Moscow ML]] 2.01 (January 2004)
 
'''Compiler:''' [[MLton]] 20061107
 
val strlen = size "Hello, world!";
 
=={{header|Tcl}}==
Basic version:
 
string length "Hello, world!"
 
or more elaborately, needs '''Interpreter''' any 8.X. Tested on 8.4.12.
 
fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
set s1 "hello, world"
set s2 "\u304A\u306F\u3088\u3046"
puts [format "length of \"%s\" in characters is %d" $s1 [string length $s1]]
puts [format "length of \"%s\" in characters is %d" $s2 [string length $s2]]
 
=={{header|UNIX Shell}}==
With external utilities:
 
'''Interpreter:''' any [[Bourne Shell]]
 
string='Hello, world!'
length=`echo -n "$string" | wc -c | tr -dc '0-9'`
echo $length # if you want it printed to the terminal
 
With SUSv3 parameter expansion modifier:
 
'''Interpreter:''' [[Almquist SHell]] (NetBSD 3.0), [[Bourne Again SHell]] 3.2, [[Korn SHell]] (5.2.14 99/07/13.2), [[Z SHell]]
 
string='Hello, world!'
length="${#string}"
echo $length # if you want it printed to the terminal
 
 
=={{header|VBScript}}==
Len(string|varname)
 
Returns the length of the string|varname
Returns null if string|varname is null
 
=={{header|XSLT}}==
<?xml version="1.0" encoding="UTF-8"?>
...
<xsl:value-of select="string-length('møøse')" /> <!-- 5 -->
 
=={{header|xTalk}}==
'''Interpreter:''' HyperCard
 
put the length of "Hello World"
 
or
 
put the number of characters in "Hello World"
Anonymous user