String Character Length: Difference between revisions
Content deleted Content added
m Switch to header template |
m Stupid case-sensitivity. |
||
(14 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
#REDIRECT [[String length]] |
|||
{{task}} |
|||
{{Template:split-review}} |
|||
In this task, the goal is to find the <em>character</em> length of a string. This means encodings like [[UTF-8]] need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters. |
|||
For byte length, see [[String Byte Length]]. |
|||
=={{header|ActionScript}}== |
|||
myStrVar.length() |
|||
=={{header|Ada}}== |
|||
'''Compiler:''' GCC 4.1.2 |
|||
Str : String := "Hello World"; |
|||
Length : constant Natural := Str'Length; |
|||
=={{header|AppleScript}}== |
|||
count of "Hello World" |
|||
Or: |
|||
count "Hello World" |
|||
=={{header|AWK}}== |
|||
From within any code block: |
|||
w=length("Hello, world!") # static string example |
|||
x=length("Hello," s " world!") # dynamic string example |
|||
y=length($1) # input field example |
|||
z=length(s) # variable name example |
|||
Ad hoc program from command line: |
|||
echo "Hello, world!" | awk '{print length($0)}' |
|||
From executable script: (prints for every line arriving on stdin) |
|||
#!/usr/bin/awk -f |
|||
{print"The length of this line is "length($0)} |
|||
=={{header|C}}== |
|||
'''Standard:''' [[ANSI C]] (AKA [[C89]]): |
|||
'''Compiler:''' GCC 3.3.3 |
|||
#include <string.h> |
|||
int main(void) |
|||
{ |
|||
const char *string = "Hello, world!"; |
|||
size_t length = strlen(string); |
|||
return 0; |
|||
} |
|||
or by hand: |
|||
int main(void) |
|||
{ |
|||
const char *string = "Hello, world!"; |
|||
size_t length = 0; |
|||
char *p = (char *) string; |
|||
while (*p++ != '\0') length++; |
|||
return 0; |
|||
} |
|||
or (for arrays of char only) |
|||
#include <stdlib.h> |
|||
int main(void) |
|||
{ |
|||
char const s[] = "Hello, world!"; |
|||
size_t length = sizeof s - 1; |
|||
return 0; |
|||
} |
|||
For wide character strings (usually Unicode): |
|||
#include <stdio.h> |
|||
#include <wchar.h> |
|||
int main(void) |
|||
{ |
|||
wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */ |
|||
size_t length; |
|||
length = wcslen(s); |
|||
printf("Length in characters = %d\n", length); |
|||
printf("Length in bytes = %d\n", sizeof(s) * sizeof(wchar_t)); |
|||
return 0; |
|||
} |
|||
=={{header|Objective-C}}== |
|||
// Return the length in unicode characters |
|||
unsigned length = [@"Hello Word!" length]; |
|||
=={{header|C++}}== |
|||
'''Standard:''' [[ISO C plus plus|ISO C++]] (AKA [[C plus plus 98|C++98]]): |
|||
'''Compiler:''' g++ 4.0.2 |
|||
#include <string> // note: '''not''' <string.h> |
|||
int main() |
|||
{ |
|||
std::string s = "Hello, world!"; |
|||
// Always in characters == bytes since sizeof(char) == 1 |
|||
std::string::size_type length = s.length(); // option 1: In Characters/Bytes |
|||
std::string::size_type size = s.size(); // option 2: In Characters/Bytes |
|||
} |
|||
For wide character strings: |
|||
#include <string> |
|||
int main() |
|||
{ |
|||
std::wstring s = L"\u304A\u306F\u3088\u3046"; |
|||
std::wstring::size_type length = s.length(); |
|||
} |
|||
=={{header|C sharp|C#}}== |
|||
'''Platform:''' [[.NET]] |
|||
'''Language Version:''' 1.0+ |
|||
string s = "Hello, world!"; |
|||
int clength = s.Length; // In characters |
|||
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes. |
|||
==[[Clean]]== |
|||
[[Category:Clean]] |
|||
Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array. |
|||
import StdEnv |
|||
strlen :: String -> Int |
|||
strlen string = size string |
|||
Start = strlen "Hello, world!" |
|||
=={{header|ColdFusion}}== |
|||
#len("Hello World")# |
|||
=={{header|Common Lisp}}== |
|||
(length "Hello World") |
|||
=={{header|Component Pascal}}== |
|||
LEN("Hello, World!") |
|||
=={{header|E}}== |
|||
"Hello World".size() |
|||
=={{header|Forth}}== |
|||
The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.) |
|||
'''Interpreter:''' ANS Forth |
|||
The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx". |
|||
2 base ! |
|||
: utf8+ ( str -- str ) |
|||
begin |
|||
char+ |
|||
dup c@ |
|||
11000000 and |
|||
10000000 <> |
|||
until ; |
|||
decimal |
|||
: count-utf8 ( zstr -- n ) |
|||
0 |
|||
begin |
|||
swap dup c@ |
|||
while |
|||
utf8+ |
|||
swap 1+ |
|||
repeat drop ; |
|||
=={{header|Haskell}}== |
|||
'''Interpreter:''' [[GHC | GHCi]] 6.6, [[Hugs]] |
|||
'''Compiler:''' [[GHC]] 6.6 |
|||
strlen = length "Hello, world!" |
|||
=={{header|IDL}}== |
|||
'''Compiler:''' any IDL compiler should do |
|||
length = strlen("Hello, world!") |
|||
=={{header|Java}}== |
|||
Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two. |
|||
The length method of String objects gives the number of 16-bit values used to encode a string. |
|||
String s = "Hello, world!"; |
|||
int length = s.length(); |
|||
Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method. |
|||
String str = "\uD834\uDD2A"; //U+1D12A |
|||
int length1 = str.length(); //2 |
|||
int length2 = str.codePointCount(0, str.length()); //1 |
|||
=={{header|JavaScript}}== |
|||
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two. |
|||
JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters. |
|||
var str1 = "Hello, world!"; |
|||
var len1 = str1.length; //13 |
|||
var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair |
|||
var len2 = str2.length; //2 |
|||
=={{header|JudoScript}}== |
|||
//Store length of hello world in length and print it |
|||
. length = "Hello World".length(); |
|||
=={{header|LSE64}}== |
|||
LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string. |
|||
" Hello world" @ , # 11 |
|||
=={{Lua}}== |
|||
'''Interpreter:''' [[Lua]] 5.0 or later. |
|||
string="Hello world" |
|||
length=#string |
|||
=={{header|MAXScript}}== |
|||
"Hello world".count |
|||
=={{header|mIRC Scripting Language}}== |
|||
'''Interpreter:''' [[mIRC]] |
|||
alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! } |
|||
=={{header|OCaml}}== |
|||
'''Interpreter'''/'''Compiler:''' [[Ocaml]] 3.09 |
|||
String.length "Hello world";; |
|||
=={{header|Perl}}== |
|||
'''Interpreter:''' [[Perl]] any 5.X |
|||
my $length = length "Hello, world!"; |
|||
=={{header|PHP}}== |
|||
$length = strlen('Hello, world!'); |
|||
=={{header|PL/SQL|PL/SQL}}== |
|||
DECLARE |
|||
string VARCHAR2( 50 ) := 'Hello, world!'; |
|||
stringlength NUMBER; |
|||
BEGIN |
|||
stringlength := length( string ); |
|||
END; |
|||
=={{header|Python}}== |
|||
'''Interpreter:''' [[Python]] 2.4 |
|||
len() returns the length of a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first: |
|||
<pre> |
|||
>>> len('ascii') |
|||
5 |
|||
>>> len(u'\u05d0') # the letter Alef as unicode literal |
|||
1 |
|||
>>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string |
|||
1 |
|||
</pre> |
|||
=={{header|Ruby}}== |
|||
'''Library:''' [[active_support]] |
|||
require 'active_support' |
|||
puts "Hello World".chars.length |
|||
=={{header|Scheme}}== |
|||
(string-length "Hello world") |
|||
=={{header|Seed7}}== |
|||
length("Hello, world!") |
|||
=={{header|Smalltalk}}== |
|||
string := 'Hello, world!". |
|||
string size. |
|||
=={{header|Standard ML}}== |
|||
'''Interpreter:''' [[Standard ML of New Jersey | SML/NJ]] 110.60, [[Moscow ML]] 2.01 (January 2004) |
|||
'''Compiler:''' [[MLton]] 20061107 |
|||
val strlen = size "Hello, world!"; |
|||
=={{header|Tcl}}== |
|||
Basic version: |
|||
string length "Hello, world!" |
|||
or more elaborately, needs '''Interpreter''' any 8.X. Tested on 8.4.12. |
|||
fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly |
|||
set s1 "hello, world" |
|||
set s2 "\u304A\u306F\u3088\u3046" |
|||
puts [format "length of \"%s\" in characters is %d" $s1 [string length $s1]] |
|||
puts [format "length of \"%s\" in characters is %d" $s2 [string length $s2]] |
|||
=={{header|UNIX Shell}}== |
|||
With external utilities: |
|||
'''Interpreter:''' any [[Bourne Shell]] |
|||
string='Hello, world!' |
|||
length=`echo -n "$string" | wc -c | tr -dc '0-9'` |
|||
echo $length # if you want it printed to the terminal |
|||
With SUSv3 parameter expansion modifier: |
|||
'''Interpreter:''' [[Almquist SHell]] (NetBSD 3.0), [[Bourne Again SHell]] 3.2, [[Korn SHell]] (5.2.14 99/07/13.2), [[Z SHell]] |
|||
string='Hello, world!' |
|||
length="${#string}" |
|||
echo $length # if you want it printed to the terminal |
|||
=={{header|VBScript}}== |
|||
Len(string|varname) |
|||
Returns the length of the string|varname |
|||
Returns null if string|varname is null |
|||
=={{header|xTalk}}== |
|||
'''Interpreter:''' HyperCard |
|||
put the length of "Hello World" |
|||
or |
|||
put the number of characters in "Hello World" |
Latest revision as of 19:31, 19 January 2008
Redirect to: