String Character Length: Difference between revisions

← Older edit

String Character Length (view source)

Revision as of 19:31, 19 January 2008

8,861 bytes removed , 16 years ago

m

Stupid case-sensitivity.

Anonymous user

rosettacode>Mwn3d

Revision as of 23:03, 7 December 2007 (view source) rosettacode>IanOsgood (verified that AWK example only does byte length) ← Older edit		Latest revision as of 19:31, 19 January 2008 (view source) rosettacode>Mwn3d m (Stupid case-sensitivity.)
(7 intermediate revisions by 2 users not shown)
Line 1: #REDIRECT [[String length]] ~~{{task}}~~ ~~{{Template:split-review}}~~ In this task, the goal is to find the <em>character</em> length of a string. This means encodings like [[UTF-8]] need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters. ~~For byte length, see [[String Byte Length]].~~ ~~=={{header\|ActionScript}}==~~ ~~myStrVar.length()~~ ~~=={{header\|Ada}}==~~ ~~'''Compiler:''' GCC 4.1.2~~ ~~Str : String := "Hello World";~~ ~~Length : constant Natural := Str'Length;~~ ~~=={{header\|ALGOL 68}}==~~ ~~STRING str := "hello, world";~~ ~~INT length := UPB str;~~ ~~printf(($"Length of """g""" is "g(3)$,str,length))~~ ~~Result:~~ ~~Length of "hello, world" is +12~~ ~~=={{header\|AppleScript}}==~~ ~~count of "Hello World"~~ ~~Or:~~ ~~count "Hello World"~~ ~~=={{header\|C}}==~~ ~~'''Standard:''' [[ANSI C]] (AKA [[C89]]):~~ ~~'''Compiler:''' GCC 3.3.3~~ ~~#include <string.h>~~ ~~int main(void)~~ { ~~const char string = "Hello, world!";~~ ~~size_t length = strlen(string);~~ ~~return 0;~~ } ~~or by hand:~~ ~~int main(void)~~ { ~~const char string = "Hello, world!";~~ ~~size_t length = 0;~~ ~~char p = (char ) string;~~ ~~while (p++ != '\0') length++;~~ ~~return 0;~~ } ~~or (for arrays of char only)~~ ~~#include <stdlib.h>~~ ~~int main(void)~~ { ~~char const s[] = "Hello, world!";~~ ~~size_t length = sizeof s - 1;~~ ~~return 0;~~ } ~~For wide character strings (usually Unicode):~~ ~~#include <stdio.h>~~ ~~#include <wchar.h>~~ ~~int main(void)~~ { ~~wchar_t s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou /~~ ~~size_t length;~~ ~~length = wcslen(s);~~ ~~printf("Length in characters = %d\n", length);~~ ~~printf("Length in bytes = %d\n", sizeof(s) sizeof(wchar_t));~~ ~~return 0;~~ } ~~=={{header\|Objective-C}}==~~ ~~// Return the length in unicode characters~~ ~~unsigned length = [@"Hello Word!" length];~~ ~~=={{header\|C++}}==~~ ~~'''Standard:''' [[ISO C plus plus\|ISO C++]] (AKA [[C plus plus 98\|C++98]]):~~ ~~'''Compiler:''' g++ 4.0.2~~ ~~#include <string> // note: '''not''' <string.h>~~ ~~int main()~~ { ~~std::string s = "Hello, world!";~~ ~~// Always in characters == bytes since sizeof(char) == 1~~ ~~std::string::size_type length = s.length(); // option 1: In Characters/Bytes~~ ~~std::string::size_type size = s.size(); // option 2: In Characters/Bytes~~ } ~~For wide character strings:~~ ~~#include <string>~~ ~~int main()~~ { ~~std::wstring s = L"\u304A\u306F\u3088\u3046";~~ ~~std::wstring::size_type length = s.length();~~ } ~~=={{header\|C sharp\|C#}}==~~ ~~'''Platform:''' [[.NET]]~~ ~~'''Language Version:''' 1.0+~~ ~~string s = "Hello, world!";~~ ~~int clength = s.Length; // In characters~~ ~~int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.~~ ~~==[[Clean]]==~~ ~~[[Category:Clean]]~~ ~~Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.~~ ~~import StdEnv~~ ~~strlen :: String -> Int~~ ~~strlen string = size string~~ ~~Start = strlen "Hello, world!"~~ ~~=={{header\|ColdFusion}}==~~ ~~#len("Hello World")#~~ ~~=={{header\|Common Lisp}}==~~ ~~(length "Hello World")~~ ~~=={{header\|Component Pascal}}==~~ ~~LEN("Hello, World!")~~ ~~=={{header\|E}}==~~ ~~"Hello World".size()~~ ~~=={{header\|Forth}}==~~ The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.) ~~'''Interpreter:''' ANS Forth~~ The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx". ~~2 base !~~ ~~: utf8+ ( str -- str )~~ ~~begin~~ ~~char+~~ ~~dup c@~~ ~~11000000 and~~ ~~10000000 <>~~ ~~until ;~~ ~~decimal~~ ~~: count-utf8 ( zstr -- n )~~ 0 ~~begin~~ ~~swap dup c@~~ ~~while~~ ~~utf8+~~ ~~swap 1+~~ ~~repeat drop ;~~ ~~=={{header\|Haskell}}==~~ ~~'''Interpreter:''' [[GHC \| GHCi]] 6.6, [[Hugs]]~~ ~~'''Compiler:''' [[GHC]] 6.6~~ ~~The base type ''Char'' defined by the standard is already intended for (plain) Unicode characters.~~ ~~strlen = length "Hello, world!"~~ ~~=={{header\|IDL}}==~~ ~~'''Compiler:''' any IDL compiler should do~~ ~~length = strlen("Hello, world!")~~ ~~=={{header\|Java}}==~~ Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two. ~~The length method of String objects gives the number of 16-bit values used to encode a string.~~ ~~String s = "Hello, world!";~~ ~~int length = s.length();~~ ~~Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method.~~ ~~String str = "\uD834\uDD2A"; //U+1D12A~~ ~~int length1 = str.length(); //2~~ ~~int length2 = str.codePointCount(0, str.length()); //1~~ ~~=={{header\|JavaScript}}==~~ JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two. JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters. ~~var str1 = "Hello, world!";~~ ~~var len1 = str1.length; //13~~ ~~var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair~~ ~~var len2 = str2.length; //2~~ ~~=={{header\|JudoScript}}==~~ ~~//Store length of hello world in length and print it~~ ~~. length = "Hello World".length();~~ ~~=={{header\|LSE64}}==~~ ~~LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string.~~ ~~" Hello world" @ , # 11~~ ~~=={{header\|Lua}}==~~ ~~'''Interpreter:''' [[Lua]] 5.0 or later.~~ ~~string="Hello world"~~ ~~length=#string~~ ~~=={{header\|MAXScript}}==~~ ~~"Hello world".count~~ ~~=={{header\|mIRC Scripting Language}}==~~ ~~'''Interpreter:''' [[mIRC]]~~ ~~alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }~~ ~~=={{header\|OCaml}}==~~ ~~'''Interpreter'''/'''Compiler:''' [[Ocaml]] 3.09~~ ~~String.length "Hello world";;~~ ~~=={{header\|Perl}}==~~ ~~'''Interpreter:''' [[Perl]] any 5.X~~ ~~my $length = length "Hello, world!";~~ ~~=={{header\|PHP}}==~~ ~~$length = strlen('Hello, world!');~~ ~~=={{header\|PL/SQL\|PL/SQL}}==~~ ~~DECLARE~~ ~~string VARCHAR2( 50 ) := 'Hello, world!';~~ ~~stringlength NUMBER;~~ ~~BEGIN~~ ~~stringlength := length( string );~~ ~~END;~~ ~~=={{header\|Python}}==~~ ~~'''Interpreter:''' [[Python]] 2.4~~ ~~len() returns the number of characters in a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first:~~ ~~<pre>~~ ~~>>> len('ascii')~~ 5 ~~>>> len(u'\u05d0') # the letter Alef as unicode literal~~ 1 ~~>>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string~~ 1 ~~</pre>~~ ~~=={{header\|Ruby}}==~~ ~~'''Library:''' [[active_support]]~~ ~~require 'active_support'~~ ~~puts "Hello World".chars.length~~ ~~=={{header\|Scheme}}==~~ ~~(string-length "Hello world")~~ ~~=={{header\|Seed7}}==~~ ~~length("Hello, world!")~~ ~~=={{header\|Smalltalk}}==~~ ~~string := 'Hello, world!".~~ ~~string size.~~ ~~=={{header\|Standard ML}}==~~ ~~'''Interpreter:''' [[Standard ML of New Jersey \| SML/NJ]] 110.60, [[Moscow ML]] 2.01 (January 2004)~~ ~~'''Compiler:''' [[MLton]] 20061107~~ ~~val strlen = size "Hello, world!";~~ ~~=={{header\|Tcl}}==~~ ~~Basic version:~~ ~~string length "Hello, world!"~~ ~~or more elaborately, needs '''Interpreter''' any 8.X. Tested on 8.4.12.~~ ~~fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly~~ ~~set s1 "hello, world"~~ ~~set s2 "\u304A\u306F\u3088\u3046"~~ ~~puts [format "length of \"%s\" in characters is %d" $s1 [string length $s1]]~~ ~~puts [format "length of \"%s\" in characters is %d" $s2 [string length $s2]]~~ ~~=={{header\|UNIX Shell}}==~~ ~~With external utilities:~~ ~~'''Interpreter:''' any [[Bourne Shell]]~~ ~~string='Hello, world!'~~ ~~length=`echo -n "$string" \| wc -c \| tr -dc '0-9'`~~ ~~echo $length # if you want it printed to the terminal~~ ~~With SUSv3 parameter expansion modifier:~~ ~~'''Interpreter:''' [[Almquist SHell]] (NetBSD 3.0), [[Bourne Again SHell]] 3.2, [[Korn SHell]] (5.2.14 99/07/13.2), [[Z SHell]]~~ ~~string='Hello, world!'~~ ~~length="${#string}"~~ ~~echo $length # if you want it printed to the terminal~~ ~~=={{header\|VBScript}}==~~ ~~Len(string\|varname)~~ ~~Returns the length of the string\|varname~~ ~~Returns null if string\|varname is null~~ ~~=={{header\|XSLT}}==~~ ~~<?xml version="1.0" encoding="UTF-8"?>~~ ~~...~~ ~~<xsl:value-of select="string-length('møøse')" /> <!-- 5 -->~~ ~~=={{header\|xTalk}}==~~ ~~'''Interpreter:''' HyperCard~~ ~~put the length of "Hello World"~~ or ~~put the number of characters in "Hello World"~~