String length: Difference between revisions

Add Ecstasy example
(Add Ecstasy example)
 
(8 intermediate revisions by 7 users not shown)
Line 579:
===Character Length===
{{works with|QBasic}}
 
{{works with|Liberty BASIC}}
 
{{works with|PowerBASIC|PB/CC, PB/DOS}}
 
Line 587 ⟶ 585:
<syntaxhighlight lang="qbasic"> INPUT a$
PRINT LEN(a$)</syntaxhighlight>
 
==={{header|ANSI BASIC}}===
The ANSI BASIC needs line numbers.
<syntaxhighlight lang="basic">
10 INPUT A$
20 PRINT LEN(A$)
</syntaxhighlight>
 
==={{header|Applesoft BASIC}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
 
==={{header|BASIC256}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
 
==={{header|Chipmunk Basic}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
 
==={{header|MSX Basic}}===
{{works with|MSX BASIC|any}}
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.>
 
==={{header|Quite BASIC}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
 
==={{header|True BASIC}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
 
==={{header|Yabasic}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
 
==={{header|ZX Spectrum Basic}}===
Line 1,266 ⟶ 1,293:
# 8
print len "J̲o̲s̲é̲"
# 1
print len "😀"
</syntaxhighlight>
 
=={{header|Ecstasy}}==
<syntaxhighlight lang="ecstasy">
module StrLen {
@Inject Console console;
 
void run(String s = "José") {
console.print($|For the string {s.quoted()}:
| Character length: {s.size}
| UTF-8 byte length: {s.calcUtf8Length()}
);
}
}
</syntaxhighlight>
 
{{out}}
<pre>
For the string "José":
Character length: 4
UTF-8 byte length: 5
</pre>
 
=={{header|Elena}}==
Line 1,823 ⟶ 1,873:
int actual_length = str.codePointCount(0, str.length()); // value is 1, which is the length in characters</syntaxhighlight>
===Grapheme Length===
 
Since JDK 20<ref>https://bugs.openjdk.org/browse/JDK-8291660</ref>.
 
<syntaxhighlight lang="java">import java.text.BreakIterator;
 
Line 1,850 ⟶ 1,903:
 
=={{header|JavaScript}}==
 
===Byte Length===
===Byte length===
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.
 
<syntaxhighlight lang="javascript">var s = "Hello, world!";
var s = "Hello, world!";
var byteCount = s.length * 2; //26</syntaxhighlight>
var byteCount = s.length * 2; // 26
===Character Length===
</syntaxhighlight>
 
It's easier to use Buffer.byteLength (Node.JS specific, not ECMAScript).
 
<syntaxhighlight lang="javascript">
a = '👩‍❤️‍👩'
Buffer.byteLength(a, 'utf16le'); // 16
Buffer.byteLength(a, 'utf8'); // 20
Buffer.byteLength(s, 'utf16le'); // 26
Buffer.byteLength(s, 'utf8'); // 13
</syntaxhighlight>
 
In pure ECMAScript, TextEncoder() can be used to return the UTF-8 byte size:
 
<syntaxhighlight lang="javascript">
(new TextEncoder().encode(a)).length; // 20
(new TextEncoder().encode(s)).length; // 13
</syntaxhighlight>
 
=== Unicode codepoint length ===
 
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
 
JavaScript has no built-in way to determine how many characters are in a string. However, ifIf the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.
 
<syntaxhighlight lang="javascript">var str1 = "Hello, world!";
<syntaxhighlight lang="javascript">
var len1 = str1.length; //13
var str1 = "Hello, world!";
var len1 = str1.length; // 13
 
var str2 = "\uD834\uDD2A"; // U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; // 2
</syntaxhighlight>
 
More generally, the expansion operator in an array can be used to enumerate Unicode code points:
 
<syntaxhighlight lang="javascript">
[...str2].length // 1
</syntaxhighlight>
 
=== Unicode grapheme length ===
 
Counting Unicode codepoints when using combining characters such as joining sequences or diacritics will return the wrong size, so we must count graphemes instead. Intl.Segmenter() default granularity is grapheme.
 
<syntaxhighlight lang="javascript">
[...new Intl.Segmenter().segment(a)].length; // 1
</syntaxhighlight>
 
var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; //2</syntaxhighlight>
===ES6 destructuring/iterators===
 
ES6 provides several ways to get a string split into an array of code points instead of UTF-16 code units:
<syntaxhighlight lang="javascript">let
Line 3,012 ⟶ 3,106:
 
Unfortunately, only character length can be retrieved in this language.
 
=={{header|RPL}}==
RPL strings are all made of 8-bit characters.
"RPL" SIZE
 
=={{header|Ruby}}==
Line 3,615 ⟶ 3,713:
di ustrlen(s)
47</syntaxhighlight>
 
=={{header|Stringle}}==
The only current implementation of Stringle uses 8-bit character sets, meaning character and byte length is always the same.
 
This prints the length of a string from input:
 
<syntaxhighlight lang="stringle">$ #$</syntaxhighlight>
 
=={{header|Swift}}==
Line 3,954 ⟶ 4,059:
=={{header|Wren}}==
===Byte Length===
<syntaxhighlight lang="ecmascriptwren">System.print("møøse".bytes.count)
System.print("𝔘𝔫𝔦𝔠𝔬𝔡𝔢".bytes.count)
System.print("J̲o̲s̲é̲".bytes.count)</syntaxhighlight>
Line 3,966 ⟶ 4,071:
 
===Character Length===
<syntaxhighlight lang="ecmascriptwren">System.print("møøse".count)
System.print("𝔘𝔫𝔦𝔠𝔬𝔡𝔢".count)
System.print("J̲o̲s̲é̲".count)</syntaxhighlight>
Line 3,979 ⟶ 4,084:
===Grapheme Length===
{{libheader|Wren-upc}}
<syntaxhighlight lang="ecmascriptwren">import "./upc" for Graphemes
 
System.print(Graphemes.clusterCount("møøse"))
162

edits