String length: Difference between revisions
Add Ecstasy example
(Add Ecstasy example) |
|||
(8 intermediate revisions by 7 users not shown) | |||
Line 579:
===Character Length===
{{works with|QBasic}}
{{works with|Liberty BASIC}}
{{works with|PowerBASIC|PB/CC, PB/DOS}}
Line 587 ⟶ 585:
<syntaxhighlight lang="qbasic"> INPUT a$
PRINT LEN(a$)</syntaxhighlight>
==={{header|ANSI BASIC}}===
The ANSI BASIC needs line numbers.
<syntaxhighlight lang="basic">
10 INPUT A$
20 PRINT LEN(A$)
</syntaxhighlight>
==={{header|Applesoft BASIC}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
==={{header|BASIC256}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
==={{header|Chipmunk Basic}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
==={{header|MSX Basic}}===
{{works with|MSX BASIC|any}}
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.>
==={{header|Quite BASIC}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
==={{header|True BASIC}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
==={{header|Yabasic}}===
The [[#GW-BASIC|GW-BASIC]] solution works without any changes.
==={{header|ZX Spectrum Basic}}===
Line 1,266 ⟶ 1,293:
# 8
print len "J̲o̲s̲é̲"
# 1
print len "😀"
</syntaxhighlight>
=={{header|Ecstasy}}==
<syntaxhighlight lang="ecstasy">
module StrLen {
@Inject Console console;
void run(String s = "José") {
console.print($|For the string {s.quoted()}:
| Character length: {s.size}
| UTF-8 byte length: {s.calcUtf8Length()}
);
}
}
</syntaxhighlight>
{{out}}
<pre>
For the string "José":
Character length: 4
UTF-8 byte length: 5
</pre>
=={{header|Elena}}==
Line 1,823 ⟶ 1,873:
int actual_length = str.codePointCount(0, str.length()); // value is 1, which is the length in characters</syntaxhighlight>
===Grapheme Length===
Since JDK 20<ref>https://bugs.openjdk.org/browse/JDK-8291660</ref>.
<syntaxhighlight lang="java">import java.text.BreakIterator;
Line 1,850 ⟶ 1,903:
=={{header|JavaScript}}==
===Byte length===
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.
<syntaxhighlight lang="javascript">
var s = "Hello, world!";
var byteCount = s.length * 2; // 26
</syntaxhighlight>
It's easier to use Buffer.byteLength (Node.JS specific, not ECMAScript).
<syntaxhighlight lang="javascript">
a = '👩❤️👩'
Buffer.byteLength(a, 'utf16le'); // 16
Buffer.byteLength(a, 'utf8'); // 20
Buffer.byteLength(s, 'utf16le'); // 26
Buffer.byteLength(s, 'utf8'); // 13
</syntaxhighlight>
In pure ECMAScript, TextEncoder() can be used to return the UTF-8 byte size:
<syntaxhighlight lang="javascript">
(new TextEncoder().encode(a)).length; // 20
(new TextEncoder().encode(s)).length; // 13
</syntaxhighlight>
=== Unicode codepoint length ===
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
<syntaxhighlight lang="javascript">
var str1 = "Hello, world!";
var len1 = str1.length; // 13
var str2 = "\uD834\uDD2A"; // U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; // 2
</syntaxhighlight>
More generally, the expansion operator in an array can be used to enumerate Unicode code points:
<syntaxhighlight lang="javascript">
[...str2].length // 1
</syntaxhighlight>
=== Unicode grapheme length ===
Counting Unicode codepoints when using combining characters such as joining sequences or diacritics will return the wrong size, so we must count graphemes instead. Intl.Segmenter() default granularity is grapheme.
<syntaxhighlight lang="javascript">
[...new Intl.Segmenter().segment(a)].length; // 1
</syntaxhighlight>
===ES6 destructuring/iterators===
ES6 provides several ways to get a string split into an array of code points instead of UTF-16 code units:
<syntaxhighlight lang="javascript">let
Line 3,012 ⟶ 3,106:
Unfortunately, only character length can be retrieved in this language.
=={{header|RPL}}==
RPL strings are all made of 8-bit characters.
"RPL" SIZE
=={{header|Ruby}}==
Line 3,615 ⟶ 3,713:
di ustrlen(s)
47</syntaxhighlight>
=={{header|Stringle}}==
The only current implementation of Stringle uses 8-bit character sets, meaning character and byte length is always the same.
This prints the length of a string from input:
<syntaxhighlight lang="stringle">$ #$</syntaxhighlight>
=={{header|Swift}}==
Line 3,954 ⟶ 4,059:
=={{header|Wren}}==
===Byte Length===
<syntaxhighlight lang="
System.print("𝔘𝔫𝔦𝔠𝔬𝔡𝔢".bytes.count)
System.print("J̲o̲s̲é̲".bytes.count)</syntaxhighlight>
Line 3,966 ⟶ 4,071:
===Character Length===
<syntaxhighlight lang="
System.print("𝔘𝔫𝔦𝔠𝔬𝔡𝔢".count)
System.print("J̲o̲s̲é̲".count)</syntaxhighlight>
Line 3,979 ⟶ 4,084:
===Grapheme Length===
{{libheader|Wren-upc}}
<syntaxhighlight lang="
System.print(Graphemes.clusterCount("møøse"))
|