String length: Difference between revisions

→‎{{header|JavaScript}}: Unicode consideration
(→‎{{header|JavaScript}}: Unicode consideration)
Line 1,855:
 
=={{header|JavaScript}}==
 
===Byte Lengthlength===
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.
 
<syntaxhighlight lang="javascript">var s = "Hello, world!";
var s = "Hello, world!";
var byteCount = s.length * 2; // 26</syntaxhighlight>
===Character Length===
</syntaxhighlight>
 
It's easier to use Buffer.byteLength (Node.JS specific, not ECMAScript).
 
<syntaxhighlight lang="javascript">var str1 = "Hello, world!";
a = '👩‍❤️‍👩'
Buffer.byteLength(a, 'utf16le'); // 16
Buffer.byteLength(a, 'utf8'); // 20
Buffer.byteLength(s, 'utf16le'); // 26
Buffer.byteLength(s, 'utf8'); // 13
</syntaxhighlight>
 
In pure ECMAScript, TextEncoder() can be used to return the UTF-8 byte size:
 
<syntaxhighlight lang="javascript">
(new TextEncoder().encode(a)).length; // 20
(new TextEncoder().encode(s)).length; // 13
</syntaxhighlight>
 
=== Unicode codepoint length ===
 
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
 
JavaScript has no built-in way to determine how many characters are in a string. However, ifIf the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.
 
<syntaxhighlight lang="javascript">var str1 = "Hello, world!";
<syntaxhighlight lang="javascript">
var len1 = str1.length; //13
var str1 = "Hello, world!";
var len1 = str1.length; // 13
 
var str2 = "\uD834\uDD2A"; // U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; // 2</syntaxhighlight>
</syntaxhighlight>
 
More generally, the expansion operator in an array can be used to enumerate Unicode code points:
 
<syntaxhighlight lang="javascript">
[...str2].length // 1
</syntaxhighlight>
 
=== Unicode grapheme length ===
 
Counting Unicode codepoints when using combining characters such as joining sequences or diacritics will return the wrong size, so we must count graphemes instead. Intl.Segmenter() default granularity is grapheme.
 
<syntaxhighlight lang="javascript">
[...new Intl.Segmenter().segment(a)].length; // 1
</syntaxhighlight>
 
var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; //2</syntaxhighlight>
===ES6 destructuring/iterators===
 
ES6 provides several ways to get a string split into an array of code points instead of UTF-16 code units:
<syntaxhighlight lang="javascript">let