String length: Difference between revisions

Line 1,855:

=={{header|JavaScript}}==

===Byte ~~Length~~===

===Byte length===

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

<syntaxhighlight lang="javascript">~~var s = "Hello, world!";~~

var s = "Hello, world!";

var byteCount = s.length * 2; //26~~</syntaxhighlight>~~

var byteCount = s.length * 2; // 26

===Character Length===

</syntaxhighlight>

It's easier to use Buffer.byteLength (Node.JS specific, not ECMAScript).

⚫

a = '👩‍❤️‍👩'

Buffer.byteLength(a, 'utf16le'); // 16

Buffer.byteLength(a, 'utf8'); // 20

Buffer.byteLength(s, 'utf16le'); // 26

Buffer.byteLength(s, 'utf8'); // 13

</syntaxhighlight>

In pure ECMAScript, TextEncoder() can be used to return the UTF-8 byte size:

(new TextEncoder().encode(a)).length; // 20

(new TextEncoder().encode(s)).length; // 13

</syntaxhighlight>

=== Unicode codepoint length ===

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

~~JavaScript has no built-in way to determine how many characters are in a string. However, if~~ the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.

If the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.

⚫

<syntaxhighlight lang="javascript">~~var str1 = "Hello, world!";~~

⚫

var len1 = str1.length; //13

var str1 = "Hello, world!";

⚫

var len1 = str1.length; // 13

⚫

var str2 = "\uD834\uDD2A"; // U+1D12A represented by a UTF-16 surrogate pair

⚫

var len2 = str2.length; // 2

</syntaxhighlight>

More generally, the expansion operator in an array can be used to enumerate Unicode code points:

[...str2].length // 1

</syntaxhighlight>

=== Unicode grapheme length ===

Counting Unicode codepoints when using combining characters such as joining sequences or diacritics will return the wrong size, so we must count graphemes instead. Intl.Segmenter() default granularity is grapheme.

[...new Intl.Segmenter().segment(a)].length; // 1

</syntaxhighlight>

⚫

var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair

⚫

var len2 = str2.length; //2~~</syntaxhighlight>~~

===ES6 destructuring/iterators===

ES6 provides several ways to get a string split into an array of code points instead of UTF-16 code units:

<syntaxhighlight lang="javascript">let