String length: Difference between revisions

← Older edit

String length (view source)

Revision as of 19:49, 30 April 2024

2,673 bytes added , 17 days ago

Add Ecstasy example

Xtclang

162

edits

Revision as of 11:18, 11 May 2023 (view source) Lanky79 (talk \| contribs) (→‎{{header\|EMal}}) ← Older edit		Latest revision as of 19:49, 30 April 2024 (view source) Xtclang (talk \| contribs) (Add Ecstasy example)
(8 intermediate revisions by 7 users not shown)
Line 579: ===Character Length=== {{works with\|QBasic}} {{works with\|Liberty BASIC}} {{works with\|PowerBASIC\|PB/CC, PB/DOS}} Line 587 ⟶ 585: <syntaxhighlight lang="qbasic"> INPUT a$ PRINT LEN(a$)</syntaxhighlight> ==={{header\|ANSI BASIC}}=== The ANSI BASIC needs line numbers. <syntaxhighlight lang="basic"> 10 INPUT A$ 20 PRINT LEN(A$) </syntaxhighlight> ==={{header\|Applesoft BASIC}}=== The [[#GW-BASIC\|GW-BASIC]] solution works without any changes. ==={{header\|BASIC256}}=== The [[#GW-BASIC\|GW-BASIC]] solution works without any changes. ==={{header\|Chipmunk Basic}}=== The [[#GW-BASIC\|GW-BASIC]] solution works without any changes. ==={{header\|MSX Basic}}=== {{works with\|MSX BASIC\|any}} The [[#GW-BASIC\|GW-BASIC]] solution works without any changes.> ==={{header\|Quite BASIC}}=== The [[#GW-BASIC\|GW-BASIC]] solution works without any changes. ==={{header\|True BASIC}}=== The [[#GW-BASIC\|GW-BASIC]] solution works without any changes. ==={{header\|Yabasic}}=== The [[#GW-BASIC\|GW-BASIC]] solution works without any changes. ==={{header\|ZX Spectrum Basic}}=== Line 1,266 ⟶ 1,293: # 8 print len "J̲o̲s̲é̲" # 1 print len "😀" </syntaxhighlight> =={{header\|Ecstasy}}== <syntaxhighlight lang="ecstasy"> module StrLen { @Inject Console console; void run(String s = "José") { console.print($\|For the string {s.quoted()}: \| Character length: {s.size} \| UTF-8 byte length: {s.calcUtf8Length()} ); } } </syntaxhighlight> {{out}} <pre> For the string "José": Character length: 4 UTF-8 byte length: 5 </pre> =={{header\|Elena}}== Line 1,823 ⟶ 1,873: int actual_length = str.codePointCount(0, str.length()); // value is 1, which is the length in characters</syntaxhighlight> ===Grapheme Length=== Since JDK 20<ref>https://bugs.openjdk.org/browse/JDK-8291660</ref>. <syntaxhighlight lang="java">import java.text.BreakIterator; Line 1,850 ⟶ 1,903: =={{header\|JavaScript}}== ~~===Byte Length===~~ ===Byte length=== JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number. <syntaxhighlight lang="javascript">~~var s = "Hello, world!";~~ var s = "Hello, world!"; ~~var byteCount = s.length * 2; //26</syntaxhighlight>~~ var byteCount = s.length * 2; // 26 ~~===Character Length===~~ </syntaxhighlight> It's easier to use Buffer.byteLength (Node.JS specific, not ECMAScript). <syntaxhighlight lang="javascript"> a = '👩‍❤️‍👩' Buffer.byteLength(a, 'utf16le'); // 16 Buffer.byteLength(a, 'utf8'); // 20 Buffer.byteLength(s, 'utf16le'); // 26 Buffer.byteLength(s, 'utf8'); // 13 </syntaxhighlight> In pure ECMAScript, TextEncoder() can be used to return the UTF-8 byte size: <syntaxhighlight lang="javascript"> (new TextEncoder().encode(a)).length; // 20 (new TextEncoder().encode(s)).length; // 13 </syntaxhighlight> === Unicode codepoint length === JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two. ~~JavaScript has no built-in way to determine how many characters are in a string. However, if~~If the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters. ~~<syntaxhighlight lang="javascript">var str1 = "Hello, world!";~~ <syntaxhighlight lang="javascript"> ~~var len1 = str1.length; //13~~ var str1 = "Hello, world!"; var len1 = str1.length; // 13 var str2 = "\uD834\uDD2A"; // U+1D12A represented by a UTF-16 surrogate pair var len2 = str2.length; // 2 </syntaxhighlight> More generally, the expansion operator in an array can be used to enumerate Unicode code points: <syntaxhighlight lang="javascript"> [...str2].length // 1 </syntaxhighlight> === Unicode grapheme length === Counting Unicode codepoints when using combining characters such as joining sequences or diacritics will return the wrong size, so we must count graphemes instead. Intl.Segmenter() default granularity is grapheme. <syntaxhighlight lang="javascript"> [...new Intl.Segmenter().segment(a)].length; // 1 </syntaxhighlight> ~~var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair~~ ~~var len2 = str2.length; //2</syntaxhighlight>~~ ===ES6 destructuring/iterators=== ES6 provides several ways to get a string split into an array of code points instead of UTF-16 code units: <syntaxhighlight lang="javascript">let Line 3,012 ⟶ 3,106: Unfortunately, only character length can be retrieved in this language. =={{header\|RPL}}== RPL strings are all made of 8-bit characters. "RPL" SIZE =={{header\|Ruby}}== Line 3,615 ⟶ 3,713: di ustrlen(s) 47</syntaxhighlight> =={{header\|Stringle}}== The only current implementation of Stringle uses 8-bit character sets, meaning character and byte length is always the same. This prints the length of a string from input: <syntaxhighlight lang="stringle">$ #$</syntaxhighlight> =={{header\|Swift}}== Line 3,954 ⟶ 4,059: =={{header\|Wren}}== ===Byte Length=== <syntaxhighlight lang="~~ecmascript~~wren">System.print("møøse".bytes.count) System.print("𝔘𝔫𝔦𝔠𝔬𝔡𝔢".bytes.count) System.print("J̲o̲s̲é̲".bytes.count)</syntaxhighlight> Line 3,966 ⟶ 4,071: ===Character Length=== <syntaxhighlight lang="~~ecmascript~~wren">System.print("møøse".count) System.print("𝔘𝔫𝔦𝔠𝔬𝔡𝔢".count) System.print("J̲o̲s̲é̲".count)</syntaxhighlight> Line 3,979 ⟶ 4,084: ===Grapheme Length=== {{libheader\|Wren-upc}} <syntaxhighlight lang="~~ecmascript~~wren">import "./upc" for Graphemes System.print(Graphemes.clusterCount("møøse"))