Jump to content

Unicode strings: Difference between revisions

→‎{{header|Wren}}: Added a upc example.
(Added Wren)
(→‎{{header|Wren}}: Added a upc example.)
Line 1,562:
 
=={{header|Wren}}==
{{libheader|Wren-upc}}
Wren source code files are interpreted as UTF-8 encoded and so it is easy to include Unicode strings within scripts.
 
Line 1,572 ⟶ 1,573:
However, string indexing (including methods which use or return an index) is done by byte offset as the use of code-point indexing is relatively inefficient.
 
The standard library does not support normalization but the [https://rosettacode.org/wiki/Category:Wren-upc Wren-upc]above module does allow one to split a string into ''user perceived characters'' (or ''graphemes'').
<lang ecmascript>var w = "voilà"
for (c in w) {
Line 1,589 ⟶ 1,590:
System.write("%(b) ") // prints the bytes as numbers
}
 
System.print()</lang>
var zwe = "👨‍👩‍👧"
System.print("\n\n%(zwe) has:")
System.print(" %(zwe.bytes.count) bytes: %(zwe.bytes.toList.join(" "))")
System.print(" %(zwe.codePoints.count) code-points: %(zwe.codePoints.toList.join(" "))")
System.print(" %(Graphemes.clusterCount(zwe)) grapheme")</lang>
 
{{out}}
Line 1,601 ⟶ 1,607:
Its bytes are:
118 111 105 108 195 160
 
👨‍👩‍👧 has:
18 bytes: 240 159 145 168 226 128 141 240 159 145 169 226 128 141 240 159 145 167
5 code-points: 128104 8205 128105 8205 128103
1 grapheme
</pre>
 
9,490

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.