Unicode strings: Difference between revisions

Content added Content deleted
(Added Wren)
Line 1,560: Line 1,560:


プリント 'これは実験です。';</lang>
プリント 'これは実験です。';</lang>

=={{header|Wren}}==
Wren source code files are interpreted as UTF-8 encoded and so it is easy to include Unicode strings within scripts.

Although Unicode literals can be written directly, identifiers or keywords are limited to ASCII letters/digits or underscores.

The String type represents an immutable sequence of bytes and is usually interpreted as UTF-8 but doesn't have to be. It has methods to represent a string as either a list of bytes or a list of Unicode code-points.

If strings are iterated directly, they are considered to be a list of Unicode 'characters'. Likewise, the 'count' property returns the number of such characters, not the number of bytes.

However, string indexing (including methods which use or return an index) is done by byte offset as the use of code-point indexing is relatively inefficient.

The standard library does not support normalization but the [https://rosettacode.org/wiki/Category:Wren-upc Wren-upc] module does allow one to split a string into ''user perceived characters''.
<lang ecmascript>var w = "voilà"
for (c in w) {
System.write("%(c) ") // prints the 5 Unicode 'characters'.
}
System.print("\nThe length of %(w) is %(w.count)")


System.print("\nIts code-points are:")
for (cp in w.codePoints) {
System.write("%(cp) ") // prints the code-points as numbers
}

System.print("\n\nIts bytes are: ")
for (b in w.bytes) {
System.write("%(b) ") // prints the bytes as numbers
}
System.print()</lang>

{{out}}
<pre>
v o i l à
The length of voilà is 5

Its code-points are:
118 111 105 108 224

Its bytes are:
118 111 105 108 195 160
</pre>


=={{header|zkl}}==
=={{header|zkl}}==