Unicode strings: Difference between revisions
Content added Content deleted
(Added Wren) |
|||
Line 1,560: | Line 1,560: | ||
プリント 'これは実験です。';</lang> |
プリント 'これは実験です。';</lang> |
||
=={{header|Wren}}== |
|||
Wren source code files are interpreted as UTF-8 encoded and so it is easy to include Unicode strings within scripts. |
|||
Although Unicode literals can be written directly, identifiers or keywords are limited to ASCII letters/digits or underscores. |
|||
The String type represents an immutable sequence of bytes and is usually interpreted as UTF-8 but doesn't have to be. It has methods to represent a string as either a list of bytes or a list of Unicode code-points. |
|||
If strings are iterated directly, they are considered to be a list of Unicode 'characters'. Likewise, the 'count' property returns the number of such characters, not the number of bytes. |
|||
However, string indexing (including methods which use or return an index) is done by byte offset as the use of code-point indexing is relatively inefficient. |
|||
The standard library does not support normalization but the [https://rosettacode.org/wiki/Category:Wren-upc Wren-upc] module does allow one to split a string into ''user perceived characters''. |
|||
<lang ecmascript>var w = "voilà" |
|||
for (c in w) { |
|||
System.write("%(c) ") // prints the 5 Unicode 'characters'. |
|||
} |
|||
System.print("\nThe length of %(w) is %(w.count)") |
|||
System.print("\nIts code-points are:") |
|||
for (cp in w.codePoints) { |
|||
System.write("%(cp) ") // prints the code-points as numbers |
|||
} |
|||
System.print("\n\nIts bytes are: ") |
|||
for (b in w.bytes) { |
|||
System.write("%(b) ") // prints the bytes as numbers |
|||
} |
|||
System.print()</lang> |
|||
{{out}} |
|||
<pre> |
|||
v o i l à |
|||
The length of voilà is 5 |
|||
Its code-points are: |
|||
118 111 105 108 224 |
|||
Its bytes are: |
|||
118 111 105 108 195 160 |
|||
</pre> |
|||
=={{header|zkl}}== |
=={{header|zkl}}== |