Unicode strings: Difference between revisions

Content added Content deleted

Inline

@@ Line 1,560: / Line 1,560: @@
 プリント 'これは実験です。';</lang>
+=={{header|Wren}}==
+Wren source code files are interpreted as UTF-8 encoded and so it is easy to include Unicode strings within scripts.
+Although Unicode literals can be written directly, identifiers or keywords are limited to ASCII letters/digits or underscores.
+The String type represents an immutable sequence of bytes and is usually interpreted as UTF-8 but doesn't have to be. It has methods to represent a string as either a list of bytes or a list of Unicode code-points.
+If strings are iterated directly, they are considered to be a list of Unicode 'characters'. Likewise, the 'count' property returns the number of such characters, not the number of bytes.
+However, string indexing (including methods which use or return an index) is done by byte offset as the use of code-point indexing is relatively inefficient.
+The standard library does not support normalization but the [https://rosettacode.org/wiki/Category:Wren-upc Wren-upc] module does allow one to split a string into ''user perceived characters''.
+<lang ecmascript>var w = "voilà"
+for (c in w) {
+    System.write("%(c) ") // prints the 5 Unicode 'characters'.
+}
+System.print("\nThe length of %(w) is %(w.count)")
+System.print("\nIts code-points are:")
+for (cp in w.codePoints) {
+    System.write("%(cp) ") // prints the code-points as numbers
+}
+System.print("\n\nIts bytes are: ")
+for (b in w.bytes) {
+    System.write("%(b) ") // prints the bytes as numbers
+}
+System.print()</lang>
+{{out}}
+<pre>
+v o i l à
+The length of voilà is 5
+Its code-points are:
+111 105 108 224
+Its bytes are:
+111 105 108 195 160
+</pre>
 =={{header|zkl}}==