Unicode strings: Difference between revisions
Content deleted Content added
jq |
→{{header|Go}}: Update info on normalization support; add link to highly relevant official Go blog article; other tweaks |
||
Line 533:
=={{header|Go}}==
Go source code is specified to be UTF-8 encoded.
This directly allows any Unicode code point in character and string literals.
Unicode is also allowed in identifiers like variables and field names, with some restrictions.
The <code>string</code> data type represents a read-only sequence of bytes, conventionally but not necessarily represents UTF-8-encoded text.
A number of built-in features interpret <code>string</code>s as UTF-8. For example,
<lang go> var i int
var u rune
Line 547 ⟶ 551:
4 224
</pre>
224 being the Unicode code point for the à character
Note <code>rune</code> is predefined to be a type that can hold a Unicode code point.
In contrast,
Line 564 ⟶ 569:
5 160
</pre>
bytes 4 and 5 showing the UTF-8 encoding of à
The expression <code>w[i]</code> in this case has the type of <code>byte</code> rather than <code>rune</code>.
A Go blog post covers this in more detail: [http://blog.golang.org/strings Strings, bytes, runes and characters in Go].
The heavily used standard packages <
The standard packages <code>unicode</code>, <code>unicode/utf8</code>, and <code>unicode/utf16</code> have additional functions.
Normalization support is available in the [[:Category:Go sub-repositories|sub-repository]] package <code>code.google.com/p/go.text/unicode/norm</code>.
It contains a number of string manipulation functions that work with the four normalization forms NFC, NFD, NFKC, and NFKD.
The normalization form type in this package implements the <code>io.Reader</code> and <code>io.WriteCloser</code> interfaces to enable on-the-fly normalization during I/O.
A Go blog post covers this in more detail: [http://blog.golang.org/normalization Text normalization in Go].
There is no built-in or automatic handling of byte order marks (which are at best unnecessary with UTF-8).
=={{header|Haskell}}==
|