Unicode strings: Difference between revisions

Content added Content deleted
(Added Arturo implementation)
m (→‎{{header|Raku}}: typo, formatting, update)
Line 1,215: Line 1,215:
=={{header|Raku}}==
=={{header|Raku}}==
(formerly Perl 6)
(formerly Perl 6)

Raku programs and strings are all in Unicode and operate at a grapheme abstraction level, which is agnostic to underlying encodings or normalizations. (These are generally handled at program boundaries.) Opened files default to UTF-8 encoding. All Unicode character properties are in play, so any appropriate characters may be used as parts of identifiers, whitespace, or user-defined operators. For instance:
Raku programs and strings are all in Unicode and operate at a grapheme abstraction level, which is agnostic to underlying encodings or normalizations. (These are generally handled at program boundaries.) Opened files default to UTF-8 encoding. All Unicode character properties are in play, so any appropriate characters may be used as parts of identifiers, whitespace, or user-defined operators. For instance:


Line 1,223: Line 1,224:


Raku tracks the Unicode consortium standards releases and is generally up to the latest
Raku tracks the Unicode consortium standards releases and is generally up to the latest
standard within a month or so of its release. (currently at 12.1 as of Nov. 2019)
standard within a month or so of its release. (currently at 13.1 as of May 2021)


* Supports the normalized forms NFC, NFD, NFKC, and NFKD, and character equivalence as specified in [http://unicode.org/reports/tr15/ Unicode technical report #15].
* Supports the normalized forms NFC, NFD, NFKC, and NFKD, and character equivalence as specified in [http://unicode.org/reports/tr15/ Unicode technical report #15].
Line 1,232: Line 1,233:
* Provides built-in routines to access character names, do name-to-character character-to-ordinal and ordinal-to-character conversions.
* Provides built-in routines to access character names, do name-to-character character-to-ordinal and ordinal-to-character conversions.
* Works seamlessly with upper plane and private use plane character codepoints.
* Works seamlessly with upper plane and private use plane character codepoints.
* Provides tools to deal with strings that contain invalid Unicode charters.
* Provides tools to deal with strings that contain invalid Unicode characters.



In general, it tries to make dealing with Unicode "just work".
In general, it tries to make dealing with Unicode "just work".