Unicode strings: Difference between revisions

Line 1,215:

=={{header|Raku}}==

(formerly Perl 6)

Raku programs and strings are all in Unicode and operate at a grapheme abstraction level, which is agnostic to underlying encodings or normalizations. (These are generally handled at program boundaries.) Opened files default to UTF-8 encoding. All Unicode character properties are in play, so any appropriate characters may be used as parts of identifiers, whitespace, or user-defined operators. For instance:

Line 1,223:

Line 1,224:

Raku tracks the Unicode consortium standards releases and is generally up to the latest

standard within a month or so of its release. (currently at 12.1 as of ~~Nov.~~ ~~2019~~)

standard within a month or so of its release. (currently at 13.1 as of May 2021)

* Supports the normalized forms NFC, NFD, NFKC, and NFKD, and character equivalence as specified in [http://unicode.org/reports/tr15/ Unicode technical report #15].

Line 1,232:

Line 1,233:

* Provides built-in routines to access character names, do name-to-character character-to-ordinal and ordinal-to-character conversions.

* Works seamlessly with upper plane and private use plane character codepoints.

* Provides tools to deal with strings that contain invalid Unicode ~~charters~~.

* Provides tools to deal with strings that contain invalid Unicode characters.

In general, it tries to make dealing with Unicode "just work".