Unicode strings: Difference between revisions

Unicode strings (view source)

Revision as of 17:24, 3 January 2023

447 bytes added , 1 year ago

→‎{{header|Raku}}: update about Unicode version support and string normalization

Noraj (ACCEIS)

45

edits

Revision as of 16:35, 30 December 2022 (view source) Thundergnat (talk \| contribs) m (→‎{{header\|Raku}}: Even more Unicodey) ← Older edit		Revision as of 17:24, 3 January 2023 (view source) Noraj (ACCEIS) (talk \| contribs) (→‎{{header\|Raku}}: update about Unicode version support and string normalization) Newer edit →
Line 1,220: (formerly Perl 6) Raku programs and strings are all in Unicode and operate at a grapheme abstraction level, which is agnostic to underlying encodings ~~or normalizations~~. (These are generally handled at program boundaries.) Opened files default to UTF-8 encoding. All Unicode character properties are in play, so any appropriate characters may be used as parts of identifiers, whitespace, or user-defined operators. For instance: <syntaxhighlight lang="raku" line>sub prefix:<∛> (\𝐕) { 𝐕 ** ⅓ } Line 1,227: Non-Unicode strings are represented as Buf types rather than Str types, and Unicode operations may not be applied to Buf types without some kind of explicit conversion. Only ASCIIish operations are allowed on buffers. As the latest version (2022.12) of Rakudo (Raku compiler) available, the official [https://docs.raku.org/language/unicode Raku documentation about Unicode supports] says: ~~Raku tracks the Unicode consortium standards releases and is generally up to the latest~~ ~~standard within a month or so of its release. (currently at 13.1 as of May 2021)~~ <blockquote> Raku has a high level of support of Unicode, with the latest version supporting Unicode 12.1. </blockquote> So Unicode 13.0, 14.0 and 15.0 are not yet supported (or the documentation is outdated). However, Raku still supports the following Unicode features * Supports the normalized forms NFC, NFD, NFKC, and NFKD, and character equivalence as specified in [http://unicode.org/reports/tr15/ Unicode technical report #15]. Line 1,238 ⟶ 1,245: * Works seamlessly with upper plane and private use plane character codepoints. * Provides tools to deal with strings that contain invalid Unicode characters. In general, it tries to make dealing with Unicode "just work". Raku intends to support Unicode ~~even~~ better than Perl 5~~, which already does a great job in recent versions of accessing large swaths of Unicode spec. functionality~~. Raku improves on Perl 5 primarily by offering explicitly typed strings that always know which operations are sensical and which are not. A very important distinctive characteristic of Raku to keep in mind is that it applies normalization (Unicode NFC form (Normalization Form Canonical)) autoamtically by default to all strings as showcased and explained on the [[String comparison#Unicode_normalization_by_default\|String comparison page]]. =={{header\|REXX}}==