String comparison: Difference between revisions

m
→‎Unicode normalization by default: A little less negative about following the Unicode spec
(→‎{{header|Raku}}: add title for Unicode NFC)
m (→‎Unicode normalization by default: A little less negative about following the Unicode spec)
Line 3,606:
=== Unicode normalization by default ===
 
BewareBe aware that Raku applies normalization (Unicode NFC form (Normalization Form Canonical)) by default to all input and output except for file names<ref> [https://docs.raku.org/language/unicode</ref> See docs]. ShortlyRaku putfollows yourthe Unicode stringspec. willRaku rarelyfollows be'''all''' whatof youthe expectUnicode whenspec, youincluding includeparts charactersthat outsidesome people don't like. There are some graphemes for which the ASCIIUnicode rangeconsortium has specified that the NFC form is a different (though usually visually identical) grapheme. Referred to in [https://www.unicode.org/reports/tr15 Unicode standard annex #15] as '''Canonical Equivalence'''. Raku adheres to that spec.
 
One that people seem to get hung up on is the Kelvin symbol "K" getting automatically converted to ASCII uppercase "K".
<syntaxhighlight lang="raku" line>
 
say "\c[KELVIN SIGN]".uniname;
<syntaxhighlight lang="raku" line>say "\c[KELVIN SIGN]".uniname;
# => LATIN CAPITAL LETTER K
 
Line 3,616 ⟶ 3,617:
say ($kelvin eq $k); # True, lexically equal
say ($kelvin eqv $k); # True, generically equal
say ($kelvin === $k); # True, identical objects</syntaxhighlight>
</syntaxhighlight>
 
In most programming language the previous two objects wouldn't be equivalent, but insince Raku therefollows arethe sinceUnicode specification, and normalization is applied by default automatically, andthey show can'tup beas disabledequivalent.
 
It's officially identified as a commonpossible trap for string handling<ref>. [https://docs.raku.org/language/traps#All_text_is_normalized_by_default</ref> See docs].
 
=={{header|Relation}}==
10,343

edits