String comparison: Difference between revisions

Content added Content deleted
(→‎{{header|Raku}}: add title for Unicode NFC)
m (→‎Unicode normalization by default: A little less negative about following the Unicode spec)
Line 3,606: Line 3,606:
=== Unicode normalization by default ===
=== Unicode normalization by default ===


Beware that Raku applies normalization (Unicode NFC form (Normalization Form Canonical)) by default to all input and output except for file names<ref>https://docs.raku.org/language/unicode</ref>. Shortly put your Unicode string will rarely be what you expect when you include characters outside the ASCII range.
Be aware that Raku applies normalization (Unicode NFC form (Normalization Form Canonical)) by default to all input and output except for file names [https://docs.raku.org/language/unicode See docs]. Raku follows the Unicode spec. Raku follows '''all''' of the Unicode spec, including parts that some people don't like. There are some graphemes for which the Unicode consortium has specified that the NFC form is a different (though usually visually identical) grapheme. Referred to in [https://www.unicode.org/reports/tr15 Unicode standard annex #15] as '''Canonical Equivalence'''. Raku adheres to that spec.


One that people seem to get hung up on is the Kelvin symbol "K" getting automatically converted to ASCII uppercase "K".
<syntaxhighlight lang="raku" line>

say "\c[KELVIN SIGN]".uniname;
<syntaxhighlight lang="raku" line>say "\c[KELVIN SIGN]".uniname;
# => LATIN CAPITAL LETTER K
# => LATIN CAPITAL LETTER K


Line 3,616: Line 3,617:
say ($kelvin eq $k); # True, lexically equal
say ($kelvin eq $k); # True, lexically equal
say ($kelvin eqv $k); # True, generically equal
say ($kelvin eqv $k); # True, generically equal
say ($kelvin === $k); # True, identical objects
say ($kelvin === $k); # True, identical objects</syntaxhighlight>
</syntaxhighlight>


In most programming language the previous two objects wouldn't be equivalent but in Raku there are since normalization is applied by default automatically and can't be disabled.
In most programming language the previous two objects wouldn't be equivalent, but since Raku follows the Unicode specification, and normalization is applied automatically, they show up as equivalent.


It's officially identified as a common trap for string handling<ref>https://docs.raku.org/language/traps#All_text_is_normalized_by_default</ref>.
It's officially identified as a possible trap for string handling. [https://docs.raku.org/language/traps#All_text_is_normalized_by_default See docs].


=={{header|Relation}}==
=={{header|Relation}}==