Jump to content

Talk:Substring: Difference between revisions

m
→‎Unicode support: It's a product of its time
(→‎Unicode support: new section)
m (→‎Unicode support: It's a product of its time)
 
Line 61:
 
Most modern languages support Unicode in some form, usually UTF-16. However, none of the example code in any of these languages seems to work correctly with characters above the Basic Multilingual Plane. Why is that? Did the requirement to support all Unicode code points come after the examples were written?
 
: Back in August 2009 when this task was added, Unicode support was much spottier than it is today. The task asks that '''if''' the the language uses UTF-8 or UTF-16, then it should support character outside the BMP, but doesn't require it to be explicitly shown. That is a failing, but the task is so old and has so many entries, it would be challenging to go back and change the requirements now. It is still kind of a crapshoot whether a particular language can seamlessly support upper plane characters though many do, and do it well; the Raku entry, for instance: (last two subtasks slightly modified as the "given letter/sequence" is no longer "given")
 
: <syntaxhighlight lang="raku" line>for 'abcdefgh', '𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍', '𠘀𠘁𠘂𠘃𠘄𠘅𠘆', '👒🎩👩‍👩‍👦‍👦🧢🎓👨‍👧‍👧' -> $str {
my $n = 2;
my $m = 3;
say $str.substr($n, $m);
say $str.substr($n);
say $str.substr(0, *-1);
say $str.substr($str.index($str.comb[3]), $m);
say $str.substr($str.index($str.substr($m,$n)), $m);
}</syntaxhighlight>
 
:yields
 
<pre style="margin-left:2em;">cde
cdefgh
abcdefg
def
def
𝖈𝖉𝖊
𝖈𝖉𝖊𝖋𝖌𝖍
𝖆𝖇𝖈𝖉𝖊𝖋𝖌
𝖉𝖊𝖋
𝖉𝖊𝖋
𠘂𠘃𠘄
𠘂𠘃𠘄𠘅𠘆
𠘀𠘁𠘂𠘃𠘄𠘅
𠘃𠘄𠘅
𠘃𠘄𠘅
👩‍👩‍👦‍👦🧢🎓
👩‍👩‍👦‍👦🧢🎓👨‍👧‍👧
👒🎩👩‍👩‍👦‍👦🧢🎓
🧢🎓👨‍👧‍👧
🧢🎓👨‍👧‍👧</pre>
 
: Seamlessly and transparently handles any valid Unicode glyph. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]) 12:19, 15 August 2023 (UTC)
10,333

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.