Talk:Substring: Difference between revisions

m
→‎Unicode support: It's a product of its time
(whole string minus first character here)
m (→‎Unicode support: It's a product of its time)
 
(8 intermediate revisions by 4 users not shown)
Line 1:
__TOC__
 
== why not have other cases ? ==
 
The individual subtasks here seem to cover only certain particular arbitrary use cases and not others. Why not have
* substring that starts at index n and ends at index m
Line 4 ⟶ 8:
* substring that starts at n places before the end of the string and is of length m
* and so on
 
 
Also, the last two subtasks seem very obscure and contrived. No language seems to have built-in methods for them. It seems that all the solutions are basically (1) find the character or substring we are looking for, and (2) use the first subtask ("starting from n characters in and of m length") to get the result. Why not just put the finding the character or substring part as a separate article? --[[Special:Contributions/76.173.203.32|76.173.203.32]] 09:28, 10 August 2009 (UTC)
Line 26 ⟶ 31:
 
[[User:Markhobley|Markhobley]] 23:30, 2 June 2011 (UTC)
 
 
==Substantial task changes affecting many examples==
 
The more examples their are for a task, the more effort it takes to change the essential task goals '''and get all the examples updated'''. This task is not draft and has 60 examples. You need to weigh any change to the task definition against the ability to get most of the examples updated, and I think 60 examples is too much for a change that adds another requirement to the task description when the task description without it wasn't so bad.
 
What do others think? --[[User:Paddy3118|Paddy3118]] 04:35, 5 June 2011 (UTC)
:I agree. Way too far in to a task effort to make changes without discussion. Also, I'm not sure how many occasions there are to show a string minus the first character. In any case, we should talk about it first. --[[User:Mwn3d|Mwn3d]] 05:03, 5 June 2011 (UTC)
 
:: (regarding showing a string minus the 1<sup>st</sup> character): &nbsp; this is covered by the 1<sup>st</sup> task requirement. &nbsp; ''Showing'' that result isn't very common, but ''using'' a string starting with the 2<sup>nd</sup> is. -- [[User:Gerard Schildberger|Gerard Schildberger]] 18:52, 14 March 2013 (UTC)
 
 
==special cases==
 
I am wondering how many languages allow a zero length &nbsp; '''SUBSTR''' &nbsp; (or equivalent BIF).
 
Also, the 3<sup>rd</sup> task requirement: &nbsp; if the (original) string is a null string, how many language examples would handle that case?
 
I've been bit in the hinder too many times on that little ditty. &nbsp; &nbsp; "But, but, but, it never should've happened ..."
 
-- [[User:Gerard Schildberger|Gerard Schildberger]] 19:03, 14 March 2013 (UTC)
 
-----
 
Also, considering the PL/I version when the original string has a length of zero (a null string, if you will). &nbsp; What does the PL/I '''substr''' BIF do with a negative length (3<sup>rd</sup> argument)? -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 23:15, 5 October 2013 (UTC)
 
== Unicode support ==
 
Most modern languages support Unicode in some form, usually UTF-16. However, none of the example code in any of these languages seems to work correctly with characters above the Basic Multilingual Plane. Why is that? Did the requirement to support all Unicode code points come after the examples were written?
 
: Back in August 2009 when this task was added, Unicode support was much spottier than it is today. The task asks that '''if''' the the language uses UTF-8 or UTF-16, then it should support character outside the BMP, but doesn't require it to be explicitly shown. That is a failing, but the task is so old and has so many entries, it would be challenging to go back and change the requirements now. It is still kind of a crapshoot whether a particular language can seamlessly support upper plane characters though many do, and do it well; the Raku entry, for instance: (last two subtasks slightly modified as the "given letter/sequence" is no longer "given")
 
: <syntaxhighlight lang="raku" line>for 'abcdefgh', '𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍', '𠘀𠘁𠘂𠘃𠘄𠘅𠘆', '👒🎩👩‍👩‍👦‍👦🧢🎓👨‍👧‍👧' -> $str {
my $n = 2;
my $m = 3;
say $str.substr($n, $m);
say $str.substr($n);
say $str.substr(0, *-1);
say $str.substr($str.index($str.comb[3]), $m);
say $str.substr($str.index($str.substr($m,$n)), $m);
}</syntaxhighlight>
 
:yields
 
<pre style="margin-left:2em;">cde
cdefgh
abcdefg
def
def
𝖈𝖉𝖊
𝖈𝖉𝖊𝖋𝖌𝖍
𝖆𝖇𝖈𝖉𝖊𝖋𝖌
𝖉𝖊𝖋
𝖉𝖊𝖋
𠘂𠘃𠘄
𠘂𠘃𠘄𠘅𠘆
𠘀𠘁𠘂𠘃𠘄𠘅
𠘃𠘄𠘅
𠘃𠘄𠘅
👩‍👩‍👦‍👦🧢🎓
👩‍👩‍👦‍👦🧢🎓👨‍👧‍👧
👒🎩👩‍👩‍👦‍👦🧢🎓
🧢🎓👨‍👧‍👧
🧢🎓👨‍👧‍👧</pre>
 
: Seamlessly and transparently handles any valid Unicode glyph. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]) 12:19, 15 August 2023 (UTC)
10,327

edits