Talk:Substring

why not have other cases ?

The individual subtasks here seem to cover only certain particular arbitrary use cases and not others. Why not have

substring that starts at index n and ends at index m
substring that starts at index n and ends at m places before the end of the string
substring that starts at n places before the end of the string and is of length m
and so on

Also, the last two subtasks seem very obscure and contrived. No language seems to have built-in methods for them. It seems that all the solutions are basically (1) find the character or substring we are looking for, and (2) use the first subtask ("starting from n characters in and of m length") to get the result. Why not just put the finding the character or substring part as a separate article? --76.173.203.32 09:28, 10 August 2009 (UTC)

> The individual subtasks here seem to cover only certain particular arbitrary use cases and not others.

I thought it would be overly repetitious and verbose to cover all cases.

> Also, the last two subtasks seem very obscure and contrived. No language seems to have built-in methods for them.

Yes, I think you're right. I expected Ruby to have this feature but it turned out not to. Leaving the only language I know that does as XSLT http://www.zvon.org/xxl/XSLTreference/Output/function_substring-after.html Hardly significant enough to justify those two subtasks. If you're happy to make those changes I'll support them.

Oligomous 17:48, 10 August 2009 (UTC)

For what it's worth, the last Snobol4 subtask was incorrect, though it happened to return the right result. The break( ) pattern creates a character class like regex [ ], not a substring to match. Fixed. --Snoman 11:32, 12 July 2010 (UTC)

In the same way that we have "whole string minus last character", we also need "whole string minus first character here", because there may be a separate handler within the language for removing a single leading character without needing to substring from characters 2 to end.

Markhobley 23:30, 2 June 2011 (UTC)

Substantial task changes affecting many examples

The more examples their are for a task, the more effort it takes to change the essential task goals and get all the examples updated. This task is not draft and has 60 examples. You need to weigh any change to the task definition against the ability to get most of the examples updated, and I think 60 examples is too much for a change that adds another requirement to the task description when the task description without it wasn't so bad.

What do others think? --Paddy3118 04:35, 5 June 2011 (UTC)

I agree. Way too far in to a task effort to make changes without discussion. Also, I'm not sure how many occasions there are to show a string minus the first character. In any case, we should talk about it first. --Mwn3d 05:03, 5 June 2011 (UTC)

(regarding showing a string minus the 1^st character): this is covered by the 1^st task requirement. Showing that result isn't very common, but using a string starting with the 2^nd is. -- Gerard Schildberger 18:52, 14 March 2013 (UTC)

special cases

I am wondering how many languages allow a zero length SUBSTR (or equivalent BIF).

Also, the 3^rd task requirement: if the (original) string is a null string, how many language examples would handle that case?

I've been bit in the hinder too many times on that little ditty. "But, but, but, it never should've happened ..."

-- Gerard Schildberger 19:03, 14 March 2013 (UTC)

Also, considering the PL/I version when the original string has a length of zero (a null string, if you will). What does the PL/I substr BIF do with a negative length (3^rd argument)? -- Gerard Schildberger (talk) 23:15, 5 October 2013 (UTC)

Unicode support

Most modern languages support Unicode in some form, usually UTF-16. However, none of the example code in any of these languages seems to work correctly with characters above the Basic Multilingual Plane. Why is that? Did the requirement to support all Unicode code points come after the examples were written?

Back in August 2009 when this task was added, Unicode support was much spottier than it is today. The task asks that if the the language uses UTF-8 or UTF-16, then it should support character outside the BMP, but doesn't require it to be explicitly shown. That is a failing, but the task is so old and has so many entries, it would be challenging to go back and change the requirements now. It is still kind of a crapshoot whether a particular language can seamlessly support upper plane characters though many do, and do it well; the Raku entry, for instance: (last two subtasks slightly modified as the "given letter/sequence" is no longer "given")

for 'abcdefgh', '𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍', '𠘀𠘁𠘂𠘃𠘄𠘅𠘆', '👒🎩👩‍👩‍👦‍👦🧢🎓👨‍👧‍👧' -> $str {
    my $n = 2;
    my $m = 3;
    say $str.substr($n, $m);
    say $str.substr($n);
    say $str.substr(0, *-1);
    say $str.substr($str.index($str.comb[3]), $m);
    say $str.substr($str.index($str.substr($m,$n)), $m);
}

yields

cde
cdefgh
abcdefg
def
def
𝖈𝖉𝖊
𝖈𝖉𝖊𝖋𝖌𝖍
𝖆𝖇𝖈𝖉𝖊𝖋𝖌
𝖉𝖊𝖋
𝖉𝖊𝖋
𠘂𠘃𠘄
𠘂𠘃𠘄𠘅𠘆
𠘀𠘁𠘂𠘃𠘄𠘅
𠘃𠘄𠘅
𠘃𠘄𠘅
👩‍👩‍👦‍👦🧢🎓
👩‍👩‍👦‍👦🧢🎓👨‍👧‍👧
👒🎩👩‍👩‍👦‍👦🧢🎓
🧢🎓👨‍👧‍👧
🧢🎓👨‍👧‍👧

Seamlessly and transparently handles any valid Unicode glyph. --Thundergnat (talk) 12:19, 15 August 2023 (UTC)