Talk:Reverse a string

From Rosetta Code

Extra Credit?

Does any example go for the extra credit Unicode combining characters? It seems to have been introduced [here] by Kevin Reid, but I am not sure that even his [E example] goes for the extra credit. --Paddy3118 04:35, 28 July 2009 (UTC)

Nobody's tackled it since the requirement was introduced. It's moderately tricky too IIRC, as it gets into the whole problem of normalization of strings. —Donal Fellows 08:07, 28 July 2009 (UTC)
I've cooked up something that works for the given Unicode string in Python and I have tried to make it generic, but the more I read about Unicode, the more I know I don't know :-)     --Paddy3118 08:30, 28 July 2009 (UTC)

I have found the data table, (license), that is embedded in the Python module on-line. Should I split the task and have the stretch goal as a task on its own? (parse the table if needed, reverse a unicode string using the info from the table/an internal function with the combining info). --Paddy3118 09:20, 28 July 2009 (UTC)

About my addition: Yep, it's often tricky, and that's why I said "extra credit". The thing is, if you don't do it, you get nonsense from certain Unicode strings; this will become increasingly relevant as the world drifts away from the habits of ASCII-and-a-few-extras. So I added this to spread a little Unicode-handling-awareness (though this isn't even complete: there are also e.g. bidirectional formatting markers, which need even more complicated handling). And that's why I think it shouldn't be a separate task: it's not a different problem, it's more correctness (unless the string you're reversing is not really text, in which case you're looking for binary tasks). --Kevin Reid 12:03, 28 July 2009 (UTC)

I am so relieved not to have to delve any further into Unicode. Whilst doing my background reading, I could not help but think that the margins were littered with little arrows and in much smaller text Here be Dragons. :-)   --Paddy3118 16:35, 28 July 2009 (UTC)
What unicode characters should be handled? For example in ShinTakez' resolution of the issue in R it was assumed that the unicode character meant was not either of the specials, U+FFFC or U+FFFD; is that a reasonable assumption? -Russell Pierce 18:50, 28 July 2009 (UTC)
i realised that their could be further complications, but stopped at something that could handle unicode of the type given, i.e. simple chars with optional simple composable chars.
To me, it is a reasonable assumption. In fact I've not used the example given in the task, which is a real special one beyond common usage; handling Unicode like single characters (UTF-8 encoded or whatever) should not imply handling everything. I bet a lot of examples which handle just single byte encoding won't work if the single byte encoding used would have special character like those (i.e. which should be considered tied to the next character): they require a special handling... just with reversing and few more cases. I would change the example string just to stress the ability to handle multibyte encodings, rather than special composed characters in whichever single byte or multibyte encodings --ShinTakezou 22:38, 28 July 2009 (UTC)
Hi, their are two multibyte characters in the example given, as I get the hex values of the characters as being: '61', '73', '20dd', '64', '66', and '305'. --Paddy3118 01:50, 29 July 2009 (UTC)
It's right; but they are both "combining" (combining enclosing circle and combining overline); the problem is in their "combining" ability, since they should be considered altogether with the character they combine with. When one reverts them just as common (multibyte) characters, the combination changes... --ShinTakezou 10:22, 29 July 2009 (UTC)

What is needed to get the extra credit

Oh wait, I had forgot the years of heartache in the Python community before we got this far with Unicode. What you might need to get anywhere with the stretch goal would be:

  1. Handle UTF-8 character strings
  2. Handle multibyte unicode strings
  3. Treat composed, possibly multibyte characters as an entity when reversing.