Talk:Reverse a string: Difference between revisions

→‎Extra Credit: new section
(→‎Extra Credit: new section)
 
(4 intermediate revisions by 4 users not shown)
Line 33:
 
:Woa. Normally when handling text ASCII is assumed and unicode must be asked for. The 'normal credit' task does not mention unicode, and so answers assuming ASCII are correct. In your extra credit extension, you give characters that need unicode and so to get this extra credit, unicode handling is required. I don't think you should now force the 'normal credit' task to require unicode. --[[User:Paddy3118|Paddy3118]] 22:29, 29 July 2009 (UTC)
 
Um. If your character set includes Unicode, a reversing routine should handle it. If your character set does not include Unicode, the reversing routine need not handle it. --[[User:Kevin Reid|Kevin Reid]] 00:41, 30 July 2009 (UTC)
 
== Notes about Unicode combining characters ==
 
[[Ruby]] has the regular expression <tt>/\p{M}/</tt> which matches a combining mark. With this expression, I might be able to reverse a string while preserving the combining marks.
 
# The most relevant parts of [http://www.unicode.org/versions/Unicode6.0.0/ Unicode 6.0.0] seem to be section 3.6 "Combination" and section 3.12 "Conjoining Jamo Behavior".
# I am not yet certain whether to preserve "combining character sequences" or "grapheme clusters". My best guess for now is to preserve the combining character sequences (CCS), not the grapheme clusters.
# The regular expression for a CCS-or-char might look like <tt>/(?>#{base}\p{M}*|\p{M}+|.)/</tt> where <tt>#{base}</tt> is whatever regular expression matches a base character or extended base. The <tt>?></tt> prevents backtracking, so the regexp always matches the longest possible CCS.
# I need some way with Ruby to comb a string for all matches of a regular expression. For example, with <tt>/[aeiou]./</tt> and <tt>"Rosetta Code"</tt>, I want <tt>["os", "et", "a ", "od"]</tt>. Then I would comb a string for CCS-or-char, reverse the array, join.
# Korean hangul is a special case. A group of 2 or 3 jamo characters might form an extended base (a syllable with an leading consonant, a medial vowel and perhaps a trailing consonant. Because a CCS may contain an extended base, I need some way to group jamos.
# I probably want a Korean test string. I must enter this string with jamo characters, not syllable characters, to test the code to group jamos.
# If EUC-KR has jamo characters, then the code should work with both EUC-KR and UTF-8.
# Avoid normalization. A normalization to NFC would replace some CCS with individual characters, but Unicode does not have individual characters for every possible sequence.
# Do I have a library that already does some of this?
 
--[[User:Kernigh|Kernigh]] 04:00, 31 January 2011 (UTC)
 
:Have you considered [http://unicode.org/reports/tr9/ directionality]? --[[User:Rdm|Rdm]] 04:13, 31 January 2011 (UTC)
 
== Extra Credit ==
 
The extra credit task seems artificial.
 
Specifically, it's using unicode, and I can see that demonstrating unicode handling in string reversal could be a good thing.
 
However, a close look at the characters involved in the extra credit task shows that two of the character codes in the "reversed string" are ''required to be '''not reversed''''' from the order in which they appear in the original string. And while that could indeed be an interesting task, I do not think that it belongs in the "Reverse a string" task.
 
The task that would be appropriate to this example might perhaps be better labeled "manipulate a string while retaining structures implied by the use of some contained unicode characters" though perhaps a shorter name is possible.
 
--[[User:Rdm|Rdm]] 14:50, 22 April 2011 (UTC)
6,951

edits