Talk:Determine if a string has all the same characters: Difference between revisions

m
no edit summary
mNo edit summary
mNo edit summary
Line 2:
For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much worse with [https://en.wikipedia.org/wiki/Unicode_equivalence Unicode equivalence] it is. How are languages with Unicode support expected to deal with this?
 
Wikipedia gives the example of the character "ñ" which can be encoded by U+00F1, or alternately U+006E followed by U+0303 (which is "ñ"). In Python, the latter would be a two "characters" string by default, which could be normalized with the unicodedata.normalize function. However, Notepad++ or MS Word correctly print both as ñ.
 
And while we are at it, note that, while "EEE" is a string which ''has all the same characters'', "EΕЕ" is not.
1,336

edits