Talk:Determine if a string has all the same characters: Difference between revisions

m
no edit summary
(→‎What is a character?: my own two cents)
mNo edit summary
 
(2 intermediate revisions by 2 users not shown)
Line 65:
:<blockquote>''Quote'' As an aside, I always thought of ''Thundergnat'' as quite the character, but he is much more than 8 bits... ''End Quote''</blockquote>
:I resemble that remark. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]) 23:47, 30 October 2019 (UTC)
 
::As far as unicode is concerned and bearing in mind it's not needed for the 'compulsory' examples which Gerard has set anyway, I agree with the thrust of what Thundergnat has said that a 'character' should be defined in whatever way seems most natural for the language you're using.
 
::In the case of Go, a character (or 'rune' as we prefer to call it) is simply a unicode code point expressed as a 4 byte integer. String literals are encoded as UTF-8 and are not normalized by default (though there is a supplemental package which can do this). Consequently, an accented character is not the same as the corresponding unaccented character plus the accent. Also, unlike Perl 6, there appears to be no easy way to deal with emoji ZWJ sequences at the present time. I've therefore had to be careful in the Go examples to only use emojis which are complete in themselves. --[[User:PureFox|PureFox]] ([[User talk:PureFox|talk]]) 17:11, 31 October 2019 (UTC)
:::Ok, I'm fine with that. It means that different program will give different results for the same input, but it seems to be the consensus, and we are not going to reimplement ICU, nor to dumb down languages which are able to deal with Unicode. By the way, the langages I use (Python, R, Stata mostly) don't normalize either by default. [[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 18:23, 31 October 2019 (UTC)
1,336

edits