Talk:String case: Difference between revisions

m
 
(6 intermediate revisions by 3 users not shown)
Line 8:
:I disagree: this is a question on the abstraction level the language provides.On a high level, a string is a collection of characters, and I really do not care how it is encoded internally. I may care when talking to the outside world via file or socket. On a low level, this is a sequence of bytes, which have to be interpreted according to a rule in order to know which character is represented. If a programming language mixes those two, you are in trouble, as you need to know the encoding in order for a string to be interpreted. Higher level String datatypes should hide this (separate those two) and provide conversions. Smalltalk, Java, JS and many others do it. In Smalltalk, for example, I would write "(CharacterEncoder encoderFor:#'iso8859-5') encode: 'someString')" to get a string's particlar encoding. The result is conceptionally no longer a sequence of characters, but a sequence of bytes which represent those in that particular encoding. So, as soon as you ask for a particular encoding to be part of the task, we actually no longer talk about the language's String implementation and capabilities, but instead about the language's byte-collection support. Of course, I see the problem that in many low level languages, these are the same. [[User:Cg|Cg]] 09:37, 25 January 2013 (UTC)
 
::Conceptually, though, the issue of representing fat unicode characters is not much different from the issue of supporting different numeric types (consider, especially, the distinction between <code>signed char</code> and <code>float</code> in C). The significant differences between numeric types and character types are the conversion process, along with the issue that most non-ascii characters are represented by a sequence of bytes in utf-8 rather than a single byte. That said, in this context we are not talking about which storage formats the language uses to represent unicode characters - any language that can represent bytes can represent sequences of unicode characters. And we do not know, without knowing the language, whether 'String' represents a unicode type, or an ascii type or whether it even exists in a particular langauge. In other words, I am inclined to consider Short Circuit's point of view to be more relevant here than Cg's disagreement. Still, I agree with both that expanding this to unicode will significantly increase the complexity of the task. Simply representing the translation between upper and lower case, in a language which does not implement that for you, will be bulky. --[[User:Rdm|Rdm]] 13:24, 25 January 2013 (UTC)
 
== C++ ==
Line 14:
 
: The current C++ code includes <algorithm>, which is an STL header? --[[User:Kernigh|Kernigh]] 02:17, 27 September 2011 (UTC)
 
== Unicode ==
 
I suggest to add an example to the task to show the effect of case change, in both directions, on Unicode characters. I added an example in the Stata task in ancient greek, and the result is not perfect. I know it's not either in other languages, and I suspect it depends on the underlying implementation of Unicode, but not only: Python seems to behave like Stata, but the Notepad++ text editor does not. As a side note, the example is the first sentence of the [https://en.wikipedia.org/wiki/Book_of_Genesis Book of Genesis].
[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 09:29, 3 September 2017 (UTC)
: Good example – in the (traditionally Aramaic) lettering of the original Hebrew of that sentence, 'upper case' is not defined. [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 09:39, 3 September 2017 (UTC)
:: I expect that unicode case handling would belong in a different task, and would also tend to be language specific (depending on the significance of case for the task in question). See http://unicode.org/faq/casemap_charprop.html for some of the issues. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 10:08, 3 September 2017 (UTC)
6,951

edits