Talk:String Byte Length: Difference between revisions

no edit summary
(deleted bad C wchar_t example)
 
No edit summary
 
(2 intermediate revisions by 2 users not shown)
Line 1:
The C exampleand C++ examples of finding the length of a string of wide characters (wchar_t) waswere just plain wrong, so I deleted itthem. Regardless of string length, the exampleexamples would always show 16 because it was computing against the length of a ''pointer'' to a wchar_t, rather than the length of the string. --[[User:139.85.252.186|139.85.252.186]] 1617:5700, 23 April 2007 (EDT)
: I've undone the delete of the C++ example. While you were right about the C example (except it will show the product of a pointer size and a wchar_t size, which is not necessarily 16, but depends on the CPU and OS), the C++ example does not even contain a pointer (except hidden somewhere in the internals of C++), and std::wstring::length() returns the number of characters, not the size of a pointer (which would be quite pointless anyway). --[[User:Ce|Ce]] 17:40, 23 April 2007 (EDT)
 
 
The task should clarify what "byte length" mean. For C, with its
traditional representation of strings and notion of mulitbyte
encodings byte length makes perfect sense. But for other
string representations that is not clear: do we mean amount
of storage taken by string? However, C string length is
smaller than storage use (because of null terminator) and
other languages frequently add extra data to strings: tags,
count giving current length, count giving capacity. Also,
there is separate task devoted to size of variables, so
I would prefer avoid question of storage size here.
 
One can talk about "payload size": storage ocupated by characters
itself, ommiting control information. But if language uses
sophisticated reprezentation of string it may be quite difficult
to separate payload form control information.
 
IMHO the most sensible formulation (and having most practical
applications!) is to determine byte length of the printed
representation of the string in some external byte-oriented
encoding (say UTF-8). If one uses such interpretation then
many of current solutions will be incorrect.
Anonymous user