Jump to content

Unicode strings: Difference between revisions

m
(→‎Tcl: Added description)
m (→‎{{header|Tcl}}: More notes)
Line 100:
 
=={{header|Tcl}}==
All characters in Tcl are ''always'' Unicode characters, with ordinary string operations (as listed elsewhere on Rosetta Code) always performed on Unicode. Input and output characters are translated from and to the system's native encoding automatically (with this being able to be overridden on a per file-handle basis via <code>fconfigure -encoding</code>). Source files can be written in encodings other than the native encoding — from Tcl 8.5 onwards, the encoding to use for a file can be controlled by the <code>-encoding</code> option to [[tclsh]], [[wish]] and <code>source</code> — though it is usually recommended that programmers maximize their portability by writing in the ASCII subset and using the <code>\uXXXX</code> escape sequence for all other characters. Tcl does ''not'' handle byte-order marks by default, because that requires deeper understanding of the application level (and sometimes the encoding information is available in metadata anyway, such as when handling HTTP connections).
 
The way in which characters are encoded in memory is not defined by the Tcl language (the implementation uses byte arrays, UTF-16 arrays and UCS-2 strings as appropriate) and the only characters with any restriction on use as command or variable names are the ASCII parenthesis and colon characters. However, the <code>$var</code> shorthand syntax is much more restricted (to ASCII alphanumeric plus underline only); other cases have to use the more verbose form: <code>[set funny–var–name]</code>.
Anonymous user
Cookies help us deliver our services. By using our services, you agree to our use of cookies.