Unicode strings: Difference between revisions
Content added Content deleted
Langurmonkey (talk | contribs) |
Thundergnat (talk | contribs) (Rename Perl 6 -> Raku, alphabetize, minor clean-up) |
||
Line 543: | Line 543: | ||
}</lang> |
}</lang> |
||
=={{header|C#}}== |
=={{header|C sharp|C#}}== |
||
In C#, the native string representation is actually determined by the Common Language Runtime. In CLR, the string data type is a sequence of char, and the char data type represents a UTF-16 code unit. The native string representation is essentially UTF-16, except that strings can contain sequences of UTF-16 code units that aren't valid in UTF-16 if the string contains incorrectly-used high and low surrogates. |
In C#, the native string representation is actually determined by the Common Language Runtime. In CLR, the string data type is a sequence of char, and the char data type represents a UTF-16 code unit. The native string representation is essentially UTF-16, except that strings can contain sequences of UTF-16 code units that aren't valid in UTF-16 if the string contains incorrectly-used high and low surrogates. |
||
Line 592: | Line 592: | ||
Strings are UTF-16. |
Strings are UTF-16. |
||
=={{header|Elena}}== |
=={{header|Elena}}== |
||
ELENA supports both UTF8 and UTF16 strings, Unicode identifiers are also supported: |
ELENA supports both UTF8 and UTF16 strings, Unicode identifiers are also supported: |
||
Line 670: | Line 671: | ||
Unicode is built-in in Haskell, so it can be used in strings and functions names. |
Unicode is built-in in Haskell, so it can be used in strings and functions names. |
||
=={{header|J}}== |
=={{header|J}}== |
||
Line 747: | Line 747: | ||
Starting in J2SE 5 (1.5), Java has fairly convenient methods for dealing with true Unicode characters, even supplementary ones. Many methods that deal with characters have versions for both <code>char</code> and <code>int</code>. For example, <code>String</code> has the <code>codePointAt</code> method, analogous to the <code>charAt</code> method. |
Starting in J2SE 5 (1.5), Java has fairly convenient methods for dealing with true Unicode characters, even supplementary ones. Many methods that deal with characters have versions for both <code>char</code> and <code>int</code>. For example, <code>String</code> has the <code>codePointAt</code> method, analogous to the <code>charAt</code> method. |
||
How broad/deep does the language support Unicode? What encodings (e.g. UTF-8, UTF-16, etc) can be used? Normalization? |
How broad/deep does the language support Unicode? What encodings (e.g. UTF-8, UTF-16, etc) can be used? Normalization? |
||
=={{header|jq}}== |
=={{header|jq}}== |
||
Line 855: | Line 855: | ||
> (unicode:characters_to_list encoded 'utf8) |
> (unicode:characters_to_list encoded 'utf8) |
||
"åäö ð" |
"åäö ð" |
||
</lang> |
</lang> |
||
=={{header|Lingo}}== |
=={{header|Lingo}}== |
||
Line 918: | Line 918: | ||
[[File:Unicode print locomotive basic.png]] |
[[File:Unicode print locomotive basic.png]] |
||
=={{header|M2000 Interpreter}}== |
=={{header|M2000 Interpreter}}== |
||
* How easy is it to present Unicode strings in source code? |
* How easy is it to present Unicode strings in source code? |
||
Line 1,055: | Line 1,056: | ||
The internal Unicode string representation is plane Unicode without any encodings. Two builtin functions support UTF-8 encoding ('string->bytes' and 'bytes->string'). |
The internal Unicode string representation is plane Unicode without any encodings. Two builtin functions support UTF-8 encoding ('string->bytes' and 'bytes->string'). |
||
=={{header|Perl}}== |
=={{header|Perl}}== |
||
Line 1,075: | Line 1,075: | ||
However, when your program interacts with the environment, you may still run into tricky spots if you have incompatible locale settings or your OS is not using unicode; that's not what Perl has control over, unfortunately. |
However, when your program interacts with the environment, you may still run into tricky spots if you have incompatible locale settings or your OS is not using unicode; that's not what Perl has control over, unfortunately. |
||
⚫ | |||
⚫ | Perl 6 programs and strings are all in Unicode and operate at a grapheme abstraction level, which is agnostic to underlying encodings or normalizations. (These are generally handled at program boundaries.) Opened files default to UTF-8 encoding. All Unicode character properties are in play, so any appropriate characters may be used as parts of identifiers, whitespace, or user-defined operators. For instance: |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | Perl 6 intends to support Unicode even better than Perl 5, which already does a great job in recent versions of accessing large swaths of Unicode spec. functionality. Perl 6 improves on Perl 5 primarily by offering explicitly typed strings that always know which operations are sensical and which are not. |
||
=={{header|Phix}}== |
=={{header|Phix}}== |
||
Line 1,186: | Line 1,162: | ||
* Racket includes additional related functionality, like some Unicode functions (normalization etc), and IO encoding based on iconv to do IO of many other encodings. |
* Racket includes additional related functionality, like some Unicode functions (normalization etc), and IO encoding based on iconv to do IO of many other encodings. |
||
⚫ | |||
(formerly Perl 6) |
|||
⚫ | Perl 6 programs and strings are all in Unicode and operate at a grapheme abstraction level, which is agnostic to underlying encodings or normalizations. (These are generally handled at program boundaries.) Opened files default to UTF-8 encoding. All Unicode character properties are in play, so any appropriate characters may be used as parts of identifiers, whitespace, or user-defined operators. For instance: |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | Perl 6 intends to support Unicode even better than Perl 5, which already does a great job in recent versions of accessing large swaths of Unicode spec. functionality. Perl 6 improves on Perl 5 primarily by offering explicitly typed strings that always know which operations are sensical and which are not. |
||
=={{header|REXX}}== |
=={{header|REXX}}== |