Unicode strings: Difference between revisions

Content deleted Content added
{{header|ZX Spectrum Basic}}
Rdm (talk | contribs)
No edit summary
Line 1: Line 1:
{{draft task}}
{{draft task}}
Demonstrate how one is expected to handle Unicode strings. Some example considerations: can a Unicode string be directly written in the source code? How does one do IO with unicode strings? Can these strings be manipulated easily? Can non-ASCII characters be used for keywords/identifiers/etc? What encodings (UTF-8, UTF-16, etc) can your language accept without much trouble?
Demonstrate how one is expected to handle Unicode strings. Some example considerations: can a Unicode string be directly written in the source code? How does one do IO with unicode strings? Can these strings be manipulated easily? Can non-ASCII characters be used for keywords/identifiers/etc? What encodings (UTF-8, UTF-16, etc) can your language accept without much trouble?
=={{header|J}}==

Unicode characters can be represented directly in J strings:

<lang j> '♥♦♣♠'
♥♦♣♠</lang>

By default, they are represented as utf-8:

<lang j> #'♥♦♣♠'
12</lang>

However, they can be represented as utf-16 instead:

<lang j> 7 u:'♥♦♣♠'
♥♦♣♠
#7 u:'♥♦♣♠'
4</lang>

These forms are not treated as equivalent:

<lang> '♥♦♣♠' -: 7 u:'♥♦♣♠'
0</lang>

I/O uses characters in whatever format they happen to be in.

See also: http://www.jsoftware.com/help/dictionary/duco.htm

Unicode characters are not legal tokens or names, within current versions J.

=={{header|Perl}}==
=={{header|Perl}}==
In Perl, "Unicode" means "UTF-8". If you want to include utf8 characters in your source file, unless you have set <code>PERL_UNICODE</code> environment correctly, you should do<lang Perl>use utf8;</lang> or you rick the parser treating the file as raw bytes.
In Perl, "Unicode" means "UTF-8". If you want to include utf8 characters in your source file, unless you have set <code>PERL_UNICODE</code> environment correctly, you should do<lang Perl>use utf8;</lang> or you rick the parser treating the file as raw bytes.