Unicode strings: Difference between revisions

Line 543:

}</lang>

=={{header|C#}}==

=={{header|C sharp|C#}}==

In C#, the native string representation is actually determined by the Common Language Runtime. In CLR, the string data type is a sequence of char, and the char data type represents a UTF-16 code unit. The native string representation is essentially UTF-16, except that strings can contain sequences of UTF-16 code units that aren't valid in UTF-16 if the string contains incorrectly-used high and low surrogates.

Line 592:

Strings are UTF-16.

=={{header|Elena}}==

ELENA supports both UTF8 and UTF16 strings, Unicode identifiers are also supported:

Line 670:

Line 671:

Unicode is built-in in Haskell, so it can be used in strings and functions names.

=={{header|J}}==

Line 747:

Starting in J2SE 5 (1.5), Java has fairly convenient methods for dealing with true Unicode characters, even supplementary ones. Many methods that deal with characters have versions for both <code>char</code> and <code>int</code>. For example, <code>String</code> has the <code>codePointAt</code> method, analogous to the <code>charAt</code> method.

How broad/deep does the language support Unicode? What encodings (e.g. UTF-8, UTF-16, etc) can be used? Normalization?

=={{header|jq}}==

Line 855:

> (unicode:characters_to_list encoded 'utf8)

"åäö ð"

</lang>

=={{header|Lingo}}==

Line 918:

[[File:Unicode print locomotive basic.png]]

=={{header|M2000 Interpreter}}==

* How easy is it to present Unicode strings in source code?

Line 1,055:

Line 1,056:

The internal Unicode string representation is plane Unicode without any encodings. Two builtin functions support UTF-8 encoding ('string->bytes' and 'bytes->string').

=={{header|Perl}}==

Line 1,075:

However, when your program interacts with the environment, you may still run into tricky spots if you have incompatible locale settings or your OS is not using unicode; that's not what Perl has control over, unfortunately.

⚫

=={{header|~~Perl 6~~}}==

⚫

Perl 6 programs and strings are all in Unicode and operate at a grapheme abstraction level, which is agnostic to underlying encodings or normalizations. (These are generally handled at program boundaries.) Opened files default to UTF-8 encoding. All Unicode character properties are in play, so any appropriate characters may be used as parts of identifiers, whitespace, or user-defined operators. For instance:

⚫

<lang perl6>sub prefix:<∛> (\𝐕) { 𝐕 ** (1/3) }

⚫

say ∛27; # prints 3</lang>

⚫

Non-Unicode strings are represented as Buf types rather than Str types, and Unicode operations may not be applied to Buf types without some kind of explicit conversion. Only ASCIIish operations are allowed on buffers.

⚫

Perl 6 tracks the Unicode consortium standards releases and is generally up to the latest

⚫

standard within a month or so of its release. (currently at 12.1 as of Nov. 2019)

⚫

* Supports the normalized forms NFC, NFD, NFKC, and NFKD, and character equivalence as specified in [http://unicode.org/reports/tr15/ Unicode technical report #15].

⚫

* Built-in routines provide access to character classifications (Letter, Numeric. White-space, etc) and sub-classifications: (Letter-lowercase, Letter-uppercase, Numeric-digit, etc.)

⚫

* Allows users to use any Unicode character that has a Numeric property '''as''' that numeric value.

⚫

* Provides Unicode aware upper-case, lower-case and fold-case routines.

⚫

* Implements the [https://unicode.org/reports/tr10/ Unicode technical standard #10] collation algorithm, (though not all optional mappings are supported yet).

⚫

* Provides built-in routines to access character names, do name-to-character character-to-ordinal and ordinal-to-character conversions.

⚫

* Works seamlessly with upper plane and private use plane character codepoints.

⚫

* Provides tools to deal with strings that contain invalid Unicode charters.

⚫

In general, it tries to make dealing with Unicode "just work".

⚫

Perl 6 intends to support Unicode even better than Perl 5, which already does a great job in recent versions of accessing large swaths of Unicode spec. functionality. Perl 6 improves on Perl 5 primarily by offering explicitly typed strings that always know which operations are sensical and which are not.

=={{header|Phix}}==

Line 1,186:

Line 1,162:

* Racket includes additional related functionality, like some Unicode functions (normalization etc), and IO encoding based on iconv to do IO of many other encodings.

⚫

=={{header|Raku}}==

(formerly Perl 6)

⚫