Unicode strings: Difference between revisions

m (+Stata (not done yet))
Line 1,322:
 
=={{header|Stata}}==
 
See ''[https://www.stata.com/features/overview/unicode/ Unicode support]'' on Stata web site. See also the help on [https://www.stata.com/help.cgi?unicode Unicode utilities]. Unicode support was added in Stata 14.
 
# How easy is it to present Unicode strings in source code?
:One can include any Unicode character in the source code. Code is stored as UTF-8 text files with extension .do, .ado or .mata. The ''Output window'' can print Unicode characters as well.
# Can Unicode literals be written directly, or be part of identifiers/keywords/etc?
:Yes. Unicode literals can be part of variable names (in all places : datasets, scalar and matrix variables, and Mata variables).
# How well can the language communicate with the rest of the world?
:Stata datasets (extension .dta) are stored in UTF-8. I/O with CSV files can use any encoding supported by Java (see the list [https://docs.oracle.com/en/java/javase/11/intl/supported-encodings.html here]).
# Is it good at input/output with Unicode?
:Yes.
# Is it convenient to manipulate Unicode strings in the language?
:Stata has string functions to manipulate Unicode strings. It also has legacy functions to manipulate strings as byte sequences: the unicode flavor is prefixed by "u". For instance, ''strtrim'' for the ASCII function and ''ustrtrim'' for the Unicode function.
# How broad/deep does the language support Unicode?
:Unicode support is good. There is one missing function: while it's easy to get the character from the numeric value of a Unicode code point, with the [https://www.stata.com/help.cgi?uchar() uchar] function, the converse is not easy. However, it's possible to convert a Unicode string to ''escaped'' hex values, e.g. <code>ustrtohex("Ж")</code> returns "\u0416", and the converse operation is done with [https://www.stata.com/help.cgi?ustrunescape() ustrunescape].
# What encodings (e.g. UTF-8, UTF-16, etc) can be used?
:Data and code are stored in UTF-8. I/O with CSV data files can be done in any encoding supported by Java, which includes UTF-8, UTF-16 and UTF-32.
# Does it support normalization?
:Yes. See the help for the [https://www.stata.com/help.cgi?ustrnormalize() ustrnormalize] function. It supports the NFC, NFD, NFKC, NFKD and NFKCC forms.
 
=={{header|Tcl}}==
1,336

edits