Unicode variable names

From Rosetta Code
Revision as of 13:37, 26 April 2015 by rosettacode>Axtens (8th, point 2)
Task
Unicode variable names
You are encouraged to solve this task according to the task description, using any language you may know.
  1. Describe, and give a pointer to documentation on your languages use of characters beyond those of the ASCII character set in the naming of variables.
  2. Show how to:
  • Set a variable with a name including the 'Δ', (delta character), to 1
  • Increment it
  • Print its value.
Cf.

8th

<lang forth> 1 var, Δ

Δ @ n:1+ Δ !

Δ @ . cr

\ unicode silliness

念 ' G:@ w:exec ;
店 ' G:! w:exec ;
ਵਾਧਾ ' n:1+ w:exec ;
الوداع ' G:bye w:exec ;
キャリッジリターン ' G:cr w:exec ;
प्रिंट ' G:. w:exec ;

Δ 念 ਵਾਧਾ Δ 店

Δ 念 प्रिंट キャリッジリターン الوداع

</lang>

ACL2

Variables in ACL2 cannot be modified in place. <lang Lisp>(let ((Δ 1))

    (1+ Δ))</lang>

Ada

As of Ada 2005, all source code can be made of up to 32bit characters. Unless you have made it a default, GNAT would require the -gnatW8 flag to understand you are using UTF8 for the code below, other encodings are possible. <lang Ada>with Ada.Text_IO; procedure main is

  Δ : Integer; 

begin

  Δ := 41;
  Δ := Δ + 1;
  Ada.Text_IO.Put_Line (Δ'Img);

end main;</lang>

Output:
 42

AutoHotkey

The earlier version of AutoHotkey (AutoHotkey Basic) will produce an error since it doesn't support Unicode. It is perfectly working in AutoHotkey_L Unicode (Lexikos Custom Build). Documentation: http://www.autohotkey.net/~Lexikos/AutoHotkey_L/docs/Variables.htm

Works with: AutoHotkey_L

<lang ahk>Δ = 1 Δ++ MsgBox, % Δ</lang>

Bracmat

Bracmat allows any sequence of non-zero bytes as symbol and therefore, as variable name. Even the empty string is a variable, though a special one. If a symbol/variable name contains characters that have special meaning (operators, prefixes, parentheses, braces and the semicolon) it may be necessary to enclose it in quotes. Other special characters must be escaped C-style. See bracmat.html in the git-repo. The example below requires a terminal that supports UTF-8 encoded characters. <lang bracmat>( (Δ=1) & 1+!Δ:?Δ & out$("Δ:" !Δ) );</lang> Output:

Δ: 2

C

C has limited support for Unicode in variable names, see Annex D of the C standard.

C#

Section 2.4.2 of the C# Language Specification gives rules for identifiers. They correspond exactly to those recommended by the Unicode Standard Annex 31, except that underscore is allowed as an initial character (as is traditional in the C programming language), Unicode escape sequences are permitted in identifiers, and the "@" character is allowed as a prefix to enable keywords to be used as identifiers. <lang csharp>class Program {

   static void Main()
   {
       var Δ = 1;
       Δ++;
       System.Console.WriteLine(Δ);        
   }

}</lang>

Output:
2

Clojure

According to the current documentation, one should stick to naming with alphanumeric characters and *, +, !, -, _, and ? to avoid possible problems if future versions of Clojure decide to apply special meaning to a character.

That being said, it is not currently enforced, so while you probably shouldn't, you technically can.

<lang clojure>(let [Δ 1]

 (inc Δ))</lang>
Output:
2

Common Lisp

<lang lisp>(let ((Δ 1))

 (incf Δ))</lang>
Output:
2

D

D source files support four character encodings: ASCII, UTF-8, UTF-16 and UTF-32. <lang d>import std.stdio;

void main() {

   auto Δ = 1;
   Δ++;
   writeln(Δ);

}</lang>

Output:
2

You can use any of the following:

   Letters,
   digits,
   underscore (_),
   code points >= \u00A0 and < \uD800,
   code points > \uDFFF.

However, the following cannot be used:

   \u0024 ($),
   \u0040 (@) and
   \u0060 (`).

See: http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD

Delphi

For more information about naming identifiers (including variables) visit: Identifiers in Delphi <lang Delphi>(* Compiled with Delphi XE *) program UnicodeVariableName;

{$APPTYPE CONSOLE}

uses

 SysUtils;

var

 Δ: Integer;

begin

 Δ:= 1;
 Inc(Δ);
 Writeln(Δ);
 Readln;

end.</lang>

Déjà Vu

<lang dejavu>set :Δ 1 set :Δ ++ Δ !. Δ</lang>

DWScript

<lang Delphi>var Δ : Integer;

Δ := 1; Inc(Δ); PrintLn(Δ);</lang>

Emacs Lisp

<lang Lisp>(setq Δ 1) (setq Δ (1+ Δ)) (message "Δ is %d" Δ)</lang>

Variables are symbols and symbol names can be any string. Source code .el files can have all usual Emacs coding system specifications to give variables in non-ASCII.

The byte compiler writes utf-8 (or past versions wrote emacs-mule) into .elc so that any mixture of non-ASCII is preserved.

F#

As with C# the F# Language Specification refers to Unicode Standard Annex #31 for identifier syntax, allowing Unicode letter characters. <lang fsharp>let mutable Δ = 1 Δ <- Δ + 1 printfn "%d" Δ</lang>

Forth

Historically, Forth has worked only in ASCII (going so far as to reserve the eighth bit for symbol smudging), but some more modern implementations have extended character set support such as UTF-8.

Works with: GNU Forth

0.7.0

<lang forth>variable ∆ 5 ∆ ! ∆ @ .</lang>

Go

Go source encoding is specified to be UTF-8. Allowable variable names are specified in the sections identifiers and Exported identifiers. <lang go>package main

import "fmt"

func main() {

   Δ := 1
   Δ++
   fmt.Println(Δ)

}</lang>

Output:
2

Haskell

Haskell variables must start with a lower case character, however Δ is an upper case delta. As such, lower case delta (δ) was used as the first character instead, followed by an upper case delta as the second character in the variable name.

Also, Haskell does not allow mutable variables, so incrementing delta isn't possible. Instead lower case psi was used to store the incremented value of delta since tridents are cool. <lang Haskell>main = print ψ

   where δΔ = 1
         ψ = δΔ + 1</lang>

J

Variable names must be comprised of ASCII characters.

From the Dictionary page Alphabet and Words:

"The alphabet is standard ASCII, comprising digits, letters (of the English alphabet), the underline (used in names and numbers), ..."
"Names ... begin with a letter and may continue with letters, underlines, and digits."

Java

<lang java>int Δ = 1; double π = 3.141592; String 你好 = "hello"; Δ++; System.out.println(Δ);</lang>

Output:
2

JavaScript

<lang javascript>var ᾩ = "something"; var ĦĔĽĻŎ = "hello"; var 〱〱〱〱 = "too less"; var जावास्क्रिप्ट = "javascript"; // ok that's JavaScript in hindi var KingGeorgeⅦ = "Roman numerals.";

console.log([ᾩ, ĦĔĽĻŎ, 〱〱〱〱, जावास्क्रिप्ट, KingGeorgeⅦ])</lang>

Output:
["something", "hello", "too less", "javascript", "Roman numerals."]

Julia

The Julia documentation on allowed variable names explicitly describes the wide variety of Unicode codepoints that are allowed: <lang Julia>julia> Δ = 1 ; Δ += 1 ; Δ 2</lang> The allowed identifiers also include sub/superscripts and combining characters (e.g. accent marks): <lang julia>julia> Δ̂₂ = Δ^2 4</lang> and the Julia interactive shells (and many editors) allow typing these symbols via tab-completion of their LaTeX abbreviations.

LOLCODE

The spec mandates that identifiers be alphanumeric. However, the fact that YARNs are Unicode-aware permits the use of the SRS operator introduced in 1.3 to utilize variables of arbitrary name. <lang LOLCODE>I HAS A SRS "Δ" ITZ 1 SRS "Δ" R SUM OF SRS ":(394)" AN 1 VISIBLE SRS ":[GREEK CAPITAL LETTER DELTA]"</lang>

Output:
2

Lua

<lang Lua>local unicode = {} unicode["Für"] = "for" print(unicode["Für"])

unicode["garçon"] = "boy" print(unicode["garçon"])

unicode["∆"]=1 print(unicode["∆"])</lang>

Mathematica

<lang Mathematica>Δ = 1; Δ++; Print[Δ]</lang>

Nemerle

From the Nemerle Reference Manual: "Programs are written using the Unicode character set, using the UTF-8 encoding." <lang Nemerle>using System.Console;

module UnicodeVar {

   Main() : void
   {
       mutable Δ = 1;
       Δ++;
       WriteLine($"Δ = $Δ");
   }

}</lang>

NetRexx

The NetRexx Language Definition section of the NetRexx documentation (netrexx.org/files/nrl3.pdf) describes the character set support within the language. <lang NetRexx>/* NetRexx */ options replace format comments java crossref symbols nobinary

upperΔ = 1 Δupper = upperΔ lowerδ = 2 δlower = lowerδ

say upperΔ '+' Δupper '= \-' upperΔ = upperΔ + Δupper say upperΔ

say lowerδ '+' δlower '= \-' lowerδ = lowerδ + δlower say lowerδ say

-- Unicode works with the NetRexx built-in functions Υππερ = '\u0391'.sequence('\u03a1') || '\u03a3'.sequence('\u03a9') -- ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ Λοωερ = '\u03b1'.sequence('\u03c1') || '\u03c3'.sequence('\u03c9') -- αβγδεζηθικλμνξοπρστυφχψω say Υππερ'.Lower =' Υππερ.lower() say Λοωερ'.Upper =' Λοωερ.upper() say

-- Note: Even with unicode characters NetRexx variables are case-insensitive numeric digits 12 δ = 20.0 π = Math.PI θ = Π * Δ σ = Θ ** 2 / (Π * 4) -- == Π * (Δ / 2) ** 2 say 'Π =' π', diameter =' δ', circumference =' Θ', area =' Σ

return </lang> Output:

1 + 1 = 2
2 + 2 = 4

ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ.Lower = αβγδεζηθικλμνξοπρστυφχψω
αβγδεζηθικλμνξοπρστυφχψω.Upper = ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ

Π = 3.141592653589793, diameter = 20.0, circumference = 62.8318530718, area = 314.159265359

Nim

From the spec: http://nim-lang.org/docs/manual.html#identifiers-keywords

<lang Nimrod>var Δ = 1 Δ.inc() echo(Δ)</lang>

Objeck

As of 3.2, Objeck supports UTF-8 encoded I/O and stores characters in the runtime's native Unicode format. <lang objeck> class Test {

 function : Main(args : String[]) ~ Nil {
   Δ := 1;
   π := 3.141592;
   你好 := "hello";
   Δ += 1;
   Δ->PrintLine();
 }

} </lang>

PARI/GP

GP accepts only ASCII in strings and variable names.

PARI supports Unicode variable names only insofar as C does.

Perl

Requires Perl 5.8.1 at the minimum. See http://perldoc.perl.org/utf8.html

<lang perl>use utf8;

my $Δ = 1; $Δ++; print $Δ, "\n";</lang>

$ sigil can be omitted by using lvalue subroutine:

<lang perl>use utf8;

BEGIN {

   my $val;
   sub Δ () : lvalue {
       $val;
   }

}

Δ = 1; Δ++; print Δ, "\n";</lang>

or with Perl 5.10 and state modifier:

<lang perl>use utf8; use v5.10;

sub Δ () : lvalue {

   state $val;

}

Δ = 1; Δ++; say Δ;</lang>

One can have Unicode in identifier or subroutine names and also in package or class names. Use of Unicode for the last two purposes is, due to file and directory names, dependent on the filesystem.

Perl 6

Perl 6 is written in Unicode so, with narrow restrictions, nearly any Unicode letter can be used in identifiers.

See Perl 6 Synopsis 02. - http://perlcabal.org/syn/S02.html#Names <lang perl6>my $Δ = 1; $Δ++; say $Δ;</lang> Function and subroutine names can also use Unicode characters: (as can methods, classes, packages, whatever...) <lang perl6>my @ᐁ = (0, 45, 60, 90);

sub π { pi };

sub postfix:<°>($degrees) { $degrees * π / 180 };

for @ᐁ -> $ಠ_ಠ { say sin $ಠ_ಠ° };</lang>

PicoLisp

Variables are usually Internal Symbols, and their names may contain any UTF-8 character except null-bytes. White space, and 11 special characters (see the reference) must be escaped with a backslash. Transient Symbols are often used as variables too, they follow the syntax of strings in other languages. <lang PicoLisp>: (setq Δ 1) -> 1

Δ

-> 1

(inc 'Δ)

-> 2

Δ

-> 2</lang>

PHP

PHP is not made to support Unicode. UTF-16 (UCS-2) will not work because it adds null bytes before or after ASCII characters (depending on endianness of UTF-16). As every code has to start with <?php (ASCII) exactly, the parser doesn't find the match and just prints <?php mark.

UTF-8 uses ASCII values for bytes which can be represented as ASCII and as result it's possible to insert <?php mark at beginning. PHP sees your document as some 8-bit encoding (like ISO-8859-1), but it doesn't matter because UTF-8 doesn't use ASCII ranges for its values and calls to the variable are consistent.

Documentation: mbstring.php4.req, language.variables.basics <lang php><?php $Δ = 1; ++$Δ; echo $Δ;</lang>

Prolog

<lang prolog>% Unicode in predicate names: 是.  % be: means, approximately, "True". 不是 :- \+ 是.  % not be: means, approximately, "False". Defined as not 是.

% Unicode in variable names: test(Garçon, Δ) :-

 Garçon = boy,
 Δ = delta.

% Call test2(1, Result) to have 2 assigned to Result. test2(Δ, R) :- R is Δ + 1.</lang>

Putting this into use: <lang prolog>?- 是. true.

?- 不是. false.

?- test(X,Y). X = boy, Y = delta.

?- test2(1,Result). Result = 2.</lang>

Protium

This example is incomplete. Please ensure that it meets all task requirements and remove this message.

1. (working on it)

2. <lang protium><@ LETVARLIT>Δ|1</@> <@ ACTICRVAR>Δ</@> <@ SAYVAR>Δ</@></lang> Using what Google Translate says is the Traditional Chinese for 'delta' <lang protium><@ LETVARLIT>三角洲|1</@> <@ ACTICRVAR>三角洲</@> <@ SAYVAR>三角洲</@></lang>

Python

Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.

Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). For these characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module.

Identifiers are unlimited in length. Case is significant. <lang python>>>> Δx = 1 >>> Δx += 1 >>> print(Δx) 2 >>> </lang>


R

See ?assign for details.

<lang Rsplus>f <- function(`∆`=1) `∆`+1

f(1)</lang>

Output:
[1] 2

Racket

Racket has virtually no restrictions on valid characters for identifiers. In particular, Unicode identifiers are supported.

<lang Racket>

  1. lang racket
Racket can use Unicode in identifier names

(define √ sqrt) (√ 256) ; -> 16

and in fact the standard language makes use of some of these

(λ(x) x) ; -> an identity function

The required binding

(define Δ 1) (set! Δ (add1 Δ)) (printf "Δ = ~s\n" Δ) ; prints "Δ = 2"

</lang>

Retro

This has been tested on Retro 11.0 running under OS X. <lang Retro>variable Δ 1 !Δ @Δ putn 1 +Δ @Δ putn</lang> Function and variable names are stored as strings, and UTF-8 is usable, as long as the host system allows it.

Ruby

This task requires Ruby 1.9. Multilingualization, or m17n, is a major new feature of Ruby 1.9. With m17n, the identifiers can use the non-ASCII characters. Ruby is a Code Set Independent (CSI) language, so there are many different character encodings.

  1. Any non-ASCII characters require a magic comment to select the encoding.
  2. Ruby source code must be ASCII compatible. For example, SJIS and UTF-8 are ASCII compatible, but ISO-2022-JP and UTF-16LE are not compatible. So one can write the source file in UTF-8, but not in UTF-16LE.

A more complete reference is The design and implementation of Ruby M17N.

The next example uses a magic comment to select the Big5 encoding. Then it creates a local variable named Δ.

Works with: Ruby version 1.9

<lang ruby># -*- coding: big5 -*- Δ = 1 Δ += 1 puts Δ</lang>

00000000  23 20 2d 2a 2d 20 63 6f  64 69 6e 67 3a 20 62 69  |# -*- coding: bi|
00000010  67 35 20 2d 2a 2d 0a a3  47 20 3d 20 31 0a a3 47  |g5 -*-.Δ = 1.Δ|
00000020  20 2b 3d 20 31 0a 70 75  74 73 20 a3 47 0a        | += 1.puts Δ.|
0000002e

The output is 2. One can also use the non-ASCII characters in a method name. The next example selects the EUC-JP encoding, and creates a method named ≦, with a parameter named ♯♭♪. Because ≦ is an ordinary method, not an operator, so the program must use a dot to call the method.

Works with: Ruby version 1.9

<lang ruby># -*- coding: euc-jp -*-

class Numeric

 def ≦(♯♭♪)
   self <= ♯♭♪
 end

end

∞ = Float::INFINITY ±5 = [-5, 5] p [(±5.first.≦ ∞),

  (±5.last.≦ ∞),
  (∞.≦ ∞)]</lang>
00000000  23 20 2d 2a 2d 20 63 6f  64 69 6e 67 3a 20 65 75  |# -*- coding: eu|
00000010  63 2d 6a 70 20 2d 2a 2d  0a 0a 63 6c 61 73 73 20  |c-jp -*-..class |
00000020  4e 75 6d 65 72 69 63 0a  20 20 64 65 66 20 a1 e5  |Numeric.  def ≦|
00000030  28 a2 f4 a2 f5 a2 f6 29  0a 20 20 20 20 73 65 6c  |(♯♭♪).    sel|
00000040  66 20 3c 3d 20 a2 f4 a2  f5 a2 f6 0a 20 20 65 6e  |f <= ♯♭♪.  en|
00000050  64 0a 65 6e 64 0a 0a a1  e7 20 3d 20 46 6c 6f 61  |d.end..∞ = Floa|
00000060  74 3a 3a 49 4e 46 49 4e  49 54 59 0a a1 de 35 20  |t::INFINITY.±5 |
00000070  3d 20 5b 2d 35 2c 20 35  5d 0a 70 20 5b 28 a1 de  |= [-5, 5].p [(±|
00000080  35 2e 66 69 72 73 74 2e  a1 e5 20 a1 e7 29 2c 0a  |5.first.≦ ∞),.|
00000090  20 20 20 28 a1 de 35 2e  6c 61 73 74 2e a1 e5 20  |   (±5.last.≦ |
000000a0  a1 e7 29 2c 0a 20 20 20  28 a1 e7 2e a1 e5 20 a1  |∞),.   (∞.≦ |
000000b0  e7 29 5d 0a                                       |)].|
000000b4

The output is [true, true, true] because the numbers -5, 5 and infinity are all less than or equal to infinity.

Rust

Rust source encoding is specified to be UTF-8. Variable names must begin with a character that has Unicode XID_start property and remaining characters must have the XID_Continue property. (Note that flipping tables is not permitted under current specification)

Non-ASCII identifiers are feature gated since version 0.9

<lang Rust>// rustc 0.9 (7613b15 2014-01-08 18:04:43 -0800)

  1. [feature(non_ascii_idents)];

fn main() {

   let mut Δ:int = 1;
   Δ += 1;
   println!("{}", Δ);

}</lang>

Scala

<lang scala>var Δ = 1 val π = 3.141592 val 你好 = "hello" Δ += 1 println(Δ)</lang>

Sidef

<lang ruby>var Δ = 1; Δ += 1; say Δ;</lang>

Output:
2

Swift

<lang swift>var Δ = 1 let π = 3.141592 let 你好 = "hello" Δ++ println(Δ)</lang>

Output:
2

Tcl

Tcl variable names can include any character (the $var syntax can't, but that's just a shorthand for the operationally-equivalent [set var]). Thus, this script is entirely legal: <lang tcl>set Δx 1 incr Δx puts [set Δx]</lang> However, this script only works smoothly if the “Δ” character is in the system's default encoding (thankfully more common than it used to be, as more and more systems use UTF-8 or UTF-16 as their default encodings) so normal Tcl practice is to stick to ASCII for identifier names.

It is also possible to encode characters using a \uXXXX substitution (each X is a hexadecimal digit), thus the Δx could be replaced throughout above by \u0394x; the result is a variable with exactly the same name as before. Doing this allows a script to be written with just ASCII characters, which tends to maximize portability across platforms.