Unicode variable names
You are encouraged to solve this task according to the task description, using any language you may know.
- Task
- Describe, and give a pointer to documentation on your languages use of characters beyond those of the ASCII character set in the naming of variables.
- Show how to:
- Set a variable with a name including the 'Δ', (delta character), to 1
- Increment it
- Print its value.
- Related task
11l
V Δx = 1
Δx++
print(Δx)
- Output:
2
8th
Text in 8th is stored in the UTF-8 encoding, which means that the text is always represented correctly, even when due to other issues (e.g. font problems) it may appear incorrect.
To make the programmer’s task easier, 8th not only lets you actually enter any UTF-8 text, it also lets you use special “escapes” in your text to make it easier to enter obscure characters. Thus, for example, this string: "qu\u00e9" results in this: qué.
The words (e.g. “functions”) 8th provides to do string manipulation are also UTF-8 aware, which means you don’t have to worry about creating an invalid bit of UTF-8 encoded text (unless you deliberately do so).
-- Writing localized applications with 8th
1 var, Δ
Δ @ n:1+ Δ !
Δ @ . cr
\ unicode silliness
: 念 ' G:@ w:exec ;
: 店 ' G:! w:exec ;
: ਵਾਧਾ ' n:1+ w:exec ;
: الوداع ' G:bye w:exec ;
: キャリッジリターン ' G:cr w:exec ;
: प्रिंट ' G:. w:exec ;
Δ 念 ਵਾਧਾ Δ 店
Δ 念 प्रिंट キャリッジリターン
الوداع
ACL2
Variables in ACL2 cannot be modified in place.
(let ((Δ 1))
(1+ Δ))
Ada
As of Ada 2005, all source code can be made of up to 32bit characters. Unless you have made it a default, GNAT would require the -gnatW8 flag to understand you are using UTF8 for the code below, other encodings are possible.
with Ada.Text_IO;
procedure main is
Δ : Integer;
begin
Δ := 41;
Δ := Δ + 1;
Ada.Text_IO.Put_Line (Δ'Img);
end main;
- Output:
42
ALGOL 68
The definition of Algol 68 is character set independent. Not only variables may be represented in Unicode, all other language elements may be so represented. All that is required is that the representation language must be capable of being translated unambigously back and forth into the reference language.
The following excerpt from the Revised Report states this:
d) A program in the strict language must be represented in some "representation language" {9.3.a} chosen by the implementer. In most cases this will be the official "reference language".
(i) A program in a representation language is obtained by replacing the symbols of a program in the strict language by certain typographical marks {9 .3}.
(ii) Even the reference language allows considerable discretion to the implementer {9.4.a,b,c}. A restricted form of the reference language in which such freedom has not been exercised may be termed the "canonical form" of the language, and it is expected that this form will be used in algorithms intended for publication.
(iii) The meaning of a program in a representation language is the same as that of the program {in the strict language} from which it was obtained
The great majority of implementations have used EBCDIC, ASCII and an encoding of Cyrillic (precise encoding not known to me). wp:ALGOL_68 gives as an example:
Russian/Soviet example: In English Algol68's reverent case statement reads case ~ in ~ out ~ esac, in Cyrillic this reads выб ~ в ~ либо ~ быв.
AppleScript
In AppleScript, identifiers whose first and last characters are vertical bars (|) can contain any characters. The AppleScript Language Guide doesn't recommend their use as they can make scripts difficult to read, but they're perfectly legal.
set |Δ| to 1
set |Δ| to |Δ| + 1
return |Δ|
- Output:
2
Vertical bars can also be used to differentiate between identifiers and reserved words should the need arise. The bars are just a signal to the compiler to accept what's between them as an identifier and aren't stored with the identifier itself. They may disappear once the code's compiled if the decompiler can't see a reason to include them. Or indeed they may be added if a script compiled on one machine is decompiled on another where one of the identifiers clashes with a term defined in a library or OSAX installed on that machine.
Arturo
Arturo doesn't support words with non-ASCII characters.
However, you can set/get a variable with any name defined as a Unicode string, using the functions let
(or its alias :
) and var
.
"Δ": 1
"Δ": inc var "Δ"
print ["Delta =>" var "Δ"]
- Output:
Delta => 2
AutoHotkey
The earlier version of AutoHotkey (AutoHotkey Basic) will produce an error since it doesn't support Unicode. It is perfectly working in AutoHotkey_L Unicode (Lexikos Custom Build). Documentation: http://www.autohotkey.net/~Lexikos/AutoHotkey_L/docs/Variables.htm
Δ = 1
Δ++
MsgBox, % Δ
BaCon
This is a port from the C example. As mentioned there, C has limited support for Unicode variable names which is specified in the C standard, and BaCon, being a Basic-to-C converter, therefore has the same restrictions. The below example works with the CLang compiler.
PRAGMA COMPILER clang
DECLARE Δ TYPE int
Δ = 1
INCR Δ
PRINT Δ
- Output:
user@host $ bacon prog Converting 'prog.bac'... done, 10 lines were processed in 0.008 seconds. Compiling 'prog.bac'... clang -c prog.bac.c clang -o prog prog.bac.o -lbacon -lm Done, program 'prog' ready. user@host $ ./prog 2
Bracmat
Bracmat allows any sequence of non-zero bytes as symbol and therefore, as variable name. Even the empty string is a variable, though a special one. If a symbol/variable name contains characters that have special meaning (operators, prefixes, parentheses, braces and the semicolon) it may be necessary to enclose it in quotes. Other special characters must be escaped C-style. See bracmat.html in the git-repo. The example below requires a terminal that supports UTF-8 encoded characters.
( (Δ=1)
& 1+!Δ:?Δ
& out$("Δ:" !Δ)
);
Output:
Δ: 2
C
C has limited support for Unicode in variable names since C99, where the UCS escaped form is required but raw Unicode characters are optional. See Annex D of the C standard and cppreference.
Most modern compilers, as of 2020, support raw Unicode identifiers, given the file is encoded properly (typically UTF-8).
// Works for clang and GCC 10+
#include<stdio.h>
int main() {
int Δ = 1; // if unsupported, use \u0394
Δ++;
printf("%d",Δ);
return 0;
}
Output:
2
C#
Section 2.4.2 of the C# Language Specification gives rules for identifiers. They correspond exactly to those recommended by the Unicode Standard Annex 31, except that underscore is allowed as an initial character (as is traditional in the C programming language), Unicode escape sequences are permitted in identifiers, and the "@" character is allowed as a prefix to enable keywords to be used as identifiers.
class Program
{
static void Main()
{
var Δ = 1;
Δ++;
System.Console.WriteLine(Δ);
}
}
- Output:
2
C++
The C++ language allows the use of Unicode variable names.
For more information visit: https://en.cppreference.com/w/cpp/language/character_literal,
https://en.cppreference.com/w/cpp/language/identifiers and https://learnmoderncpp.com/2021/03/24/a-unicode-primer/
#include <cstdint>
#include <iostream>
#include <string>
int main() {
uint32_t Δ = 1;
double π = 3.14159;
std::string 你好 = "Hello";
Δ++;
std::cout << Δ << " :: " << π << " :: " << 你好 << std::endl;
}
- Output:
2 :: 3.14159 :: Hello
Clojure
According to the current documentation, one should stick to naming with alphanumeric characters and *, +, !, -, _, and ? to avoid possible problems if future versions of Clojure decide to apply special meaning to a character.
That being said, it is not currently enforced, so while you probably shouldn't, you technically can.
(let [Δ 1]
(inc Δ))
- Output:
2
Common Lisp
(let ((Δ 1))
(incf Δ))
- Output:
2
Crystal
Δ = 1
Δ += 1
puts Δ
- Output:
2
D
D source files support four character encodings: ASCII, UTF-8, UTF-16 and UTF-32.
import std.stdio;
void main() {
auto Δ = 1;
Δ++;
writeln(Δ);
}
- Output:
2
You can use any of the following:
Letters, digits, underscore (_), code points >= \u00A0 and < \uD800, code points > \uDFFF.
However, the following cannot be used:
\u0024 ($), \u0040 (@) and \u0060 (`).
See: http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD
Delphi
For more information about naming identifiers (including variables) visit: Identifiers in Delphi
(* Compiled with Delphi XE *)
program UnicodeVariableName;
{$APPTYPE CONSOLE}
uses
SysUtils;
var
Δ: Integer;
begin
Δ:= 1;
Inc(Δ);
Writeln(Δ);
Readln;
end.
DWScript
var Δ : Integer;
Δ := 1;
Inc(Δ);
PrintLn(Δ);
Déjà Vu
set :Δ 1
set :Δ ++ Δ
!. Δ
EchoLisp
Symbol names can be any string including unicode characters. See the EchoLisp reference documentation.
(define ∆-🍒 1) → ∆-🍒
(set! ∆-🍒 (1+ ∆-🍒)) → 2
(printf "🔦 Look at ∆-🍒 : %d" ∆-🍒)
🔦 Look at ∆-🍒 : 2
Elena
ELENA 6.x:
public program()
{
var Δ := 1;
Δ := Δ + 1;
console.writeLine(Δ)
}
Emacs Lisp
(setq Δ 1)
(setq Δ (1+ Δ))
(message "Δ is %d" Δ)
Variables are symbols and symbol names can be any string. Source code .el
files can have all usual Emacs coding system specifications to give variables in non-ASCII.
The byte compiler writes utf-8
(or past versions wrote emacs-mule
) into .elc
so that any mixture of non-ASCII is preserved.
F#
As with C# the F# Language Specification refers to Unicode Standard Annex #31 for identifier syntax, allowing Unicode letter characters.
let mutable Δ = 1
Δ <- Δ + 1
printfn "%d" Δ
Factor
Variable names can contain any character, inlcuding unicode characters, as long as they don't parse as a string or a number.
USE: locals
[let
1 :> Δ!
Δ 1 + Δ!
Δ .
]
Forth
Historically, Forth has worked only in ASCII (going so far as to reserve the eighth bit for symbol smudging), but modern implementations (e.g., Gforth) allow UTF-8 in word names, strings and comments.
variable ∆
1 ∆ !
1 ∆ +!
∆ @ .
FreeBASIC
FreeBASIC does not allow non-ASCII characters in variable names or identifiers generally.
The only ASCII characters allowed are numerals (0-9), letters (a-z, A-Z) and the underscore(_).
However, identifiers cannot begin with a numeral.
If one wanted to use a Greek character such as Δ for a variable name, it would therefore have to be spelled out :
'FB 1.05.0 Win64
Var delta = 1
delta += 1
Print delta '' 2
Sleep
Frink
Frink can use Unicode variable names that meet certain constraints. Variable names that don't meet these constraints can still be parsed and displayed by specifying them as Unicode escapes: Unicode Variable Names
Δ = 1
Δ = Δ + 1
println[Δ]
Go
Go source encoding is specified to be UTF-8. Allowable variable names are specified in the sections identifiers and Exported identifiers.
package main
import "fmt"
func main() {
Δ := 1
Δ++
fmt.Println(Δ)
}
- Output:
2
Groovy
The Groovy solution for Arithmetic/Complex demonstrates a number of Unicode variable names
Haskell
Haskell variables must start with a lower case character, however Δ is an upper case delta. As such, lower case delta (δ) was used as the first character instead, followed by an upper case delta as the second character in the variable name.
Also, Haskell does not allow mutable variables, so incrementing delta isn't possible. Instead lower case psi was used to store the incremented value of delta since tridents are cool.
main = print ψ
where δΔ = 1
ψ = δΔ + 1
Insitux
(let Δ😄 1)
(inc Δ😄)
- Output:
2
J
Variable names must be comprised of ASCII characters.
From the Dictionary page Alphabet and Words:
- "The alphabet is standard ASCII, comprising digits, letters (of the English alphabet), the underline (used in names and numbers), ..."
- "Names ... begin with a letter and may continue with letters, underlines, and digits."
Java
int Δ = 1;
double π = 3.141592;
String 你好 = "hello";
Δ++;
System.out.println(Δ);
- Output:
2
JavaScript
var ᾩ = "something";
var ĦĔĽĻŎ = "hello";
var 〱〱〱〱 = "too less";
var जावास्क्रिप्ट = "javascript"; // ok that's JavaScript in hindi
var KingGeorgeⅦ = "Roman numerals.";
console.log([ᾩ, ĦĔĽĻŎ, 〱〱〱〱, जावास्क्रिप्ट, KingGeorgeⅦ])
- Output:
["something", "hello", "too less", "javascript", "Roman numerals."]
jq
Apart from the initial "$", the characters allowed in so-called "$ variables" in jq are restricted to alphanumeric characters and the underscore: [A-Za-z0-9_].
However, in practice, the keys of JSON objects can also be used as variable names. For example, in the following expression, "Δ" is in effect set to 1 and then its value is retrieved in the environment in which "Δ" has been set:
{ "Δ": 1 } | .["Δ"]
In jq 1.5 and later,
.["Δ"]
can be abbreviated to
."Δ"
Strictly speaking, variables in jq cannot be incremented (in fact, strictly speaking, jq does not have variables at all), but the equivalent operation is illustrated here:
{ "Δ": 1 } # initialization
| .["Δ"] += 1 # increment by 1
| .["Δ"] # emit the incremented value
Julia
The Julia documentation on allowed variable names explicitly describes the wide variety of Unicode codepoints that are allowed:
julia> Δ = 1 ; Δ += 1 ; Δ
2
The allowed identifiers also include sub/superscripts and combining characters (e.g. accent marks):
julia> Δ̂₂ = Δ^2
4
and the Julia interactive shells (and many editors) allow typing these symbols via tab-completion of their LaTeX abbreviations.
Kotlin
fun main(args: Array<String>) {
var Δ = 1
Δ++
print(Δ)
}
- Output:
2
Lily
var Δ = 1
Δ += 1
print(Δ.to_s())
Lingo
Since version 11, in Lingo/Director both native strings and scripts use UTF-8 encoding. Variable names support Unicode characters:
Δ = 1
Δ = Δ + 1
put Δ
-- 2
LiveCode
In LiveCode 7+ all characters are stored as unicode. This includes variable (container) names, although it does not seem to state this in the LC dictionary.
put 1 into Δ
add 1 to Δ
put Δ
-- result is 2
LOLCODE
The spec mandates that identifiers be alphanumeric. However, the fact that YARNs are Unicode-aware permits the use of the SRS operator introduced in 1.3 to utilize variables of arbitrary name.
I HAS A SRS "Δ" ITZ 1
SRS "Δ" R SUM OF SRS ":(394)" AN 1
VISIBLE SRS ":[GREEK CAPITAL LETTER DELTA]"
- Output:
2
Lua
Lua 5.3 supports UTF-8 encoding as documented here: https://www.lua.org/manual/5.3/manual.html#6.5 . However, this support is not strictly necessary for this task so long as the Lua script is edited using a UTF-8 enabled text editor.
∆ = 1
∆ = ∆ + 1
print(∆)
- Output:
2
This output was produced using LuaJIT, which implements Lua 5.1. This works because although Lua doesn't 'understand' the delta character, it still resolves to a consistent set of bytes. This string is "Γêå" in ASCII but the programmer does not need to be aware of that; the unicode variable name works just like any other.
M2000 Interpreter
case is not significant The language has Greek statements too
Δ=1
Δ++
Print Δ
ᾩ=3
Print ᾩ**2=ᾩ^2, ᾩ^2-1=8
Τύπωσε ᾩ**2=ᾩ^2, ᾩ^2-1=8 ' this is Print statement too
Print ᾡ=3
जावास्क्रिप्ट=100
जावास्क्रिप्ट++
Print "जावास्क्रिप्ट=";जावास्क्रिप्ट
ĦĔĽĻŎ$="hello"
Print ĦĔĽĻŎ$+ħĕľļŏ$="hellohello"
〱〱〱〱$="too less"
Print Left$(〱〱〱〱$, 3)="too"
c͓͈̃͂̋̈̆̽h̥̪͕ͣ͛̊aͨͣ̍͞ơ̱͔̖͖̑̽ș̻̥ͬ̃̈ͩ =100 : Print "c͓͈̃͂̋̈̆̽h̥̪͕ͣ͛̊aͨͣ̍͞ơ̱͔̖͖̑̽ș̻̥ͬ̃̈ͩ ="; c͓͈̃͂̋̈̆̽h̥̪͕ͣ͛̊aͨͣ̍͞ơ̱͔̖͖̑̽ș̻̥ͬ̃̈ͩ
Mathematica /Wolfram Language
Δ = 1;
Δ++;
Print[Δ]
Nemerle
From the Nemerle Reference Manual: "Programs are written using the Unicode character set, using the UTF-8 encoding."
using System.Console;
module UnicodeVar
{
Main() : void
{
mutable Δ = 1;
Δ++;
WriteLine($"Δ = $Δ");
}
}
NetRexx
The NetRexx Language Definition section of the NetRexx documentation (netrexx.org/files/nrl3.pdf) describes the character set support within the language.
/* NetRexx */
options replace format comments java crossref symbols nobinary
upperΔ = 1
Δupper = upperΔ
lowerδ = 2
δlower = lowerδ
say upperΔ '+' Δupper '= \-'
upperΔ = upperΔ + Δupper
say upperΔ
say lowerδ '+' δlower '= \-'
lowerδ = lowerδ + δlower
say lowerδ
say
-- Unicode works with the NetRexx built-in functions
Υππερ = '\u0391'.sequence('\u03a1') || '\u03a3'.sequence('\u03a9') -- ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
Λοωερ = '\u03b1'.sequence('\u03c1') || '\u03c3'.sequence('\u03c9') -- αβγδεζηθικλμνξοπρστυφχψω
say Υππερ'.Lower =' Υππερ.lower()
say Λοωερ'.Upper =' Λοωερ.upper()
say
-- Note: Even with unicode characters NetRexx variables are case-insensitive
numeric digits 12
δ = 20.0
π = Math.PI
θ = Π * Δ
σ = Θ ** 2 / (Π * 4) -- == Π * (Δ / 2) ** 2
say 'Π =' π', diameter =' δ', circumference =' Θ', area =' Σ
return
Output:
1 + 1 = 2 2 + 2 = 4 ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ.Lower = αβγδεζηθικλμνξοπρστυφχψω αβγδεζηθικλμνξοπρστυφχψω.Upper = ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ Π = 3.141592653589793, diameter = 20.0, circumference = 62.8318530718, area = 314.159265359
Nim
From the spec: https://nim-lang.org/docs/manual.html#lexical-analysis-identifiers-amp-keywords
var Δ = 1
inc Δ
echo Δ
Objeck
As of 3.2, Objeck supports UTF-8 encoded I/O and stores characters in the runtime's native Unicode format.
class Test {
function : Main(args : String[]) ~ Nil {
Δ := 1;
π := 3.141592;
你好 := "hello";
Δ += 1;
Δ->PrintLine();
}
}
Ol
Ol fully supports Unicode.
(define Δ 1)
(define Δ (+ Δ 1))
(print Δ)
- Output:
2
PARI/GP
GP accepts only ASCII in strings and variable names.
PARI supports Unicode variable names only insofar as C does.
PascalABC.NET
begin
var Δ := 1;
Δ += 1;
Print(Δ);
end.
Peloton
1. (working on it)
2.
<@ LETVARLIT>Δ|1</@>
<@ ACTICRVAR>Δ</@>
<@ SAYVAR>Δ</@>
Using what Google Translate says is the Traditional Chinese for 'delta'
<@ LETVARLIT>三角洲|1</@>
<@ ACTICRVAR>三角洲</@>
<@ SAYVAR>三角洲</@>
Perl
Requires Perl 5.8.1 at the minimum. See http://perldoc.perl.org/utf8.html
use utf8;
my $Δ = 1;
$Δ++;
print $Δ, "\n";
$
sigil can be omitted by using lvalue subroutine:
use utf8;
BEGIN {
my $val;
sub Δ () : lvalue {
$val;
}
}
Δ = 1;
Δ++;
print Δ, "\n";
or with Perl 5.10 and state modifier:
use utf8;
use v5.10;
sub Δ () : lvalue {
state $val;
}
Δ = 1;
Δ++;
say Δ;
One can have Unicode in identifier or subroutine names and also in package or class names. Use of Unicode for the last two purposes is, due to file and directory names, dependent on the filesystem.
Phix
Phix does not officially support unicode variable names, however it took me less than 5 minutes (changes, which are now permanent, labelled with "for rosettacode/unicode" in ptok.e and pttree.e, setting charset and identset respectively) to get the following to work, as long as the source file is stored using utf8 with a proper BOM, as supported by Notepad and the included Edita. I will happily add further character ranges as required/requested: I simply don't know what those ranges are, but I believe that no code points in utf8 should overlap existing ascii chars such as +-* etc.
with javascript_semantics integer Δ = 1 Δ += 1 ?Δ
- Output:
2
PHP
PHP is not made to support Unicode. UTF-16 (UCS-2) will not work because it adds null bytes before or after ASCII characters (depending on endianness of UTF-16). As every code has to start with <?php
(ASCII) exactly, the parser doesn't find the match and just prints <?php
mark.
UTF-8 uses ASCII values for bytes which can be represented as ASCII and as result it's possible to insert <?php
mark at beginning. PHP sees your document as some 8-bit encoding (like ISO-8859-1), but it doesn't matter because UTF-8 doesn't use ASCII ranges for its values and calls to the variable are consistent.
Documentation: mbstring.php4.req, language.variables.basics
<?php
$Δ = 1;
++$Δ;
echo $Δ;
PicoLisp
Variables are usually Internal Symbols, and their names may contain any UTF-8 character except null-bytes. White space, and 11 special characters (see the reference) must be escaped with a backslash. Transient Symbols are often used as variables too, they follow the syntax of strings in other languages.
: (setq Δ 1)
-> 1
: Δ
-> 1
: (inc 'Δ)
-> 2
: Δ
-> 2
Pike
#charset utf8
void main()
{
int Δ = 1;
Δ++;
write( Δ +"\n");
}
- Output:
2
PowerShell
$Δ = 2
$π = 3.14
$π*$Δ
Output:
6.28
Prolog
% Unicode in predicate names:
是. % be: means, approximately, "True".
不是 :- \+ 是. % not be: means, approximately, "False". Defined as not 是.
% Unicode in variable names:
test(Garçon, Δ) :-
Garçon = boy,
Δ = delta.
% Call test2(1, Result) to have 2 assigned to Result.
test2(Δ, R) :- R is Δ + 1.
Putting this into use:
?- 是.
true.
?- 不是.
false.
?- test(X,Y).
X = boy,
Y = delta.
?- test2(1,Result).
Result = 2.
Python
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.
Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). For these characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module.
Identifiers are unlimited in length. Case is significant.
>>> Δx = 1
>>> Δx += 1
>>> print(Δx)
2
>>>
R
See ?assign
for details.
f <- function(`∆`=1) `∆`+1
f(1)
- Output:
[1] 2
Racket
Racket has virtually no restrictions on valid characters for identifiers. In particular, Unicode identifiers are supported.
#lang racket
;; Racket can use Unicode in identifier names
(define √ sqrt)
(√ 256) ; -> 16
;; and in fact the standard language makes use of some of these
(λ(x) x) ; -> an identity function
;; The required binding:
(define Δ 1)
(set! Δ (add1 Δ))
(printf "Δ = ~s\n" Δ) ; prints "Δ = 2"
Raku
(formerly Perl 6) Raku is written in Unicode so, with narrow restrictions, nearly any Unicode letter can be used in identifiers.
See the current Raku documentation on the topic here: https://docs.raku.org/language/syntax#Identifiers
my $Δ = 1;
$Δ++;
say $Δ;
Function and subroutine names can also use Unicode characters: (as can methods, classes, packages, whatever...)
my @ᐁ = (0, 45, 60, 90);
sub π { pi };
sub postfix:<°>($degrees) { $degrees * π / 180 };
for @ᐁ -> $ಠ_ಠ { say sin $ಠ_ಠ° };
sub c͓͈̃͂̋̈̆̽h̥̪͕ͣ͛̊aͨͣ̍͞ơ̱͔̖͖̑̽ș̻̥ͬ̃̈ͩ { 'HE COMES' }
See Also:
Egyptian division
Retro
'Δ var
#1 !Δ
@Δ n:put
&Δ v:inc
@Δ n:put
Function and variable names are stored as strings, and UTF-8 is usable, as long as the host system allows it.
REXX
Note: this REXX program only works with the R4 or ROO REXX interpreter under DOS or DOS under Windows.
This REXX program works because the R4 and ROO REXX interpreters support an extended character set.
/*REXX program (using the R4 REXX interpreter) which uses a Greek delta char).*/
'chcp' 1253 "> NUL" /*ensure we're using correct code page.*/
Δ=1 /*define delta (variable name Δ) to 1*/
Δ=Δ+1 /*bump the delta REXX variable by unity*/
say 'Δ=' Δ /*stick a fork in it, we're all done. */
output
Δ= 2
Ring
# Project : Unicode variable names
Δ = "Ring Programming Language"
see Δ + nl
Output:
Ring Programming Language
Ruby
Ruby supports about 100 encodings, the default being UTF-8.
Δ = 1
Δ += 1
puts Δ # => 2
Rust
Rust source encoding is specified to be UTF-8. Identifiers must begin with a character that has Unicode XID_start property and remaining characters must have the XID_Continue property. (Which means that ╯°□°╯︵┻━┻ is not permitted under current specification)
Non-ASCII identifiers are feature gated since version 0.9
#![feature(non_ascii_idents)]
#![allow(non_snake_case)]
fn main() {
let mut Δ: i32 = 1;
Δ += 1;
println!("{}", Δ);
}
S-lang
S-Lang documentation is decent, but there are imperfections, and this is a case in point. The "Identifiers" chapter in the primary doc says that an identifier must start with [A-Za-z$_] followed by zero or more of [A-Za-z0-9$_]. (The chapter is probably a holdover from S-Lang 1.)
But in reality, any non-ascii Unicode character is legal in identifiers, including the first character, as long as it's encoded in UTF-8. [As an aside, S-Lang includes iconv.sl to interface to the Gnu iconv library. The 0.99.18 ms-win version of the S-Lang-powered Jed programmer's editor includes this to handle the UTF-16 files popular with windows.]
The same document does give a hint in the "S-Lang 2 Interpreter NEWS" appendix, which says "Native support for Unicode via UTF-8 throughout the library" was added as of S-Lang 2.0.
As the doc isn't completely clear, I've used the requested non-ascii char not just in variable but also function and reference names, and tested under S-Lang versions 2.0.6 and pre2.3.1-23.
define ∆increment(∆ref) {
@∆ref++;
}
variable foo∆bar = 1;
foo∆bar++;
variable ∆bar = 1;
∆bar++;
∆increment(&∆bar);
% foo∆bar should be 2 and ∆bar should be 3.
print(foo∆bar);
print(∆bar);
- Output:
2 3
Scala
var Δ = 1
val π = 3.141592
val 你好 = "hello"
Δ += 1
println(Δ)
SenseTalk
put 1 into Δ
add 1 to Δ
put Δ
- Output:
2
Sidef
var Δ = 1;
Δ += 1;
say Δ;
- Output:
2
Stata
Here is how to create a macro, a scalar and a Mata variable named Δ:
sca Δ=10
sca Δ=Δ+1
di Δ
local Δ=20
local ++Δ
di `Δ'
mata
Δ=30
Δ++
Δ
end
Swift
var Δ = 1
let π = 3.141592
let 你好 = "hello"
Δ++
println(Δ)
- Output:
2
Tcl
Tcl variable names can include any character (the $var
syntax can't, but that's just a shorthand for the operationally-equivalent [set var]
). Also (in tcl 8.6, at least), the ${var}
syntax does work. Thus, this script is entirely legal:
set Δx 1
incr Δx
puts [set Δx]
puts ${Δx}
However, this script only works smoothly if the “Δ” character is in the system's default encoding (thankfully more common than it used to be, as more and more systems use UTF-8 or UTF-16 as their default encodings) so normal Tcl practice is to stick to ASCII for identifier names.
It is also possible to encode characters using a \uXXXX substitution (each X is a hexadecimal digit), thus the Δx
could be replaced throughout above by \u0394x
; the result is a variable with exactly the same name as before. Doing this allows a script to be written with just ASCII characters, which tends to maximize portability across platforms.
UNIX Shell
Δ=1
Δ=`expr $Δ + 1`
echo $Δ
- Output:
2
Vala
Vala has limited support for Unicode in variable names. This limitation comes from its source-to-source compilation to C.
VBA
- See Visual Basic
Visual Basic
Sub Main()
Dim Δ As Integer
Δ=1
Δ=Δ+1
Debug.Print Δ
End Sub
- Output:
2
Wren
The rules for identifiers are described here - basically they can contain letters, digits (not initially) or underscores but identifiers with leading underscores are reserved for field names.
Although it's perhaps not 100% clear from that description that only ASCII letters and digits are allowed, this in fact is the case as the following example shows.
The error description refers to the bytes in the UTF encoding of 'Δ' which can't appear (outside a string) in a Wren script.
var a = 3
var b = 2
var delta = a - b // ok
var Δ = delta // not ok
- Output:
[./unicode_ident line 4] Error: Invalid byte 0xce. [./unicode_ident line 4] Error: Invalid byte 0x94.
However, it is possible to simulate Unicode variable names to some extent by using the Var class in the above module which stores the names (which can be any valid string) in an internal map.
import "./trait" for Var
Var["Δ"] = 1
Var["Δ"] = Var["Δ"] + 1
System.print(Var["Δ"])
- Output:
2
Zig
Works with: 0.10.x, 0.11.x, 0.12.0-dev.1594+7048e9366
Links here refer to 0.10.1 documentation, but nothing changed in 0.11.0, and 0.12.0 is not released yet at the moment of writing.
Source code in Zig must be encoded in UTF-8, see this page from language reference.
Zig supports non-conforming identifiers (that includes space, use non-English alphabet, shadow reserved keyword etc.) via @"" syntax. See this page from language reference.
const std = @import("std");
pub fn main() void {
var @"Δ": i32 = 1;
@"Δ" += 1;
std.debug.print("{d}\n", .{@"Δ"});
}
- Output:
2
zkl
The short answer is zkl identifiers are a small subset of ASCII. This is enforced by the compiler. That said, the VM doesn't particularly care about names (although UTF-8 will cause sorting/etc issues). So ...
delta:="\U0394;"; // UTF-8 delta
klass:= // embryo(names, numFcns, numClasses, numParents, ...)
self.embryo(L("","",delta),0,0,0).cook();
klass.setVar(0,Ref(1)); // indirect set since delta not valid var name
klass.vars.println();
dv:=klass.setVar(0); // which actually gets the var, go figure
dv.inc(); // ie (*ptr)++
dv.value.println();
- Output:
L(L("Δ",Ref)) 2
- Programming Tasks
- Unicode
- 11l
- 8th
- ACL2
- Ada
- ALGOL 68
- AppleScript
- Arturo
- AutoHotkey
- BaCon
- Bracmat
- C
- C sharp
- C++
- Clojure
- Common Lisp
- Crystal
- D
- Delphi
- DWScript
- Déjà Vu
- EchoLisp
- Elena
- Emacs Lisp
- F Sharp
- Factor
- Forth
- FreeBASIC
- Frink
- Go
- Groovy
- Haskell
- Insitux
- J
- Java
- JavaScript
- Jq
- Julia
- Kotlin
- Lily
- Lingo
- LiveCode
- LOLCODE
- Lua
- M2000 Interpreter
- Mathematica
- Wolfram Language
- Nemerle
- NetRexx
- Nim
- Objeck
- Ol
- PARI/GP
- PascalABC.NET
- Peloton
- Peloton examples needing attention
- Examples needing attention
- Perl
- Phix
- PHP
- PicoLisp
- Pike
- PowerShell
- Prolog
- Python
- R
- Racket
- Raku
- Retro
- REXX
- Ring
- Ruby
- Rust
- S-lang
- Scala
- SenseTalk
- Sidef
- Stata
- Swift
- Tcl
- UNIX Shell
- Vala
- VBA
- Visual Basic
- Wren
- Wren-trait
- Zig
- Zkl
- 6502 Assembly/Omit
- 68000 Assembly/Omit
- 8080 Assembly/Omit
- 8086 Assembly/Omit
- AWK/Omit
- BASIC/Omit
- BBC BASIC/Omit
- Bc/Omit
- BLAST/Omit
- C++/Omit
- Dc/Omit
- EasyLang/Omit
- Erlang/Omit
- Free Pascal/Omit
- GUISS/Omit
- Icon/Omit
- Locomotive Basic/Omit
- Lotus 123 Macro Scripting/Omit
- Pascal/Omit
- Sed/Omit
- TUSCRIPT/Omit
- Unicon/Omit
- Z80 Assembly/Omit
- ZX Spectrum Basic/Omit