Idiomatically determine all the characters that can be used for symbols: Difference between revisions

m
→‎{{header|Wren}}: Changed to Wren S/H
m (→‎{{header|Wren}}: Changed to Wren S/H)
 
(25 intermediate revisions by 18 users not shown)
Line 1:
{{task}}
{{omit from|6502 Assembly}}
{{omit from|Pascal|ISO standards define “reference tokens” so implementors are free to choose, e.g. * ∗ • ⋅}}
Idiomatically determine all the characters that can be used for ''symbols''.
The word ''symbols'' is meant things like names of variables, procedures (i.e., named fragments of programs, functions, subroutines, routines), statement labels, events or conditions, and in general, anything a computer programmer can choose to ''name'', but not being restricted to this list. ''Identifiers'' might be another name for ''symbols''.
Line 15 ⟶ 17:
* [[Idiomatically_determine_all_the_lowercase_and_uppercase_letters|Idiomatically determine all the lowercase and uppercase letters]].
<br><br>
 
=={{header|ALGOL 68}}==
{{works with|ALGOL 68G|Any - tested with release 2.8.3.win32}}
...should also work with other Algol 68 implementations that use upper-stropping (reserved words in upper-case).<br>
There are a number of different types of symbols that can be defined in Algol 68 (informally as follows):<br>
- identifiers used for variables, constants, structure members, procedures<br>
- monadic operators<br>
- dyadic operators<br>
- mode indicants - used for type names
<br>
Monadic and dyadic operators can be symbolic or have "bold" names. Mode indicants also have "bold" names. When upper-stropping is used, bold words are formed from upper-case letters. Algol 68G also allows underscores in bold words - other implementations of Algol 68 may also allow digits.<br>
In the output, the characters shown for monadic and dyadic operators include the upper-case letters - these can't be mixed with symbols, e.g. +A and B- are not valid operator symbols. Additionally, the only valid two character operator name where ":" is the second character is "=:" - the surlaw operator, perhaps :).<br>
Symbolic operator symbols can be one or two characters, optionally suffixed with := or =:.<br>
The following uses the same approach as the AWK sample, though due to the different symbol types, rather more possible symbols have to be checked.<br>
The sample assumes Windows/Linux is the operating system and the Algol 68 compiler/interpreter can be invoked with "a68". It should be possible to modify it for other operating systems/commands. Only 7-bit ASCII characters > space are considered
<syntaxhighlight lang="algol68">
BEGIN # determine which characters can be in identifiers, etc. by trying to #
# compile test programs #
 
STRING source name = "_tmp.a68";
STRING a68 command = "a68 " + source name + " > _tmp.err 2>&1";
 
# attenpts to compile the code with "%" substituted with id, #
# returns 0 if it compiled OK, non-zero otherwise #
PROC attempt compilation = ( STRING template, id )INT:
BEGIN
STRING code := "";
# replace "%" with the identifier in the template #
FOR t pos FROM LWB template TO UPB template DO
code +:= IF template[ t pos ] /= "%"
THEN template[ t pos ]
ELSE id
FI
OD;
# output the source file and try compiling it #
FILE source file;
BOOL open error := IF open( source file, source name, stand out channel ) = 0
THEN
# opened OK - file already exists and #
# will be overwritten #
FALSE
ELSE
# failed to open the file #
# - try creating a new file #
establish( source file, source name, stand out channel ) /= 0
FI;
IF open error
THEN # failed to open the file #
print( ( "Unable to open ", source name, newline ) );
stop
ELSE # file opened OK #
put( source file, ( code ) ); # write source #
close( source file );
system( a68 command ) # compile it #
FI
END # attempt compilation # ;
# attempt to construct all two-charaacter symbols and determine whether #
# they are valid by attempting to compile a program containing them #
# only 7-bit ASCII characters > space are considered #
PROC try = ( STRING template, legend )VOID:
BEGIN
[ 0 : 127 ]BOOL first, second;
FOR i FROM LWB first TO UPB first DO
first[ i ] := second[ i ] := FALSE
OD;
FOR f FROM ABS " " + 1 TO UPB first DO
CHAR fc = REPR f;
IF attempt compilation( template, fc ) = 0
THEN
# this character can be the first character of a symbol #
first[ f ] := TRUE;
FOR s FROM ABS " " + 1 TO UPB second DO
IF NOT second[ s ]
THEN
# haven't found this is a valid second character #
# yet #
IF attempt compilation( template, fc + REPR s ) = 0
THEN
# compiled OK #
second[ s ] := TRUE
FI
FI
OD
FI
OD;
print( ( "Characters valid for ", legend, ":", newline ) );
print( ( " as first: " ) );
FOR c pos FROM LWB first TO UPB first DO
IF first[ c pos ]
THEN print( ( REPR c pos ) )
ELIF second[ c pos ]
THEN print( ( " " ) )
FI
OD;
print( ( newline ) );
print( ( " as other: " ) );
FOR c pos FROM LWB first TO UPB first DO
IF second[ c pos ]
THEN print( ( REPR c pos ) )
ELIF first[ c pos ]
THEN print( ( " " ) )
FI
OD;
print( ( newline ) )
END # try # ;
 
try( "BEGIN INT %; % := 1 END", "identifiers" );
try( "BEGIN OP % = ( INT a )INT: a; % 1 END", "monadic operators" );
try( "BEGIN PRIO % = 5; OP % = ( INT a, b )INT: a; 1 % 1 END", "dyadic operators" );
try( "BEGIN MODE % = INT; % x; x := 1 END", "mode indicants" )
 
END
</syntaxhighlight>
{{out}}
<pre>
Characters valid for identifiers:
as first: abcdefghijklmnopqrstuvwxyz
as other: 0123456789_abcdefghijklmnopqrstuvwxyz
Characters valid for monadic operators:
as first: !%& +- ?ABCDEFGHIJKLMNOPQRSTUVWXYZ^ ~
as other: * /<=> ABCDEFGHIJKLMNOPQRSTUVWXYZ _
Characters valid for dyadic operators:
as first: !%&*+-/ <=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ^ ~
as other: * /:<=> ABCDEFGHIJKLMNOPQRSTUVWXYZ _
Characters valid for mode indicants:
as first: ABCDEFGHIJKLMNOPQRSTUVWXYZ
as other: ABCDEFGHIJKLMNOPQRSTUVWXYZ_
</pre>
 
=={{header|AWK}}==
<syntaxhighlight lang="awk"># usage: gawk -f Idiomatically_determine_all_the_characters_that_can_be_used_for_symbols.awk
<lang AWK>
 
# syntax: GAWK -f IDIOMATICALLY_DETERMINE_ALL_THE_CHARACTERS_THAT_CAN_BE_USED_FOR_SYMBOLS.AWK
function is_valid_identifier(id, rc) {
fn = "is_valid_identifier.awk"
printf("function unused(%s) { arr[%s] = 1 }\n", id, id, id) >fn
printf("BEGIN { exit(0) }\n") >>fn
close(fn)
 
rc = system("gawk -f is_valid_identifier.awk 2>errors")
return rc == 0
}
 
BEGIN {
fnfor (i = "TEMP.AWK"0; i <= 255; i++) {
cmd c = sprintf("GAWK -f %s 2>NULc",fn i)
 
for (i=0; i<=255; i++) {
c = sprintfif (is_valid_identifier("%c",i))
good1 = good1 c;
if (c ~ /\x09|\x0D|\x0A|\x20/) { ng++; continue } # tab,CR,LF,space
else
(run(c) == 0) ? (ok1 = ok1 c) : (ng1 = ng1 c) # 1st character
(run("_" c) == 0) ? (ok2 = ok2 c) : (ng2bad1 = ng2bad1 c) # 2nd..nth character
 
if (is_valid_identifier("_" c "_"))
good2 = good2 c;
else
bad2 = bad2 c;
}
 
printf("1st character: %d NG, %d OK %s\n",length(ng1)+ng,length(ok1),ok1)
printf("2nd..nth1st charcharacter: %d NGbad, %d OKok: %s\n",length(ng2)+ng,length(ok2),ok2)
length(bad1), length(good1), good1)
printf("2nd..nth char: %d bad, %d ok: %s\n",
length(bad2), length(good2), good2)
exit(0)
}</syntaxhighlight>
}
function run(c, rc) {
printf("BEGIN{%s+=0}\n",c) >fn
close(fn)
rc = system(cmd)
return(rc)
}
</lang>
<p>output:</p>
<pre>
1st character: 203 NGbad, 53 OKok: ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
2nd..nth char: 193 NGbad, 63 OKok: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
</pre>
 
=={{header|Delphi}}==
{{works with|Delphi|6.0}}
{{libheader|SysUtils,StdCtrls}}
This code test the all printable ASCII characters to see if they are valid in symbols. It uses the the Delphi system call "IsValidIdent" to determine what the compiler will accept. It starts by testing single characters identifiers, which tests the first character of the identifier. Then tests the second character. This tells which characters are valid for the rest of an identifier.
 
<syntaxhighlight lang="Delphi">
 
 
procedure ShowValidSymbols(Memo: TMemo);
{Uses Delphi system tool "IsValidIndent" }
{To identify valid characters in indentifiers}
var I: integer;
var TS: string;
var Good,Bad: string;
begin
{Test first characters in a symbol}
Good:=''; Bad:='';
for I:=$21 to $7F do
begin
TS:=Char(I);
if IsValidIdent(TS) then Good:=Good+TS
else Bad:=Bad+TS;
end;
Memo.Lines.Add('First Characters Allowed');
Memo.Lines.Add('Allowed: '+Good);
Memo.Lines.Add('Not Allowed: '+Bad);
{Test remaining characters in a symbol}
Good:=''; Bad:='';
for I:=$21 to $7F do
begin
TS:='A'+Char(I);
if IsValidIdent(TS) then Good:=Good+TS[2]
else Bad:=Bad+TS[2];
end;
Memo.Lines.Add('');
Memo.Lines.Add('Remaining Characters Allowed');
Memo.Lines.Add('Allowed: '+Good);
Memo.Lines.Add('Not Allowed: '+Bad);
end;
 
 
</syntaxhighlight>
{{out}}
<pre>
First Characters Allowed
Allowed: ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
Not Allowed: !"#$%&'()*+,-./0123456789:;<=>?@[\]^`{|}~
 
Remaining Characters Allowed
Allowed: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
Not Allowed: !"#$%&'()*+,-./:;<=>?@[\]^`{|}~
 
Elapsed Time: 9.048 ms.
 
</pre>
 
 
=={{header|F_Sharp|F#}}==
Well, if the purpose of this task is to determine what can be used as an identifier then in F# anything so long as you enclose it in double backticks so:
<langsyntaxhighlight lang="fsharp">
let ``+`` = 5
printfn "%d" ``+``
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 59 ⟶ 258:
{{output?}}
 
<langsyntaxhighlight lang="factor">USING: parser see ;
\ scan-word-name see</langsyntaxhighlight>
{{out}}
<pre>
Line 68 ⟶ 267:
</pre>
From this code we can see that any characters may be used in an identifier unless it parses as a string or a number.
 
=={{header|FreeBASIC}}==
<syntaxhighlight lang="vb">Dim As String*1 C1
Dim As Integer C
Print "First character set: ";
For C = 0 To 255
If (Chr(C) >= "A" And Chr(C) <="Z") Or Chr(C)="_" Then Print Chr(C);
Next
 
Print !"\nNext characters set: ";
For C = 0 To 255
C1 = Chr(C)
If (C1 >= "A" And C1 <= "Z") Or (C1 >= "0" And C1 <= "9") Or C1 = "_" Or (C1 >= "a" And C1 <= "z") Then Print C1;
Next C
 
Sleep</syntaxhighlight>
{{out}}
<pre>First character set: ABCDEFGHIJKLMNOPQRSTUVWXYZ_
Next characters set: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz</pre>
 
=={{header|Go}}==
This program uses the Go parser to check whether an identifier is indeed valid.
Go allows the underscore, letters, and digits, with "letters" and "digits" defined by Unicode. The first character must be the underscore or a letter. To be exported, the first character must be an upper case letter, again as defined by Unicode.
It checks separately which Unicode code points may appear at the beginning of an identifier, or in the remaining name.
<lang go>package main
The code assumes that the Go language does not have a keyword or otherwise reserved symbol of length 1,
or of length 2 starting with the underscore.
 
<syntaxhighlight lang="go">package main
 
import (
"fmt"
"go/ast"
"go/parser"
"strings"
"unicode"
)
 
func isValidIdentifier(identifier string) bool {
func main() {
node, err := parser.ParseExpr(identifier)
fmt.Println("Unicode version: ", unicode.Version)
if err != nil {
fmt.Println()
return false
fmt.Println("Underscore: _")
}
fmt.Println("ASCII digits: 0123456789")
ident, ok := node.(*ast.Ident)
fmt.Println("ASCII letters: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
return ok && ident.Name == identifier
showRange("Unicode digits: ", unicode.Digit)
showRange("Unicode letters: ", unicode.Letter)
}
 
type runeRanges struct {
const Ω = 52
ranges []string
hasStart bool
start rune
end rune
}
 
func (r *runeRanges) add(cp rune) {
var n int
if !r.hasStart {
var ряд, 广度一六 int
r.hasStart = true
r.start = cp
r.end = cp
return
}
 
if cp == r.end+1 {
func showRange(hdr string, rt *unicode.RangeTable) {
r.end = cp
fmt.Print(hdr)
return
n = 0
r16 := rt.R16
for r16[0].Hi < 128 {
r16 = r16[1:]
}
 
for _, rng := range r16 {
r.writeTo(&r.ranges)
for r := rng.Lo; r <= rng.Hi; r += rng.Stride {
 
fmt.Print(string(r))
r.start = cp
n++
r.end = cp
if n == Ω {
}
fmt.Println("...")
 
return
func (r *runeRanges) writeTo(ranges *[]string) {
}
if r.hasStart {
if r.start == r.end {
*ranges = append(*ranges, fmt.Sprintf("%U", r.end))
} else {
*ranges = append(*ranges, fmt.Sprintf("%U-%U", r.start, r.end))
}
}
}
fmt.Println()
 
for _, rng := range rt.R32 {
func (r *runeRanges) String() string {
for r := rng.Lo; r <= rng.Hi; r += rng.Stride {
ranges := r.ranges
fmt.Print(string(r))
r.writeTo(&ranges)
n++
return strings.Join(ranges, ", ")
if n == Ω {
}
fmt.Println("...")
 
return
func main() {
}
var validFirst runeRanges
var validFollow runeRanges
var validOnlyFollow runeRanges
 
for r := rune(0); r <= unicode.MaxRune; r++ {
first := isValidIdentifier(string([]rune{r}))
follow := isValidIdentifier(string([]rune{'_', r}))
if first {
validFirst.add(r)
}
if follow {
validFollow.add(r)
}
if follow && !first {
validOnlyFollow.add(r)
}
}
 
fmt.Println()
_, _ = fmt.Println("Valid first:", validFirst.String())
}</lang>
_, _ = fmt.Println("Valid follow:", validFollow.String())
_, _ = fmt.Println("Only follow:", validOnlyFollow.String())
}</syntaxhighlight>
{{out}}
<pre>
Valid first: U+0041-U+005A, U+005F, U+0061-U+007A, U+00AA, ..., U+00F8-U+02C1, U+02C6-U+02D1, ...
Unicode version: 7.0.0
Valid follow: U+0030-U+0039, U+0041-U+005A, U+005F, U+0061-U+007A, U+00AA, ..., U+00F8-U+02C1, ..., U+2CEB0-U+2EBE0, U+2F800-U+2FA1D
 
Only follow: U+0030-U+0039, U+0660-U+0669, U+06F0-U+06F9, U+07C0-U+07C9, ..., U+1D7CE-U+1D7FF, U+1E950-U+1E959
Underscore: _
ASCII digits: 0123456789
ASCII letters: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Unicode digits: ٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧...
Unicode letters: ªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñ...
</pre>
 
=={{header|Haskell}}==
 
Quotation from the Haskell 2010 language specification [https://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-140002]
 
<pre> varid → (small {small | large | digit | ' }) / reservedid
conid → large {small | large | digit | ' }
reservedid → case | class | data | default | deriving | do | else
| foreign | if | import | in | infix | infixl
| infixr | instance | let | module | newtype | of
| then | type | where | _
 
small → ascSmall | uniSmall | _
ascSmall → a | b | … | z
uniSmall → any Unicode lowercase letter
large → ascLarge | uniLarge
ascLarge → A | B | … | Z
uniLarge → any uppercase or titlecase Unicode letter
 
digit → ascDigit | uniDigit
ascDigit → 0 | 1 | … | 9
uniDigit → any Unicode decimal digit</pre>
 
An identifier consists of a letter followed by zero or more letters, digits, underscores, and single quotes. Identifiers are lexically distinguished into two namespaces: those that begin with a lowercase letter (variable identifiers) and those that begin with an upper-case letter (constructor identifiers). Identifiers are case sensitive: name, naMe, and Name are three distinct identifiers (the first two are variable identifiers, the last is a constructor identifier).
 
Underscore, “_”, is treated as a lowercase letter, and can occur wherever a lowercase letter can. However, “_” all by itself is a reserved identifier, used as wild card in patterns.
 
According to the specification we may give predicates for valid symbols and identifiers in Haskell:
<syntaxhighlight lang="haskell">import Data.Char
 
-- predicate for valid symbol
isSymbolic ch = isAlphaNum ch || ch `elem` "_'"
 
-- predicate for valid type construtor
isConId s = and [ not (null s)
, isUpper (head s)
, all isSymbolic (tail s) ]
 
-- predicate for valid identifier
isVarId s = and [ not (null s)
, isLower (head s)
, all isSymbolic (tail s)
, not (isReserved s) ]
 
-- predicate for reserved words
isReserved s = elem s ["case", "class", "data", "default", "deriving", "do "
, "else", "foreign", "if", "import", "in", "infix "
, "infixl", "infixr", "instance", "let", "module "
, "newtype", "of", "then", "type", "where", "_"</syntaxhighlight>
 
=={{header|J}}==
Line 138 ⟶ 438:
J is defined in terms of ascii, but that would not prevent it from being ported to other environments. But we can still use J's parser to determine if a specific character combination is a single, legal word:
 
<langsyntaxhighlight Jlang="j"> a.#~1=#@;: ::0:"1 'b',.a.,.'c'
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz</langsyntaxhighlight>
 
Here, [http://www.jsoftware.com/help/dictionary/dadot.htm a.] is the set of chararacters we are testing. We prefix each of these with an arbitrary letter, and suffix each with an arbitrary character and then try counting how many parsed tokens are formed by the result. If the token count is 1, then that character was a legal word-forming character.
Line 147 ⟶ 447:
=={{header|Java}}==
{{works with|Java|8}}
<langsyntaxhighlight lang="java">import java.util.function.IntPredicate;
import java.util.stream.IntStream;
 
Line 178 ⟶ 478:
System.out.println("...");
}
}</langsyntaxhighlight>
 
<pre>Java Identifier start: $ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz¢£¤¥ªµºÀÁÂÃÄÅÆÇÈÉÊ...
Line 194 ⟶ 494:
 
To generate a string of such characters idiomatically:
<langsyntaxhighlight lang="jq">[range(0;128) | [.] | implode | select(test("[A-Za-z0-9$_]"))] | add</langsyntaxhighlight>
 
jq 1.5 also allows ":" as a joining character in the form "module::name".
Line 206 ⟶ 506:
Therefore, assuming the availability in jq of the test/1 builtin, the test
in jq for whether a character can appear literally in a jq identifier or key is:
<langsyntaxhighlight lang="jq">test("[^\u0000-\u0007F]")</langsyntaxhighlight>
 
===Symbols===
The following function screens for characters by "\p" class:
<langsyntaxhighlight lang="jq">def is_character(class):
test( "\\p{" + class + "}" );</langsyntaxhighlight>
For example, to test whether a character is a Unicode letter, symbol or numeric character:
<langsyntaxhighlight lang="jq">is_character("L") or is_character("S") or is_character("N")</langsyntaxhighlight>
 
An efficient way to count the number of Unicode characters within a character class is
to use the technique illustrated by the following function:
<langsyntaxhighlight lang="jq">def count(class; m; n):
reduce (range(m;n) | [.] | implode | select( test( "\\p{" + class + "}" ))) as $i
(0; . + 1);</langsyntaxhighlight>
 
For example the number of Unicode "symbol" characters can be obtained by evaluating:
<langsyntaxhighlight lang="jq">count("S"; 0; 1114112)</langsyntaxhighlight>
The result is 3958.
 
Line 230 ⟶ 530:
 
For example, x2 is a valid identifier, but 2x is not-- it is interpreted as 2 times the identifier x. In Julia, the Symbol() function turns a string into a symbolic token. So, for example:
<langsyntaxhighlight lang="julia">
for i in 1:0x200000 - 1
Symbol("x" * Char(i))
end
</syntaxhighlight>
</lang>
 
When run, this loop runs without error up to 0x200000 but not at Unicode symbol numbered 0x200000.
Line 253 ⟶ 553:
 
A Kotlin label name is a valid identifier followed by an @ symbol and an annotation name is an identifier preceded by an @ symbol.
<langsyntaxhighlight lang="scala">// version 1.1.4-3
 
typealias CharPredicate = (Char) -> Boolean
Line 275 ⟶ 575:
printChars("Kotlin Identifier ignorable: ", 0, 0x10FFFF, 25,
Character::isIdentifierIgnorable, true)
}</langsyntaxhighlight>
 
{{out}}
Line 283 ⟶ 583:
Kotlin Identifier ignorable: [0][1][2][3][4][5][6][7][8][14][15][16][17][18][19][20][21][22][23][24][25][26][27][127][128]...
</pre>
 
=={{header|Lua}}==
From the 5.4 reference manual: "Names (also called identifiers) in Lua can be any string of Latin letters, Arabic-Indic digits, and underscores, not beginning with a digit and not being a reserved word."
<syntaxhighlight lang="lua">function isValidIdentifier(id)
local reserved = {
["and"]=true, ["break"]=true, ["do"]=true, ["end"]=true, ["else"]=true, ["elseif"]=true, ["end"]=true,
["false"]=true, ["for"]=true, ["function"]=true, ["goto"]=true, ["if"]=true, ["in"]=true,
["local"]=true, ["nil"]=true, ["not"]=true, ["or"]=true, ["repeat"]=true, ["return"]=true,
["then"]=true, ["true"]=true, ["until"]=true, ["while"]=true }
return id:find("^[a-zA-Z_][a-zA-Z0-9_]*$") ~= nil and not reserved[id]
end
vfc, vsc = {}, {}
for i = 0, 255 do
local c = string.char(i)
if isValidIdentifier(c) then vfc[#vfc+1]=c end
if isValidIdentifier("_"..c) then vsc[#vsc+1]=c end
end
print("Valid First Characters: " .. table.concat(vfc))
print("Valid Subsequent Characters: " .. table.concat(vsc))</syntaxhighlight>
{{out}}
<pre>Valid First Characters: ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
Valid Subsequent Characters: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz</pre>
 
=={{header|Mathematica}}/{{header|Wolfram Language}}==
<syntaxhighlight lang="mathematica">chars = Characters[FromCharacterCode[Range[0, 1114111]]];
out = Reap[Do[
If[Quiet[Length[Symbol[c]] == 0],
Sow[c]
]
,
{c, chars}
]][[2, 1]];
Print["Possible 1st characters: ", out // Length]
out = Reap[Do[
If[Quiet[Length[Symbol["a" <> c]] == 0],
Sow[c]
]
,
{c, chars}
]][[2, 1]];
Print["Possible 2nd-nth characters: ", out // Length]</syntaxhighlight>
{{out}}
In Wolfram Language almost all characters (there are 1114112 characters defined) can be used in variable/function names. I can't show all valid characters as there are over a million that are allowed. I do not show the list of characters 'out' but rather their length for practical purposes:
<pre>Possible 1st characters: 1113704
Possible 2nd-nth characters: 1113726</pre>
 
=={{header|Nim}}==
As regards identifiers, there exists a general rule which describes how they can be formed. For this rule, the following program prints the allowed starting characters and the allowed characters:
 
<syntaxhighlight lang="nim">import sequtils, strutils
 
echo "Allowed starting characters for identifiers:"
echo toSeq(IdentStartChars).join()
echo ""
echo "Allowed characters in identifiers:"
echo toSeq(IdentChars).join()</syntaxhighlight>
 
{{out}}
<pre>Allowed starting characters for identifiers:
ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
 
Allowed characters in identifiers:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz</pre>
 
But Nim is a lot more flexible and allows using Unicode symbols in identifiers provided these are letters and digits. Thus, the following program is valid:
 
<syntaxhighlight lang="nim">var à⁷ = 3
echo à⁷</syntaxhighlight>
 
Using escape character <code>`</code>, it is possible to override the rules and to include any character in an identifier and even to use a keyword as identifier. Here is an example of the possibilities:
 
<syntaxhighlight lang="nim">var `const`= 3
echo `const`
 
proc `<`(a, b: int): bool =
echo a, " ", b
system.`<`(a, b)
 
echo 4 < 7
 
proc `Π`(a: varargs[int]): int =
result = 1
for n in a: result *= n
 
echo Π(4, 5, 7)
 
var `1` = 2
echo `1`
</syntaxhighlight>
 
=={{header|Ol}}==
Absolutely any Unicode or ANSI character can be used as part of symbol name. There only some limitations in form of symbol declaration.
 
1. Direct symbol declaration (in form of quote or ') must not be started from control codes (first 32 characters), numbers and @. Next characters in symbol must not be control code neither @.
 
2. Direct symbol declaration (in form of ||) must not contain character |.
 
3. Functional symbol creation (in form of string->symbol) have no any limitations.
 
=={{header|ooRexx}}==
Although this program does not use any feature that is not in Classic Rexx,
it is included here to show what characters are valid for symbols in ooRexx.
<langsyntaxhighlight lang="oorexx">/*REXX program determines what characters are valid for REXX symbols.*/
/* copied from REXX version 2 */
Parse Version v
Line 297 ⟶ 695:
symbol_characters=symbol_characters || c /* add to list. */
end
say 'symbol characters:' symbol_characters /*display all */</langsyntaxhighlight>
{{out}}
<pre>REXX-ooRexx_4.2.0(MT)_32-bit 6.04 22 Feb 2014
Line 304 ⟶ 702:
=={{header|PARI/GP}}==
The only symbols that can be used in variable names (including function names as a special case) are a-z, A-Z, 0-9, and the underscore. Additionally, the first character must be a letter. (That is, they must match this regex: <code>[a-zA-Z][a-zA-Z0-9_]*</code>.)
<langsyntaxhighlight lang="parigp">v=concat(concat([48..57],[65..90]),concat([97..122],95));
apply(Strchr,v)</langsyntaxhighlight>
{{out}}
<pre>%1 = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "_"]</pre>
 
=={{header|Perl}}==
<langsyntaxhighlight lang="perl"># When not using the <code>use utf8</code> pragma, any word character in the ASCII range is allowed.
# the loop below returns: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
for $i (0..0x7f) {
Line 324 ⟶ 722:
$c = chr($_);
print $c if $c =~ /\p{Word}/;
}</langsyntaxhighlight>
 
=={{header|Perl 6}}==
Any Unicode character or combination of characters can be used for symbols in Perl 6. Here's some counting rods and some cuneiform:
<lang perl6>sub postfix:<𒋦>($n) { say "$n trilobites" }
 
sub term:<𝍧> { unival('𝍧') }
 
𝍧𒋦</lang>
{{out}}
<pre>8 trilobites</pre>
 
And here is a Zalgo-text symbol:
 
<lang perl6>sub Z̧̔ͩ͌͑̉̎A̢̲̙̮̹̮͍̎L̔ͧ́͆G̰̬͎͔̱̅ͣͫO͙̔ͣ̈́̈̽̎ͣ ($n) { say "$n COMES" }
 
 
Z̧̔ͩ͌͑̉̎A̢̲̙̮̹̮͍̎L̔ͧ́͆G̰̬͎͔̱̅ͣͫO͙̔ͣ̈́̈̽̎ͣ 'HE'</lang>
{{out}}
<pre>HE COMES</pre>
 
Of course, as in other languages, most of the characters you'll typically see in names are going to be alphanumerics from ASCII (or maybe Unicode), but that's a convention, not a limitation, due to the syntactic category notation demonstrated above, which can introduce any sequence of characters as a term or operator.
 
Actually, the above is a slight prevarication. The syntactic category notation does not allow you to use whitespace in the definition of a new symbol. But that leaves many more characters allowed than not allowed. Hence, it is much easier to enumerate the characters that <em>cannot</em> be used in symbols:
<lang perl6>say .fmt("%4x"),"\t", uniname($_)
if uniprop($_,'Z')
for 0..0x1ffff;</lang>
{{out}}
<pre> 20 SPACE
a0 NO-BREAK SPACE
1680 OGHAM SPACE MARK
2000 EN QUAD
2001 EM QUAD
2002 EN SPACE
2003 EM SPACE
2004 THREE-PER-EM SPACE
2005 FOUR-PER-EM SPACE
2006 SIX-PER-EM SPACE
2007 FIGURE SPACE
2008 PUNCTUATION SPACE
2009 THIN SPACE
200a HAIR SPACE
2028 LINE SEPARATOR
2029 PARAGRAPH SEPARATOR
202f NARROW NO-BREAK SPACE
205f MEDIUM MATHEMATICAL SPACE
3000 IDEOGRAPHIC SPACE</pre>
We enforce the whitespace restriction to prevent insanity in the readers of programs.
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. <tt>:-)</tt>
 
=={{header|Phix}}==
Translation of AWK, extended with separation of ansi and utf8 handling
<!--<syntaxhighlight lang="phix">(notonline)-->
<lang Phix>function run(string ident)
<span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- file i/o, system_exec, \t and \r chars</span>
integer fn = open("test.exw","w")
<span style="color: #008080;">function</span> <span style="color: #000000;">run</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">ident</span><span style="color: #0000FF;">)</span>
printf(fn,"object %s",ident)
<span style="color: #004080;">integer</span> <span style="color: #000000;">fn</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">open</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"test.exw"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"w"</span><span style="color: #0000FF;">)</span>
close(fn)
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"object %s"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ident</span><span style="color: #0000FF;">)</span>
return system_exec("p -batch test.exw")
<span style="color: #7060A8;">close</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span>
end function
<span style="color: #008080;">return</span> <span style="color: #7060A8;">system_exec</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"p -batch test.exw"</span><span style="color: #0000FF;">)</span>
 
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
function check(integer lo, hi)
string ok1 = "", ok2 = ""
<span style="color: #008080;">function</span> <span style="color: #000000;">check</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">lo</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">hi</span><span style="color: #0000FF;">)</span>
integer ng1 = 0, ng2 = 0
<span style="color: #004080;">string</span> <span style="color: #000000;">ok1</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">""</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">ok2</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">""</span>
for ch=lo to hi do
<span style="color: #004080;">integer</span> <span style="color: #000000;">ng1</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">ng2</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span>
printf(1,"%d/%d...\r",{ch,hi})
<span style="color: #008080;">for</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">=</span><span style="color: #000000;">lo</span> <span style="color: #008080;">to</span> <span style="color: #000000;">hi</span> <span style="color: #008080;">do</span>
if find(ch,"\t\r\n \0\x1A;") then
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%d/%d...\r"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span><span style="color: #000000;">hi</span><span style="color: #0000FF;">})</span>
ng1 += 1
<span style="color: #008080;">if</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\t\r\n \0\x1A;"</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span>
ng2 += 1
<span style="color: #000000;">ng1</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
else
<span style="color: #000000;">ng2</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
string c = sprintf("%c",ch)
<span if run(c)style==0 then ok1 &= c"color: #008080;">else ng1 += 1 end if</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">c</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sprintf</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"%c"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">)</span>
if run("_"&c)==0 then ok2 &= c else ng2 += 1 end if
<span style="color: #008080;">if</span> <span style="color: #000000;">run</span><span style="color: #0000FF;">(</span><span style="color: #000000;">c</span><span style="color: #0000FF;">)==</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span> <span style="color: #000000;">ok1</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">c</span> <span style="color: #008080;">else</span> <span style="color: #000000;">ng1</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
end if
<span style="color: #008080;">if</span> <span style="color: #000000;">run</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"_"</span><span style="color: #0000FF;">&</span><span style="color: #000000;">c</span><span style="color: #0000FF;">)==</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span> <span style="color: #000000;">ok2</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">c</span> <span style="color: #008080;">else</span> <span style="color: #000000;">ng2</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
end for
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
return {{ng1,length(ok1),ok1},
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
{ng2,length(ok2),ok2}}
<span style="color: #008080;">return</span> <span style="color: #0000FF;">{{</span><span style="color: #000000;">ng1</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ok1</span><span style="color: #0000FF;">),</span><span style="color: #000000;">ok1</span><span style="color: #0000FF;">},</span>
end function
<span style="color: #0000FF;">{</span><span style="color: #000000;">ng2</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ok2</span><span style="color: #0000FF;">),</span><span style="color: #000000;">ok2</span><span style="color: #0000FF;">}}</span>
sequence r = check(0,127)
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
printf(1,"ansi characters:\n===============\n")
printf(1,"1st character: %d no good, %d OK %s\n",r[1])
<span style="color: #004080;">sequence</span> <span style="color: #000000;">r</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">check</span><span style="color: #0000FF;">(</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">127</span><span style="color: #0000FF;">)</span>
printf(1,"2nd..nth char: %d no good, %d OK %s\n\n",r[2])
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ansi characters:\n===============\n"</span><span style="color: #0000FF;">)</span>
r = check(128,255)
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"1st character: %d bad, %d OK %s\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">r</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">])</span>
integer ok8 = 0, ng8 = 0
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"2nd..nth char: %d bad, %d OK %s\n\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">r</span><span style="color: #0000FF;">[</span><span style="color: #000000;">2</span><span style="color: #0000FF;">])</span>
sequence good = ""
<span style="color: #000000;">r</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">check</span><span style="color: #0000FF;">(</span><span style="color: #000000;">128</span><span style="color: #0000FF;">,</span><span style="color: #000000;">255</span><span style="color: #0000FF;">)</span>
for i=#80 to #10FFFF do
<span style="color: #004080;">integer</span> <span style="color: #000000;">ok8</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">ng8</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span>
if i<#D800 or i>#DFFF then
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">#80</span> <span style="color: #008080;">to</span> <span style="color: #000000;">#10FFFF</span> <span style="color: #008080;">do</span>
printf(1,"#%x/#10FFFF...\r",i)
<span style="color: #008080;">if</span> <span style="color: #000000;">i</span><span style="color: #0000FF;"><</span><span style="color: #000000;">#D800</span> <span style="color: #008080;">or</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">></span><span style="color: #000000;">#DFFF</span> <span style="color: #008080;">then</span>
string utf8 = utf32_to_utf8({i})
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"#%x/#10FFFF...\r"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">i</span><span style="color: #0000FF;">)</span>
bool ok = true
<span style="color: #004080;">string</span> <span style="color: #000000;">utf8</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">utf32_to_utf8</span><span style="color: #0000FF;">({</span><span style="color: #000000;">i</span><span style="color: #0000FF;">})</span>
if not find(utf8[1],r[1][3]) then
<span style="color: #004080;">bool</span> <span style="color: #000000;">ok</span> <span style="color: #0000FF;">=</span> <span style="color: #004600;">true</span>
ok = false
<span style="color: #008080;">if</span> <span style="color: #008080;">not</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">utf8</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">],</span><span style="color: #000000;">r</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">][</span><span style="color: #000000;">3</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">then</span>
else
<span style="color: #000000;">ok</span> <span style="color: #0000FF;">=</span> <span style="color: #004600;">false</span>
for j=2 to length(utf8) do
<span style="color: #008080;">else</span>
if not find(utf8[j],r[2][3]) then
<span style="color: #008080;">for</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">=</span><span style="color: #000000;">2</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">utf8</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
ok = false
<span style="color: #008080;">if</span> <span style="color: #008080;">not</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">utf8</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">],</span><span style="color: #000000;">r</span><span style="color: #0000FF;">[</span><span style="color: #000000;">2</span><span style="color: #0000FF;">][</span><span style="color: #000000;">3</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">then</span>
exit
<span style="color: #000000;">ok</span> <span style="color: #0000FF;">=</span> <span style="color: #004600;">false</span>
end if
<span style="color: #008080;">exit</span>
end for
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
end if
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
if ok then
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
ok8 += 1
<span style="color: #008080;">if</span> <span style="color: #000000;">ok</span> <span style="color: #008080;">then</span>
good &= utf8&", "
<span style="color: #000000;">ok8</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
else
<span ng8 +style="color: 1#008080;">else</span>
<span style="color: #000000;">ng8</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
end if
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
end if
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
end for
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
printf(1,"utf8 characters: \n===============\n")
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"utf8 characters: \n===============\n"</span><span style="color: #0000FF;">)</span>
printf(1,"good:%d, bad:%d\n",{ok8,ng8})
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"bad:%,d, good:%,d\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">ng8</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ok8</span><span style="color: #0000FF;">})</span>
if platform()=LINUX then
<!--</syntaxhighlight>-->
-- (comes out gibberish on a windows console...)
printf(1,"%s\n",{good})
end if</lang>
{{out}}
<pre>
ansi characters:
===============
1st character: 75 no goodbad, 53 OK ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
2nd..nth char: 65 no goodbad, 63 OK 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
 
utf8 characters:
===============
bad:0, good:1,111,936
good:48, bad:1111888
</pre>
΀, Έ, Δ, Κ, Σ, λ, π, ψ, ϔ, Ϛ, ϣ, ϻ,  ,  , —, ‚, ‣, ※, ∀, ∈, ∔, √, ∣, ∻, ─, ┈, └, ┚, ┣, ┻, ⚀, ⚈, ⚔, ⚚, ⚣, ⚻, ⣀, ⣈, ⣔, ⣚, ⣣, ⣻, ⻀, ⻈, ⻔, ⻚, ⻣, ⻻,
Note that versions prior to 0.8.1 only permit a mere 48 utf8 characters, running the same code on 0.7.9 gave me
<pre>
utf8 characters:
===============
bad:1,111,888, good:48
</pre>
Note that ptok.e (part of the compiler) currently contains the following:
<lang Phix>charset[#80] = LETTER -- more unicode
charset[#88] = LETTER -- more unicode
charset[#94] = LETTER -- for rosettacode/unicode (as ptok.e is not stored in utf8)
charset[#9A] = LETTER -- for rosettacode/unicode
charset[#A3] = LETTER -- for rosettacode/unicode
charset[#BB] = LETTER -- for rosettacode/unicode
charset[#CE] = LETTER -- for rosettacode/unicode
charset[#CF] = LETTER
charset[#E2] = LETTER</lang>
If that is extended (with more utf-8 handling) then obviously the output will change.<br>
I am a little surprised at just how few ad-hoc utf8 characters have been supported so far.
 
=={{header|Python}}==
See [[Idiomatically_determine_all_the_lowercase_and_uppercase_letters#Python|String class isidentifier]].
 
=={{header|Quackery}}==
 
<syntaxhighlight lang="quackery">[ $ "0123456789AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrS"
$ QsTtUuVvWwXxYyZz()[]{}<>~=+-*/^\|_.,:;?!'"`%@&#$Q
join ] constant is tokenchars ( --> $ )
( The first non-whitespace character after the word $
(pronounced "string") is deemed to be the delimiter
for the string that follows it. In the first string
the conventional character " is used, so cannot
appear as a character in that string. In the second
string all the reasonable delimiters are used, so Q
is used as the delimiter.
As it is not possible to make a string that uses all
the characters, two strings are concatenated (join)
to make the string during compilation. (Which is why
$ "0...S" $ Qs...$Q join is nested (inside [ ... ])
and followed by the word constant, which causes the
nest to be evaluated during compilation.)
Regardless of operating system, Quackery only knows
the characters in the string tokenchars, plus space
and carriage return.
The characters in tokenchars are in QACSFOT order
(the Quackery Arbitrary Character Sequence For
Ordered Text) which it uses for string comparison,
but the valid tokens (which is all of them) will
be printed by alltokens in the order native to the
operating system. (In this instance, Unicode.) )
 
[ tokenchars find
tokenchars found ] is validtoken ( c --> b )
 
[ 256 times
[ i^ validtoken if [ i^ emit ] ] ] is alltokens ( --> )
alltokens</syntaxhighlight>
 
'''Output:'''
 
<pre>!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~</pre>
 
=={{header|Racket}}==
Line 480 ⟶ 865:
That's too much to be printing out here... call <code>(main)</code> yourself, at home.
 
<langsyntaxhighlight lang="racket">#lang racket
;; Symbols that don't need to be specially quoted:
(printf "~s~%" '(a a-z 3rd ...---... .hidden-files-look-like-this))
Line 505 ⟶ 890:
(when (zero? (modulo i 80)) (newline))
(display (list->string (list c)))))
</syntaxhighlight>
</lang>
 
{{out}}
Line 512 ⟶ 897:
(3 i have a space i've got a quote in me i'm not a "dot on my own", but my neighbour is! . λ my characters aren't even mapped in unicode 􎑃)</pre>
The output to <code>(main)</code> is massive, and probably not dissimilar to Tcl's (anyone want to compare?)
 
=={{header|Raku}}==
(formerly Perl 6)
Any Unicode character or combination of characters can be used for symbols in Raku. Here's some counting rods and some cuneiform:
<syntaxhighlight lang="raku" line>sub postfix:<𒋦>($n) { say "$n trilobites" }
 
sub term:<𝍧> { unival('𝍧') }
 
𝍧𒋦</syntaxhighlight>
{{out}}
<pre>8 trilobites</pre>
 
And here is a Zalgo-text symbol:
 
<syntaxhighlight lang="raku" line>sub Z̧̔ͩ͌͑̉̎A̢̲̙̮̹̮͍̎L̔ͧ́͆G̰̬͎͔̱̅ͣͫO͙̔ͣ̈́̈̽̎ͣ ($n) { say "$n COMES" }
 
 
Z̧̔ͩ͌͑̉̎A̢̲̙̮̹̮͍̎L̔ͧ́͆G̰̬͎͔̱̅ͣͫO͙̔ͣ̈́̈̽̎ͣ 'HE'</syntaxhighlight>
{{out}}
<pre>HE COMES</pre>
 
Of course, as in other languages, most of the characters you'll typically see in names are going to be alphanumerics from ASCII (or maybe Unicode), but that's a convention, not a limitation, due to the syntactic category notation demonstrated above, which can introduce any sequence of characters as a term or operator.
 
Actually, the above is a slight prevarication. The syntactic category notation does not allow you to use whitespace in the definition of a new symbol. But that leaves many more characters allowed than not allowed. Hence, it is much easier to enumerate the characters that <em>cannot</em> be used in symbols:
<syntaxhighlight lang="raku" line>say .fmt("%4x"),"\t", uniname($_)
if uniprop($_,'Z')
for 0..0x1ffff;</syntaxhighlight>
{{out}}
<pre> 20 SPACE
a0 NO-BREAK SPACE
1680 OGHAM SPACE MARK
2000 EN QUAD
2001 EM QUAD
2002 EN SPACE
2003 EM SPACE
2004 THREE-PER-EM SPACE
2005 FOUR-PER-EM SPACE
2006 SIX-PER-EM SPACE
2007 FIGURE SPACE
2008 PUNCTUATION SPACE
2009 THIN SPACE
200a HAIR SPACE
2028 LINE SEPARATOR
2029 PARAGRAPH SEPARATOR
202f NARROW NO-BREAK SPACE
205f MEDIUM MATHEMATICAL SPACE
3000 IDEOGRAPHIC SPACE</pre>
We enforce the whitespace restriction to prevent insanity in the readers of programs.
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Raku, even the insane ones. <tt>:-)</tt>
 
=={{header|REXX}}==
===version 1===
<langsyntaxhighlight lang="rexx">/*REXX program determines what characters are valid for REXX symbols. */
@= /*set symbol characters " " */
do j=0 for 2**8 /*traipse through all the chars. */
Line 523 ⟶ 957:
 
say ' symbol characters: ' @ /*display all symbol characters.*/
/*stick a fork in it, we're done.*/</langsyntaxhighlight>
Programming note: &nbsp; REXX allows any symbol to begin a (statement) label, but variables can't begin with a period ('''.''') or a numeric digit.
<br><br>All examples below were executed on a (ASCII) PC using Windows/XP and Windows/7 with code page 437 in a DOS window.
Line 555 ⟶ 989:
</pre>
I've added version 2 which should work correctly for all Rexx interpreters and compilers
<langsyntaxhighlight lang="rexx">/*REXX program determines what characters are valid for REXX symbols.*/
/* version 1 adapted for general acceptance */
Parse Version v
Line 566 ⟶ 1,000:
end
say 'symbol characters:' symbol_characters /*display all */
</syntaxhighlight>
</lang>
{{out}} for some interpreters
Note that $#@ are not valid symbol characters for ooRexx.
Line 574 ⟶ 1,008:
REXX-Regina_3.8.2(MT) 5.00 22 Jun 2014
symbol characters: !#$.0123456789?@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
</pre>
 
=={{header|RPL}}==
The RPL character set is an 8-bit character set, sometimes referred to simply as "ECMA-94" in documentation, although it is for the most part a variant of ISO/IEC 8859-1 / ECMA-94. See the related [https://en.wikipedia.org/wiki/RPL_character_set Wikipedia entry] for more details.
≪ "" "'A '"
1 255 '''FOR''' c
3 c CHR REPL
'''IFERR''' DUP STR→ '''THEN''' DROP
'''ELSE'''
'''IF''' 'A' SAME NOT '''THEN''' SWAP c CHR + SWAP '''END'''
'''END'''
'''NEXT''' DROP
≫ '<span style="color:blue">SYMBOLS</span>' STO
{{out}}
<pre>
1: "!$%&.0123456789?ABCDEFGHIJKLMNOPQRSTUVWXYZ\abcdefghijklmnopqrstuvwxyz~∇∑▶πα→←↓↑γδεηθλρστωΔΠΩ▬∞ ¡¢£¤¥¦§¨©ª¬­®¯°±²³´µ¶·¸¹º¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
</pre>
 
=={{header|Scala}}==
{{Out}}Best seen running in your browser either by [https://scalafiddle.io/sf/ZyPkGW8/0 ScalaFiddle (ES aka JavaScript, non JVM)] or [https://scastie.scala-lang.org/4XdxscWGTtyw9MDQXCtRdg Scastie (remote JVM)].
<langsyntaxhighlight Scalalang="scala">object IdiomaticallyDetermineSymbols extends App {
 
private def print(msg: String, limit: Int, p: Int => Boolean, fmt: String) =
Line 589 ⟶ 1,039:
print("Unicode Identifier part : ", 25, cp => Character.isUnicodeIdentifierPart(cp), "[%d]")
 
}</langsyntaxhighlight>
 
=={{header|Tcl}}==
Tcl permits ''any'' character to be used in a variable or command name (subject to the restriction that <code>::</code> is a namespace separator and, for variables only, a <code>(…)</code> sequence is an array reference). The set of characters that can be used after <code>$</code> is more restricted, excluding many non-letter-like symbols, but still large. It is ''recommended practice'' to only use ASCII characters for variable names as this makes scripts more resistant to the majority of encoding problems when transporting them between systems, but the language does not itself impose such a restriction.
<langsyntaxhighlight lang="tcl">for {set c 0;set printed 0;set special {}} {$c <= 0xffff} {incr c} {
set ch [format "%c" $c]
set v "_${ch}_"
Line 606 ⟶ 1,057:
puts "All Unicode characters legal in names"
}
puts "Characters legal after \$: $special"</langsyntaxhighlight>
{{out}}
Only the first 256 characters are displayed:
<pre>All Unicode characters legal in names
Characters legal after $: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ş š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź ƪ Ƶ ƺ ǀ ǁ ǂ ǃ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ Ǡ ǡ Ǣ ǣ Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ ǰ DZ Dz dz Ǵ ǵ Ƕ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ ...</pre>
 
=={{header|Wren}}==
In Wren identifiers consist of upper and lower case letters, digits or underscores and must begin with either a letter or an underscore. Only ASCII letters or digits can be used, though Wren supports Unicode for other purposes.
 
Identifiers which begin with underscores can only be used as instance field names (one underscore) or static field names (two or more underscores).
<syntaxhighlight lang="wren">for (i in 97..122) System.write(String.fromByte(i))
for (i in 65..90) System.write(String.fromByte(i))
System.print("_")</syntaxhighlight>
 
{{out}}
<pre>
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_
</pre>
 
=={{header|XPL0}}==
Paraphrasing code from the compiler's parser:
<syntaxhighlight lang="xpl0">char C, C1;
[Text(0, "First character set: ");
for C:= 0 to 255 do
if C>=^A then if C<=^Z ! C=^_ then
ChOut(0, C);
CrLf(0);
Text(0, "Next characters set: ");
for C:= 0 to 255 do
[if C>=^a & C<=^z then C1:= C & $DF \to uppercase
else C1:= C;
case of
C1>=^A & C1<=^Z, C1>=^0 & C1<=^9, C1=^_ :
ChOut(0, C)
other [];
];
CrLf(0);
]</syntaxhighlight>
 
{{out}}
<pre>
First character set: ABCDEFGHIJKLMNOPQRSTUVWXYZ_
Next characters set: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
</pre>
 
=={{header|zkl}}==
zkl only supports ASCII, although other character sets might be finessed.
<langsyntaxhighlight lang="zkl">[0..255].filter(fcn(n){
try{ Compiler.Compiler.compileText("var "+n.text) }
catch{ False }
}).apply("text").concat()</langsyntaxhighlight>
{{out}}
<pre>
9,476

edits