Idiomatically determine all the characters that can be used for symbols: Difference between revisions

Rename Perl 6 -> Raku, alphabetize, minor clean-up
(→‎{{header|Go}}: Better intro, interesting code to the top)
(Rename Perl 6 -> Raku, alphabetize, minor clean-up)
Line 44:
2nd..nth char: 193 NG, 63 OK 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
</pre>
 
=={{header|F_Sharp|F#}}==
Well, if the purpose of this task is to determine what can be used as an identifier then in F# anything so long as you enclose it in double backticks so:
Line 367 ⟶ 368:
print $c if $c =~ /\p{Word}/;
}</lang>
 
=={{header|Perl 6}}==
Any Unicode character or combination of characters can be used for symbols in Perl 6. Here's some counting rods and some cuneiform:
<lang perl6>sub postfix:<𒋦>($n) { say "$n trilobites" }
 
sub term:<𝍧> { unival('𝍧') }
 
𝍧𒋦</lang>
{{out}}
<pre>8 trilobites</pre>
 
And here is a Zalgo-text symbol:
 
<lang perl6>sub Z̧̔ͩ͌͑̉̎A̢̲̙̮̹̮͍̎L̔ͧ́͆G̰̬͎͔̱̅ͣͫO͙̔ͣ̈́̈̽̎ͣ ($n) { say "$n COMES" }
 
 
Z̧̔ͩ͌͑̉̎A̢̲̙̮̹̮͍̎L̔ͧ́͆G̰̬͎͔̱̅ͣͫO͙̔ͣ̈́̈̽̎ͣ 'HE'</lang>
{{out}}
<pre>HE COMES</pre>
 
Of course, as in other languages, most of the characters you'll typically see in names are going to be alphanumerics from ASCII (or maybe Unicode), but that's a convention, not a limitation, due to the syntactic category notation demonstrated above, which can introduce any sequence of characters as a term or operator.
 
Actually, the above is a slight prevarication. The syntactic category notation does not allow you to use whitespace in the definition of a new symbol. But that leaves many more characters allowed than not allowed. Hence, it is much easier to enumerate the characters that <em>cannot</em> be used in symbols:
<lang perl6>say .fmt("%4x"),"\t", uniname($_)
if uniprop($_,'Z')
for 0..0x1ffff;</lang>
{{out}}
<pre> 20 SPACE
a0 NO-BREAK SPACE
1680 OGHAM SPACE MARK
2000 EN QUAD
2001 EM QUAD
2002 EN SPACE
2003 EM SPACE
2004 THREE-PER-EM SPACE
2005 FOUR-PER-EM SPACE
2006 SIX-PER-EM SPACE
2007 FIGURE SPACE
2008 PUNCTUATION SPACE
2009 THIN SPACE
200a HAIR SPACE
2028 LINE SEPARATOR
2029 PARAGRAPH SEPARATOR
202f NARROW NO-BREAK SPACE
205f MEDIUM MATHEMATICAL SPACE
3000 IDEOGRAPHIC SPACE</pre>
We enforce the whitespace restriction to prevent insanity in the readers of programs.
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. <tt>:-)</tt>
 
=={{header|Phix}}==
Line 542 ⟶ 495:
(3 i have a space i've got a quote in me i'm not a "dot on my own", but my neighbour is! . λ my characters aren't even mapped in unicode 􎑃)</pre>
The output to <code>(main)</code> is massive, and probably not dissimilar to Tcl's (anyone want to compare?)
 
=={{header|Perl 6Raku}}==
(formerly Perl 6)
Any Unicode character or combination of characters can be used for symbols in Perl 6. Here's some counting rods and some cuneiform:
<lang perl6>sub postfix:<𒋦>($n) { say "$n trilobites" }
 
sub term:<𝍧> { unival('𝍧') }
 
𝍧𒋦</lang>
{{out}}
<pre>8 trilobites</pre>
 
And here is a Zalgo-text symbol:
 
<lang perl6>sub Z̧̔ͩ͌͑̉̎A̢̲̙̮̹̮͍̎L̔ͧ́͆G̰̬͎͔̱̅ͣͫO͙̔ͣ̈́̈̽̎ͣ ($n) { say "$n COMES" }
 
 
Z̧̔ͩ͌͑̉̎A̢̲̙̮̹̮͍̎L̔ͧ́͆G̰̬͎͔̱̅ͣͫO͙̔ͣ̈́̈̽̎ͣ 'HE'</lang>
{{out}}
<pre>HE COMES</pre>
 
Of course, as in other languages, most of the characters you'll typically see in names are going to be alphanumerics from ASCII (or maybe Unicode), but that's a convention, not a limitation, due to the syntactic category notation demonstrated above, which can introduce any sequence of characters as a term or operator.
 
Actually, the above is a slight prevarication. The syntactic category notation does not allow you to use whitespace in the definition of a new symbol. But that leaves many more characters allowed than not allowed. Hence, it is much easier to enumerate the characters that <em>cannot</em> be used in symbols:
<lang perl6>say .fmt("%4x"),"\t", uniname($_)
if uniprop($_,'Z')
for 0..0x1ffff;</lang>
{{out}}
<pre> 20 SPACE
a0 NO-BREAK SPACE
1680 OGHAM SPACE MARK
2000 EN QUAD
2001 EM QUAD
2002 EN SPACE
2003 EM SPACE
2004 THREE-PER-EM SPACE
2005 FOUR-PER-EM SPACE
2006 SIX-PER-EM SPACE
2007 FIGURE SPACE
2008 PUNCTUATION SPACE
2009 THIN SPACE
200a HAIR SPACE
2028 LINE SEPARATOR
2029 PARAGRAPH SEPARATOR
202f NARROW NO-BREAK SPACE
205f MEDIUM MATHEMATICAL SPACE
3000 IDEOGRAPHIC SPACE</pre>
We enforce the whitespace restriction to prevent insanity in the readers of programs.
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. <tt>:-)</tt>
 
=={{header|REXX}}==
Line 620 ⟶ 622:
 
}</lang>
 
=={{header|Tcl}}==
Tcl permits ''any'' character to be used in a variable or command name (subject to the restriction that <code>::</code> is a namespace separator and, for variables only, a <code>(…)</code> sequence is an array reference). The set of characters that can be used after <code>$</code> is more restricted, excluding many non-letter-like symbols, but still large. It is ''recommended practice'' to only use ASCII characters for variable names as this makes scripts more resistant to the majority of encoding problems when transporting them between systems, but the language does not itself impose such a restriction.
10,327

edits