Idiomatically determine all the characters that can be used for symbols: Difference between revisions

Content deleted Content added
Wherrera (talk | contribs)
Petelomax (talk | contribs)
Line 373: Line 373:
We enforce the whitespace restriction to prevent insanity in the readers of programs.
We enforce the whitespace restriction to prevent insanity in the readers of programs.
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. <tt>:-)</tt>
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. <tt>:-)</tt>

=={{header|Phix}}==
{{trans|AWK}}
<lang Phix>function run(string ident)
integer fn = open("test.exw","w")
printf(fn,"object %s",ident)
close(fn)
return system_exec("p -batch test.exw")
end function

string ok1 = "", ok2 = ""
integer ng1 = 0, ng2 = 0
for ch=0 to 255 do
printf(1,"checking %d/255...\r",ch)
if find(ch,"\t\r\n ") then
ng1 += 1
ng2 += 1
else
string c = sprintf("%c",ch)
if run(c)==0 then ok1 &= c else ng1 += 1 end if
if run("_"&c)==0 then ok2 &= c else ng2 += 1 end if
end if
end for
printf(1,"1st character: %d no good, %d OK %s\n",{ng1,length(ok1),ok1})
printf(1,"2nd..nth char: %d no good, %d OK %s\n",{ng2,length(ok2),ok2})</lang>
{{out}}
<pre>
1st character: 194 no good, 62 OK ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇêöÜú╗╬¤Ô
2nd..nth char: 181 no good, 75 OK �0123456789;ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇêöÜú╗╬¤Ô
</pre>
Note that ptok.e (part of the compiler) currently contains the following:
<lang Phix>charset[#80] = LETTER -- more unicode
charset[#88] = LETTER -- more unicode
charset[#94] = LETTER -- for rosettacode/unicode (as ptok.e is not stored in utf8)
charset[#9A] = LETTER -- for rosettacode/unicode
charset[#A3] = LETTER -- for rosettacode/unicode
charset[#BB] = LETTER -- for rosettacode/unicode
charset[#CE] = LETTER -- for rosettacode/unicode
charset[#CF] = LETTER
charset[#E2] = LETTER</lang>
If that is extended (with more utf-8 handling) then obviously the output will change.


=={{header|Python}}==
=={{header|Python}}==