Idiomatically determine all the characters that can be used for symbols: Difference between revisions
Content deleted Content added
Line 373: | Line 373: | ||
We enforce the whitespace restriction to prevent insanity in the readers of programs. |
We enforce the whitespace restriction to prevent insanity in the readers of programs. |
||
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. <tt>:-)</tt> |
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. <tt>:-)</tt> |
||
=={{header|Phix}}== |
|||
{{trans|AWK}} |
|||
<lang Phix>function run(string ident) |
|||
integer fn = open("test.exw","w") |
|||
printf(fn,"object %s",ident) |
|||
close(fn) |
|||
return system_exec("p -batch test.exw") |
|||
end function |
|||
string ok1 = "", ok2 = "" |
|||
integer ng1 = 0, ng2 = 0 |
|||
for ch=0 to 255 do |
|||
printf(1,"checking %d/255...\r",ch) |
|||
if find(ch,"\t\r\n ") then |
|||
ng1 += 1 |
|||
ng2 += 1 |
|||
else |
|||
string c = sprintf("%c",ch) |
|||
if run(c)==0 then ok1 &= c else ng1 += 1 end if |
|||
if run("_"&c)==0 then ok2 &= c else ng2 += 1 end if |
|||
end if |
|||
end for |
|||
printf(1,"1st character: %d no good, %d OK %s\n",{ng1,length(ok1),ok1}) |
|||
printf(1,"2nd..nth char: %d no good, %d OK %s\n",{ng2,length(ok2),ok2})</lang> |
|||
{{out}} |
|||
<pre> |
|||
1st character: 194 no good, 62 OK ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇêöÜú╗╬¤Ô |
|||
2nd..nth char: 181 no good, 75 OK �0123456789;ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇêöÜú╗╬¤Ô |
|||
</pre> |
|||
Note that ptok.e (part of the compiler) currently contains the following: |
|||
<lang Phix>charset[#80] = LETTER -- more unicode |
|||
charset[#88] = LETTER -- more unicode |
|||
charset[#94] = LETTER -- for rosettacode/unicode (as ptok.e is not stored in utf8) |
|||
charset[#9A] = LETTER -- for rosettacode/unicode |
|||
charset[#A3] = LETTER -- for rosettacode/unicode |
|||
charset[#BB] = LETTER -- for rosettacode/unicode |
|||
charset[#CE] = LETTER -- for rosettacode/unicode |
|||
charset[#CF] = LETTER |
|||
charset[#E2] = LETTER</lang> |
|||
If that is extended (with more utf-8 handling) then obviously the output will change. |
|||
=={{header|Python}}== |
=={{header|Python}}== |