Idiomatically determine all the characters that can be used for symbols: Difference between revisions

Line 383:
end function
 
function check(integer lo, hi)
string ok1 = "", ok2 = ""
integer ng1 string ok1 = 0"", ng2ok2 = 0""
integer ng1 = 0, ng2 = 0
for ch=0lo to 255hi do
printf(1,"checking %d/255...\r",ch)
if find printf(ch1,"\t%d/%d...\r\n ",{ch,hi}) then
ng1if +=find(ch,"\t\r\n 1\0\x1A;") then
ng2 ng1 += 1
ng2 += 1
else
string c = sprintf("%c",ch)else
if run(c)==0 then ok1 &=string c else ng1 += 1 end ifsprintf("%c",ch)
if run("_"&c)==0 then ok2ok1 &= c else ng2ng1 += 1 end if
if run("_"&c)==0 then ok2 &= c else ng2 += 1 end if
end if
end for
return {{ng1,length(ok1),ok1},
{ng2,length(ok2),ok2}}
end function
sequence r = check(0,127)
printf(1,"ansi characters:\n===============\n")
printf(1,"1st character: %d NG, %d OK %s\n",r[1])
printf(1,"2nd..nth char: %d NG, %d OK %s\n\n",r[2])
r = check(128,255)
integer ok8 = 0, ng8 = 0
sequence good = ""
for i=#80 to #10FFFF do
if i<#D800 or i>#DFFF then
printf(1,"checking #%dx/255#10FFFF...\r",chi)
string utf8 = utf32_to_utf8({i})
bool ok = true
if not find(utf8[1],r[1][3]) then
ok = false
else
for j=2 to length(utf8) do
if not find(utf8[j],r[2][3]) then
ok = false
exit
end if
end for
end if
if ok then
ok8 += 1
good &= utf8&", "
else
ng8 += 1
end if
end if
end for
printf(1,"1stutf8 charactercharacters: %d no good, %d OK %s\n===============\n",{ng1,length(ok1),ok1})
printf(1,"2nd..nth chargood: %d no good, bad:%d OK %s\n",{ng2,length(ok2)ok8,ok2ng8})</lang>
if platform()=LINUX then
-- (comes out gibberish on a windows console...)
printf(1,"%s\n",{good})
end if</lang>
{{out}}
<pre>
ansi characters:
1st character: 194 no good, 62 OK ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇêöÜú╗╬¤Ô
===============
2nd..nth char: 181 no good, 75 OK �0123456789;ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇêöÜú╗╬¤Ô
1st character: 75 NG, 53 OK ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
</pre>
2nd..nth char: 65 NG, 63 OK 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
It is of course a completely different matter to determine what (utf8) unicode characters can be constructed from those bytes...
 
utf8 characters:
===============
good:48, bad:1111888
΀, Έ, Δ, Κ, Σ, λ, π, ψ, ϔ, Ϛ, ϣ, ϻ,  ,  , —, ‚, ‣, ※, ∀, ∈, ∔, √, ∣, ∻, ─, ┈, └, ┚, ┣, ┻, ⚀, ⚈, ⚔, ⚚, ⚣, ⚻, ⣀, ⣈, ⣔, ⣚, ⣣, ⣻, ⻀, ⻈, ⻔, ⻚, ⻣, ⻻,
</pre>
Note that ptok.e (part of the compiler) currently contains the following:
<lang Phix>charset[#80] = LETTER -- more unicode
Line 415 ⟶ 458:
charset[#CF] = LETTER
charset[#E2] = LETTER</lang>
If that is extended (with more utf-8 handling) then obviously the output will change.<br>
I am a little surprised at just how few ad-hoc utf8 characters have been supported so far.
 
=={{header|Python}}==
7,820

edits