UTF-8 encode and decode: Difference between revisions
Content added Content deleted
(+perl) |
m (Lingo added) |
||
Line 777: | Line 777: | ||
€ EURO SIGN U+020AC E2 82 AC € |
€ EURO SIGN U+020AC E2 82 AC € |
||
𝄞 MUSICAL SYMBOL G CLEF U+1D11E F0 9D 84 9E 𝄞 |
𝄞 MUSICAL SYMBOL G CLEF U+1D11E F0 9D 84 9E 𝄞 |
||
</pre> |
|||
=={{header|Lingo}}== |
|||
Since UTF-8 is Lingo's native string encoding, and UTF-8 strings can be read into byteArrays (and v.v.), such UTF-8 encoding and decoding is built-in.<br /> |
|||
<br /> |
|||
Relevant Lingo functions are:<br /> |
|||
charToNum(string): converts single-character string to unicode code point (int)<br /> |
|||
numToChar(int): converts unicode code point (int) to single-character string<br /> |
|||
byteArray(string): creates byte array of UTF-8 bytes for string<br /> |
|||
byteArray.toHexString(start, length): returns hex string representation of byte array (e.g. for printing)<br /> |
|||
<br /> |
|||
Some simple demo code: |
|||
<lang Lingo>chars = ["A", "ö", "Ж", "€", "𝄞"] |
|||
put "Character Unicode (int) UTF-8 encoding (hex)" |
|||
repeat with c in chars |
|||
ba = bytearray(c) |
|||
put col(c, 12) & col(charToNum(c), 16) & ba.toHexString(1, ba.length) |
|||
end repeat</lang> |
|||
Helper function for table formatting |
|||
<lang Lingo>on col (val, len) |
|||
str = string(val) |
|||
repeat with i = str.length+1 to len |
|||
put " " after str |
|||
end repeat |
|||
return str |
|||
end</lang> |
|||
{{out}} |
|||
<pre> |
|||
-- "Character Unicode (int) UTF-8 encoding (hex)" |
|||
-- "A 65 41" |
|||
-- "ö 246 c3 b6" |
|||
-- "Ж 1046 d0 96" |
|||
-- "€ 8364 e2 82 ac" |
|||
-- "𝄞 119070 f0 9d 84 9e" |
|||
</pre> |
</pre> |
||