UTF-8 encode and decode: Difference between revisions
Content added Content deleted
Line 781: | Line 781: | ||
=={{header|Lingo}}== |
=={{header|Lingo}}== |
||
Since UTF-8 is Lingo's native string encoding, and UTF-8 strings can be read into byteArrays (and v.v.), such UTF-8 encoding and decoding is built-in.<br /> |
Since UTF-8 is Lingo's native string encoding, and UTF-8 strings can be read into byteArrays (and v.v.), such UTF-8 encoding and decoding is built-in.<br /> |
||
<br /> |
|||
Relevant Lingo functions are:<br /> |
Relevant Lingo functions are:<br /> |
||
charToNum(string): converts single-character string to unicode code point (int)<br /> |
|||
numToChar(int): converts unicode code point (int) to single-character string<br /> |
|||
byteArray(string): creates byte array of UTF-8 bytes for string<br /> |
|||
byteArray.toHexString(start, length): returns hex string representation of byte array (e.g. for printing) |
|||
Some simple demo code: |
|||
<lang Lingo>chars = ["A", "ö", "Ж", "€", "𝄞"] |
<lang Lingo>chars = ["A", "ö", "Ж", "€", "𝄞"] |
||
put "Character Unicode (int) UTF-8 |
put "Character Unicode (int) UTF-8 (hex) Decoded" |
||
repeat with c in chars |
repeat with c in chars |
||
ba = bytearray(c) |
ba = bytearray(c) |
||
put col(c, 12) & col(charToNum(c), 16) & ba.toHexString(1, ba.length) |
put col(c, 12) & col(charToNum(c), 16) & col(ba.toHexString(1, ba.length), 14) & ba.readRawString(ba.length) |
||
end repeat</lang> |
end repeat</lang> |
||
Helper function for table formatting |
Helper function for table formatting |
||
Line 803: | Line 803: | ||
{{out}} |
{{out}} |
||
<pre> |
<pre> |
||
-- "Character Unicode (int) UTF-8 |
-- "Character Unicode (int) UTF-8 (hex) Decoded" |
||
-- "A 65 41" |
-- "A 65 41 A" |
||
-- "ö 246 c3 b6" |
-- "ö 246 c3 b6 ö" |
||
-- "Ж 1046 d0 96" |
-- "Ж 1046 d0 96 Ж" |
||
-- "€ 8364 e2 82 ac" |
-- "€ 8364 e2 82 ac €" |
||
-- "𝄞 119070 f0 9d 84 9e" |
-- "𝄞 119070 f0 9d 84 9e 𝄞" |
||
</pre> |
</pre> |
||