UTF-8 encode and decode: Difference between revisions

Content added Content deleted
(+perl)
m (Lingo added)
Line 777: Line 777:
€ EURO SIGN U+020AC E2 82 AC €
€ EURO SIGN U+020AC E2 82 AC €
𝄞 MUSICAL SYMBOL G CLEF U+1D11E F0 9D 84 9E 𝄞
𝄞 MUSICAL SYMBOL G CLEF U+1D11E F0 9D 84 9E 𝄞
</pre>

=={{header|Lingo}}==
Since UTF-8 is Lingo's native string encoding, and UTF-8 strings can be read into byteArrays (and v.v.), such UTF-8 encoding and decoding is built-in.<br />
<br />
Relevant Lingo functions are:<br />
charToNum(string): converts single-character string to unicode code point (int)<br />
numToChar(int): converts unicode code point (int) to single-character string<br />
byteArray(string): creates byte array of UTF-8 bytes for string<br />
byteArray.toHexString(start, length): returns hex string representation of byte array (e.g. for printing)<br />
<br />
Some simple demo code:
<lang Lingo>chars = ["A", "ö", "Ж", "€", "𝄞"]
put "Character Unicode (int) UTF-8 encoding (hex)"
repeat with c in chars
ba = bytearray(c)
put col(c, 12) & col(charToNum(c), 16) & ba.toHexString(1, ba.length)
end repeat</lang>
Helper function for table formatting
<lang Lingo>on col (val, len)
str = string(val)
repeat with i = str.length+1 to len
put " " after str
end repeat
return str
end</lang>
{{out}}
<pre>
-- "Character Unicode (int) UTF-8 encoding (hex)"
-- "A 65 41"
-- "ö 246 c3 b6"
-- "Ж 1046 d0 96"
-- "€ 8364 e2 82 ac"
-- "𝄞 119070 f0 9d 84 9e"
</pre>
</pre>