UTF-8 encode and decode: Difference between revisions

m
Line 781:
=={{header|Lingo}}==
Since UTF-8 is Lingo's native string encoding, and UTF-8 strings can be read into byteArrays (and v.v.), such UTF-8 encoding and decoding is built-in.<br />
<br />
Relevant Lingo functions are:<br />
- charToNum(string): converts single-character string to unicode code point (int)<br />
- numToChar(int): converts unicode code point (int) to single-character string<br />
- byteArray(string): creates byte array of UTF-8 bytes for string<br />
- byteArray.toHexString(start, length): returns hex string representation of byte array (e.g. for printing)<br />
Some simple demo code:
<lang Lingo>chars = ["A", "ö", "Ж", "€", "𝄞"]
put "Character Unicode (int) UTF-8 encoding (hex) Decoded"
repeat with c in chars
ba = bytearray(c)
put col(c, 12) & col(charToNum(c), 16) & col(ba.toHexString(1, ba.length), 14) & ba.readRawString(ba.length)
end repeat</lang>
Helper function for table formatting
Line 803:
{{out}}
<pre>
-- "Character Unicode (int) UTF-8 encoding (hex) Decoded"
-- "A 65 41 A"
-- "ö 246 c3 b6 ö"
-- "Ж 1046 d0 96 Ж"
-- "€ 8364 e2 82 ac"
-- "𝄞 119070 f0 9d 84 9e 𝄞"
</pre>
 
Anonymous user