Jump to content

UTF-8 encode and decode: Difference between revisions

No edit summary
Line 1,164:
€ -> ["\xE2", "\x82", "\xAC"]
𝄞 -> ["\xF0", "\x9D", "\x84", "\x9E"]
</pre>
=={{header|Swift}}==
In Swift there's a difference between UnicodeScalar, which is a single unicode code point, and Character which may consist out of multiple UnicodeScalars, usually because of combining characters.
<lang Swift>import Foundation
 
func encode(_ scalar: UnicodeScalar) -> Data {
return Data(String(scalar).utf8)
}
 
func decode(_ data: Data) -> UnicodeScalar? {
guard let string = String(data: data, encoding: .utf8) else {
assertionFailure("Failed to convert data to a valid String")
return nil
}
assert(string.unicodeScalars.count == 1, "Data should contain one scalar!")
return string.unicodeScalars.first
}
 
for scalar in "AöЖ€𝄞".unicodeScalars {
let bytes = encode(scalar)
let formattedBytes = bytes.map({ String($0, radix: 16)}).joined(separator: " ")
let decoded = decode(bytes)!
print("character: \(decoded), code point: U+\(String(scalar.value, radix: 16)), \tutf-8: \(formattedBytes)")
}
</lang>
{{out}}
<pre>
character: A, code point: U+41, utf-8: 41
character: ö, code point: U+f6, utf-8: c3 b6
character: Ж, code point: U+416, utf-8: d0 96
character: €, code point: U+20ac, utf-8: e2 82 ac
character: 𝄞, code point: U+1d11e, utf-8: f0 9d 84 9e
</pre>
 
Anonymous user
Cookies help us deliver our services. By using our services, you agree to our use of cookies.