UTF-8 encode and decode: Difference between revisions

m
syntax highlighting fixup automation
m (syntax highlighting fixup automation)
Line 24:
{{trans|Python}}
 
<langsyntaxhighlight lang="11l">F unicode_code(ch)
R ‘U+’hex(ch.code).zfill(4)
 
Line 33:
V chars = [‘A’, ‘ö’, ‘Ж’, ‘€’]
L(char) chars
print(‘#<11 #<15 #<15’.format(char, unicode_code(char), utf8hex(char)))</langsyntaxhighlight>
 
{{out}}
Line 45:
 
=={{header|8th}}==
<langsyntaxhighlight lang="8th">
hex \ so bytes print nicely
 
Line 81:
 
bye
</syntaxhighlight>
</lang>
Output:<pre>
A 41
Line 97:
 
=={{header|Action!}}==
<langsyntaxhighlight Actionlang="action!">TYPE Unicode=[BYTE bc1,bc2,bc3]
BYTE ARRAY hex=['0 '1 '2 '3 '4 '5 '6 '7 '8 '9 'A 'B 'C 'D 'E 'F]
 
Line 246:
StrUnicode(res) PutE()
OD
RETURN</langsyntaxhighlight>
{{out}}
[https://gitlab.com/amarok8bit/action-rosetta-code/-/raw/master/images/UTF-8_encode_and_decode.png Screenshot from Atari 8-bit computer]
Line 269:
{{works with|Ada|Ada|2012}}
 
<langsyntaxhighlight Adalang="ada">with Ada.Strings.Fixed; use Ada.Strings.Fixed;
with Ada.Strings.UTF_Encoding.Wide_Wide_Strings;
with Ada.Integer_Text_IO;
Line 317:
end;
end loop;
end UTF8_Encode_And_Decode;</langsyntaxhighlight>
{{out}}
<pre>Character Unicode UTF-8 encoding (hex)
Line 334:
Despite this complexity, what actually gets generated is highly optimizable C code. Note that the following demonstration requires no ATS-specific library support whatsoever.
 
<syntaxhighlight lang="ats">(*
<lang ATS>(*
 
UTF-8 encoding and decoding in ATS2.
Line 2,111:
 
println! ("SUCCESS")
end</langsyntaxhighlight>
 
{{out}}
Line 2,118:
 
=={{header|AutoHotkey}}==
<langsyntaxhighlight AutoHotkeylang="autohotkey">Encode_UTF(hex){
Bytes := hex>=0x10000 ? 4 : hex>=0x0800 ? 3 : hex>=0x0080 ? 2 : hex>=0x0001 ? 1 : 0
Prefix := [0, 0xC0, 0xE0, 0xF0]
Line 2,152:
DllCall("msvcrt.dll\" v, "Int64", value, "Str", s, "UInt", OutputBase, "CDECL")
return s
}</langsyntaxhighlight>
Examples:<langsyntaxhighlight AutoHotkeylang="autohotkey">data =
(comment
0x0041
Line 2,168:
}
MsgBox % output
return</langsyntaxhighlight>
{{out}}
<pre>
Line 2,181:
=={{header|BaCon}}==
BaCon supports UTF8 natively.
<langsyntaxhighlight lang="bacon">DECLARE x TYPE STRING
 
CONST letter$ = "A ö Ж € 𝄞"
Line 2,190:
FOR x IN letter$
PRINT x, TAB$(1), "U+", HEX$(UCS(x)), TAB$(2), COIL$(LEN(x), HEX$(x[_-1] & 255))
NEXT</langsyntaxhighlight>
{{out}}
<pre>Char Unicode UTF-8 (hex)
Line 2,201:
 
=={{header|C}}==
<syntaxhighlight lang="c">
<lang C>
#include <stdio.h>
#include <stdlib.h>
Line 2,312:
return 0;
}
</syntaxhighlight>
</lang>
Output
<syntaxhighlight lang="text">
Character Unicode UTF-8 encoding (hex)
----------------------------------------
Line 2,323:
𝄞 U+1d11e f0 9d 84 9e
 
</syntaxhighlight>
</lang>
 
=={{header|C sharp|C#}}==
 
<langsyntaxhighlight lang="csharp">using System;
using System.Text;
 
Line 2,354:
20AC € E2-82-AC
1D11E 𝄞 F0-9D-84-9E */
</syntaxhighlight>
</lang>
 
 
Line 2,362:
Helper functions
 
<langsyntaxhighlight lang="lisp">
(defun ascii-byte-p (octet)
"Return t if octet is a single-byte 7-bit ASCII char.
Line 2,403:
when (= (nth i templates) (logand (nth i bitmasks) octet))
return i)))
</syntaxhighlight>
</lang>
 
Encoder
 
<langsyntaxhighlight lang="lisp">
(defun unicode-to-utf-8 (int)
"Take a unicode code point, return a list of one to four UTF-8 encoded bytes (octets)."
Line 2,438:
;; return the list of UTF-8 encoded bytes.
byte-list))
</syntaxhighlight>
</lang>
 
Decoder
 
<langsyntaxhighlight lang="lisp">
(defun utf-8-to-unicode (byte-list)
"Take a list of one to four utf-8 encoded bytes (octets), return a code point."
Line 2,462:
(error "calculated number of bytes doesnt match the length of the byte list")))
(error "first byte in the list isnt a lead byte"))))))
</syntaxhighlight>
</lang>
 
The test
 
<langsyntaxhighlight lang="lisp">
(defun test-utf-8 ()
"Return t if the chosen unicode points are encoded and decoded correctly."
Line 2,481:
;; return t if all are t
(every #'= unicodes-orig unicodes-test)))
</syntaxhighlight>
</lang>
 
Test output
 
<langsyntaxhighlight lang="lisp">
CL-USER> (test-utf-8)
character A, code point: 41, utf-8: 41
Line 2,493:
character 𝄞, code point: 1D11E, utf-8: F0 9D 84 9E
T
</syntaxhighlight>
</lang>
 
=={{header|D}}==
<langsyntaxhighlight Dlang="d">import std.conv;
import std.stdio;
 
Line 2,508:
writefln("%s %7X [%(%X, %)]", c, unicode, bytes);
}
}</langsyntaxhighlight>
 
{{out}}
Line 2,520:
=={{header|Elena}}==
ELENA 4.x :
<langsyntaxhighlight lang="elena">import system'routines;
import extensions;
Line 2,557:
"𝄞".printAsString().printAsUTF8Array().printAsUTF32();
console.printLine();
}</langsyntaxhighlight>
{{out}}
<pre>
Line 2,567:
 
=={{header|F_Sharp|F#}}==
<langsyntaxhighlight lang="fsharp">
// Unicode character point to UTF8. Nigel Galloway: March 19th., 2018
let fN g = match List.findIndex (fun n->n>g) [0x80;0x800;0x10000;0x110000] with
Line 2,574:
|2->[0xe0+(g&&&0xf000>>>12);0x80+(g&&&0xfc0>>>6);0x80+(g&&&0x3f)]
|_->[0xf0+(g&&&0x1c0000>>>18);0x80+(g&&&0x3f000>>>12);0x80+(g&&&0xfc0>>>6);0x80+(g&&&0x3f)]
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 2,590:
{{works with|gforth|0.7.9_20191121}}
{{works with|lxf|1.6-982-823}}
<langsyntaxhighlight lang="forth">: showbytes ( c-addr u -- )
over + swap ?do
i c@ 3 .r loop ;
Line 2,602:
\ can also be written as
\ 'A' test 'ö' test 'Ж' test '€' test '𝄞' test
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 2,614:
If you also want to see the implementation of <code>xc!+</code> and <code>xc@+</code>, here it is (<code>u8!+</code> is the UTF-8 implementation of <code>xc!+</code>, and likewise for <code>u8@+</code>):
 
<langsyntaxhighlight lang="forth">-77 Constant UTF-8-err
 
$80 Constant max-single-byte
Line 2,635:
REPEAT $7F xor 2* or r>
BEGIN over $80 u>= WHILE tuck c! 1+ REPEAT nip ;
</syntaxhighlight>
</lang>
 
=={{header|Go}}==
===Implementation===
This implementation is missing all checks for invalid data and so is not production-ready, but illustrates the basic UTF-8 encoding scheme.
<langsyntaxhighlight lang="go">package main
 
import (
Line 2,741:
rune(b[3]&mbMask)
}
}</langsyntaxhighlight>
{{out}}
<pre>
Line 2,752:
 
===Library/language===
<langsyntaxhighlight lang="go">package main
 
import (
Line 2,777:
fmt.Printf("%-7c U+%04X\t%-12X\t%c\n", codepoint, codepoint, encoded, decoded)
}
}</langsyntaxhighlight>
{{out}}
<pre>
Line 2,789:
 
Alternately:
<langsyntaxhighlight lang="go">package main
 
import (
Line 2,810:
fmt.Printf("%-7c U+%04X\t%-12X\t%c\n", codepoint, codepoint, encoded, decoded)
}
}</langsyntaxhighlight>
{{out}}
<pre>
Line 2,823:
=={{header|Groovy}}==
{{trans|Java}}
<langsyntaxhighlight lang="groovy">import java.nio.charset.StandardCharsets
 
class UTF8EncodeDecode {
Line 2,849:
}
}
}</langsyntaxhighlight>
{{out}}
<pre>Char Name Unicode UTF-8 encoded Decoded
Line 2,862:
Example makes use of [http://hackage.haskell.org/package/bytestring <tt>bytestring</tt>] and [http://hackage.haskell.org/package/text <tt>text</tt>] packages:
 
<langsyntaxhighlight lang="haskell">module Main (main) where
import qualified Data.ByteString as ByteString (pack, unpack)
Line 2,889:
(printf "U+%04X" codepoint :: String)
(intercalate " " (map (printf "%02X") values))
codepoint'</langsyntaxhighlight>
{{out}}
<pre>
Line 2,903:
=={{header|J}}==
'''Solution:'''
<langsyntaxhighlight lang="j">utf8=: 8&u: NB. converts to UTF-8 from unicode or unicode codepoint integer
ucp=: 9&u: NB. converts to unicode from UTF-8 or unicode codepoint integer
ucp_hex=: hfd@(3 u: ucp) NB. converts to unicode codepoint hexadecimal from UTF-8, unicode or unicode codepoint integer</langsyntaxhighlight>
 
'''Examples:'''
<langsyntaxhighlight lang="j"> utf8 65 246 1046 8364 119070
AöЖ€𝄞
ucp 65 246 1046 8364 119070
Line 2,923:
1d11e
utf8@dfh ucp_hex utf8 65 246 1046 8364 119070
AöЖ€𝄞</langsyntaxhighlight>
 
=={{header|Java}}==
{{works with|Java|7+}}
<langsyntaxhighlight lang="java">import java.nio.charset.StandardCharsets;
import java.util.Formatter;
 
Line 2,956:
}
}
}</langsyntaxhighlight>
{{out}}
<pre>
Line 2,969:
=={{header|JavaScript}}==
An implementation in ECMAScript 2015 (ES6):
<langsyntaxhighlight lang="javascript">
/***************************************************************************\
|* Pure UTF-8 handling without detailed error reporting functionality. *|
Line 3,029:
?( m&0x07)<<18|( n&0x3f)<<12|( o&0x3f)<<6|( p&0x3f)<<0
:(()=>{throw'Invalid UTF-8 encoding!'})()
</syntaxhighlight>
</lang>
The testing inputs:
<langsyntaxhighlight lang="javascript">
const
str=
Line 3,050:
:[ [ a,b,c]]
,inputs=zip3(str,cps,cus);
</syntaxhighlight>
</lang>
The testing code:
<langsyntaxhighlight lang="javascript">
console.log(`\
${'Character'.padEnd(16)}\
Line 3,068:
${`[${[...utf8encode(cp)].map(n=>n.toString(0x10))}]`.padEnd(16)}\
${utf8decode(cu).toString(0x10).padStart(8,'U+000000')}`)
</syntaxhighlight>
</lang>
and finally, the output from the test:
<pre>
Line 3,084:
 
'''Preliminaries'''
<langsyntaxhighlight lang="jq"># input: a decimal integer
# output: the corresponding binary array, most significant bit first
def binary_digits:
Line 3,098:
.result += .power * $b
| .power *= 2)
| .result;</langsyntaxhighlight>
'''Encode to UTF-8'''
<langsyntaxhighlight lang="jq"># input: an array of decimal integers representing the utf-8 bytes of a Unicode codepoint.
# output: the corresponding decimal number of that codepoint.
def utf8_encode:
Line 3,120:
end
| map(binary_to_decimal)
end;</langsyntaxhighlight>
'''Decode an array of UTF-8 bytes'''
<langsyntaxhighlight lang="jq"># input: an array of decimal integers representing the utf-8 bytes of a Unicode codepoint.
# output: the corresponding decimal number of that codepoint.
def utf8_decode:
Line 3,140:
else $d[0][-3:] + $d[1][$mb:] + $d[2][$mb:] + $d[3][$mb:]
end
| binary_to_decimal ;</langsyntaxhighlight>
'''Task'''
<langsyntaxhighlight lang="jq">def task:
[ "A", "ö", "Ж", "€", "𝄞" ][]
| . as $glyph
Line 3,150:
| "Glyph \($glyph) => \($encoded) => \($decoded) => \([$decoded]|implode)" ;
 
task</langsyntaxhighlight>
{{out}}
<pre>
Line 3,165:
Julia supports by default UTF-8 encoding.
 
<langsyntaxhighlight lang="julia">for t in ("A", "ö", "Ж", "€", "𝄞")
enc = Vector{UInt8}(t)
dec = String(enc)
println(dec, " → ", enc)
end</langsyntaxhighlight>
 
{{out}}
Line 3,179:
 
=={{header|Kotlin}}==
<langsyntaxhighlight lang="scala">// version 1.1.2
 
fun utf8Encode(codePoint: Int) = String(intArrayOf(codePoint), 0, 1).toByteArray(Charsets.UTF_8)
Line 3,198:
System.out.printf("%-${n}s %c\n", s, decoded)
}
}</langsyntaxhighlight>
 
{{out}}
Line 3,212:
=={{header|langur}}==
{{works with|langur|0.8.4}}
<langsyntaxhighlight lang="langur">writeln "character Unicode UTF-8 encoding (hex)"
 
for .cp in "AöЖ€𝄞" {
Line 3,219:
val .utf8rep = join " ", map f $"\.b:X02;", .utf8
writeln $"\.cpstr:-11; U+\.cp:X04:-8; \.utf8rep;"
}</langsyntaxhighlight>
 
{{out}}
Line 3,237:
- byteArray.toHexString (intStart, intLen): returns hex string representation of byte array (e.g. for printing)<br />
- byteArray.readRawString (intLen, [strCharSet="UTF-8"]): reads a fixed number of bytes as a string
<langsyntaxhighlight Lingolang="lingo">chars = ["A", "ö", "Ж", "€", "𝄞"]
put "Character Unicode (int) UTF-8 (hex) Decoded"
repeat with c in chars
ba = bytearray(c)
put col(c, 12) & col(charToNum(c), 16) & col(ba.toHexString(1, ba.length), 14) & ba.readRawString(ba.length)
end repeat</langsyntaxhighlight>
Helper function for table formatting
<langsyntaxhighlight Lingolang="lingo">on col (val, len)
str = string(val)
repeat with i = str.length+1 to len
Line 3,250:
end repeat
return str
end</langsyntaxhighlight>
{{out}}
<pre>
Line 3,263:
=={{header|Lua}}==
{{works with|Lua|5.3}}
<syntaxhighlight lang="lua">
<lang Lua>
-- Accept an integer representing a codepoint.
-- Return the values of the individual octets.
Line 3,298:
end
end
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 3,319:
 
=={{header|M2000 Interpreter}}==
<syntaxhighlight lang="m2000 interpreter">
<lang M2000 Interpreter>
Module EncodeDecodeUTF8 {
a$=string$("Hello" as UTF8enc)
Line 3,339:
}
EncodeDecodeUTF8
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 3,354:
 
=={{header|Mathematica}}/{{header|Wolfram Language}}==
<langsyntaxhighlight Mathematicalang="mathematica">utf = ToCharacterCode[ToString["AöЖ€", CharacterEncoding -> "UTF8"]]
ToCharacterCode[FromCharacterCode[utf, "UTF8"]]</langsyntaxhighlight>
{{out}}
<pre>{65, 195, 182, 208, 150, 226, 130, 172}
Line 3,367:
For this purpose, using sequences or bytes is not natural. Here is a way to proceed using the module “unicode”.
 
<langsyntaxhighlight Nimlang="nim">import unicode, sequtils, strformat, strutils
 
const UChars = ["\u0041", "\u00F6", "\u0416", "\u20AC", "\u{1D11E}"]
Line 3,387:
r = s.toRune
# Display.
echo &"""{uchar:>5} U+{r.int.toHex(5)} {s.map(toHex).join(" ")}"""</langsyntaxhighlight>
 
{{out}}
Line 3,400:
In this section, we provide two procedures to convert a Unicode code point to a UTF-8 sequence of bytes and conversely, without using the module “unicode”. We provide also a procedure to convert a sequence of bytes to a string in order to print it. The algorithm is the one used by the Go solution.
 
<langsyntaxhighlight Nimlang="nim">import sequtils, strformat, strutils
 
const
Line 3,472:
# Display.
echo &"""{s.toString:>5} U+{c.int.toHex(5)} {s.map(toHex).join(" ")}"""
</syntaxhighlight>
</lang>
 
{{out}}
Line 3,478:
 
=={{header|Perl}}==
<langsyntaxhighlight lang="perl">#!/usr/bin/perl
use strict;
use warnings;
Line 3,497:
} split //, $utf8;
print "\n";
} @chars;</langsyntaxhighlight>
 
{{out}}
Line 3,515:
As requested in the task description:
 
<!--<langsyntaxhighlight Phixlang="phix">-->
<span style="color: #008080;">constant</span> <span style="color: #000000;">tests</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">#0041</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">#00F6</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">#0416</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">#20AC</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">#1D11E</span><span style="color: #0000FF;">}</span>
Line 3,528:
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"#%04x -> {%s} -> {%s}\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">codepoint</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">hex</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"#%02x"</span><span style="color: #0000FF;">),</span><span style="color: #000000;">hex</span><span style="color: #0000FF;">(</span><span style="color: #000000;">r</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"#%04x"</span><span style="color: #0000FF;">)})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<!--</langsyntaxhighlight>-->
 
{{out}}
Line 3,540:
 
=={{header|Processing}}==
<langsyntaxhighlight lang="java">import java.nio.charset.StandardCharsets;
 
Integer[] code_points = {0x0041, 0x00F6, 0x0416, 0x20AC, 0x1D11E};
Line 3,567:
tel_1 += 30; tel_2 = 50;
}
}</langsyntaxhighlight>
 
 
Line 3,574:
The encoding and decoding procedure are kept simple and designed to work with an array of 5 elements for input/output of the UTF-8 encoding for a single code point at a time. It was decided not to use a more elaborate example that would have been able to operate on a buffer to encode/decode more than one code point at a time.
 
<langsyntaxhighlight lang="purebasic">#UTF8_codePointMaxByteCount = 4 ;UTF-8 encoding uses only a maximum of 4 bytes to encode a codepoint
 
Procedure UTF8_encode(x, Array encoded_codepoint.a(1)) ;x is codepoint to encode, the array will contain output
Line 3,685:
Print(#CRLF$ + #CRLF$ + "Press ENTER to exit"): Input()
CloseConsole()
EndIf</langsyntaxhighlight>
Sample output:
<pre> Unicode UTF-8 Decoded
Line 3,698:
 
=={{header|Python}}==
<langsyntaxhighlight lang="python">
#!/usr/bin/env python3
from unicodedata import name
Line 3,715:
chars = ['A', 'ö', 'Ж', '€', '𝄞']
for char in chars:
print('{:<11} {:<36} {:<15} {:<15}'.format(char, name(char), unicode_code(char), utf8hex(char)))</langsyntaxhighlight>
{{out}}
<pre>Character Name Unicode UTF-8 encoding (hex)
Line 3,725:
 
=={{header|Racket}}==
<langsyntaxhighlight lang="racket">#lang racket
 
(define char-map
Line 3,741:
(map (curryr number->string 16) bites)
(bytes->string/utf-8 (list->bytes bites))
name)))</langsyntaxhighlight>
{{out}}
<pre>#\A A (41) A LATIN-CAPITAL-LETTER-A
Line 3,753:
{{works with|Rakudo|2017.02}}
Pretty much all built in to the language.
<syntaxhighlight lang="raku" perl6line>say sprintf("%-18s %-36s|%8s| %7s |%14s | %s\n", 'Character|', 'Name', 'Ordinal', 'Unicode', 'UTF-8 encoded', 'decoded'), '-' x 100;
 
for < A ö Ж € 𝄞 😜 👨‍👩‍👧‍👦> -> $char {
printf " %-5s | %-43s | %6s | %-7s | %12s |%4s\n", $char, $char.uninames.join(','), $char.ords.join(' '),
('U+' X~ $char.ords».base(16)).join(' '), $char.encode('UTF8').list».base(16).Str, $char.encode('UTF8').decode;
}</langsyntaxhighlight>
{{out}}
<pre>Character| Name | Ordinal| Unicode | UTF-8 encoded | decoded
Line 3,773:
=={{header|Ruby}}==
 
<langsyntaxhighlight lang="ruby">
character_arr = ["A","ö","Ж","€","𝄞"]
for c in character_arr do
Line 3,781:
puts ""
end
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 3,807:
 
=={{header|Rust}}==
<langsyntaxhighlight lang="rust">fn main() {
let chars = vec!('A', 'ö', 'Ж', '€', '𝄞');
chars.iter().for_each(|c| {
Line 3,817:
});
}
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 3,829:
=={{header|Scala}}==
=== Imperative solution===
<langsyntaxhighlight lang="scala">object UTF8EncodeAndDecode extends App {
 
val codePoints = Seq(0x0041, 0x00F6, 0x0416, 0x20AC, 0x1D11E)
Line 3,851:
printf(s"%-${w}c %-36s %-7s %-${16 - w}s%c%n",
codePoint, Character.getName(codePoint), leftAlignedHex, s, utf8Decode(bytes))
}</langsyntaxhighlight>
=== Functional solution===
<langsyntaxhighlight lang="scala">import java.nio.charset.StandardCharsets
 
object UTF8EncodeAndDecode extends App {
Line 3,878:
 
println(s"\nSuccessfully completed without errors. [total ${scala.compat.Platform.currentTime - executionStart} ms]")
}</langsyntaxhighlight>
=== Composable and testable solution===
<langsyntaxhighlight lang="scala">package example
 
object UTF8EncodeAndDecode extends TheMeat with App {
Line 3,913:
 
}
</syntaxhighlight>
</lang>
 
=={{header|Seed7}}==
<langsyntaxhighlight lang="seed7">$ include "seed7_05.s7i";
include "unicode.s7i";
include "console.s7i";
Line 3,934:
hex(utf8) rpad 22 <& utf8ToStri(utf8));
end for;
end func;</langsyntaxhighlight>
 
{{out}}
Line 3,948:
 
=={{header|Sidef}}==
<langsyntaxhighlight lang="ruby">func utf8_encoder(Number code) {
code.chr.encode('UTF-8').bytes.map{.chr}
}
Line 3,961:
assert_eq(n, decoded.ord)
say "#{decoded} -> #{encoded}"
}</langsyntaxhighlight>
{{out}}
<pre>
Line 3,973:
=={{header|Swift}}==
In Swift there's a difference between UnicodeScalar, which is a single unicode code point, and Character which may consist out of multiple UnicodeScalars, usually because of combining characters.
<langsyntaxhighlight Swiftlang="swift">import Foundation
 
func encode(_ scalar: UnicodeScalar) -> Data {
Line 3,994:
print("character: \(decoded), code point: U+\(String(scalar.value, radix: 16)), \tutf-8: \(formattedBytes)")
}
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 4,006:
=={{header|Tcl}}==
Note: Tcl can handle Unicodes only up to U+FFFD, i.e. the Basic Multilingual Plane (BMP, 16 bits wide). Therefore, the fifth test fails as expected.
<langsyntaxhighlight Tcllang="tcl">proc encoder int {
set u [format %c $int]
set bytes {}
Line 4,025:
lappend res [encoder $test] -> [decoder [encoder $test]]
puts $res
}</langsyntaxhighlight>
<pre>
0x0041 41 -> A
Line 4,037:
While perhaps not as readable as the above, this version handles beyond-BMP codepoints by manually composing the utf-8 byte sequences and emitting raw bytes to the console. <tt>encoding convertto utf-8</tt> command still does the heavy lifting where it can.
 
<langsyntaxhighlight Tcllang="tcl">proc utf8 {codepoint} {
scan $codepoint %llx cp
if {$cp < 0x10000} {
Line 4,064:
set utf8 [utf8 $codepoint]
puts "[format U+%04s $codepoint]\t$utf8\t[hexchars $utf8]"
}</langsyntaxhighlight>
 
{{out}}<pre>U+0041 A 41
Line 4,074:
 
=={{header|VBA}}==
<langsyntaxhighlight VBlang="vb">Private Function unicode_2_utf8(x As Long) As Byte()
Dim y() As Byte
Dim r As Long
Line 4,195:
Debug.Print String$(8 - Len(s), " "); s
Next cpi
End Sub</langsyntaxhighlight>{{out}}<pre>ch unicode UTF-8 encoded decoded
A 41 41 41
ö F6 C3 B6 F6
Line 4,204:
 
=={{header|Vlang}}==
<langsyntaxhighlight lang="vlang">import encoding.hex
fn decode(s string) ?[]u8 {
return hex.decode(s)
Line 4,216:
println("${codepoint:-7} U+${codepoint:04X}\t${encoded:-12}\t${decoded.bytestr()}")
}
}</langsyntaxhighlight>
{{out}}
<pre>Char Unicode UTF-8 encoded Decoded
Line 4,228:
=={{header|Wren}}==
The utf8_decode function was translated from the Go entry.
<langsyntaxhighlight lang="ecmascript">import "/fmt" for Fmt
 
var utf8_encode = Fn.new { |cp| String.fromCodePoint(cp).bytes.toList }
Line 4,267:
var uni = String.fromCodePoint(cp2)
System.print("%(Fmt.s(-11, uni)) %(Fmt.s(-37, test[0])) U+%(Fmt.s(-8, Fmt.Xz(4, cp2))) %(utf8)")
}</langsyntaxhighlight>
 
{{out}}
Line 4,281:
 
=={{header|zkl}}==
<langsyntaxhighlight lang="zkl">println("Char Unicode UTF-8");
foreach utf,unicode_int in (T( T("\U41;",0x41), T("\Uf6;",0xf6),
T("\U416;",0x416), T("\U20AC;",0x20ac), T("\U1D11E;",0x1d11e))){
Line 4,290:
 
println("%s %s %9s %x".fmt(char,char2,"U+%x".fmt(unicode_int),utf_int));
}</langsyntaxhighlight>
Int.len() --> number of bytes in int. This could be hard coded because UTF-8
has a max of 6 bytes and (0x41).toBigEndian(6) --> 0x41,0,0,0,0,0 which is
10,333

edits