Read a file character by character/UTF8: Difference between revisions

added REXX version 2 (show my understanding of this task)
m (→‎{{header|REXX}}: added whitespace to a REXX statement. -- ~~~~)
(added REXX version 2 (show my understanding of this task))
Line 56:
</lang>
=={{header|REXX}}==
===version 1===
REXX doesn't support UTF8 encoded wide characters, just bytes.
<br><br>The task's requirement stated that '''EOF''' was to be returned upon reaching the end-of-file, so this programming example was written as a subroutine (procedure).
Line 95 ⟶ 96:
18 character, (hex,char) 454F46 EOF
</pre>
===version 2===
<lang rexx>/* REXX ---------------------------------------------------------------
* 29.12.2013 Walter Pachl
* read one utf8 character at a time
* see http://de.wikipedia.org/wiki/UTF-8#Kodierung
* sorry this is in German but the encoding table should be obvious
*--------------------------------------------------------------------*/
oid='utf8.txt';'erase' oid /* first create file containing utf8 chars*/
Call charout oid,'79'x
Call charout oid,'C3A4'x
Call charout oid,'C2AE'x
Call charout oid,'E282AC'x
Call charout oid,'F09D849E'x
Call lineout oid
fid='utf8.txt' /* then read it and show the contents */
Do While chars(fid)>0
c8=get_utf8char(fid)
Say left(c8,4) c2x(c8)
End
Say 'EOF'
Exit
 
get_utf8char: Procedure
Parse Arg f
c=charin(f)
b=c2b(c)
If left(b,1)=0 Then
Nop
Else Do
p=pos('0',b)
Do i=1 To p-2
If chars(f)=0 Then Do
Say 'illegal contents in file' f
Leave
End
c=c||charin(f)
End
End
Return c
 
c2b: Return x2b(c2x(arg(1)))</lang>
output:
<pre>y 79
ä C3A4
® C2AE
€ E282AC
𝄞 F09D849E
EOF</pre>
 
=={{header|Ruby}}==
2,295

edits