Anonymous user
Read a file character by character/UTF8: Difference between revisions
Read a file character by character/UTF8 (view source)
Revision as of 00:16, 1 April 2015
, 9 years ago→{{header|zkl}}: rewrite
m (→{{header|J}}) |
(→{{header|zkl}}: rewrite) |
||
Line 668:
=={{header|zkl}}==
zkl doesn't know much about UTF-8 or Unicode but is able to test whether a string or number is valid UTF-8 or not. This code uses that to build a state machine to decode a byte stream into UTF-8 characters
<lang zkl>fcn readUTF8c(chr,s=""){ // transform UTF-8 character stream
try{ s.len(8); return(s) }▼
catch{ if(s.len()>6) throw(__exception) } // 6 bytes max for UTF-8▼
return(Void.Again,s); // call me again with s & another character
}</lang>▼
Used to modify a zkl iterator, it can consume any stream-able (files, strings, lists, etc) and provides support for foreach, map, look ahead, push back, etc.
<lang zkl>fcn utf8Walker(obj){
obj.walker(
.tweak(
▲ s+=b.text;
}</lang>
<lang zkl>s:="-->\u20AC123"; // --> e2,82,ac,31,32,33 == -->€123
Line 692 ⟶ 694:
ValueError : Invalid UTF-8 string
</pre>
If you wish to push a UTF-8 stream through one or more functions, you can
<lang zkl>
▲ try{ s.len(8); return(s) }
▲ catch{ if(s.len()>6) throw(__exception) } // 6 bytes max for UTF-8
▲}</lang>
▲<lang zkl>w.pump(3,List,readUTF8,"print")</lang>
{{out}}<pre>-->€123</pre>
and returns a list of the eight UTF-8 characters (with newline).
Or, if file "foo.txt" contains the characters:
<lang zkl>File("foo.txt","rb").howza(3).pump(List,readUTF8c,"print");</lang>
produces the same result.
{{omit from|AWK}}
|