Jump to content

Read a file character by character/UTF8: Difference between revisions

Line 668:
 
=={{header|zkl}}==
zkl doesn't know much about UTF-8 or Unicode but is able to test whether a string or number is valid UTF-8 or not. This code uses that to build a state machine to decode a byte stream into UTF-8 characters. Implemented as zkl iterator, it can consume any stream-able (files, strings, lists, etc) and provides support for foreach, map, look ahead, push back, etc.
<lang zkl>fcn readUTF8c(chr,s=""){ // transform UTF-8 character stream
s+=b.textchr;
try{ s.len(8); return(s) }
catch{ if(s.len()>6) throw(__exception) } // 6 bytes max for UTF-8
return(Void.Again,s); // call me again with s & another character
}</lang>
Used to modify a zkl iterator, it can consume any stream-able (files, strings, lists, etc) and provides support for foreach, map, look ahead, push back, etc.
<lang zkl>fcn utf8Walker(obj){
obj.walker(03) // read bytescharacters
.tweak(fcn(b,s=""readUTF8c){ // and convert to UTF-8
s+=b.text;
try{ s.len(8); return(s) }
catch{ if(s.len()>6) throw(__exception) } // 6 bytes max for UTF-8
return(Void.Read,s); // read another bytes and pass back s
})
}</lang>
<lang zkl>s:="-->\u20AC123"; // --> e2,82,ac,31,32,33 == -->€123
Line 692 ⟶ 694:
ValueError : Invalid UTF-8 string
</pre>
If you wish to push a UTF-8 stream through one or more functions, you can ause similarthe same state machine (which, unfortunately, has a slightly different syntax):
<lang zkl>fcn readUTF8stream:=Data(charsVoid,s,"\n"){.howza(3); // transform UTF-8 character stream
<lang zkl>wstream.pump(3,List,readUTF8readUTF8c,"print")</lang>
s:=vm.arglist.concat();
try{ s.len(8); return(s) }
catch{ if(s.len()>6) throw(__exception) } // 6 bytes max for UTF-8
return(Void.Read,1,True,True); // read 1 char, complain on EoS, retry w/char appended to parameters
}</lang>
<lang zkl>w.pump(3,List,readUTF8,"print")</lang>
{{out}}<pre>-->€123</pre>
and returns a list of the eight UTF-8 characters (with newline).
Or, if file "foo.txt" contains the characters:
<lang zkl>File("foo.txt","rb").howza(3).pump(List,readUTF8c,"print");</lang>
produces the same result.
 
{{omit from|AWK}}
Anonymous user
Cookies help us deliver our services. By using our services, you agree to our use of cookies.