Read a file character by character/UTF8: Difference between revisions

Content added Content deleted
(→‎{{header|zkl}}: Added example)
(J)
Line 124: Line 124:
}
}
}</lang>
}</lang>

=={{header|J}}==

Reading a file a character at a time is antithetical not only to the architecture of J, but to the architecture and design of most computers and most file systems. Nevertheless, this can be a useful concept if you're building your own hardware. So let's model it...

First, we know that the first 8-bit value in a utf-8 sequence tells us the length of the sequence needed to represent that character. Specifically: we can convert that value to binary, and count the number of leading 1s to find the length of the character (except the length is always at least 1 character long).

<lang J>u8len=: 1 >. 0 i.~ (8#2)#:a.&i.</lang>

So now, we can use indexed file read to read a utf-8 character starting at a specific file index. What we do is read the first octet and then read as many additional characters as we need based on whatever we started with. If that's not possible, we will return EOF:

<lang J>indexedread1u8=:4 :0
try.
octet0=. 1!:11 y;x,1
octet0,1!:11 y;(x+1),<:u8len octet0
catch.
'EOF'
end.
)</lang J>

The length of the result tells us what to add to the file index to find the next available file index for reading.

Of course, this is massively inefficient. So if someone ever asks you to do this, make sure you ask them "Why?" Because the answer to that question is going to be important (and might suggest a completely different implementation).

Note also that it would make more sense to return an empty string, instead of the string 'EOF', when we reach the end of the file. But that is out of scope for this task.


=={{header|Java}}==
=={{header|Java}}==