Walter Dörwald wrote: > The stateful decoder has a little problem: At least three bytes > have to be available from the stream until the StreamReader > decides whether these bytes are a BOM that has to be skipped. > This means that if the file only contains "ab", the user will > never see these two characters. This can be improved, of course: If the first byte is "a", it most definitely is *not* an UTF-8 signature. So we only need a second byte for the characters between U+F000 and U+FFFF, and a third byte only for the characters U+FEC0...U+FEFF. But with the first byte being \xef, we need three bytes *anyway*, so we can always decide with the first byte only whether we need to wait for three bytes. > A solution for this would be to add an argument named final to > the decode and read methods that tells the decoder that the > stream has ended and the remaining buffered bytes have to be > handled now. Shouldn't an empty read from the underlying stream be taken as an EOF? Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4