M.-A. Lemburg wrote: >> [...] >>With the UTF-8-SIG codec, it would apply to all operation modes of >>the codec, whether stream-based or from strings. Whether or not to >>use the codec would be the application's choice. > > I'd suggest to use the same mode of operation as we have in > the UTF-16 codec: it removes the BOM mark on the first call > to the StreamReader .decode() method and writes a BOM mark > on the first call to .encode() on a StreamWriter. > > Note that the UTF-16 codec is strict w/r to the presence > of the BOM mark: you get a UnicodeError if a stream does > not start with a BOM mark. For the UTF-8-SIG codec, this > should probably be relaxed to not require the BOM. I've started writing such a codec. Making the BOM optional on decoding definitely simplifies the implementation. Bye, Walter Dörwald
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4