Guido van Rossum wrote: > I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy > talk. And for the other two, perhaps it would make more sense to have > a separate encoding-guessing function that takes a binary stream and > returns a text stream wrapping it with the proper encoding? > Alternatively, have a universal UTF-8/16/32 encoding, ie one that expects UTF-8, with or without BOM, or UTF-16/32 with BOM. > > On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner > <victor.stinner at haypocalc.com> wrote: >> Hi, >> >> Builtin open() function is unable to open an UTF-16/32 file starting with a >> BOM if the encoding is not specified (raise an unicode error). For an UTF-8 >> file starting with a BOM, read()/readline() returns also the BOM whereas the >> BOM should be "ignored". >> >> See recent issues related to reading an UTF-8 text file including a BOM: #7185 >> (csv) and #7519 (ConfigParser). Such file can be opened in unicode mode with >> the UTF-8-SIG encoding, but it's possible to do better. >> >> I propose to improve open() (TextIOWrapper) by using the BOM to choose the >> right encoding. I think that only files opened in read only mode should >> support this new feature. *Read* the BOM in a *write* only file would cause >> unexpected behaviours. >> >> Since my proposition changes the result TextIOWrapper.read()/readline() for >> files starting with a BOM, we might introduce an option to open() to enable >> the new behaviour. But is it really needed to keep the backward compatibility? >> >> I wrote a proof of concept attached to the issue #7651. My patch only changes >> the behaviour of TextIOWrapper for reading files starting with a BOM. It >> doesn't work yet if a seek() is used before the first read. >>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4