> On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner > <victor.stinner at haypocalc.com> wrote: >> Hi, >> >> Builtin open() function is unable to open an UTF-16/32 file starting with a >> BOM if the encoding is not specified (raise an unicode error). For an UTF-8 >> file starting with a BOM, read()/readline() returns also the BOM whereas the >> BOM should be "ignored". >> [...] > I had similar issues too (please read below ;o) ... On Thu, Jan 7, 2010 at 7:52 PM, Guido van Rossum <guido at python.org> wrote: > I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy > talk. And for the other two, perhaps it would make more sense to have > a separate encoding-guessing function that takes a binary stream and > returns a text stream wrapping it with the proper encoding? > About guessing the encoding, I experienced this issue while I was developing a Trac plugin. What I was doing is as follows : - I guessed the MIME type + charset encoding using Trac MIME API (it was a CSV file encoded using UTF-16) - I read the file using `open` - Then wrapped the file using `codecs.EncodedFile` - Then used `csv.reader` ... and still get the BOM in the first value of the first row in the CSV file. {{{ #!python >>> mimetype 'utf-16-le' >>> ef = EncodedFile(f, 'utf-8', mimetype) }}} IMO I think I am +1 for leaving `open` just like it is, and use module `codecs` to deal with encodings, but I am strongly -1 for returning the BOM while using `EncodedFile` (mainly because encoding is explicitly supplied in ;o) > --Guido > CMIIW anyway ... -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4