On Thu, Jan 7, 2010 at 11:55 PM, Glyph Lefkowitz <glyph at twistedmatrix.com> wrote: > I'm saying that the BOM itself isn't enough to detect that the file is actually UTF-8. And I'm saying that it is, with as much certainty as we can ever guess the encoding of a file. > If (for whatever reason: explicitly specified, guessed in some other way) the file's encoding is determined to be something else, the bytes comprising the BOM should be decoded as normal. It's just that the UTF-8 decoding of the BOM at the start of a file should be "". Sure, a Latin-1-encoded file could start with the same pattern that is a UTF-8-encoded BOM. But at that point, a UTF-16-encoded file is also valid Latin-1. The question was in the context of encoding-guessing; if we're guessing, a UTF-8-encoded BOM cannot signify anything else but UTF-8. (Ditto for UTF-16 and UTF-32 BOMs.) -- --Guido van Rossum (python.org/~guido)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4