Le samedi 09 janvier 2010 02:12:28, MRAB a écrit : > What about listing the possible encodings? It would try each in turn > until it found one where the BOM matched or had no BOM: > > my_file = open(filename, 'r', encoding='UTF-8-sig|UTF-16|UTF-8') > > or is that taking it too far? Yes, you're taking it foo far :-) Checking BOM is reliable, whereas *guessing* the charset only using the byte stream can only be an heuristic. Guess a charset is a complex problem, they are 3rd party library to do that, like the chardet project. -- Victor Stinner http://www.haypocalc.com/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4