On 9 January 2014 10:07, Ben Finney <ben+python at benfinney.id.au> wrote: > Kristján Valur Jónsson <kristjan at ccpgames.com> writes: > >> Believe it or not, sometimes you really don't care about encodings. >> Sometimes you just want to parse text files. > > Files don't contain text, they contain bytes. Bytes only become text > when filtered through the correct encoding. > > Python should not guess the encoding if it's unknown. Without the right > encoding, you don't get text, you get partial or complete gibberish. > > So, if what you want is to parse text and not get gibberish, you need to > *tell* Python what the encoding is. That's a brute fact of the world of > text in computing. Set the mode to "rb", process it as binary. Done. See http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html for details. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4