On 2014-01-09 00:07, Ben Finney wrote: > Kristján Valur Jónsson <kristjan at ccpgames.com> writes: > >> Believe it or not, sometimes you really don't care about encodings. >> Sometimes you just want to parse text files. > > Files don't contain text, they contain bytes. Bytes only become text > when filtered through the correct encoding. > > Python should not guess the encoding if it's unknown. Without the right > encoding, you don't get text, you get partial or complete gibberish. > > So, if what you want is to parse text and not get gibberish, you need to > *tell* Python what the encoding is. That's a brute fact of the world of > text in computing. > >> Python 3 forces you to think about abstract concepts like encodings >> when all you want is to open that .txt file on the drive and extract >> some phone numbers and merge in some email addresses. What encoding >> does the file have? Do I care? Must I care? > > Yes, you must. > >> Python forcing you to think about this is like the cashier at the >> hardware store who won't let you buy the hammer you brought to the >> cash register because you don't know what wood its handle is made of. > > The cashier is making a mistake: the hammer, regardless of the wood in > the handle, still functions just fine as a hammer. Hence, the question > is unimportant to the purpose. > On the other hand: "I need a new battery." "What kind of battery?" "I don't care!" > The same is not true of changing the encoding for text. The encoding > matters, and the programmer needs to care. >
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4