On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote: > Similarly I'd expect (from experience) that a programmer using > Python to want to take the same approach, sticking with unencoded > data in nearly all situations. Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early, even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined transformations, which don't actually depend on it having being decoded properly. (For example, extracting the path from the URL and attempting to open it as a file on the filesystem.) This means that Python3 programs can become *more* fragile in the face of random data you encounter out in the real world, rather than less fragile, which was the goal of the whole exercise. The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. It seems kinda too late for that, though: next time someone designs a language, they can try that. :) James -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20100622/7f537c1e/attachment-0001.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4