On 6/21/2010 1:29 PM, Guido van Rossum wrote: > Actually, the big problem with Python 2 is that if you mix str and > unicode, things work or crash depending on whether any of the str > objects involved contain non-ASCII bytes. > > If one API decides to upgrade to Unicode, the result, when passed to > another API, may well cause a UnicodeError because not all arguments > have had the same treatment. > >> Now, the APIs are neither safe nor aware -- if you pass bytes in, you get >> unpredictable results back. > > This seems an overgeneralization of a particular bug. There are APIs > that are strictly text-in, text-out. There are others that are > bytes-in, bytes-out. Let's call all those *pure*. For some operations > it makes sense that the API is *polymorphic*, with which I mean that > text-in causes text-out, and bytes-in causes byte-out. All of these > are fine. > > Perhaps there are more situations where a polymorphic API would be > helpful. Such APIs are not always so easy to implement, because they > have to be careful with literals or other constants (and even more so > mutable state) used internally -- but it can be done, and there are > plenty of examples in the stdlib. > > The real problem apparently lies in (what I believe is only a few > rare) APIs that are text-or-bytes-in and always-text-out (or > always-bytes-out). Let's call them *hybrid*. Clearly, mixing hybrid > APIs in a stream of pure or polymorphic API calls is a problem, > because they turn a pure or polymorphic overall operation into a > hybrid one. > > There are also text-in, bytes-out or bytes-in, text-out APIs that are > intended for encoding/decoding of course, but these are in a totally > different class. > > Abstractly, it would be good if there were as few as possible hybrid > APIs, many pure or polymorphic APIs (which it should be in a > particular case is a pragmatic choice), and a limited number of > encoding/decoding APIs, which should generally be invoked at the edges > of the program (e.g., I/O). Nice summary of part of the 'why' for Python3. > I still believe that believe that the instances of bytes silently > succeeding *some* of the time refers to specific bugs in specific > APIs, either intentional because of misguided compatibility desires, > or accidental in the haste of trying to convert the entire stdlib to > Python 3 in a finite time. I think http://bugs.python.org/issue5468 reports one aspect of haste, missing encoding and errors paramaters. But it has not gotten much attention. -- Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4