Guido van Rossum wrote: >>M.-A. Lemburg wrote: >> >>>Now that more and more codecs become available and the scope >>>of those codecs goes far beyond only encoding from Unicode to >>>strings and back, I am tempted to open up that restriction, >>>thereby opening up u.encode() for applications that wish to >>>use other codecs that return e.g. Unicode objects as well. >>>[...] >>>Note that codecs are not restricted in what they can return >>>for their .encode() or .decode() method, so any object >>>type is acceptable, including subclasses of str or >>>unicode, buffers, mmapped files, etc. >> >>+1. I find it surprising that the restriction exists. I would have >>thought u.encode('foo') would pretty transparently wrap the foo >>codec's .encode(). >> >>This is also a good reminder that type checking of the result of >>codec or unicode .encode() calls is prudent, anytime. > > > May I make one tiny objection? I don't know if it's enough to stop > this (I value it at -0.5 at most), but this will make reasoning about > types harder. Given that approaches like StarKiller and IronPython > are likely the best way to get near-C speed for Python, I'd like the > standard library at least to make life eacy for their approach. > > The issue is that currently the type inferencer can know that the > return type of u.encode(s) is 'unicode', assuming u's type is > 'unicode'. But with the proposed change, the return type will depend > on the *value* of s, and I don't know how easy it is for the type > inferencers to handle that case -- likely, a type inferencer will have > to give up and say it returns 'object'. > If you use something like the Cartesian product algorithm (what StarKiller uses) then for different call signatures a new inferred return type is done for a method. But this pretty much only works with Python code since you have full access to the source to do the analysis again. With Unicode stuff being done in C, you would have to just take the lowest common-denominator result, which would be 'object' since you can't reanalyze the execution path for different call signatures unless someone wants to take the pain of type inferring C code. Otherwise this type fo case can be taken into consideration when developing a type inferencing framework that deals with C code, but that just seems painful and overly complicated. -Brett
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4