> If we agree to merge the semantics of the two APIs, then str() > would have to change too: is this desirable ? (IMHO, yes) Not clear. Which is why I'm backing off from my initial support for merging the two. I believe unicode() (which is really just an interface to PyUnicode_FromEncodedObject()) currently already does too much. In particular this whole business with calling __str__ on instances seems to me to be unnecessary. I think it should *only* bother to look for something that supports the buffer interface (checking for regular strings only as a tiny optimization), or existing unicode objects. > Here's what we could do: > > a) merge the semantics of unistr() into unicode() > b) apply the same semantics in str() > c) remove unistr() -- how's that for a short-living builtin ;) > > About the semantics: > > These should be backward compatible to str() in that everything > that worked before should continue to work after the merge. > > A strawman for processing str() and unicode(): > > 1. strings/Unicode is passed back as-is I hope you mean str() passes 8-bit strings back as-is, unicode() passes Unicode strings back as-is, right? > 2. tp_str is tried > 3. the method __str__ is tried Shouldn't have to -- instances should define tp_str and all the magic for calling __str__ should be there. I don't understand why it's not done that way, probably just for historical reasons. I also don't think __str__ should be tried for non-instance types. But, more seriously, I believe tp_str or __str__ shouldn't be tried at all by unicode(). > 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) > 5. for str(): Unicode return values are converted to strings using > the default encoding > for unicode(): Unicode return values are passed back as-is; > string return values are decoded according to the > encoding parameter > 6. the return object is type-checked: str() will always return > a string object, unicode() always a Unicode object > > Note that passing back Unicode is only allowed in case no encoding > was given. Otherwise an execption is raised: you can't decode > Unicode. > > As extension we could add encoding and error parameters to str() > as well. The result would be either an encoding of Unicode objects > passed back by tp_str or __str__ or a recoding of string objects > returned by checks 2, 3 or 4. Naaaah! > If we agree to take this approach, then we should remove the > unistr() Python API before the alpha ships. Frankly, I believe we need more time to sort this out, and therefore I propose to remove the unistr() built-in before the release. Marc, would you do the honors? --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4