Guido van Rossum wrote: > > > If we agree to merge the semantics of the two APIs, then str() > > would have to change too: is this desirable ? (IMHO, yes) > > Not clear. Which is why I'm backing off from my initial support for > merging the two. > > I believe unicode() (which is really just an interface to > PyUnicode_FromEncodedObject()) currently already does too much. In > particular this whole business with calling __str__ on instances seems > to me to be unnecessary. I think it should *only* bother to look for > something that supports the buffer interface (checking for regular > strings only as a tiny optimization), or existing unicode objects. Hmm, unicode() should (just like str()) take an object and convert it to a Unicode string. Since many objects either don't support the tp_str slot (instances don't for some reason -- just like they don't tp_call), I had to add some special cases to make Python instances compatible to Unicode in the same way str() does. What I think is really needed is a concept for "stringification" in Python. We currently have these schemes: 1. tp_str 2. method __str__ (not only of Python instances, but any object) 3. character buffer interface These three could easily be unified into the tp_str slot: e.g. tp_str could do the necessary magic to call __str__ or the buffer interface. Note that the same is true for e.g. tp_call -- the special cases we have in ceval.c for the different builtin callable objects would not be necessary if they would implement tp_call. > > Here's what we could do: > > > > a) merge the semantics of unistr() into unicode() > > b) apply the same semantics in str() > > c) remove unistr() -- how's that for a short-living builtin ;) > > > > About the semantics: > > > > These should be backward compatible to str() in that everything > > that worked before should continue to work after the merge. > > > > A strawman for processing str() and unicode(): > > > > 1. strings/Unicode is passed back as-is > > I hope you mean str() passes 8-bit strings back as-is, unicode() > passes Unicode strings back as-is, right? Right. > > 2. tp_str is tried > > 3. the method __str__ is tried > > Shouldn't have to -- instances should define tp_str and all the magic > for calling __str__ should be there. I don't understand why it's not > done that way, probably just for historical reasons. I also don't > think __str__ should be tried for non-instance types. Ok. > But, more seriously, I believe tp_str or __str__ shouldn't be tried at > all by unicode(). Hmm, but how would you implement generic conversion to Unicode then ? We'll need some way for instances (and other types) to provide a conversion to Unicode. Some time ago we discussed this issue and came to the conclusion that tp_str should be allowed to return Unicode data instead of inventing a new tp_unicode slot for this purpose. > > 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) > > 5. for str(): Unicode return values are converted to strings using > > the default encoding > > for unicode(): Unicode return values are passed back as-is; > > string return values are decoded according to the > > encoding parameter > > 6. the return object is type-checked: str() will always return > > a string object, unicode() always a Unicode object > > > > Note that passing back Unicode is only allowed in case no encoding > > was given. Otherwise an execption is raised: you can't decode > > Unicode. > > > > As extension we could add encoding and error parameters to str() > > as well. The result would be either an encoding of Unicode objects > > passed back by tp_str or __str__ or a recoding of string objects > > returned by checks 2, 3 or 4. > > Naaaah! Would be nice for symmetry and useful in the light of making Unicode the only string type in Py4k ;-) > > If we agree to take this approach, then we should remove the > > unistr() Python API before the alpha ships. > > Frankly, I believe we need more time to sort this out, and therefore I > propose to remove the unistr() built-in before the release. Marc, > would you do the honors? Ok. I'll remove the builtin and the docs, but will leave the PyObject_Unicode() API enabled. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4