> Martin v. Loewis wrote: > > "M.-A. Lemburg" <mal@lemburg.com> writes: > > > > > >>The fact that StringIO works with Unicode (and then only in the > >>case where you *only* pass Unicode to it) is more an implementation > >>detail than a true feature. > > > > It's a true feature. You explicitly fixed that feature in > > > > revision 1.20 > > date: 2002/01/06 17:15:05; author: lemburg; state: Exp; lines: +8 -5 > > Restore Python 2.1 StringIO.py behaviour: support concatenating > > Unicode string snippets to larger Unicode strings. > > > > This fix should also go into Python 2.2.1. > > > > after you broke it in > > > > revision 1.19 > > date: 2001/09/24 17:34:52; author: lemburg; state: Exp; lines: +4 -1 > > branches: 1.19.12; > > StringIO patch #462596: let's [c]StringIO accept read buffers on > > input to .write() too. > > I doubt that it's a true feature. The fact that I broke it > in the above patch by introducing the str(data) call in > StringIO.py suggests that whoever complained about this change > was using an implementation detail rather than a documented > and originally intended feature of StringIO. > > If you need something like StringIO for Unicode then I would > suggest to create a similar object which then only deals with > Unicode, e.g. UnicodeIO. But since StringIO already works for Unicode, why bother? > cStringIO could then be extended to also support such an object > by using the same trick as SRE does to support two native > types (putting the code into a .h file and then including > it twice). (Off-topic: each time I fix a bug twice, once in stringobject.c and once in unicodeobject.c, I wish we'd done that for string and unicode objects. But it's too late now, and also may not be realistic given some different implementation choices.) > Back to the original question. I don't have a problem with > leaving in the Unicode support in StringIO's .write() method, > but the introduction of the Unicode print support should not > rely on this detail. Agreed. > Instead someone wanting to write Unicode > only to a StringIO like object should be directed to UnicodeIO. > > Now, to satisfy the request of the poster who wanted support for > __unicode__ in PyFile_WriteObject() we need to add something > which lets PyFile_WriteObject() determine wether to look > for __unicode__ or not (per default, it passes through > Unicode objects as-is and applies str() to all other objects). > > I like the idea of using the .encoding attribute as flag > for this. What I don't like is that setting it to None > should be used for Unicode-only streams (ones that take > Unicode on input and use Unicode on output). To me, > .encoding = None would signal: this stream doesn't do anything > to the input data and passes it to the output stream as-is. But I'm not sure that's a useful feature. Maybe encoding=None could mean the current StringIO behavior. <0.5 wink> > Much better, IMHO, would be to use .encoding = 'unicode' > on Unicode-only streams such as the mentioned UnicodeIO > object. Yes. (Except 'unicode' is not an encoding name, right? Maybe it should be?) > In summary, StringIO objects should not implement .encoding > while a new Unicode-only stream-like object UnicodeIO > should have .encoding = 'unicode'. > > The same could then be done with the corresponding cStringIO > objects. > > PS: Some may not know, but the obvious way of fixing printing > of Unicode by adding a tp_print slot implementation does not > work, since that slot takes a FILE* pointer as file "object" > which, of course, cannot include any additional information > such as the encoding. Yes, tp_print is only an optimization for tp_repr and tp_str when writing to a "real" file object. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4