Guido van Rossum wrote: > > [GvR] > > > Hm... There's also the problem that there's no easy way to do Unicode > > > I/O. I'd like to have a way to turn a particular file into a Unicode > > > output device (where the actual encoding might be UTF-8 or UTF-16 or a > > > local encoding), which should mean that writing Unicode objects to the > > > file should "do the right thing" (in particular should not try to > > > coerce it to an 8-bit string using the default encoding first, like > > > print and str() currently do) and that writing 8-bit string objects to > > > it should first convert them to Unicode using the default encoding > > > (meaning that at least ASCII strings can be written to a Unicode file > > > without having to specify a conversion). I support that reading from > > > a "Unicode file" should always return a Unicode string object (even if > > > the actual characters read all happen to fall in the ASCII range). > > > > > > This requires some serious changes to the current I/O mechanisms; in > > > particular str() needs to be fixed, or perhaps a ustr() needs to be > > > added that it used in certain cases. Tricky, tricky! > > [MAL] > > It's not all that tricky since you can write a StreamRecoder > > subclass which implements this. AFAIR, I posted such an implementation > > on i18n-sig. > > > > BTW, one of my patches on SF adds unistr(). Could be that it's > > time to apply it :-) > > Adding unistr() and StreamRecoder isn't enough. The problem is that > when you set sys.stdout to a StreamRecoder, the print statement > doesn't do the right thing! Try it. print u"foo" will work, but > print u"\u1234" will fail because print always applies the default > encoding. Hmm, that's due to PyFile_WriteObject() calling PyObject_Str(). Perhaps we ought to let it call PyObject_Unicode() (which you find in the patch on SF) instead for Unicode objects. That way the file-like .write() method will be given a Unicode object and StreamRecoder could then do the trick. Haven't tried this, but it could work (the paths objects take through Python to get printed are somewhat strange at times -- there are just so many different possiblities and special cases that it becomes hard telling from just looking at the code). > The required changes to print are what's tricky. Whether we even need > unistr() depends on the solution we find there. I think we'll need PyObject_Unicode() and unistr() one way or another. Those two APIs simply complement PyObject_Str() and str() in that they always return Unicode objects and do the necessary conversion based on the input object type. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4