On Sat, 10 Feb 2001, Andy Robinson wrote: > > So far, noone has commented on this idea. > > > > I would like to go ahead and check in patch which passes through > > Unicode objects to the file-object's .write() method while leaving > > the standard str() call for all other objects in place. > > > I'm behind this in principle. Here's an example of why: > > >>> tokyo_utf8 =3D "??" # the kanji for Tokyo, trust me... > >>> print tokyo_utf8 # this is 8-bit and prints fine > =E6=9D=B1=E4=BA=AC > >>> tokyo_uni =3D codecs.utf_8_decode(tokyo_utf8)[0] > >>> print tokyo_uni # try to print the kanji > Traceback (innermost last): > File "<interactive input>", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) Something like the following looks reasonable to me; the added complexity is that the file object now remembers an encoder/decoder pair in its state (the API might give the appearance of remembering just the codec name, but we want to avoid doing codecs.lookup() on every write), and uses it whenever write() is passed a Unicode object. >>> file =3D open('outputfile', 'w', 'utf-8') >>> file.encoding 'utf-8' >>> file.write(tokyo_uni) # tokyo_utf8 gets written to file >>> file.close() Open questions: - If an encoding is specified, should file.read() then always return Unicode objects? - If an encoding is specified, should file.write() only accept Unicode objects and not bytestrings? - Is the encoding attribute mutable? (I would prefer not, but then how to apply an encoding to sys.stdout?) Side question: i noticed that the Lib/encodings directory supports quite a few code pages, including Greek, Russian, but there are no ISO-2022 CJK or JIS codecs. Is this just because no one felt like writing one, or is there a reason not to include one? It seems to me it might be nice to include some codecs for the most common CJK encodings -- that recent note on the popularity of Python in Korea comes to mind. -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose=20
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4