> Open questions: > > - If an encoding is specified, should file.read() then > always return Unicode objects? > > - If an encoding is specified, should file.write() only > accept Unicode objects and not bytestrings? > > - Is the encoding attribute mutable? (I would prefer not, > but then how to apply an encoding to sys.stdout?) Right now, codecs.open returns an instance of codecs.StreamReaderWriter, not a native file object. It has methods that look like the ones on a file, but they tpically accept or return Unicode strings instead of binary ones. This feels right to me and is what Java does; if you want to switch encoding on sys.stdout, you are not really doing anything to the file object, just switching the wrapper you use. There is much discussion on the i18n sig about 'unifying' binary and Unicode strings at the moment. > Side question: i noticed that the Lib/encodings directory supports > quite a few code pages, including Greek, Russian, but there are no > ISO-2022 CJK or JIS codecs. Is this just because no one felt like > writing one, or is there a reason not to include one? It seems to > me it might be nice to include some codecs for the most common CJK > encodings -- that recent note on the popularity of Python in Korea > comes to mind. There have been 3 contributions to Asian codecs on the i18n sig in the last six months (pythoncodecs.sourceforge.net) one C, two J and one K - but some authors are uncomfortable with Python-style licenses. They need tying together into one integrated package with a test suite. After a 5-month-long project which tied me up, I have finally started ooking at this. The general feeling was that the Asian codecs package should be an optional download, but if we can get them fully tested and do some compression magic it would be nice to get them in the box one day. - Andy Robinson
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4