> My feeling on the unicode proposal and its implementation is that most > of the changes can be integrated directly into JPython without breaking > any existing JPython code. One thing concerns me though: > > open("out", "wb").write(u"hello") (Note that the file is opened in *binary* mode; in text mode, this would write the 5 bytes or "hello".) > This writes a 10 bytes to the file "out". > > I have two problems with that: > > 1. In java, files are always byte-based. To move from unicode chars to > bytes some kind of encoder must always be applied. It is also strange to > see the actual byte layout of the data, which in my "out" file seems to > be platform dependent. Is that the case? If it is, then the > write(u"..") strikes me as somewhat random (unknown). > > 2. To get this behavior under JPython, it is necessary to introduce a > new string type which in all other aspects are equal to the existing > string type. Only when passed to file.write should the new string type > returned a faked representation of its memory. When a normal string is > passed to .write, some byte representation of the string is written to > the file. I would prefer that in jpython a unicode string is the same as > a normal string (type("") == type(u"")). > > Perhaps the real reason for my dislike of this feature of the unicode > implementation is based on my (from java) assumption that a unicode > character is an atomic data type. Hm, I agree that it's not a great feature. On the other hand it's hard to decide what to do instead without breaking other corners of the Unicode design. Could we leave this implementation-dependent? --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4