I have recently released errata-07 which improves on JPython's ability to handle unicode characters as well as binary data read from and written to python files. The conversions can be described as - I/O to a file opened in binary mode will read/write the low 8-bit of each char. Writing Unicode chars >0xFF will cause silent truncation [*]. - I/O to a file opened in text mode will push the character through the default encoding for the platform (in addition to handling CR/LF issues). This breaks completely with python1.6a2, but I believe that it is close to the expectations of java users. (The current JPython-1.1 behavior are completely useless for both characters and binary data. It only barely manage to handle 7-bit ASCII). In JPython (with the errata) we can do: f = open("test207.out", "w") f.write("\x20ac") # On my w2k platform this writes 0x80 to the file. f.close() f = open("test207.out", "r") print hex(ord(f.read())) f.close() f = open("test207.out", "wb") f.write("\x20ac") # On all platforms this writes 0xAC to the file. f.close() f = open("test207.out", "rb") print hex(ord(f.read())) f.close() With the output of: 0x20ac 0xac I do not expect anything like this in CPython. I just hope that all unicode advice given on c.l.py comes with the modifier, that JPython might do it differently. regards, finn http://sourceforge.net/project/filelist.php?group_id=1842 [*] Silent overflow is bad, but it is at least twice as fast as having to check each char for overflow.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4