Hello from Japan, On 16 Jan 2003 11:05:55 +0100 martin@v.loewis.de (Martin v. Lvwis) wrote: > "M.-A. Lemburg" <mal@lemburg.com> writes: > > > Thoughts ? > > I'm in favour of adding support for Japanese codecs, but I wonder > whether we shouldn't incorporate the C version of the Japanese codecs > package instead, despite its size. I also vote for JapaneseCodec. Talking about it's size, JapaneseCodec package is much lager because it contains both C version and pure Python version. Size of C version part of JapaneseCodec is about 160kb(compiled on Windows platform), and I don't think it makes difference. > *If* Suzuki's code is incorporated, I'd like to get independent > confirmation that it is actually correct. I know Tamito has taken many > iterations until it was correct, where "correct" is a somewhat fuzzy > term, since there are some really tricky issues for which there is no > single one correct solution (like whether \x5c is a backslash or a Yen > sign, in these encodings). Yes, Tamito's JapaneseCodec has been used for years by many Japanese users, while I've never heard about Suzuki's one. > mapping tables are extracted from Java, through Jython. > > I also dislike absence of the cp932 encoding in Suzuki's codecs. The > suggestion to equate this to "mbcs" on Windows is not convincing, as > a) "mbcs" does not mean cp932 on all Windows installations, and b) > cp932 needs to be processed on other systems, too. Agreed. > I *think* cp932 > could be implemented as a delta to shift-jis, as shown in > > http://hp.vector.co.jp/authors/VA003720/lpproj/test/cp932sj.htm > > (although I wonder why they don't list the backslash issue as a > difference between shift-jis and cp932) > http://www.ingrid.org/java/i18n/unicode-utf8.html may be better reference. This page is written in English with utf-8. -------------------------- Atsuo Ishimoto ishimoto@gembook.org Homepage:http://www.gembook.jp
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4