Martin v. L=F6wis wrote: > "M.-A. Lemburg" <mal@lemburg.com> writes: >=20 >>Thoughts ? >=20 > I'm in favour of adding support for Japanese codecs, but I wonder > whether we shouldn't incorporate the C version of the Japanese codecs > package instead, despite its size. I was suggesting to make Suzuki's codecs the default. That doesn't prevent Tamito's codecs from working, since these are inside a package. If someone wants the C codecs, we should provide them as separate download right alongside of the standard distro (as discussed several times before). Note that the C codecs are not as easy to modify to special needs as the Python ones. While this may seem unnecessary I've heard from a few people that especially companies tend to extend the mappings with their own set of company specific code points. > I would also suggest that it might be more worthwhile to expose > platform codecs, which would give us all CJK codecs on a number of > major platforms, with a minimum increase in the size of the Python > distribution, and with very good performance. +1 We already have this on Windows (via the mbcs codec). If you could contribute your iconv codecs under the PSF license we'd go a long way in that direction on Unix as well. > *If* Suzuki's code is incorporated, I'd like to get independent > confirmation that it is actually correct.=20 Since he built the codecs on the mappings in Java, this looks like enough third party confirmation already. > I know Tamito has taken many > iterations until it was correct, where "correct" is a somewhat fuzzy > term, since there are some really tricky issues for which there is no > single one correct solution (like whether \x5c is a backslash or a Yen > sign, in these encodings). I notice (with surprise) that the actual > mapping tables are extracted from Java, through Jython. Indeed. I think that this kind of approach is a good one in the light of the "correctness" problems you mention above. It also helps with the compatibility side. > I also dislike absence of the cp932 encoding in Suzuki's codecs. The > suggestion to equate this to "mbcs" on Windows is not convincing, as > a) "mbcs" does not mean cp932 on all Windows installations, and b) > cp932 needs to be processed on other systems, too. I *think* cp932 > could be implemented as a delta to shift-jis, as shown in >=20 > http://hp.vector.co.jp/authors/VA003720/lpproj/test/cp932sj.htm >=20 > (although I wonder why they don't list the backslash issue as a > difference between shift-jis and cp932) As always: contributions are welcome :-) --=20 Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4