Hye-Shik Chang wrote: > I have planned few things to update in cjkcodecs before 2.4 alpha1 > is out. If you have any opionions or objections, please tell me. > > 1. Update JIS X 0213 to its first amendment (a.k.a JIS X 0213:2004) > This will introduce three new encodings; euc-jis-2004, shift_jis-2004 > and iso-2022-jp-2004. It's not so different from their each > preceding encodings but we may need to keep both of versions due > to incompatibilities and encoding name change. (This won't bloat > code size a lot. I expect it around 3~5K.) +1 > 2. Merge two or three simliar C codecs into one. We have one C > codec for every each python codecs currently. I have got an > idea to merge them into several similar groups and many common > part of .so binaries will be saved: > > _codecs_jacodecs_1.so: euc-jp, shift-jis, iso-2022-jp, > iso-2022-jp-1, iso-2022-jp-ext > _codecs_jacodecs_2.so: euc-jisx0213, shift-jisx0213, iso-2022-jp-3, > euc-jis-2004, shift-jis-2004, > iso-2022-jp-2004 > _codecs_jacodecs_3.so: iso-2022-jp-2 > _codecs_kocodecs_1.so: euc-kr, johab, iso-2022-kr > _codecs_kocodecs_2.so: cp949 > _codecs_zhcodecs_1.so: gb2312, gbk, gb18030, hz > _codecs_zhcodecs_2.so: big5, cp950 +1, but why not put all Japanese codecs into one module and dito for the Korean and Chinese ones ? Note that todays OS linkers will only mmap those pieces of code into the process memory that are actually needed by the application, so even though the size of the modules increases, the application process memory foot-print is likely not to increase. > 3. Split some mapping keeper modules to few group-based modules. This > will save memory and spaces for who need only legacy codecs like > "euc-kr only". > > _codecs_mapdata_ko_KR -> > _codecs_komapdata_1.so: KS X 1001 > _codecs_komapdata_2.so: cp949 > > _codecs_mapdata_ja_JP -> > _codecs_jamapdata_1.so: JIS X 0208, JIS X 0212 > _codecs_jamapdata_2.so: JIS X 0213:2000 and :2004 > > _codecs_mapdata_zh_CN -> > _codecs_zhmapdata_1.so: gb2312, gbk, gb18030 > > _codecs_mapdata_zh_TW -> > _codecs_zhmapdata_2.so: big5, cp950 > -1 See above: this is static C data, so splitting these won't really buy the user anything. If you don't believe this, compare the resident size of Python with and without unicodedata loaded. The difference on my machine is a measily 30kB, not the 250kB of the complete module. > If these sound acceptable for python-dev people, they will be > implemented as CJKCodecs 1.1 first and imported into python later > (before 2.4a1). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 16 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4