Martin v. L=F6wis wrote: > "M.-A. Lemburg" <mal@lemburg.com> writes: >=20 >>I was talking about the *installed* size, ie. the size >>of the package in site-packages: >=20 > Right. And we are trying to tell you that this is irrelevant when > talking about the size increase to be expected when JapaneseCodecs is > incorporated into Python. Why is it irrelevant ? If it would be irrelevant Fredrik wouldn't have invested so much time in trimming down the footprint of the Unicode database. What we need is a generic approach here which works for more than just the Japanese codecs. I believe that those codecs could provide a good basis for more codecs from the Asian locale, but before adding megabytes of mapping tables, I'd prefer to settle for a good design first. >>Hisao's approach uses a single table which fits into 58kB Python >>source code. Boil that down to a static C table and you'll end up >>with something around 10-20kB for static C data.=20 > > How did you obtain this number?=20 By looking at the code. It uses Unicode literals to define the table. >>Hisao does still builds a dictionary using this data, but perhaps >>that step could be avoided using the same techniques that Fredrik >>used in boiling down the size of the unicodedata module (which holds >>the Unicode Database). >=20 > Perhaps, yes. Have you studied the actual data to see whether these > techniques might help or not? It's just a hint: mapping tables are all about fast lookup vs. memory consumption and that's what Fredrik's approach of decomposition does rather well (Tamito already uses such an approach). cdb would provide an alternative approach, but there are licensing problems... --=20 Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4