"M.-A. Lemburg" <mal@lemburg.com> writes: > I was talking about the *installed* size, ie. the size > of the package in site-packages: Right. And we are trying to tell you that this is irrelevant when talking about the size increase to be expected when JapaneseCodecs is incorporated into Python. > degas site-packages/japanese# du > 337 ./c > 1252 ./mappings > 88 ./python > 8 ./aliases You should ignore mappings and python in your counting, they are not needed. > I wonder whether it wouldn't be possible to use the same tricks > Hisao used in his codec for a C version. I believe it does use the same tricks. It's just that the JapaneseCodecs package supports a number of widely-used encodings which Hisao's package does not support. > The source code size is not that important. The install size > is and even more the memory footprint. Computing the memory footprint is very difficult, of course. > Hisao's approach uses a single table which fits into 58kB Python > source code. Boil that down to a static C table and you'll end up > with something around 10-20kB for static C data. How did you obtain this number? > Hisao does still builds a dictionary using this data, but perhaps > that step could be avoided using the same techniques that Fredrik > used in boiling down the size of the unicodedata module (which holds > the Unicode Database). Perhaps, yes. Have you studied the actual data to see whether these techniques might help or not? Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4