Vladimir Marangozov wrote: > > > [me] > > > > > > Perhaps it would make sense to move the Unicode database on the > > > Python side (write it in Python)? Or init the database dynamically > > > in the unicodedata module on import? It's quite big, so if it's > > > possible to avoid the static declaration (and if the unicodata module > > > is enabled by default), I'd vote for a dynamic initialization of the > > > database from reference (Python ?) file(s). > > [Marc-Andre] > > > > The unicodedatabase module contains the Unicode database > > as static C data - this makes it shareable among (Python) > > processes. > > The static data is shared if the module is a shared object (.so). > If unicodedata is not a .so, then you'll have a seperate copy of the > database in each process. Uhm, comparing the two versions Python 1.5 and the current CVS Python I get these figures on Linux: Executing : ./python -i -c '1/0' Python 1.5: 1208kB / 728 kB (resident/shared) Python CVS: 1280kB / 808 kB ("/") Not much of a change if you ask me and the CVS version has the unicodedata module linked statically... so there's got to be some sharing and load-on-demand going on behind the scenes: this is what I was referring to when I mentioned static C data. The OS can much better deal with these sharing techniques and delayed loads than anything we could implement on top of it in C or Python. But perhaps this is Linux-specific... > > Python modules don't provide this feature: instead a dictionary > > would have to be built on import which would increase the heap > > size considerably. Those dicts would *not* be shareable. > > I haven't mentioned dicts, have I? I suggested that the entries in the > C version of the database be rewritten in Python (or a text file) > The unicodedata module would, in it's init function, allocate memory > for the database and would populate it before returning "import okay" > to Python -- this is one way to init the db dynamically, among others. I'm leaving this as exercise to the interested reader ;-) Really, if you have better ideas for the unicodedata module, please go ahead. > As to sharing the database among different processes, this is a classic > IPC pb, which has nothing to do with the static C declaration of the db. > Or, hmmm, one of us is royally confused <wink>. Could you check this on other platforms ? Perhaps Linux is doing more than other OSes are in this field. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4