On Wed, 15 Mar 2000, Vladimir Marangozov wrote: > > [me] > > > > > > Perhaps it would make sense to move the Unicode database on the > > > Python side (write it in Python)? Or init the database dynamically > > > in the unicodedata module on import? It's quite big, so if it's > > > possible to avoid the static declaration (and if the unicodata module > > > is enabled by default), I'd vote for a dynamic initialization of the > > > database from reference (Python ?) file(s). > > [Marc-Andre] > > > > The unicodedatabase module contains the Unicode database > > as static C data - this makes it shareable among (Python) > > processes. > > The static data is shared if the module is a shared object (.so). > If unicodedata is not a .so, then you'll have a seperate copy of the > database in each process. Nope. A shared module means that multiple executables can share the code. Whether the const data resides in an executable or a .so, the OS will map it into readonly memory and share it across all procsses. > > Python modules don't provide this feature: instead a dictionary > > would have to be built on import which would increase the heap > > size considerably. Those dicts would *not* be shareable. > > I haven't mentioned dicts, have I? I suggested that the entries in the > C version of the database be rewritten in Python (or a text file) > The unicodedata module would, in it's init function, allocate memory > for the database and would populate it before returning "import okay" > to Python -- this is one way to init the db dynamically, among others. This would place all that data into the per-process heap. Definitely not shared, and definitely a big hit for each Python process. > As to sharing the database among different processes, this is a classic > IPC pb, which has nothing to do with the static C declaration of the db. > Or, hmmm, one of us is royally confused <wink>. This isn't IPC. It is sharing of some constant data. The most effective way to manage this is through const C data. The OS will properly manage it. And sorry, David, but mmap'ing a file will simply add complexity. As jcw mentioned, the OS is pretty much doing this anyhow when it deals with a const data segment in your executable. I don't believe this is Linux specific. This kind of stuff has been done for a *long* time on the platforms, too. Side note: the most effective way of exposing this const data up to Python (without shoving it onto the heap) is through buffers created via: PyBuffer_FromMemory(ptr, size) This allows the data to reside in const, shared memory while it is also exposed up to Python. Cheers, -g -- Greg Stein, http://www.lyra.org/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4