[Martin von Loewis] > I was looking into making pymalloc the default a few weeks ago, when I > noticed that allocations is pretty efficient on Linux. I studied glibc > malloc a bit to see that it is indeed quite clever about allocation, > even in the presence of MT locks (with per-arena locks, and recording > a per-thread arena in a thread-local variable). Is this Doug Lea's malloc implementation <ftp://g.oswego.edu/pub/misc/malloc.c>? Can you tell whether it was the one Roeland used in his "before" test case run? > I believe the main difference in deallocation speed (which I didn't > notice then) comes from the attempt to combine subsequent memory > blocks in glibc; pymalloc doesn't attempt to do so. That would be my guess; I traced bad deallocation performance to free() coalescing in a different system long ago, and it seems a common problem. > Also, there *is* a lock acquire/release cycle in glibc malloc, which > may account for some overhead, too. Sure. Doug Lea sez (see above) "If you are using malloc in a concurrent program, you would be far better off obtaining ptmalloc ..."; slapping a lock around a thread-naive malloc is merely correct. > Looking at obmalloc.c, I see one aspect that I'd consider > questionable: detection of pool-vs-system-malloc uses a heuristics, > namely > > offset = (off_t )p & POOL_SIZE_MASK; > pool = (poolp )((block *)p - offset); > if (pool->pooladdr != pool || pool->magic != (uint )POOL_MAGIC) { > _SYSTEM_FREE(p); > return; > } > > So if the pool header has the pool magic and points to itself, it > really is a pool header. That may result in a system malloc being > recognized as a pool malloc, under rare circumstances. As a result, > the system heap will be corrupted. Yes. I believe Vladimir offered to buy us lunch if that ever happened in a real program, so he's betting at least US$4 that it won't -- and nothing else in Python is backed by that much hard cash <wink>. The tradeoff is that he gets a lot of comparative storage efficiency out of this trick, and some speed too (no per-object doubly-linked free lists to maintain, no need to store size fields). I don't see a way to make it bulletproof without giving that up, short of over-allocating "large requests" enough so that a hidden pool_header struct can be stuffed at a POOL_SIZE-aligned address. Then the above could become offset = (off_t )p & POOL_SIZE_MASK; pool = (poolp )((block *)p - offset); assert(pool->pooladdr == pool); /* pool->magic member no longer exists; use, e.g., pool->szidx == (uint)-1 as a flag to mean "this block came from the system malloc". */ if (pool->szidx == (uint)-1)) { _SYSTEM_FREE(p); return; } POOL_SIZE is 4KB, though, and that's a lot of wasted space. OTOH, a "large object arena" could be introduced too, and carved up much like the current one. I suppose the bottom line is that you just can't make address tricks bulletproof unless you arrange-- by hook or by crook --to control the addresses you pass out (so you never call malloc at all, or wrap what it does). > Another aspect that may be troubling is that obmalloc never returns > memory to the system. Ah, but it does for "large objects" obtained via malloc() -- they get free()d individually. > It just detects free pools, and is willing to rearrange a free pool for > use with a different object size. Of course, there is little chance > that a complete arena will ever become empty, so this is probably not > that bad. It's also nothing really new, and *could* lead to an improvement: the hope was that PyMalloc could be made fast enough that Python could drop all its fiddly little type-specific free lists. Those never return memory either, and, worse, can't even recycle memory across types; for example, if some phase of your program uses a million floats at one time, and the next phase only retains their sum, you're stuck forever with space for a million (minus 1 <wink>) dead float objects, sitting in floatobject.c's block_list.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4