FYI, I implemented the optimizations Vladimir and I discussed here. Next, _PyMalloc_DebugDumpStats() is an entry point you can call in a debug build (or when PYMALLOC_DEBUG is enabled in a release build) to get a snapshot of pymalloc's internal structures. Perhaps it should be enabled in a release build too without PYMALLOC_DEBUG -- as is, *because* PYMALLOC_DEBUG is enabled, every allocation is bumped by 16 bytes to make room for PYMALLOC_DEBUG's memory decorations. Here's sample output (recently greatly improved), from near the tail end of a debug-build run of the test suite: Small block threshold = 256, in 32 size classes. pymalloc malloc+realloc called 4414692 times. class num bytes num pools blocks in use avail blocks ----- --------- --------- ------------- ------------ 5 48 773 64932 0 6 56 266 19028 124 7 64 288 18122 22 8 72 124 6914 30 9 80 178 8873 27 10 88 41 1867 19 11 96 28 1170 6 12 104 21 798 21 13 112 16 543 33 14 120 11 359 4 15 128 8 228 20 16 136 5 141 4 17 144 5 114 26 18 152 13 295 43 19 160 6 144 6 20 168 138 3292 20 21 176 5 96 19 22 184 4 76 12 23 192 3 43 20 24 200 3 42 18 25 208 3 40 17 26 216 3 43 11 27 224 2 29 7 28 232 3 32 19 29 240 2 21 11 30 248 2 31 1 31 256 2 21 9 31 arenas * 262144 bytes/arena = 8126464 0 unused pools * 4096 bytes = 0 # bytes in allocated blocks = 7796144 # bytes in available blocks = 69056 # bytes lost to pool headers = 62496 # bytes lost to quantization = 71792 # bytes lost to arena alignment = 126976 Total = 8126464 Running the Unicode tests vastly increases the number of the smallest blocks in use. The hump in the 168-byte class is due to small dicts. Feel lightly encouraged to try calling this in your real programs now, and strongly encouraged after the memory-API rework is complete. Try very hard not to read too much into the test suite <wink>. All I take from the above is that memory utilization is excellent; fragmentation is trivial (e.g., in the 56-byte class, 124 available blocks * 56 bytes/block is greater than a 4096-byte pool, so in an ideal world we *could* get away with 265 pools of this size instead of 266); and the wastage due to tossing away "the ends" of arenas to leave pool-aligned pools ("arena alignment") is significant (compared to the other kinds of pure waste in pymalloc -- "quantization" means stuff lost to that the available bytes in a pool often aren't an exact multiple of the pool's block size), but that overall wastage is low. Note that there's no accounting here for what's lost due to returning 8-byte aligned addresses.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4