On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote: > Now that you brought up a specific numbers, I tried to verify them, > and found them correct (although a bit unfortunate), please see my > test script below. Up to 21800 interned strings, the dict takes (only) > 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k > interned strings is "typical", I still don't know. > > Wrt. your proposed change, I would be worried about maintainability, > in particular if it would copy parts of the set implementation. I connected to a random one of our processes, which has been running for a typical amount of time and is currently at ~300MB RSS. (gdb) p *(PyDictObject*)interned $2 = {ob_refcnt = 1, ob_type = 0x8121240, ma_fill = 97239, ma_used = 95959, ma_mask = 262143, ma_table = 0xa493c008, ....} Going from 3MB to 2.25MB isn't much, but it's not nothing, either. I'd be skeptical of cache performance arguments given that the strings used in any particular bit of code should be spread pretty much evenly throughout the hash table, and 3MB seems solidly bigger than any L2 cache I know of. You should be able to get meaningful numbers out of a C profiler, but I'd be surprised to see the act of interning taking a noticeable amount of time. -jake
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4