> At pycon, I have been looking into Python startup time. > > I found that CVS-Python allocates roughly 12,000 string objects on > startup, whereas Python 2.2 only allocates 8,000 string objects. In > either case, most strings come from unmarshalling string objects, > and the increase is (probably) due to the increased number of > modules loaded at startup (up from 26 to 34). But is this really where the time goes? On my home box (~11K pystones/second) I can allocate 12K strings in 17 msec. > The string objects allocated during unmarshalling are often quickly > discarded after being allocated, as they are identifiers, and get > interned - so only the interned version of the string survives, and > the second copy is deallocated. > > I'd like to change the marshal format to perform sharing of equal > strings, instead of marshalling the same identifiers multiple times. > To do so, a dictionary of strings is created on marshalling and a > list is created on unmarshalling, and a new marshal code for > string-backreference would be added. > > What do you think? Feels like a rather dicey incompatible change to marshal, and rather a lot of work unless you know it is going to make a significant change. It seems that marshalling would have to become a two-pass thing, unless you want to limit that dict/list to function scope, in which case I'm not sure it'll make much of a difference. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4