> > Slots can get you back most of this, but not all. Dict lookup is > > already extremely tight code, and when I profiled this, most of the > > time was spent there -- twice as many lookup calls using new-style > > classes than for classic classes. > > As I've said, and as Oren later demonstrated with code, the cost of a > namespace dict lookup now is more in the layers of function call overhead > than in the actual lookup. We could whittle that down in Oren-like ways, > although I'd rather we spent whatever time we can devote to stuff like this > on advancing one of the more-general optimization schemes that were a hot > topic before the Python conference. Here's a some info taken from a profile of a program that requests an instance attribute of a new-style class without slots or properties ten million times (using a for-loop over xrange(100000) and then 100 attribute lookups (a.foo) in the for-loop body). The following functions are called for each attribute lookup: #calls seconds name 3 1.72 lookdict_string 1 1.17 PyObject_GenericGetAttr 1 1.10 _PyType_Lookup 3 1.00 PyDict_GetItem 1 0.45 _PyObject_GetDictPtr 1 0.38 PyObject_GetAttr 10 5.82 Subtotal 3.28 eval_frame (one call!) 9.10 Total Here, "seconds" is the total time spent in 10 million times the number of calls. In addition, the program spent 3.28 seconds in 500 calls to eval_frame, I assume nearly all of it in the one call that corresponds to the body of the test function, so I've added that. The call graph is as follows: eval_frame -> (10 million times) PyObject_GetAttr -> PyObject_GenericGetAttr -> _PyObject_GetDictPtr _PyType_Lookup -> PyDict_GetItem -> lookdict_string PyDict_GetItem -> lookdict_string PyDict_GetItem -> lookdict_string If we want to be really aggressive about this, I suppose we could inline all of that in PyObject_GenericGetAttr, for the case that the name passes the PyString_CheckExact test and has a pre-calculated hash. In particular, PyDict_GetItem then pretty much boils down to "mp->ma_lookup(mp, key, hash)->me_value". That should cut out 5 function calls. A quick small gain would be to inline just the call to _PyObject_GetDictPtr. (I tried this; it saves about 2% on the total running time of this particular test when not using the profiler.) An intermediate gain would be to inline the call to _PyType_Lookup. Here's the code I profiled: ============================================================================ class C(object): pass def main(): a = C() a.foo = 42 for i in xrange(100000): a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo main() ============================================================================ If I add __slots__ = ['foo'] to the class definition, here's what I get this call graph (prefixed with the total seconds for each function; each function is called exactly once per attribute lookup in this case): 3.22 eval_frame -> (10 million times) 0.33 PyObject_GetAttr -> 1.05 PyObject_GenericGetAttr -> 0.35 PyDescr_IsData 0.36 member_get -> 0.15 descr_check -> 0.27 PyObject_IsInstance 0.44 PyMember_GetOne 0.49 _PyType_Lookup -> 0.35 PyDict_GetItem -> 1.17 lookdict_string 8.18 Total This profile points out a bug in descr_check! It calls PyObject_IsInstance, which is a very general routine and hence relatively expensive. But descr_check's call to it always passes a genuine PyTypeObject as the second argument, and we can in-line this by writing PyObject_TypeCheck(obj, descr->d_type); that's a macro that may call PyType_IsSubtype but in this case never needs to, saving about 6% on the total running time of this particular test when not using the profiler. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4