[Kevin Jacobs] Nice job, Kevin! You learned a lot in a hurry here. I'll try to fill in some blanks. > ... > lst = [] > for i in range(100000): > lst.append( (1,) ) > > The key ingredients are: > > 1) A method is called on a container (rather than __setitem__ or > __setattr__). > > 2) A new object is allocated while the method object lives on the Python > VM stack, as shown by the disassembled bytecodes: > > 40 LOAD_FAST 1 (lst) > 43 LOAD_ATTR 3 (append) > 46 LOAD_CONST 2 (1) > 49 BUILD_TUPLE 1 > 52 CALL_FUNCTION 1 > > These ingredients combine in the following way to trigger quadratic-time > behavior in the Python garbage collector: > > * First, the LOAD_ATTR on "lst" for "append" is called, and a > PyCFunction is returned from this code in descrobject.c:method_get: > > return PyCFunction_New(descr->d_method, obj); > > Thus, a _new_ PyCFunction is allocated every time the method is > requested. In outline, so far all that has been true since 0 AP (After Python). > * This new method object is added to generation 0 of the garbage > collector, which holds a reference to "lst". It's a bug in 2.2.1 that the method object isn't getting added to gen0. It is added in current CVS. > * The BUILD_TUPLE call may then trigger a garbage collection cycle. > > * Since the "append" method is in generation 0, Yes. > the reference traversal must also follow all objects within "lst", > even if "lst" is in generation 1 or 2. According to me, it's a bug that it does so in current CVS, and a bug that's been in cyclic gc forever. This kind of gc scheme isn't "supposed to" chase old objects (there's no point to doing so -- if there is a reclaimable cycle in the young generation, the cycle is necessarily composed of pure young->young pointers, so chasing a cross-generation pointer can't yield any useful info). It's not a *semantic* error if it chases old objects too, but it does waste time, and can (as it does here) yank old objects back to a younger generation. I attached a brief patch to your bug report that stops this. > This traversal requires time linear in the number of > objects in "lst", thus increasing the overall time complexity of the > code to quadratic in the number of elements in "lst". Yes. Do note that this class of program is quadratic-time anyway, just because the rare gen2 traversals have to crawl over an ever-increasing lst too. BTW, the "range(100000)" in your test program also gets crawled over every time a gen2 collection occurs! That's why Neil made them rare <wink>. > Also note that this is a much more general problem than this > small example. Sure, although whether it's still "a real problem" after the patch is open to cost-benefit ridicule <wink>. > It can affect many types of objects in addition to methods, including > descriptors, iterator objects, and any other object that contains a "back > reference". > > So, what can be done about this.... One simple solution would be to not > traverse some "back references" if we are collecting objects in generation > 0. > > This will avoid traversing virtually all of these ephemoral > objects that will trigger such expensive behavior. If they live long > enough to pass through to generation one or two, then clearly they > should be traversed. > > So, what do all of you GC gurus think? Provided that my analysis > is sound, I can rapidly propose a patch to demonstrate this approach if > there is sufficient positive sentiment. Seeing a patch is the only way I'd understand your intent. You can understand my intent by reading my patch <wink>. > ... > > PS: I have not looked into why this doesn't happen in Python 2.2.x or > before. It's a bug in 2.2.1 (well, two bugs, if Neil accepts my claim that the patch I put up "fixes a bug" too). In 2.1, method objects hadn't yet been added to cyclic gc.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4