We've got multiple segfault problems associated with weakref callbacks, and multiple problems of that kind coming from subtype_dealloc alone. Here's a piece of test_weakref.py in my 2.4 checkout; the first part of the test got fixed yesterday; the second part has not been checked in yet, because it still fails (in a release build it corrupts memory and that may not be visible; in a debug build it reliably segfaults, due to double deallocation): """ def test_sf_bug_840829(self): # "weakref callbacks and gc corrupt memory" # subtype_dealloc erroneously exposed a new-style instance # already in the process of getting deallocated to gc, # causing double-deallocation if the instance had a weakref # callback that triggered gc. # If the bug exists, there probably won't be an obvious symptom # in a release build. In a debug build, a segfault will occur # when the second attempt to remove the instance from the "list # of all objects" occurs. import gc class C(object): pass c = C() wr = weakref.ref(c, lambda ignore: gc.collect()) del c # There endeth the first part. It gets worse. del wr c1 = C() c1.i = C() wr = weakref.ref(c1.i, lambda ignore: gc.collect()) c2 = C() c2.c1 = c1 del c1 # still alive because c2 points to it # Now when subtype_dealloc gets called on c2, it's not enough just # that c2 is immune from gc while the weakref callbacks associated # with c2 execute (there are none in this 2nd half of the test, btw). # subtype_dealloc goes on to call the base classes' deallocs too, # so any gc triggered by weakref callbacks associated with anything # torn down by a base class dealloc can also trigger double # deallocation of c2. del c2 """ There are two identifiable (so far) problems in subtype_dealloc (note that these have nothing to do with Jim's current woes -- those are a different problem with weakref callbacks, and he hasn't yet hit the problems I'm talking about here -- but he will, eventually). 1. A weakref callback can resurrect self, but the code isn't aware of that now. It's not *easy* to resurrect self, and we probably thought it wasn't possible, but it is: if self is in a dead cycle, and the weakref callback invokes a method of an object in that cycle, self is visible to the callback (following the cycle links), and so self can get resurrected by the callback. The callback doesn't have to specifically try to resurrect self, it can happen as a side effect of resurrecting anything in the cycle from which self is reachable. 2. Unlike other dealloc routines, subtype_delloc leaves the object, with refcnt 0, tracked by gc. That's the cause of the now seemingly endless sequence of ways to provoke double deallocation: when a weakref callback is invoked at any time while subtype_dealloc is executing (whether the callback is associated with self, or with anything that dies as a result of any base class cleanup calls), and if gc happens to trigger while the callback is executing, and self happens to be in a generation gc is collecting, then the tracked refcount=0 self looks like garbage to gc, so gc does incref call tp_clear decref on it, and the decref knocks the refcount back down to 0 again thus triggering another deallocation (while the original deallocation is still in progress). To avoid #2, one of these two must be true: A. self is untracked at the time gc happens. B. self has a refcount > 0 at the time gc happens (e.g., the usual "temporarily resurrect" trick). I checked in a 0-byte change yesterday that repaired the first half of the test case, using #A (I simply moved the line that retracks self below the *immediate* weakref callback). But that same approach can't work for the rest of subtype_dealloc, for reasons you explained in a comment at the end of the function. Doing something of the #B flavor appears so far to work (meaning it fixes the rest of the test case, and hasn't triggered a new problem yet): """ Index: typeobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/typeobject.c,v retrieving revision 2.251 diff -u -r2.251 typeobject.c --- typeobject.c 12 Nov 2003 20:43:28 -0000 2.251 +++ typeobject.c 13 Nov 2003 17:57:07 -0000 @@ -667,6 +667,17 @@ goto endlabel; } + /* We're still not out of the woods: anything torn down by slots + * or a base class dealloc may also trigger gc in a weakref callback. + * For reasons explained at the end of the function, we have to + * keep self tracked now. The only other way to make gc harmless + * is to temporarily resurrect self. We couldn't do that before + * calling PyObject_ClearWeakRefs because that function raises + * an exception if its argument doesn't have a refcount of 0. + */ + assert(self->ob_refcnt == 0); + self->ob_refcnt = 1; + /* Clear slots up to the nearest base with a different tp_dealloc */ base = type; while ((basedealloc = base->tp_dealloc) == subtype_dealloc) { @@ -693,6 +704,8 @@ _PyObject_GC_UNTRACK(self); /* Call the base tp_dealloc() */ + assert(self->ob_refcnt == 1); + self->ob_refcnt = 0; assert(basedealloc); basedealloc(self); """ I'm not sure those asserts *can't* trigger, though (well, actually, I'm sure they can, if a weakref callback resurrects self -- but that's a different problem), and the code is getting <heh> obscure. Maybe that comes with the territory. So fresh eyeballs would help. The problems with resurrection are related to Jim's problem, in that tp_clear can leave behind insane objects, and those can kill us whether a callback provokes the insanity directly (as in Jim's case), or a resurrected insane object gets provoked sometime later. I sketched a different scheme for solving those in a long msg yesterday (it doesn't involve subtype_dealloc; it involves changing gc to be much more aware of the problems weakref callbacks can create).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4