I think I have a reasonably elegant scheme to nail this. It's best described as modifications to what cyclic gc already does. So here's a summary, with the new steps identified by [NEW]: Cyclic gc first finds the maximal subset S of the objects in the current generation such that no object in S is reachable from outside of S. S is the (possibly empty) set of cyclic trash in the current generation. Next partition S into 5 ([NEW] -- is 3 today) disjoint sets: 1. Objects with __del__ methods. 2. Objects not in #1 reachable from an object in #1. 3. [NEW] Objects not in #1 or #2 with an associated weakref callback. 4. [NEW] Objects not in #1, #2 or #3 reachable from an object in #3. 5. Objects not in one of the other sets. Then: A. Call tp_clear on each object in set 5 (set 5 may mutate while this is going on, so that needs some care). If an object's refcount isn't 0 after calling its tp_clear, move it to the next older generation (that doesn't preclude that a later tp_clear in this step may reclaim it). B. [NEW] Invoke the callbacks associated with the objects still in set 3. This also needs some care, as the deallocations occurring in step #A may remove objects from set 3, or even just remove the weak references to them so that the objects in set 3 are still there, but no longer have an associated callback. I expect we'd have to contrive code to make that happen, but we have to be safe against every possibility. The callbacks invoked during this step may also remove callbacks from objects in set 3 we haven't yet gotten to, or even add new callbacks to objects in sets 1 through 4. C. [NEW] Move the objects still remaining in sets 3 and 4 to the youngest generation. D. Move the objects still remaining in set 1 to gc.garbage. E. Move the objects still remaining in set 2 to the next (older) generation. That's telegraphic, and is bursting with subtleties. Here are notes on the new subtleties: + A key observation is that running weakref callbacks on the objects in set 3 can't have any effect on the objects in set 5, nor can the states of the objects in set 5 affect what a callback may want to do. This is so because no object in set 5 is reachable from an object in set 3: a callback can neither consult nor alter a set 5 object. So clearing set 5 first (in step A) is harmless, and should allow most cyclic trash in most programs to get collected ASAP. + Clearing the objects in set 5 first is desirable also because doing so may break enough links that objects in sets 1 thru 4 get deallocated naturally (meaning via the usual refcount-falls-to-0 route). Note that it's quite possible that objects in sets 1 thru 4 are reachable from objects in set 5 -- it's the other direction where reachability can't hold (by construction of the partition, not by luck). + By the start of B, tp_clear hasn't been called on anything reachable from sets 3 or 4, so the callbacks "see" wholly intact objects. Nothing visible to the callbacks has been torn down: __dicts__ are still fully populated, __mro__ slots are still as they were, etc. Step B doesn't do any tp_clear itself either, so the only mutations that occur are those performed by the callbacks. If a callback destroys a piece of state some other callback wanted, that's entirely on the user's head. + Because a weakref callback destroys itself after it's called, in non-pathological programs no object in set 3 or 4 will have a weakref callback associated with it at the end of step B. We cannot go on to call tp_clear on these objects, because the instant the first callback returns, we have no idea anymore which of these objects are still part of cyclic trash (the callbacks can resurrect any or all of them, ditto add new callbacks to any/all). Determining whether they are still trash requires doing live/dead analysis over from scratch. Simply moving them into *some* generation ensures that they'll get analyzed again on a future run of cyclic gc. Moving them into the youngest generation is done because they almost certainly are (in almost all programs, almost all of the time) still cyclic trash, and without new weakref callbacks. Putting them in the youngest generation allows them to get reclaimed on the next gc invocation. In steady state for a sane program creating a sustained stream of cyclic trash with associated weakref callbacks, this delays their collection by one gc invocation: the reclamation throughput should equal the rate of trash creation, but there's a one-invocation reclamation latency introduced at the start. There's no new latency in invoking the callbacks. + Because we still won't collect cyclic trash with __del__ methods, or cyclic trash reachable from such trash, we do the partitioning in such a way that weakref callbacks on such trash don't get called at all -- we're not even going to try to reclaim them, so it may be surprising if their callbacks get invoked. OTOH, it may be desired that their callbacks get invoked despite that gc will never try to reclaim them on its own. Tough luck. The callbacks will get invoked if and when the user breaks enough cycles in gc.garbage to avoid running afoul of the __del__ restriction. Objections? Great <wink> objections are of two kinds: (1) it won't work; and (2) it can't be sold for a bugfix release. Note that 2.3.2 is segfaulting today, so *something* has to be done for a bugfix release. I don't believe this scheme alters any defined semantics, and to the contrary makes it possible to say for the first time that objects visible to callbacks are never in mysteriously (and undefinedly so) partly-destroyed states. Objecting that the order of callback invocation isn't defined doesn't hold, because the order isn't defined in 2.3.2 either. Tempting as it may be, a scheme that refused to collect cyclic trash with associated weakref callbacks would be an incompatible change; Jim also has a use case for that (a billion lines of Zope3 <wink>).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4