[Bernhard Herzog] > Wouldn't it be possible to call the callbacks of all weakrefs that > point to a cycle about to be destroyed before that destruction begins? Yes, but GC couldn't also go on to call tp_clear then -- without deeper changes, the objects would have to leak. Suppose objects I and J have (strong) references to each other -- they form a two-object cycle. Suppose I also holds a weakref to J, with a callback to a method of I. Suppose the cycle becomes unreachable. GC detects that. It can also (with small changes to current code) detect that J has a weakref-associated callback, and invoke it. But when the callback returns, GC must stop trying to make progress: at that point it knows absolutely nothing anymore about the object graph, because there's absolutely nothing a callback can't do. In particular, because the callback in the example is a method of I, it has full access to I (via the callback's "self" argument), and because I has a strong reference to J, it also has full access to J. The callback can resurrect either or both the objects, and/or install new weakref callbacks on either or both, or even break the strong-reference cycle manually so that normal refcounting completely destroys I before the callback returns (although there's an obscure technical reason for why the callback can't completely destroy J before it returns -- I ahd J are different in this one respect). If GC went on to, for example, execute tp_clear on I or J, tp_clear can leave behind an accessible (if the callback resurrected it) insane object, where "insane" means one that a user-- whether in innocence or by hostile design doesn't matter --can exploit to crash the interpreter. For example, Jim has proven that a new-style class object is insane in this way after its tp_clear is invoked, and it's extremely easy to provoke one into segfaulting. Of course that's right out -- we're trying to repair a current segfault, not supply subtler ways to create segfaults. We also have to do this within the boundaries of what can be sold for a bugfix release, so gross changes in semantics are also right out. In particular, we've never said that tp_clear has to leave an object in a usable state, so it would be a hard sell to start to demand that in a bugfix release. Still, I want this to work. There's a saving grace here that __del__ methods don't have: if a __del__ method resurrects an object, there's nothing to stop the __del__ method from getting called again (when the refcount falls to 0 again). But weakref callbacks are *already* one-shot things: a given weakref callback destroys itself as part of the process of getting invoked. So once we've invoked a weakref callback for J, that callback is history. Sick code *in* the callback could install *another* weakref callback on J, so we have to be careful, but J's original callbacks are gone forever, and in almost all code will leave J callback-free. As above, GC cannot go on to call tp_clear after invoking a callback. However, after invoking all the callbacks, it *could* start another "mini" gc cycle, taking the list of cyclic trash as its starting point (as "the generation" to be collected). This is the only way it can know what the post-callback state of the object graph is. In all sane code, this mini-gc run will discover that (a) all this stuff is still cyclic trash, and (b) none of it has weakref-callbacks anymore. *Then* it's safe to run through the list calling tp_clear methods. In sick code (code that resurrects objects via a weakref callback, or registers new weakref callbacks to dead objects via a weakref callback), the mini gc run will automatically remove the resurrected objects from current consideration (they'll move to an older generation as a matter of course). It may even discover that nothing is trash anymore. If so, no harm done: because we haven't called tp_clear on anything, nothing has been damaged. If there's some trash left with (necessarily) new weakref callbacks, we're back to where we started. We *could* proceed the same way then, but I'm afraid that would give actively hostile code a way to put gc into a never-ending loop. Instead I'd simply move those objects into the next generation, and let gc end then. Again, because we haven't called tp_clear on anything, nothing has been damaged in this case either. A subtlety: instead of doing the "mini gc pass", why not just move the leftover objects into an older generation and let gc return right away then? The problem: any weakref callback in any cyclic trash would stop a complete invocation of gc from removing any trash then. A perfectly ordinary, non-hostile program, that happened to create lots of weakref callbacks in cyclic trash could then get into a state where every time gc runs, it finds one of these things, and despite that the app never does anything sick (like resurrecting in a callback), gc would never make any progress. The true purpose of the "mini gc pass" is to ensure that gc does make progress in sane code, and no matter how quickly and sustainedly it creates dead cycles containing weakref callbacks. Terminology subtlety: the "mini" in "mini gc pass" refers to that the generation it starts with is presumably small, not to that this pass has an especially easy time of it. It still has to do all the work of deducing liveness and deadness from scratch. There are no shortcuts it can take here, simply because there's nothing a callback can't do. However, this pass should go quickly: it starts with what *was* entirely trash in cycles, and it's probably still entirely trash in cycles. This is maximally easy for Python's kind of cyclic gc (it chases all and only the objects in the dead cycles then -- it doesn't have to visit any objects outside the dead cycles, *unless* the cycles aren't truly dead anymore). So for sane programs, it adds gc time proportional to the number of pointers in the dead cycles, independent of the total number of objects. All cyclic trash found by all gc invocations consumes a little more time too, because we have to ask each trash object whether it has an associated weakref callback. In most programs, most of the time, the answer will be "no".
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4