Since the spring of 2003, I have been developing Crux, which is a computer program for phylogenetic inferencing (bioinformatics) research. In March of 2004, I switched Crux to using Python from having used a different embeddable interpreter. For the most part, I have been very happy with Python, but Python's garbage collector has been a major source of frustration. Below, I describe my trials and tribulations with Python's GC. I also offer some suggestions for changes to Python; if any of the proposed changes receive favorable feedback, I am willing to help develop patches to Python. Naturally, if I am somehow abusing Python, and there are better ways to do things, I'd be happy to hear how to improve Crux. The important aspect of Crux is that it deals with trees. These trees are unrooted (there is no up or down), and multifurcating (nodes can have an arbitrary number of neighboring nodes). Thus, the trees are self-referential, and without the cyclic GC capabilities of Python, there would be little hope of making these trees integrate well with Python. Following is a diagram that illustrates the linkage between various objects for a simple tree. Crux exposes all of the components of the trees as Python objects. All lines in the diagram represent bi-directional references (except for the T-->N line). Every object refers to the tree object; those lines are left out in order to reduce clutter. T: Tree N N N: Node \ / E: Edge R R R: Ring \ / E E \ / R---------R / \ / \ / \ / \ | \ / | | \ / | | T--->N | | | | \ | / \ | / \----R----/ | E | R | N At the C (not Python object) level, the R-E-R construct is actually a set of structures that are allocated/deallocated as a single unit. Edges are *always* connected to two rings, so there's no point in allocating these separately. Also, lone ring objects actually are rings with one member; they refer to themselves (prev and next pointers). That should be enough information to understand the problems I encountered. 1) I originally had lone rings refer to themselves (ob_refcnt started out at 3; 2 self-references and one reference held by the associated edge). This didn't work. It appears that the cyclic GC does not actually calculate the number of live references to objects (references that are reachable by traversing all objects accessible from the root set); instead it assumes that if tp_clear() doesn't drop the reference count to a number that equals the number of references from live objects, there must still be references from live objects. Unfortunately, visit()ing self does not work, so there is no way to convince Python that all references are from unreachable objects. Working around this in Crux requires a lot of extra reference counting complexity, because there are three different cases for reference counts, depending on how many members there are in a ring (1, 2, or 3+ members). 2) This issue is really just a different manifestation of issue (1). At the C (not Python object) level, each node only actually stores a pointer to a single member of the associated ring. Given a single ring member, it is possible to traverse the ring and reach all other ring members. As mentioned in issue (1), the cyclic GC expects tp_traverse() to call visit() once for each reference held. It is not enough for a node to visit() one ring member; it must visit() all ring members, in order for the GC to count how many references are from unreachable objects, versus reachable from the root set. In summary, issues (1) and (2) are due to how the cyclic GC does the "marking" phase of mark/sweep GC. My expectation of mark/sweep GC is that it should be sufficient to assure that all objects reachable from the root set are visit()ed at least once; it should not be important how many times each unreachable object is visit()ed. I don't have a deep enough understanding of the Python interpreter to give a detailed suggestion for improvement. I suspect that determining the root set is not an easy operation; if this is the case, then I think we're stuck with the current design. If determining the root set *is* easy (or even possible), then I would suggest using a standard mark/sweep collector, where unreachable objects are scheduled for destruction. tp_traverse(), tp_clear(), and tp_dealloc() would retain the same structure; the only difference would be the logic that determines which objects can be destroyed. 3) A strange thing can happen when tp_clear() is called. Suppose that an edge object is being cleared, and it drops its references to the associated rings. If ob_refcnt of one of the rings drops to 0 as a result, Python will tp_dealloc() the ring *right* *now*, without ever calling tp_clear() on the ring. That means that I have to keep track of whether tp_clear() has been called on objects, if it is at all important that tp_clear() be called, so that I can manually do so in tp_dealloc(), if necessary. It is in my opinion reasonable to have cleanup code in tp_clear(), with the assumption that it will be called precisely once, but Python makes me do extra work to make sure that this happens. This should be pretty easy to change. A single bit per object is needed to keep track of whether tp_clear() has been called. I think this only needs to be done for object types that support cyclic GC. 4) There is no way to control the order in which objects are tp_dealloc()ed. This is a problem for the R-E-R construct, since at a low level, these objects are always linked together. What I would like to do is defer tp_dealloc() on the edge until after both rings have been deleted. Instead, I am forced to use a reference-counted deletion function. Not calling self->ob_type->tp_free() on the edge in tp_dealloc() until later is not a reasonable option, because this defers deletion of the edge until a later round of garbage collection. This could be addressed in the Python interpreter by paying heed to the return value of tp_dealloc(). If the return value is non-zero, move the object to the end of the list of objects to be destroyed, so that destruction is tried later. This allows the module to impose its own destruction ordering. I look forward to feedback. Thank you, Jason Evans
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4