>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes: TP> [Jeremy Hylton]> >> So the real problem is defining some reasonable semantics for >> comparison of recursive objects. TP> I think this is exactly a graph isomorphism problem, since TP> Python always compares "by value" (so isomorphism is the natural TP> generalization). I'm not familiar with any algorithms for the graph isomorphism problem, but I took a stab at a simple comparison algorithm. The idea is to detect comparisons that would cross back-edges in the object graphs. Instead of starting a new comparison, assume they are the same. If, in fact, the objects are not the same, they must differ in some other way; some other part of the comparison will fail. TP> This isn't hard (!= tedious, alas) to define or to implement TP> naively, but a straightforward implementation would be very TP> expensive at runtime compared to the status quo. That's why TP> "real languages" <wink> would rather suffer an infinite loop. TP> It's expensive because there's no cheap way to know whether you TP> have a loop in an object. My first attempt at implementing this is expensive. I maintain a dictionary that contains all the object pairs that are currently being compared. Specifically, the dictionary is used to implement a set of object id pairs. Every call to PyObject_Compare will add a new pair to the dictionary when it is called and remove it when it returns (except for a few trivial cases). A naive patch is included below. It does seem to involve a big performance hit -- more than 10% slower on pystone. It also uses a lot of extra space. Note that the patch has all its initialization code inline in PyObject_Compare; moving that elsewhere will help a little. It also use a bunch of function calls where macros would be more efficient. TP> An anal compromise would be to run comparisons full speed TP> without trying to detect loops, but if the recursion got "too TP> deep" break out and start over with an expensive alternative TP> that does check for loops. The later requires machinery similar TP> to copy.deepcopy's. It looks like the anal compromise might be necessary. I'll re-implement the patch more carefully and see what the real effect on performance is. Jeremy Index: object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 239c239 < "__repr__ returned non-string (type %.200s)", --- > "__repr__ returned non-string (type %s)", 276c276 < "__str__ returned non-string (type %.200s)", --- > "__str__ returned non-string (type %s)", 300a301,328 > static PyObject *cmp_state_key = NULL; > > static PyObject* > cmp_state_make_pair(v, w) > PyObject *v, *w; > { > PyObject *pair = PyTuple_New(2); > if (pair == NULL) > return NULL; > if ((long)v <= (long)w) { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); > } else { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); > } > return pair; > } > > void > cmp_state_clear_pair(dict, key) > PyObject *dict, *key; > { > PyDict_DelItem(dict, key); > Py_DECREF(key); > } > > 305a334,336 > PyObject *tstate_dict, *cmp_dict, *pair; > int result; > 311a343,376 > tstate_dict = PyThreadState_GetDict(); > if (tstate_dict == NULL) { > PyErr_BadInternalCall(); > return -1; > } > /* fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v, > v->ob_type->tp_name, (long)w, w->ob_type->tp_name); > */ > /* XXX should initialize elsewhere */ > if (cmp_state_key == NULL) { > cmp_state_key = PyString_InternFromString("compare_state"); > cmp_dict = PyDict_New(); > if (cmp_dict == NULL) > return -1; > PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); > } else { > cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key); > if (cmp_dict == NULL) > return NULL; > PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); > } > > pair = cmp_state_make_pair(v, w); > if (pair == NULL) { > PyErr_BadInternalCall(); > return -1; > } > if (PyDict_GetItem(cmp_dict, pair)) { > /* already comparing these objects. assume they're > equal until shown otherwise > */ > Py_DECREF(pair); > return 0; > } 316a382,384 > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } 317a386 > cmp_state_clear_pair(cmp_dict, pair); 329a399,401 > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } 344a417 > cmp_state_clear_pair(cmp_dict, pair); 350,364c423,425 < else if (PyUnicode_Check(v) || PyUnicode_Check(w)) { < int result = PyUnicode_Compare(v, w); < if (result == -1 && PyErr_Occurred() && < PyErr_ExceptionMatches(PyExc_TypeError)) < /* TypeErrors are ignored: if Unicode coercion < fails due to one of the arguments not < having the right type, we continue as < defined by the coercion protocol (see < above). Luckily, decoding errors are < reported as ValueErrors and are not masked < by this technique. */ < PyErr_Clear(); < else < return result; < } --- > cmp_state_clear_pair(cmp_dict, pair); > if (PyUnicode_Check(v) || PyUnicode_Check(w)) > return PyUnicode_Compare(v, w); 372c433,434 < if (vtp->tp_compare == NULL) --- > if (vtp->tp_compare == NULL) { > cmp_state_clear_pair(cmp_dict, pair); 374c436,439 < return (*vtp->tp_compare)(v, w); --- > } > result = (*vtp->tp_compare)(v, w); > cmp_state_clear_pair(cmp_dict, pair); > return result;
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4