[Martin v. Loewis] > ... > When stepping through the code, I also missed support for the > relationship between identity and equality. E.g. in > PyObject_RichCompare, I'd expect > > if (v == w) { > switch (op) > case Py_EQ:case Py_LE:case Py_GE: > Py_INCREF(Py_True); > return Py_True; > case Py_NE:case Py_LT:case Py_GT: > Py_INCREF(Py_False); > return Py_False; > } > } > > That would not help in your case, of course. I don't even know how > frequent comparing identical objects is in real life - but this is > something that PyObject_Compare has that PyObject_RichCompare > currently doesn't. Guido insisted (with cause <wink>) on these four pairs as being equivalent: x < y iff y > x x <= y y >= x x == y y == x x != y y != x but beyond that, in the presence of rich comparisons, agreed not to make any other assumptions about what those pixel-bags "mean". In particular, there's no implication that "x <= y" iff "x < y or x == y", or that "x < y" implies "x != y", etc. Applying that to the above leaves you with nothing but if (v == w && op == Py_EQ) /* then return Py_True */ Which is about all PyObject_Compare's if (v == w) return 0; assumes too. So I don't see much future in that. [later, a patch to fill in the richcmp slot for strings] > +static PyObject* > +string_richcompare(PyStringObject *a, PyStringObject *b, int op) > +{ > + int c; > + PyObject *result; > + if (!PyString_Check(b)) { > + result = Py_NotImplemented; > + goto out; > + } > + if (op == Py_EQ) { > + if (a->ob_size != b->ob_size) { > + result = Py_False; > + goto out; > + } > +#ifdef CACHE_HASH > + if (a->ob_shash != b->ob_shash > + && a->ob_shash != -1 > + && b->ob_shash != -1) { > + result = Py_False; > + goto out; > + } > +#endif > + } > + c = string_compare(a, b); > + switch (op) { > + case Py_LT: c = c < 0; break; > + case Py_LE: c = c <= 0; break; > + case Py_EQ: c = c == 0; break; > + case Py_NE: c = c != 0; break; > + case Py_GT: c = c > 0; break; > + case Py_GE: c = c >= 0; break; > + default: > + result = Py_NotImplemented; > + goto out; > + } > + result = c ? Py_True : Py_False; > + out: > + Py_INCREF(result); > + return result; [and that yields about an 8% speedup in the "<" case] That looks on the right track, but maybe at the wrong level: why is it necessary? That is, the bulk of the "smarts" here in the switch stmt are type-independent: if there's no specific implementation of individual comparisons, but there is a tp_compare, then the switch stmt applies verbatim to *any* such type. Do we have to fill in the richcmp slot for everything to get Python to realize that? I mean "just about everything", too: while, e.g., ceval special-cases "<" for ints, that doesn't do sorting or max or min etc on ints a lick of good (they don't go thru the COMPARE_OP opcode then, but thru the general comparison routines). The "speed problem" appears to be: + COMPARE_OP calls cmp_outcome() + which calls PyObject_RichCompare() + which calls do_richcmp() + which calls try_rich_compare() (unsuccessfully now, successfully after your patch) which fails to find a richcmp slot on either operand (now) so says "not implemented" + then calls try_3way_to_rich_compare() + which calls try_3way_compare() + which finally calls the tp_compare slot + then runs exactly the same switch (op) { case Py_LT: c = c < 0; break; case Py_LE: c = c <= 0; break; case Py_EQ: c = c == 0; break; case Py_NE: c = c != 0; break; case Py_GT: c = c > 0; break; case Py_GE: c = c >= 0; break; } result = c ? Py_True : Py_False; switch as your patch and things unwind. So we've got 7 function calls there, not even counting calls to PyErr_Occurred() and PyObject_IsTrue(), all to find about 3 machine instructions that actually do the compare <wink>. You got an 8% speedup for one type by tricking the switch stmt into appearing 3 calls earlier. What if the implementation were smarter, and did it for *all* relevant types even a call or two before that? I don't see any reason "in principle" that compares couldn't be much faster, and via the usual gimmicks: bigger, smarter functions that remember what they've already determined so don't need to figure it out over and over again, and fast paths to favor common cases at the expense of comparisons from Mars. One thing to note here: the workhorse comparisons are "like strings" in having no *logical* need for richcmps at all; and the objects for which richcmps were introduced were numerical arrays, which can much better afford a longer code path to *find* them (one matrix compare will trigger many vanilla element compares anyway, so even for arrays it's much more important that the *latter* be fast). The code now is approximately backwards in that respect (it takes gobs of work before we even *look* for a cmp now -- indeed, if a type has both cmp and richcmp slots now, and we're doing an explict "cmp" compare, the code now tries to *simulate* cmp first via a long sequence of richcmp calls!). I don't have time to uglify this code, but Python would benefit from it. and-no-matter-what-guido-may-say<wink>-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4