Wondering about the trick of copysort not copying a singly-referenced list I decided to try it out in a tiny extension module, and, yes, it is just as trivial as one might wish (haven't dealt with optional args to sort, just wanting to check performance &c): static PyObject* copysort(PyObject* self, PyObject* args) { PyObject *sequence, *listresult; if(!PyArg_ParseTuple(args, "O", &sequence)) return 0; if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) { listresult = sequence; Py_INCREF(listresult); } else { listresult = PySequence_List(sequence); } if(listresult) { if(PyList_Sort(listresult) == -1) { Py_DECREF(listresult); listresult = 0; } } return listresult; } and performance on an equally trivial testcase: x = dict.fromkeys(range(99999)) def looponsorted1(x): keys = x.keys() keys.sort() for k in keys: pass def looponsorted2(x, c=copysort.copysort): for k in c(x.keys()): pass turns out to be identical between the two _with_ The Trick (4.4e+04 usec with timeit.py -c on my box) while without The Trick copysort would slow down to about 5.5e+04 usec. But, this reminds me -- function filter, in bltinmodule.c, uses just about the same trick too (to modify in-place when possible rather than making a new list -- even though when it does make a new list it's an empty one, not a copy, so the gain is less). There must be other cases of applicability which just haven't been considered. So... Shouldn't The Trick be embodied in PySequence_List itself...? So, the whole small tricky part above: if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) { listresult = sequence; Py_INCREF(listresult); } else { listresult = PySequence_List(sequence); } would collapse to a single PySequence_List call -- *AND* potential calls from Python code such as "x=list(somedict.keys())" might also be speeded up analogously... [Such a call looks silly when shown like this, but in some cases one might not know, in polymorphic use, whether a method returns a new or potentially shared list, or other sequence, and a call to list() on the method's result then may be needed to ensure the right semantics in all cases]. Is there any hidden trap in The Trick that makes it unadvisable to insert it in PySequence_List? Can't think of any, but I'm sure y'all will let me know ASAP what if anything I have overlooked...;-). One might even be tempted to reach down all the way to PyList_GetSlice, placing THERE The Trick in cases of all-list slicing of a singly-referenced list (PyList_GetSlice is what PySequence_List uses, so it would also get the benefit), but that might be overdoing it -- and encouraging list(xxx) instead of xxx[:], by making the former a bit faster in some cases, would be no bad thing IMHO (already I'm teaching newbies to prefer using list(...) rather than ...[:] strictly for legibility and clarity, being able to mention possible future performance benefits might well reinforce the habit...;-). Alex
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4