After the last round of discussion, I was left with the idea that the best thing we could do to help destructive iteration is to introduce a {}.popitem() that returns an arbitrary (key, value) pair and deletes it. I wrote about this: > > One more concern: if you repeatedly remove the *first* item, the hash > > table will start looking lobsided. Since we don't resize the hash > > table on deletes, maybe picking an item at random (but not using an > > expensive random generator!) would be better. and Tim replied: > Which is the reason SETL doesn't specify *which* set item is removed: if > you always start looking at "the front" of a dict that's being consumed, the > dict fills with turds without shrinking, you skip over them again and again, > and consuming the entire dict is still quadratic time. > > Unfortunately, while using a random start point is almost always quicker > than that, the expected time for consuming the whole dict remains quadratic. > > The clearest way around that is to save a per-dict search finger, recording > where the last search left off. Start from its current value. Failure if > it wraps around. This is linear time in non-pathological cases (a > pathological case is one in which it isn't linear time <wink>). I've implemented this, except I use a static variable for the finger intead of a per-dict finger. I'm concerned about adding 4-8 extra bytes to each dict object for a feature that most dictionaries never need. So, instead, I use a single shared finger. This works just as well as long as this is used for a single dictionary. For multiple dictionaries (either used by the same thread or in different threads), it'll work almost as well, although it's possible to make up a pathological example that would work qadratically. An easy example of such a pathological example is to call popitem() for two identical dictionaries in lock step. Comments please! We could: - Live with the pathological cases. - Forget the whole thing; and then also forget about firstkey() etc. which has the same problem only worse. - Fix the algorithm. Maybe jumping criss-cross through the hash table like lookdict does would improve that; but I don't understand the math used for that ("Cycle through GF(2^n)-{0}" ???). I've placed a patch on SourceForge: http://sourceforge.net/patch/?func=detailpatch&patch_id=102733&group_id=5470 The algorithm is: static PyObject * dict_popitem(dictobject *mp, PyObject *args) { static int finger = 0; int i; dictentry *ep; PyObject *res; if (!PyArg_NoArgs(args)) return NULL; if (mp->ma_used == 0) { PyErr_SetString(PyExc_KeyError, "popitem(): dictionary is empty"); return NULL; } i = finger; if (i >= mp->ma_size) ir = 0; while ((ep = &mp->ma_table[i])->me_value == NULL) { i++; if (i >= mp->ma_size) i = 0; } finger = i+1; res = PyTuple_New(2); if (res != NULL) { PyTuple_SET_ITEM(res, 0, ep->me_key); PyTuple_SET_ITEM(res, 1, ep->me_value); Py_INCREF(dummy); ep->me_key = dummy; ep->me_value = NULL; mp->ma_used--; } return res; } --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4