Guido van Rossum wrote: > > After the last round of discussion, I was left with the idea that the > best thing we could do to help destructive iteration is to introduce a > {}.popitem() that returns an arbitrary (key, value) pair and deletes > it. I wrote about this: > > > > One more concern: if you repeatedly remove the *first* item, the hash > > > table will start looking lobsided. Since we don't resize the hash > > > table on deletes, maybe picking an item at random (but not using an > > > expensive random generator!) would be better. > > and Tim replied: > > > Which is the reason SETL doesn't specify *which* set item is removed: if > > you always start looking at "the front" of a dict that's being consumed, the > > dict fills with turds without shrinking, you skip over them again and again, > > and consuming the entire dict is still quadratic time. > > > > Unfortunately, while using a random start point is almost always quicker > > than that, the expected time for consuming the whole dict remains quadratic. > > > > The clearest way around that is to save a per-dict search finger, recording > > where the last search left off. Start from its current value. Failure if > > it wraps around. This is linear time in non-pathological cases (a > > pathological case is one in which it isn't linear time <wink>). > > I've implemented this, except I use a static variable for the finger > intead of a per-dict finger. I'm concerned about adding 4-8 extra > bytes to each dict object for a feature that most dictionaries never > need. So, instead, I use a single shared finger. This works just as > well as long as this is used for a single dictionary. For multiple > dictionaries (either used by the same thread or in different threads), > it'll work almost as well, although it's possible to make up a > pathological example that would work qadratically. > > An easy example of such a pathological example is to call popitem() > for two identical dictionaries in lock step. > > Comments please! We could: > > - Live with the pathological cases. > > - Forget the whole thing; and then also forget about firstkey() > etc. which has the same problem only worse. > > - Fix the algorithm. Maybe jumping criss-cross through the hash table > like lookdict does would improve that; but I don't understand the > math used for that ("Cycle through GF(2^n)-{0}" ???). That algorithm is really a gem which you should know, so let me try to explain it. Intro: A little story about finite field theory (very basic). ------------------------------------------------------------- For every prime p and every power p^n, there exists a Galois Field ( GF(p^n) ), which is a finite field. The additive group is called "elementary Abelian", it is commutative, and it looks a little like a vector space, since addition works in cycles modulo p for every p cell. The multiplicative group is cyclic, and it never touches 0. Cyclic groups are generated by a single primitive element. The powers of that element make up all the other elements. For all elements of the multiplication group GF(p^n)* the equality x^(p^n -1) == 1 . A generator element is therefore a primitive (p^n-1)th root of unity. >From another point of view, the elements of GF(p^n) can be seen as coefficients of polynomials over GF(p). It can be easily shown that every generator of the multiplicative group is an irreducible polynomial of degree n with coefficients in GF(p). An irreducible polynomial over a field has the property not to vanish for any value of the field. It has no zero in the field. By adjoining such a zero to the field, we turn it into an extension field: GF(p^n). Now on the dictionary case. --------------------------- The idea is to conceive every non-zero bit pattern as coefficients of a polynomial over GF(2)[x]. We need to find an irreducible polynomial of degee n over the prime field GF(2). There exists a primitive n'th root µ of unity in GF(2^n) which generates every non-zero bit pattern of length n, being coefficients of a polynomial over µ. That means, every given bit pattern can be seen as some power of µ. µ enumerates the whole multiplicative group, and the given pattern is just one position in that enumeration. We can go to the next position in this cycle simply by multiplying the pattern by µ. But since we are dealing with polynomials over µ, this multiplication isn't much more that adding one to very exponent in the polynomial, hence a left shift of our pattern. Adjusting the overflow of this pattern involves a single addition, which is just an XOR in GF(2^n). Example: p=2 n=3 G = GF(8) = GF(2^3) ---------------------------------------- """ Since 8 = 2^3, the prime field is GF(2) and we need to find a monic irreducible cubic polynomial over that field. Since the coefficients can only be 0 and 1, the list of irreducible candidates is easily obtained. x^3 + 1 x^3 + x + 1 x^3 + x^2 + 1 x^3 + x^2 + x + 1 Now substituting 0 gives 1 in all cases, and substituting 1 will give 0 only if there are an odd number of x terms, so the irreducible cubics are just x^3 + x + 1 and x^3 + x^2 + 1. Now the multiplicative group of this field is a cyclic group of order 7 and so every nonidentity element is a generator. Letting µ be a root of the first polynomial, we have µ^3 + µ + 1 = 0, or µ^3 = µ + 1, so the powers of µ are: µ^1 = µ µ^2 = µ^2 µ^3 = µ + 1 µ^4 = µ^2 + µ µ^5 = µ^2 + µ + 1 µ^6 = µ^2 + 1 µ^7 = 1 """ We could of course equally choose the second polynomial with an isomorphic result. The above example was taken from http://www-math.cudenver.edu/~wcherowi/courses/finflds.html Note that finding the irreducible polynomial was so easy since a reducible cubic always has a linear factor. In the general case, we have to check against all possible subpolynomials or use much more of the theory. Application of the example to the dictionary algorithm (DA) ----------------------------------------------------------- We stay in GF(8) and use the above example. The maximum allowed pattern value in our system is 2^n - 1. This is the variable "mask" in the program. We assume a given non-zero bit pattern with coefficients (a2, a1, a0) and write down a polynomial in µ for it: p = a2*µ^2 + a1*µ + a0 To obtain the next pattern in the group's enumeration, we multiply by the generator polynomial µ: p*µ = a2*µ^3 + a1*µ^2 + a0*µ In the program, this is line 210: incr = incr << 1; a simple shift. But in the case that a2 is not zero, we get an overflow, and we have to fold the bit back, by the following identity: µ^3 = µ+1 That means, we have to subtract µ^3 from the pattern and add µ+1 instead. But since addition and subtraction is identical in GF(2), we just add the whole polynomial. >From a different POV, we just add zero here, since µ^3 + µ + 1 = 0 The full progam to perform the polynomial multiplication gets down to just a shift, a test and an XOR: incr = incr << 1; if (incr > mask) incr ^= mp->ma_poly; Summary ======= For every power n of two, we can find a generator element µ of GF(2^n). Every non-zero bit pattern can be taken as coefficient of a polynomial in µ. The powers of µ reach all these patterns. Therefore, each pattern *is* some power of µ. By multiplication with µ we can reach every possible pattern exactly once. Since these patterns are used as distances from the primary hash-computed slot modulo 2^n, and the distances are never zero, all slots can be reached. ----------------------------------- Appendix, on the use of finger: ------------------------------- Instead of using a global finger variable, you can do the following (involving a cast from object to int) : - if the 0'th slot of the dict is non-empty: return this element and insert the dummy element as key. Set the value field to the Dictionary Algorithm would give for the removed object's hash. This is the next finger. - else: treat the value field of the 0'th slot as the last finger. If it is zero, initialize it with 2^n-1. Repetitively use the DA until you find an entry. Save the finger in slot 0 again. This dosn't cost an extra slot, and even when the dictionary is written between removals, the chance to loose the finger is just 1:(2^n-1) on every insertion. ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4