[Moshe Zadka] > ... > Let me note that almost everything Greg Wilson wants to do can be done > via a Python class implementing a set using a dictionary mapping to None. > Almost? > > * No builitin syntax: import Set;Set(1,2,3) instead of {1,2,3} > * Convertors: if we want list/tuple to have a semblance of efficiency, > we'll need to cache the element list as a list when accessed by > index. Like everyone else, I have a Set class too. Its __getitem__ is # so "for x in set:" works # see also set.tolist() def __getitem__(self, i): if i == 0: self.keys = self.d.keys() return self.keys[i] Of course I don't *want* set[i] to mean anything at all; this is just a hack to work with the current iteration protocol (i.e., Set.__getitem__ makes no sense except in that it must be implemented so that "for x in set" works today). The one thing I could never get used to in Aaron's kjbuckets implementation of sets is that for/in did not work as expected. > * Two different constructors: set() for building from sequence, Set() > for building from elements. Might be confusing. My Set.__init__: def __init__(self, seq=[]): self.d = d = {} for x in seq: d[x] = 1 That is, the constructor takes a sequence. Same thing in Icon, which added a set type late in the game. No problem in practice. > * Only possible elements of a set are immutables. OTOH, I'm not sure > how Greg intends to implement his sets if these sets are allowed > to contain mutable elements. This needs to be addressed, but several approaches are possible. For example, my Set class allows for Sets of Sets (& so on), because I needed them, and Sets are certainly mutable. Now immutable objects are required because dicts require immutable objects as keys. However, a class (like Set!) can fool dicts, by supplying a __hash__ and a __cmp__ "that work". Dicts can be compared directly, so __cmp__ is no problem. The difficulty comes with the __hash__. My solution was to "freeze" a Set the instant __hash__ was invoked: this allows mutation up until the point a hash is captured, and disallows it thereafter. Concretely, def __hash__(self): if self.frozen: hashcode = self.hashcode else: # The hash code must not depend on the order of the # keys. self.frozen = 1 hashcode = 0 _hash = hash for x in self.d.keys(): hashcode = hashcode ^ _hash(x) self.hashcode = hashcode return hashcode and all the mutating methods check self.frozen, raising ValueError if the target set is currently frozen. For example, # updating union def unionu(self, other): if self.frozen: raise ValueError(_frozenmsg) self.d.update(other.d) Oddly enough, I almost never got a "frozen" error in practice; it seems that by the time I was ready to build a set of sets, the constituent sets were at their final values; and the few times I did get a frozen error, it was actually a logic bug (I was mutating the set in error). It's hard to guess whether that's unique to me <0.5 wink>. Icon has a different approach entirely: an Icon set acts like a Python dict *would* act if you inserted the id()s of the keys rather than the keys themselves; that is, Icon's sets (and dicts) are based on object identity, not object value. Since object identity is invariant regardless of whether the object is mutable, a hash table implementation has no difficulties. I find identity semantics for sets less useful than value semantics, though. [Eric Raymond] > ... > Within a few hours after I see [rich comparisons] in the beta I'll > have a very rich Set class for the standard library. It includes > all the standard boolean operations, symmetric difference, powerset, > and useful overloads for most of the standard operators. As the > header comment says: > > # A set-algebra module for Python > # > # The functions work on any sequence type and return lists. > # The set methods can take a set or any sequence type as an argument. > ... I'm afraid this is a good example of why a set type is unlikely to make it into the std distribution: whenever a data structure is added to Python, the agitation for variations is endless. For example, returning a list makes your flavor of sets unsuitable (because inefficient) for many applications (e.g., there's no efficient way to test a list for element membership). "Almost all" set implementations I've seen for Python use dicts for that reason. Another set of complaints will be due to some people wanting functional versions of the set operations, while others want mutating versions. My own Set class supplies both, because both are really needed in practice; e.g., the updating union was shown above; the functional union builds on that: # functional union def union(self, other): answer = self.__copy__() answer.unionu(other) return answer (subtlety: the functional union has no problem with "frozen" sets; Set.__copy__ always returns a melted set). Since it's Greg's PEP, it's up to him to make everyone happy <wink> -- but since there's so much variation possible, I'm not sure Guido will ever bless any Set class as "the" Set class. BTW, one lesson to take from SETL: a vital set operation in practice is a mutating "pick a 'random' element E from the set, remove it from the set, and return it". An enormous number of set algorithms are of the form while not S.empty(): pick some element from S deal with it, possibly mutating S This operation can't be done efficiently in Python code if the set is represented by a dict (the best you can do is materialize the full list of keys first, and pick one of those). That means my Set class often takes quadratic time for what *should* be linear-time algorithms. if-the-set-type-is-a-toy-there's-no-need-to-put-it-in-the- distribution-but-if-it's-not-a-toy-it-can't-be-written- in-python-today-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4