Okay, so i was going to wait until i had an implementation before bringing this up here, but it seems the other discussion has come round to sets already so this might be a good time. After Eric asked about sets at dinner (e.g. having 'x in dict' do 'dict.has_key(x)') i wondered about what it would take to support sets. Guido's response to 'x in dict' was that it was inconsistent since 'x in list' searches the values of the list, while the proposed 'x in dict' would search the keys. A friend (Mark Miller, designer of E -- see www.erights.org) and i had chatted about this before, and one idea that was suggested was to have the mapping map each member of the set to itself. Then 'in' would give the expected behaviour. E also manages to overload | and & so that they do useful things with mappings-used- as-dictionaries as well as union and intersection for mappings- used-as-sets. I expect, however, that & will be rarely used on mappings-used-as-dictionaries; more common is the desire for |, which Python has solved nicely with dict.update() (also better because it is clearly non-commutative). So i think the clearest thing to do is to make sets a separate built-in type. Here's the interface i was thinking of: >>> s = {1, 5, 7} # no colons means a set >>> type(s) <type 'set'> >>> s # sets are printed in hashed order {7, 1, 5} >>> {1: 4, 5, 7} # mixing pairs and singletons is an error File "<stdin>", line 1 {1: 4, 5, 7} ^ SyntaxError: invalid syntax >>> {3, 4: 5, 7} # mixing pairs and singletons is an error File "<stdin>", line 1 {3, 4: 5, 7} ^ SyntaxError: invalid syntax >>> 5 in s # membership 1 >>> 6 in s 0 >>> t = s # sets are mutable, so this is an alias >>> t.append(2) # like list.append(), accepts one new member >>> s {7, 5, 2, 1} >>> u = s.copy() # make a copy >>> u.append(9) >>> u.append([3]) # set members must be hashable Traceback (innermost last): File "<stdin>", line 1, in ? TypeError: unhashable type >>> u {7, 5, 9, 2, 1} >>> u.extend(range(5)) # like list.extend(), accepts a sequence or set >>> u {9, 7, 5, 4, 3, 2, 1, 0} >>> u.remove(7) # like list.remove() >>> s {7, 5, 2, 1} >>> s | {3, 5} # union {7, 5, 3, 2, 1} >>> s & {1, 2, 3} # intersection {2, 1} >>> s <= u # subset 1 >>> s >= s 1 >>> u <= s 0 >>> for i in s: print i # iterate in hashed order 7 5 2 1 >>> l = list(s) >>> l.sort() >>> for i in l: print i # iterate in sorted order 1 2 5 7 >>> s.clear() # for completeness, i suppose >>> s >>> s[3] # probably best not to permit subscripting Traceback (innermost last): File "<stdin>", line 1, in ? TypeError: unsubscriptable object >>> s[5:7] Traceback (innermost last): File "<stdin>", line 1, in ? TypeError: unsliceable object In short, sets get append, extend, remove # from lists clear, copy # from dictionaries and the operators &, |, in, <=, >=, <, >, == They could be implemented like dicts with just keys and no values. Two open issues: 1. How to spell the empty set? Possibilities include {,} or {-} or a constant in __builtins__. __builtins__ would probably get a conversion function 'set' (soon to be a type-object/constructor like int, list, etc.), so you could just say 'set()'. 2. How do sets sort? We could insert them somewhere in the hierarchy of types next to lists and tuples. Currently () > [] > {}, so we could have () > [] > {,} > {} for example. More tricky is the question of what to do with the partial ordering created by >, >=, etc. a. The answer is the same as whatever answer we get when Python supports rich comparisons and instances can have partial orderings too. b. We can make >, >= behave like dictionary >, >= so that there is a defined sort order, and put the subset/superset functionality in a method. Option a. would be nicest since there are other good uses for rich comparisons, and in b. we get comparison operators that don't really do anything useful. (Side note: Do we achieve a. just by divorcing __cmp__ from >, >=, etc.? sorting would use __cmp__, while > and < would look for __gt__, __lt__, etc., and if not found, fall back on __cmp__. __cmp__ contracts to be stable [a.__cmp__(b) always has the same outcome for a given a and b], reflexive [if a == b, then a.__cmp__(b) == 0], consistent [if a.__cmp__(c) == 0 and b.__cmp__(d) == 0, then a.__cmp__(b) == c.__cmp__(d)], symmetric [a.__cmp__(b) + b.__cmp__(a) == 0 for all a, b], and transitive [if a.__cmp__(b) > 0 and b.__cmp__(c) > 0, then a.__cmp__(c) > 0; if a.__cmp__(b) == 0 and b.__cmp__(c) == 0, then a.__cmp__(c) == 0], but __gt__, __lt__ need make no such promises. Side side note: which of these promises, if any, should we ask __gt__ et al to make? Stability, at least?) ((Side side side note: in E, Mark Miller also ran into the problem of spelling different kinds of "equals"es. For object identity we have "is", for content comparison we have "==". If we need a new operator for magnitude equality i suggest "<=>", as used in E.)) Well, it may be a lot of words, but it's not much good unless you are going to use it in practice. I think i would have good use for it, but i would like to hear your opinions. Would you use such a thing? -- ?!ng
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4