[Neil Schemenauer] > I think this argues that if sets are added to the core they > should be implemented as an extension type with the speed of > dictionaries and the memory usage of lists. Basicly, we would > use the implementation of PyDict but drop the values. They'll be slower than dicts and take more memory than lists then. WRT memory, dicts cache the hash code with each entry for speed (so double the memory of a list even without the value field), and are never more than 2/3 full anyway. The dict implementation also gets low-level speed benefits out of using both the key and value fields to characterize the nature of a slot (the key field is NULL iff the slot is virgin; the value field is NULL iff the slot is available (virgin or dummy)). Dummy slots can be avoided (and so also the need for runtime code to distinguish them from active slots) by using a hash table of pointers to linked lists-- or flex vectors, or linked lists of small vectors --instead, and in most ways that leads to much simpler code (no more fiddling with dummies, no more probe-sequence hassles, no more boosting the size before the table is full). But without fine control over the internals of malloc, that takes even more memory in the end. Interesting twist: "a dict" *is* "a set", but a set of (key, value) pairs further constrained so that no two elements have the same key. So any set implementation can be used as-is to implement a dict as a set of 2-tuples, customizing the hash and "is equal" functions to look at just the tuples' first elements. The was the view taken by SETL in 1969, although their "map" (dict) type was eventually optimized to get away from actually constructing 2-tuples. Indeed, SETL eventually grew an elaborate optional type declaration sublanguage, allowing the user to influence many details of its many internal set-storage schemes; e.g., from pg 399 of "Programming With Sets: An Introduction to SETL": For example, we can declare [I'm putting their keywords in UPPERCASE for, umm, clarity] successors: LOCAL MMAP(ELMT b) REMOTE SET(ELMT b); This declaration specifies that for each x in b the image set successors{x} is stored in the element block of x, and that this image set is always to be represented as a bit vector. Similarly, the declaration successors: LOCAL MMAP(ELMT b) SPARSE SET(ELMT b); specifies that for each x in b the image set successors{x} is to be stored as a hash table containing pointers to elements of b. Note that the attribute LOCAL cannot be used for image sets of multivalued maps, This follows from the remarks in section 10.4.3 on the awkwardness of making local objects into subparts of composite objects. Clear? Snort. Here are some citations lifted from the web for their experience in trying to make these kinds of decisions by magic: @article{dewar:79, title="Programming by Refinement, as Exemplified by the {SETL} Representation Sublanguage", author="Robert B. K. Dewar and Arthur Grand and Ssu-Cheng Liu and Jacob T. Schwartz and Edmond Schonberg", journal=toplas, year=1979, month=jul, volume=1, number=1, pages="27--49" } @article{schonberg:81, title="An Automatic Technique for Selection of Data Structures in {SETL} Programs", author="Edmond Schonberg and Jacob T. Schwartz and Micha Sharir", journal=toplas, year=1981, month=apr, volume=3, number=2, pages="126--143" } @article{freudenberger:83, title="Experience with the {SETL} Optimizer", author="Stefan M. Freudenberger and Jacob T. Schwartz and Micha Sharir", pages="26--45", journal=toplas, year=1983, month=jan, volume=5, number=1 } If someone wanted to take sets seriously today, a better approach would be to define a minimal "set interface" ("abstract base class" in C++ terms), then supply multiple implementations of that interface, letting the user choose directly which implementation strategy they want for each of their sets. And people are doing just that in the C++ and Java worlds; e.g., http://developer.java.sun.com/developer/onlineTraining/ collections/Collection.html#SetInterface Curiously, the newer Java Collections Framework (covering multiple implementations of list, set, and dict interfaces) gave up on thread-safety by default, because it cost too much at runtime. Just another thing to argue about <wink>. we're-not-exactly-pioneers-here-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4