[Christopher Petrilli] > .... > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. How do you know that? I've used large sets in Python happily without resorting to C or kjbuckets (which is really aiming at fast operations on *graphs*, in which area it has no equal). Everyone (except Eric <wink>) uses dicts to implement sets in Python, and "most" set operations can work at full C speed then; e.g., assuming both sets have N elements: membership testing O(1) -- it's just dict.has_key() element insertion O(1) -- dict[element] = 1 element removal O(1) -- del dict[element] union O(N), but at full C speed -- dict1.update(dict2) intersection O(N), but at Python speed (the only 2.1 dog in the bunch!) choose some element and remove it took O(N) time and additional space in 2.0, but is O(1) in both since dict.pop() was introduced iteration O(N), with O(N) additional space using dict.keys(), or O(1) additional space using dict.pop() repeatedly What are you going to do in C that's faster than using a Python dict for this purpose? Most key set operations are straightforward Python dict 1-liners then, and Python dicts are very fast. kjbuckets sets were slower last time I timed them (several years ago, but Python dicts have gotten faster since then while kjbuckets has been stagnant). There's a long tradition in the Lisp world of using unordered lists to represent sets (when the only tool you have is a hammer ... <0.5 wink>), but it's been easy to do much better than that in Python almost since the start. Even in the Python list world, enormous improvements for large sets can be gotten by maintaining lists in sorted order (then most O(N) operations drop to O(log2(N)), and O(N**2) to O(N)). Curiously, though, in 2.1 we can still use a dict-set for complex numbers, but no longer a sorted-list-set! Requiring a total ordering can get in the way more than requiring hashability (and vice versa -- that's a tough one). measurement-is-the-measure-of-all-measurable-things-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4