[Armin Rigo] >>>> from heapq import * >>>> def isorted(iterable): >>>> heap = list(iterable) >>>> heapify(heap) >>>> while heap: >>>> yield heappop(heap) [Raymond Hettinger] >>> In terms of memory, I think list.sort() always beats the above >>> implementation. [Tim] >> That can't be -- the heap method only requires a fixed (independent >> of N) and small amount of working storage. list.sort() may need to >> allocate O(N) additional temp bytes under the covers (to create a >> working area for doing merges; it can be expected to allocate 2*N >> temp bytes for a random array of len N, which is its worst case; if >> there's a lot of pre-existing order in the input array, it can >> sometimes get away without allocating any temp space). [Raymond] > The isorted() generator shown above operates on a copy of the data > while list.sort() works in-place. So, my take on it was the > isorted() always used 2*N while list.sort() used 2*N only in the > worst case. Ah. But that's comparing apples and donkeys: Armin's example works on any iterable, while list.sort() only works on lists. I assumed that by "list.sort()" you meant "the obvious method *based* on list.sort() also accepting any iterable", i.e., def isorted(iterable): copy = list(iterable) copy.sort() for x in copy: yield x Then it's got all the space overhead of the list copy in Armin's version, plus the additional hidden temp memory allocated by sort. Something to note: most applications that only want the "first N" or "last N" values in sorted order know N in advance, and that's highly exploitable. David Eppstein and I had a long thread about that here a while back. The example of implementing an "N-best queue" in the heapq test suite is a much better use of heaps when N is known, accepting an iterable directly (without turning it into a list first), and using storage for only N items. When N is (as is typical) much smaller than the total number of elements, that method can beat the pants off list.sort() even with the Python implementation of heaps. Indeed, Guido and I used that method for production code in Zope's full-text search subsystem (find the N best matches to a search query over some 10-200K documents). David presented a method that ran even faster, provided it was coded just right, based on doing quicksort-like partitioning steps on a buffer of about 3*N values. That also uses total space proportional to N (independent of the total number of incoming elements). A heap-based N-best queue would probably beat that again now that heaps are implemented in C. OTOH, if we implemented a quicksort-like partitioning routine in C too ... (it also suffers from gobs of fiddly little integer arithmetic and simple array indexing, which screams in C).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4