In article <001901c30aab$bf31c060$b6b8958d@oemcomputer>, "Raymond Hettinger" <python@rcn.com> wrote: > > I'd very much like to see the current heapq replaced with a different > > interface in time for 2.3. I believe that an opaque object is better, > > since it allows more flexibility later. > > I'm quite pleased with the version already in CVS. It is a small > masterpiece of exposition, sophistication, simplicity, and speed. > A class based interface is not necessary for every algorithm. It has some elegance, but omits basic operations that are necessary for many heap-based algorithms and are not provided by this interface. Specifically, the three algorithms that use heaps in my upper-division undergraduate algorithms classes are heapsort (for which heapq works fine, but you would generally want to use L.sort() instead), Dijkstra's algorithm (and its relatives such as A* and Prim), which needs the ability to decrease keys, and event-queue-based plane sweep algorithms (e.g. for finding all crossing pairs in a set of line segments) which need the ability to delete items from other than the top. To see how important the lack of these operations is, I decided to compare two implementations of Dijkstra's algorithm. The priority-dict implementation from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/119466 takes as input a graph, coded as nested dicts {vertex: {neighbor: edge length}}. This is a variation of a graph coding suggested in one of Guido's essays that, as Raymond suggests, avoids using a separate class based interface. Here's a simplification of my dictionary-based Dijkstra implementation: def Dijkstra(G,start,end=None): D = {} # dictionary of final distances P = {} # dictionary of predecessors Q = priorityDictionary() # est.dist. of non-final vert. Q[start] = 0 for v in Q: D[v] = Q[v] for w in G[v]: vwLength = D[v] + G[v][w] if w not in D and (w not in Q or vwLength < Q[w]): Q[w] = vwLength P[w] = v return (D,P) Here's a translation of the same implementation to heapq (untested since I'm not running 2.3). Since there is no decrease in heapq, nor any way to find and remove old keys, I changed the algorithm to add new tuples for each new key, leaving the old tuples in place until they bubble up to the top of the heap. def Dijkstra(G,start,end=None): D = {} # dictionary of final distances P = {} # dictionary of predecessors Q = [(0,None,start)] # heap of (est.dist., pred., vert.) while Q: dist,pred,v = heappop(Q) if v in D: continue # tuple outdated by decrease-key, ignore D[v] = dist P[v] = pred for w in G[v]: heappush(Q, (D[v] + G[v][w], v, w)) return (D,P) My analysis of the differences between the two implementations: - The heapq version is slightly complicated (the two lines if...continue) by the need to explicitly ignore tuples with outdated priorities. This need for inserting low-level data structure maintenance code into higher-level algorithms is intrinsic to using heapq, since its data is not structured in a way that can support efficient decrease key operations. - Since the heap version had no way to determine when a new key was smaller than an old one, the heapq implementation needed two separate data structures to maintain predecessors (middle elements of tuples for items in queue, dictionary P for items already removed from queue). In the dictionary implementation, both types of items stored their predecessors in P, so there was no need to transfer this information from one structure to another. - The dictionary version is slightly complicated by the need to look up old heap keys and compare them with the new ones instead of just blasting new tuples onto the heap. So despite the more-flexible heap structure of the dictionary implementation, the overall code complexity of both implementations ends up being about the same. - Heapq forced me to build tuples of keys and items, while the dictionary based heap did not have the same object-creation overhead (unless it's hidden inside the creation of dictionary entries). On the other hand, since I was already building tuples, it was convenient to also store predecessors in them instead of in some other structure. - The heapq version uses significantly more storage than the dictionary: proportional to the number of edges instead of the number of vertices. - The changes I made to Dijkstra's algorithm in order to use heapq might not have been obvious to a non-expert; more generally I think this lack of flexibility would make it more difficult to use heapq for cookbook-type implementation of textbook algorithms. - In Dijkstra's algorithm, it was easy to identify and ignore outdated heap entries, sidestepping the inability to decrease keys. I'm not convinced that this would be as easy in other applications of heaps. - One of the reasons to separate data structures from the algorithms that use them is that the data structures can be replaced by ones with equivalent behavior, without changing any of the algorithm code. The heapq Dijkstra implementation is forced to include code based on the internal details of heapq (specifically, the line initializing the heap to be a one element list), making it less flexible for some uses. The usual reason one might want to replace a data structure is for efficiency, but there are others: for instance, I teach various algorithms classes and might want to use an implementation of Dijkstra's algorithm as a testbed for learning about different priority queue data structures. I could do that with the dictionary-based implementation (since it shows nothing of the heap details) but not the heapq one. Overall, while heapq was usable for implementing Dijkstra, I think it has significant shortcomings that could be avoided by a more well-thought-out interface that provided a little more functionality and a little clearer separation between interface and implementation. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4