Christopher Petrilli wrote: > > Neil Schemenauer [nas@arctrix.com] wrote: > > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > > Unfortunately, for me, a Python implementation of Sets is only > > > interesting academicaly. Any time I've needed to work with them at a > > > large scale, I've needed them *much* faster than Python could achieve > > > without a C extension. > > > > I think this argues that if sets are added to the core they > > should be implemented as an extension type with the speed of > > dictionaries and the memory usage of lists. Basicly, we would > > use the implementation of PyDict but drop the values. > > This is effectively the implementation that Zope has for Sets. Except we use sorted collections with binary search for sets. I think that a simple hash-based set would make alot of sense. > In > addition we have "buckets" that have scores on them (which are > implemented as a modified BTree). > > Unfortunately Jim Fulton (who wrote all the code for that level) is in > a meeting, but I hope he'll comment on the implementation that was > chosen for our software. We have a number of special needs: - Scalability is critical. We make some special opimizations, like sets of integers and mapping objects with integer keys and values. In these cases, data are stored using C int arrays, allowing very efficient data storage and manipulation, especially when using integer keys. - We need to spread data over multiple database records. Our data structures may be hundreds of megabytes in size. We have ZODB-aware structures that use multiple independently stored database objects. - Range searches are very common, and under some circomstances, sorted collections and BTrees can have very little overhead compared to dictionaries. For this reason, out mapping objects and sets have been based on BTrees and sorted collections. Unfortunately, our current BTree implementation has a flaw that causes excessive number of objects to be updated when items are added and removed. (Each BTree internal node keeps track of the number of objects contained in it.) Also, out current sets are limited to integers and cannot be spread over multiple database records. We are completing a new BTree implementation that overcomes these limitations. IN this implementation, we will provide sets as value-less BTrees. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4