Jim Fulton wrote: > > Christopher Petrilli wrote: > > > > Neil Schemenauer [nas@arctrix.com] wrote: > > > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > > > Unfortunately, for me, a Python implementation of Sets is only > > > > interesting academicaly. Any time I've needed to work with them at a > > > > large scale, I've needed them *much* faster than Python could achieve > > > > without a C extension. > > > > > > I think this argues that if sets are added to the core they > > > should be implemented as an extension type with the speed of > > > dictionaries and the memory usage of lists. Basicly, we would > > > use the implementation of PyDict but drop the values. > > > > This is effectively the implementation that Zope has for Sets. > > Except we use sorted collections with binary search for sets. > > I think that a simple hash-based set would make alot of sense. > > > In > > addition we have "buckets" that have scores on them (which are > > implemented as a modified BTree). > > > > Unfortunately Jim Fulton (who wrote all the code for that level) is in > > a meeting, but I hope he'll comment on the implementation that was > > chosen for our software. > > We have a number of special needs: > > - Scalability is critical. We make some special opimizations, > like sets of integers and mapping objects with integer keys > and values. In these cases, data are stored using C int arrays, > allowing very efficient data storage and manipulation, especially > when using integer keys. > > - We need to spread data over multiple database records. Our data > structures may be hundreds of megabytes in size. We have ZODB-aware > structures that use multiple independently stored database objects. > > - Range searches are very common, and under some circomstances, > sorted collections and BTrees can have very little overhead > compared to dictionaries. For this reason, out mapping objects > and sets have been based on BTrees and sorted collections. > > Unfortunately, our current BTree implementation has a flaw that > causes excessive number of objects to be updated when items are > added and removed. (Each BTree internal node keeps track of the number > of objects contained in it.) Also, out current sets are limited > to integers and cannot be spread over multiple database records. > > We are completing a new BTree implementation that overcomes these > limitations. IN this implementation, we will provide sets as > value-less BTrees. You may want to check out a soon to be released new mx package: mxBeeBase. This is an on-disk b+tree implementation which supports data files up to 2GB on 32-bit platforms. Here's a preview: http://www.lemburg.com/python/mxBeeBase.html (The links on that page are not functional.) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4