[Raymond Hettinger] > No, I meant that it is *much* faster to do an indexed write to a > pre-allocated list than to use anything relying on PyList_Append() or > ins1(). All of those individual calls to realloc are expensive. The > 50 to 1 dilution of that expense is how it beats Knuth's two stack > trick. That's fine. As I said before, I've never had a use for a queue in Python where the two-stack trick (coded in Python, even) wasn't more than fast enough for me. If I did, there are a universe of possibilities, including Marc-Andre's mxQueue. That doesn't use Python lists; doesn't call realloc on every queue put (only when it's out of room, which is "never" when a queue reaches steady state); and, when it does realloc, overallocates by a much larger amount than Python's list type. > Guess which of these two statements is faster and by how much: > > list(itertools.repeat(None, 5000)) > list(tuple(itertools.repeat(None, 5000))) I did guess, and was correct to 6 significant decimal digits <wink>. > The answer is important because it points the way to faster list > comprehensions and other list building applications. That Python's builtin list calls realloc() on every append is pretty gross, but it's also a tradeoff, saving the bytes in the list header that would be required to remember how many allocated-but-unused slots exist. If you've got a million tiny lists (and some apps do), burning an additional 8MB for something your app doesn't really need isn't attractive (on a 32-bit box today, the list header is 16 bytes; pymalloc aligns to 8-byte boundaries, so the next possible size is 24 bytes). list's very small overallocation (in the ballpark of 5%) is also aimed at minimizing memory burden. Speed isn't everything, tradeoffs are hard, and something loses when something else is favored. (BTW, I never would have considered implementing a resizable list type *without* keeping a field to record the amount of overallocation, so this isn't the tradeoff I would have made <wink>.) > ... > And my interest is in a collections module which should probably > published elsewhere (too much negativity on the subject here). A collections *package* is a fine idea. I say "package" instead of "module" because there are literally hundreds of container types, and queues and bags are a tiny corner of that space. An example dear to my heart is BTrees (I maintain Zope's BTree implementation), which is a wonderful way to get an associative mapping sorted by key value. BTrees have a large API all by themselves, so it wouldn't make sense to try to squash them into a module along with 50 other container types. > I am frankly surprised that so many here think that a collections > module would be bad for the language. I was more of the impression that bags didn't have fans <wink>. > Go figure. It's not like these are new, untried ideas -- the goal > was supposed be improving productivity by providing higher level > tools. A fast underlying implementation was just icing on the cake. I'm strongly in favor of a collections package.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4