Hi! This is a rant against the optimization trend of the Python interpreter. Sorting a list of 100000 integers in random order takes: * 0.75 seconds in Python 2.1 * 0.51 seconds in Python 2.2 * 0.46 seconds in Python 2.3 Tim Peters did a great job optimizing list.sort(). If I try with a simple, non-stable pure Python quicksort implementation, in Python 2.3: * 4.83 seconds * 0.21 seconds with Psyco First step towards world domination of high-level languages :-) The reason that Psyco manages to outperform the C implementation is not that gcc is a bad compiler (it is about 10 times better than Psyco's). The reason is that the C implementation must use a generic '<' operator to compare elements, while the Psyco version quickly figures out that it can expect to find ints in the list; it still has to check this assumption, but this is cheap and then the comparison is done with a single machine instruction. Similarily, here are some results about the heapq module, which is rewritten in C in the CVS tree for Python 2.4: l = [random.random() for x in range(200000)] heapq.heapify(l) This code executes on my laptop in: * 1.96 seconds on Python 2.3 (pure Python) * 0.18 seconds on Python 2.4cvs (rewritten in C) * 0.16 seconds on Python 2.3 with Psyco So this is not so much a plug for Psyco as a rant against the current trend of rewriting standard modules in C. Premature optimization and all that. Worse, and more importantly, the optimization starts to become visible to the programmer. Iterators, for example, are great in limited cases but I consider their introduction a significant complication in the language; before, you could expect that some function from which you would expect a sequence returned a list. Python was all lists and dicts, with dicts used as namespaces here and there. Nowadays you have to be careful. Moreover, it is harder to explain: >>> zip([1,2,3], [4,5,6]) # easy to understand and explain [(1, 4), (2, 5), (3, 6)] >>> enumerate([6,7,8,9]) # uh ? <enumerate object at 0x401a102c> I know you can always do list(_). My point is that this is a user-visible optimization. enumerate() should return a normal list, and it should be someone else's job to ensure that it is correctly optimized away if possible (and I'm not even talking about Psyco, it could be done in the current Python implementation with a reasonable amount of effort). Protesting-ly yours, Armin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4