On Sat, Mar 12, 2011 at 3:44 PM, Guido van Rossum <guido at python.org> wrote: > I recently advised a Googler who was sorting a large dataset and > running out of memory. My analysis of the situation was that he was > sorting a huge list of short lines of the form "shortstring,integer" > with a key function that returned a tuple of the form ("shortstring", > integer). As Raymond pointed out, a change I made for 3.2 significantly shrinks the memory footprint of sorting with a key (although it's still more memory-intensive than sorting with cmp). He could reduce the memory footprint further by sorting in two passes instead of using a tuple, leveraging the fact that Python guarantees a stable sort. In 3.2 or later, this technique will require roughly twice as much memory as just storing the list: biglist.sort(key=lambda s: int(s.split(',')[1])) # Sort by the integer biglist.sort(key=lambda s: s.split(',')[0]) # Sort by the shortstring I think the use cases are pretty narrow where there's plenty of memory for storing the list but not enough to store two copies. -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20110313/93fb7a0f/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4