> [Phillip J. Eby] > > What expense? The extra memory overhead for the index? I suppose > > so. > > Yes, that is an expense. Partly because of the extra memory space in > len(list) temp tuples, but mostly because space allocated for integer > objects is immortal. That is, > > range(10000000) > > grabs space for 1000000 distinct integer objects that's never reused for any > other kind of object, and so does stuffing a million distinct int objects > into a temp DSU list. Note that this is very different from doing > > for i in xrange(1000000): > > which allocates space for only three integer objects (100000, the current > value of i, and preceding value of i), and keeps reusing it. > > A cleverer implementation might be able to avoid permanently ratcheting the > space devoted to int objects. > > > But if you *don't* want that behavior, you can still DSU manually, no? > > I hope so <wink>. After reading this exchange, I'm not sure I agree with Tim about the importance of avoiding to compare the full records. Certainly the *cost* of that comparison doesn't bother me: I expect it's usually going to be a tuple or list of simple types like ints and strings, and comparing those is pretty efficient. Sure, there's a pattern that requires records with equal to remain in the original order, but that seems an artefact of a pattern used only for external sorts (where the sorted data is too large to fit in memory -- Knuth Vol. III is full of algorithms for this, but they seem mostly of historical importance in this age of Gigabyte internal memory). The pattern is that if you want records sorted by zipcode and within each zipcode sorted by name, you first sort by name and then do a stable sort by zipcode. This was common in the days of external sorts. (External sorts are still common in some application domains, of course, but I doubt that you'd find much Python being used there for the main sort.) But using Raymond's proposal, you can do that by specifying a tuple consisting of zipcode and name as the key, as follows: myTable.sort(key = lambda rec: (rec.zipcode, rec.name)) This costs an extra tuple, but the values in the tuple are not new, so it costs less space than adding the index int (not even counting the effect of the immortality of ints). And tuples aren't immortal. (To be honest, for tuples of length < 20, the first 2000 of a given size *are* immortal, but that's a strictly fixed amount of wasted memory.) Given that this solution isn't exactly rocket science, I think the default behavior of Raymond's original proposal (making the whole record the second sort key) is fine. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4