This is a multi-part message in MIME format. --Boundary_(ID_G6Ak++OVPb3/j8dlAUI+lw) Content-type: text/plain; charset=Windows-1252 Content-transfer-encoding: 7BIT In an effort to save time on email (ya, right ...), I wrote up a pretty detailed overview of the "timsort" algorithm. It's attached. all-will-be-revealed-ly y'rs - tim --Boundary_(ID_G6Ak++OVPb3/j8dlAUI+lw) Content-type: text/plain; name=timsort.txt Content-transfer-encoding: quoted-printable Content-disposition: attachment; filename=timsort.txt /*-----------------------------------------------------------------------= ---- A stable natural mergesort with excellent performance on many flavors of lightly disordered arrays, and as fast as samplesort on random arrays. In a nutshell, the main routine marches over the array once, left to = right, alternately identifying the next run, and then merging it into the = previous runs. Everything else is complication for speed, and some measure of = memory efficiency. Runs ---- count_run() returns the # of elements in the next run. A run is either "ascending", which means non-decreasing: a0 <=3D a1 <=3D a2 <=3D ... or "descending", which means strictly decreasing: a0 > a1 > a2 > ... Note that a run is always at least 2 long, unless we start at the = array's last element. The definition of descending is strict, because the main routine = reverses a descending run in-place, transforming a descending run into an = ascending run. Reversal is done via the obvious fast "swap elements starting at = each end, and converge at the middle" method, and that can violate stability = if the slice contains any equal elements. Using a strict definition of descending ensures that a descending run contains distinct elements. If an array is random, it's very unlikely we'll see long runs, much of = the rest of the algorithm is geared toward exploiting long runs, and that = takes a fair bit of work. That work is a waste of time if the data is random, = so if a natural run contains less than MIN_MERGE_SLICE elements, the main = loop artificially boosts it to MIN_MERGE_SLICE elements, via binary insertion sort applied to the right number of array elements following the short natural run. In a random array, *all* runs are likely to be = MIN_MERGE_SLICE long as a result, and merge_at() short-circuits the expensive stuff in = that case. The Merge Pattern ----------------- In order to exploit regularities in the data, we're merging on natural run lengths, and they can become wildly unbalanced. But that's a Good = Thing for this sort! Stability constrains permissible merging patterns. For example, if we = have 3 consecutive runs of lengths A:10000 B:20000 C:10000 we dare not merge A with C first, because if A, B and C happen to = contain a common element, it would get out of order wrt its occurence(s) in B. = The merging must be done as (A+B)+C or A+(B+C) instead. So merging is always done on two consecutive runs at a time, and = in-place, although this may require some temp memory (more on that later). When a run is identified, its base address and length are pushed on a = stack in the MergeState struct. merge_collapse() is then called to see = whether it should merge it with preceeding run(s). We would like to delay = merging as long as possible in order to exploit patterns that may come up later, = but we would like to do merging as soon as possible to exploit that the run = just found is still high in the memory hierarchy. We also can't delay = merging "too long" because it consumes memory to remember the runs that are = still unmerged, and the stack has a fixed size. What turned out to be a good compromise maintains two invariants on the stack entries, where A, B and C are the lengths of the three righmost = not-yet merged slices: 1. A > B+C 2. B > C Note that, by induction, #2 implies the lengths of pending runs form a decreasing sequence. #1 implies that, reading the lengths right to = left, the pending-run lengths grow at least as fast as the Fibonacci numbers. Therefore the stack can never grow larger than about log_base_phi(N) = entries, where phi =3D (1+sqrt(5))/2 ~=3D 1.618. Thus a small # of stack slots = suffice for very large arrays. If A <=3D B+C, the smaller of A and C is merged with B, and the new run = replaces the A,B or B,C entries; e.g., if the last 3 entries are A:30 B:20 C:10 then B is merged with C, leaving A:30 BC:30 on the stack. Or if they were A:500 B:400: C:1000 then A is merged with B, leaving AB:900 C:1000 on the stack. In both examples, the stack configuration still violates invariant #2, = and merge_at() goes on to continue merging runs until both invariants are satisfied. As an extreme case, suppose we didn't do the MIN_MERGE_SLICE gimmick, and natural runs were of lengths 128, 64, 32, 16, 8, 4, 2, and = 2. Nothing would get merged until the final 2 was seen, and that would = trigger 7 perfectly balanced (both runs involved have the same size) merges. The thrust of these rules when they trigger merging is to balance the = run lengths as closely as possible, while keeping a low bound on the number of runs we have to remember. This is maximally effective for random = data, where all runs are likely to be of (artificially forced) length MIN_MERGE_SLICE, and then we get a sequence of perfectly balanced = merges. OTOH, the reason this sort is so good for lightly disordered data has to = do with wildly unbalanced run lengths. Merge Memory ------------ Merging adjacent runs of lengths A and B in-place is very difficult. Theoretical constructions are known that can do it, but they're too = difficult and slow for practical use. But if we have temp memory equal to min(A, = B), it's easy. If A is smaller, copy A to a temp array, leave B alone, and then we can do the obvious merge algorithm left to right, from the temp area and B, starting the stores into where A used to live. There's always a free = area in the original area comprising a number of elements equal to the number not yet merged from the temp array (trivially true at the start; proceed by induction). The only tricky bit is that if a comparison raises an exception, we have to remember to copy the remaining elements back in = from the temp area, lest the array end up with duplicate entries from B. If B is smaller, much the same, except that we need to merge right to = left, starting the stores at the right end of where B used to live. In all, then, we need no more than N/2 temp array slots. A refinement: When we're about to merge adjacent runs A and B, we first do a form of binary search (more on that later) to see where B[0] should end up in A. Elements in A preceding that point are already in their = final positions, effectively shrinking the size of A. Likewise we also search to see where A[-1] should end up in B, and elements of B after that = point can also be ignored. This cuts the amount of temp memory needed by the same amount. It may not pay, though. Merge Algorithms ---------------- When merging runs of lengths A and B, if A/2 <=3D B <=3D 2*A (i.e., = they're within a factor of two of each other), we do the usual straightforward = one-at- a-time merge. This can take up to A+B comparisons. If the data is = random, there's very little potential for doing better than that. If there are = a great many equal elements, we can do better than that, but there's no = way to know whether there *are* a great many equal elements short of doing a great many additional comparisons (we only use "<" in sort), and that's too expensive when it doesn't pay. If the sizes of A and B are out of whack, we can do much better. The Hwang-Lin merging algorithm is very good at merging runs of mismatched lengths if the data is random, but I believe it would be a mistake to try that here. As explained before, if we really do have random data, = we're almost certainly going to stay in the A/2 <=3D B <=3D 2*A case. Instead we assume that wildly different run lengths correspond to *some* sort of clumpiness in the data. Without loss of generality, assume A is the shorter run. We first look for A[0] in B. We do this via = "galloping", comparing A[0] in turn to B[0], B[1], B[3], B[7], ..., B[2**j - 1], ..., until finding the k such that B[2**(k-1) - 1] < A[0] <=3D B[2**k - 1]. = This takes at most log2(B) comparisons, and, unlike a straight binary search, favors finding the right spot early in B. Why that's important may = <wink> become clear later. After finding such a k, the region of uncertainty is reduced to 2**(k-1) = - 1 consecutive elements, and a straight binary search requires exactly k-1 comparisons to nail it. Now we can copy all the B's up to that point in one chunk, and then copy = A[0]. If the data really is clustered, the new A[0] (what was A[1] at the = start) is likely to belong near the start of what remains of the B run. That's why we gallop first instead of doing a straight binary search: if the = new A[0] really is near the start of the remaining B run, galloping will = find it much quicker. OTOH, if we're wrong, galloping + binary search never = takes more than 2*log2(B) compares, so can't become a disaster. If the = clumpiness comes in distinct clusters, gallop + binary search also adapts nicely to that. I first learned about the galloping strategy in a related context; do a Google search to find this paper available online: "Adaptive Set Intersections, Unions, and Differences" (2000) Erik D. Demaine, Alejandro L=F3pez-Ortiz, J. Ian Munro and its followup(s). -------------------------------------------------------------------------= --*/ --Boundary_(ID_G6Ak++OVPb3/j8dlAUI+lw)--
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4