Some optimizations to the TopBy method

I had planned to present the results of performance testing here, comparing my TopBy extension method against the existing LINQ alternative (OrderByDescending() followed by Take()). But while testing I came up with a couple of optimizations that provide slight performance increases. So I’ll show you those optimizations today, and then follow up with the performance tests in the next day or two.

My original code used a 3-heap because testing of my DHeap class showed that a 3-heap is almost always faster than a binary heap. But when I made the decision to use a 3-heap, I didn’t take into consideration the operations that the heap is expected to perform. If you recall from my discussion of the d-heap, heap insertions (the BubbleUp method) become faster as the value of d increases. Inserting into a 3-heap is faster than inserting into a binary heap. But heap removals (the SiftDown method) are slower as d increases.

The HeapSelector<TElement> class I presented last time never does a heap insert! The BubbleUp method doesn’t even exist in the code. The DoSelection method creates an array of the first numItems items and heapifies it, and then any changes to the heap are performed by sifting new elements down. The result is that my SiftDown method does more comparisons than it would as a binary heap, and there’s no corresponding reduction in comparisons during heap insertion.

The Heapify method in a 3-heap is faster than in a binary heap, and better locality of reference with the 3-heap offsets some of the lost performance in SiftDown, but overall the 3-heap is slightly slower with this mix of operations.

So I modified the code to use a binary heap. That simplifies the SiftDown method and provides a small increase (about 2%) in speed. Here are the modified Heapify and SiftDown methods.

    private void Heapify()
    {
        for (int i = _count / 2; i >= 0; --i)
        {
            SiftDown(i);
        }
    }

    private void SiftDown(int ix)
    {
        var child = (ix*2) + 1;
        while (child < _count)
        {
            if (child+1 < _count && Compare(child, child + 1) > 0)
                ++child;
            if (Compare(child, ix) >= 0)
                break;
            Swap(ix, child);
            ix = child;
            child = (ix*2) + 1;
        }
    }

The other optimization has to do with extracting keys from records before comparing to see if an item should go onto the heap. The original code looks like this:

    // For each remaining item . . .
    while (srcEnumerator.MoveNext())
    {
        ExtractKeys(srcEnumerator.Current, _numItems);
        if (Compare(_numItems, 0) > 0)
        {
            // The current item is larger than the smallest item.
            // So move the current item to the root and sift down.
            Swap(0, _numItems);
            SiftDown(0);
        }
    }

That extracts all of the keys for an item and then compares the item against the first (smallest) item on the heap. If the new item is larger than the smallest item, then the smallest item is replaced by the new item.

One of the things I learned when working with heap selection some time ago is that with typical data, especially when you’re selecting a small percentage of the total items, only a very small number of items are actually placed on the heap. In most cases the vast majority of items examined are smaller than the smallest item on the heap. That being true, it’s useful to perform as little work as possible to determine if an item will go onto the heap.

The other observation is that, when multiple sort keys are specified, the majority of items will differ in the first key. If you’re sorting people by last name and then first name, most of them will have different last names. It makes no sense, then, to extract the first name key if it won’t ever be compared. If we had a method that extracted only those keys it needs for the comparison, we should spend a lot less time extracting keys.

With that in mind, I created the ExtractCompare method:

    private int ExtractCompare(TElement item, int ix, int x, int y)
    {
        var rslt = 0;
        var i = 0;
        while (rslt == 0 && i < _criteria.Length)
        {
            // Extract this key
            _criteria[i].ExtractKey(item, ix);
            rslt = _criteria[i].CompareKeys(x, y);
            ++i;
        }
        // If the item isn't larger, we can short circuit
        if (rslt <= 0) return rslt;

        // The new item compares larger, so it will be placed on the heap.
        // Extract the rest of the keys
        _items[ix] = item;
        while (i < _criteria.Length)
        {
            _criteria[i].ExtractKey(item, ix);
            ++i;
        }
        return rslt;
    }

You can see here that if the new item is smaller than the item it’s being compared to (which in this case will be the smallest item on the heap), the other keys aren’t even extracted. Again, why extract a key that you won’t use?

Using this new method requires a small change in the selection loop:

    while (srcEnumerator.MoveNext())
    {
        if (ExtractCompare(srcEnumerator.Current, _numItems, _numItems, 0) > 0)
        {
            // The current item is larger than the smallest item.
            // So move the current item to the root and sift down.
            Swap(0, _numItems);
            SiftDown(0);
        }
    }

The performance increase from this optimization isn’t very noticeable when you have just one ordering clause (i.e. no ThenBy or ThenByDescending). My testing shows that it’s only two to three percent in the general case. The effect becomes much more evident when you have multiple sort keys, or if extracting the keys is expensive.

I did have one reservation about this optimization. Somebody could write a key selector method that has side effects that other parts of the program (perhaps subsequent key selections) depends on. I can’t think of a good reason why anybody would write their program to depend on the key selector’s side effects, though. I gave this quite a bit of thought and came to the conclusion that depending on a key selector’s side effects is so fraught with peril anyway that adding this potential problem is no big deal. Key selection shouldn’t modify global state.

Together, these two optimizations resulted in a performance increase of from five to seven percent in the single sort key case. Not huge by any standard, but worthwhile considering the simplicity of the changes.

Performance comparisons next time. Really.