Last time I introduced the TopBy
extension method, which I want to implement for LINQ to Objects. On the surface of it, doing such a thing seems simple enough: just implement the IOrderedEnumerable interface. That turns out to be more involved than you might expect.
Documentation for IOrderedEnumerable
isn’t particularly helpful. The brief description says:
Represents a sorted sequence.
There are no remarks or other detailed information. The interface has two methods: CreateOrderedEnumerable<TKey>
and GetEnumerator
.
If you’ve worked with LINQ at all, you know what GetEnumerator
is supposed to do: generate the sequence. More correctly, it returns an enumerator that iterates through the collection, but something has to generate the items in the collection, and in keeping with the lazy nature of LINQ, we typically generate those items during enumeration.
Documentation for CreateOrderedEnumerable<TKey> isn’t much help, either. The brief description says:
Performs a subsequent ordering on the elements of an
IOrderedEnumerable<TElement>
according to a key.
The prototype tells us that there are three parameters: a key selector, a comparer, and a flag to indicate if the subsequent ordering should be ascending or descending. And, the only remark is:
The functionality provided by this method is like that provided by ThenBy or ThenByDescending, depending on whether descending is
true
orfalse
. They both perform a subordinate ordering of an already sorted sequence of typeIOrderedEnumerable<TElement>
.
Not so helpful. The only clue in the documentation that even hints at an implementation strategy is in the Remarks section of ThenBy
and ThenByDescending
:
ThenBy
andThenByDescending
are defined to extend the typeIOrderedEnumerable<TElement>
, which is also the return type of these methods. This design enables you to specify multiple sort criteria by applying any number ofThenBy
orThenByDescending
methods.
The idea is easy enough. We want to create a chain of comparison operations so that we can apply code that does essentially this:
if first keys are equal then
if second keys are equal then
if third keys are equal then
... etc.
It all starts with the TopBy
method, which has two overloads:
public static IOrderedEnumerable<TSource> TopBy<TSource, TKey>(
this IEnumerable<TSource> source,
int numItems,
Func<TSource, TKey> keySelector);
public static IOrderedEnumerable<TSource> TopBy<TSource, TKey>(
this IEnumerable<TSource> source,
int numItems,
Func<TSource, TKey> keySelector,
IComparer<TKey> comparer);
There are corresponding TopByDescending
methods, as well, which have the same parameters.
Those methods return an IOrderedEnumerable<TSource>
, so I need to create a class that implements the IOrderedEnumerable<TSource>
interface. I called that class HeapOrderedEnumerable<TElement>
. In its skeletal form, it looks like this:
internal abstract class HeapOrderedEnumerable<TElement> : IOrderedEnumerable<TElement>
{
public IOrderedEnumerable<TElement> CreateOrderedEnumerable<TKey>(
Func<TElement, TKey> keySelector,
IComparer<TKey> comparer,
bool descending)
{
throw new NotImplementedException();
}
public IEnumerator<TElement> GetEnumerator()
{
throw new NotImplementedException();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
If you think about it a bit, that HeapOrderedEnumerable<TElement>
class isn’t sufficient because the key type varies. What we really need is a HeapOrderedEnumerable<TElement,TKey>
that we derive from HeapOrderedEnumerable<TElement>
.
internal class HeapOrderedEnumerable<TElement, TKey> : HeapOrderedEnumerable<TElement>
{
}
TopBy
, then, creates and returns a HeapOrderedEnumerable<TSource, TKey>
. A reference to that object is passed to ThenBy
, which in turns calls the CreateOrderedEnumerable
method to create additional sort criteria. It’s up to us to figure out how to chain those multiple sort criteria together so that when GetEnumerator
is called, items are selected appropriately.
Clear as mud, right?
The HeapOrderedEnumerable<TElement, TKey>
instance has to save the parameters that are passed to its constructor so that they can be used when it’s time to generate the sequence. The class also has to save a reference to its parent so that we can properly construct the chain of comparers when the time comes. Some of the information to be saved (the source list, number of items to select, and the parent pointer) has to be accessible by the base class. The rest can be private to the derived class.
With that information, we can create internal data definitions and constructors for the classes. First, the base class:
// Internal data and constructor for HeapOrderedEnumerable<TElement>
private readonly IEnumerable<TElement> _source;
private readonly int _numItems;
internal HeapOrderedEnumerable<TElement> Parent;
protected HeapOrderedEnumerable(
IEnumerable<TElement> source,
int numItems)
{
_source = source;
_numItems = numItems;
}
Note that the Parent
isn’t set by the constructor. We’ll come back to it in a minute.
The derived class, HeapOrderedEnumerable<TElement, TKey>
, saves the key selector, the comparer, and the descending flag.
internal class HeapOrderedEnumerable<TElement, TKey> : HeapOrderedEnumerable<TElement>
{
private readonly Func<TElement, TKey> _keySelector;
private readonly IComparer<TKey> _comparer;
private readonly bool _descending;
internal HeapOrderedEnumerable(
IEnumerable<TElement> source,
int numItems,
Func<TElement, TKey> keySelector,
IComparer<TKey> comparer,
bool descending) : base(source, numItems)
{
_keySelector = keySelector;
_comparer = comparer ?? Comparer<TKey>.Default;
_descending = descending;
}
}
Now we can write the CreateOrderedEnumerable
method. All it does is create a HeapOrderedEnumerable<TElement, TKey>
, and link it to the parent so that we have a chain that we can work with when GetEnumerator
is called.
public IOrderedEnumerable<TElement> CreateOrderedEnumerable<TKey>(
Func<TElement, TKey> keySelector,
IComparer<TKey> comparer, bool descending)
{
var oe = new HeapOrderedEnumerable<TElement, TKey>(
_source, _numItems, keySelector, comparer, descending);
oe.Parent = this;
return oe;
}
And, finally, the TopBy
and TopByDescending
methods.
public static IOrderedEnumerable<TSource> TopBy<TSource, TKey>(
this IEnumerable<TSource> source,
int numItems,
Func<TSource, TKey> keySelector)
{
return new HeapOrderedEnumerable<TSource, TKey>(source, numItems, keySelector, null, false);
}
public static IOrderedEnumerable<TSource> TopBy<TSource, TKey>(
this IEnumerable<TSource> source,
int numItems,
Func<TSource, TKey> keySelector,
IComparer<TKey> comparer)
{
return new HeapOrderedEnumerable<TSource, TKey>(source, numItems, keySelector, comparer, false);
}
public static IOrderedEnumerable<TSource> TopByDescending<TSource, TKey>(
this IEnumerable<TSource> source,
int numItems,
Func<TSource, TKey> keySelector)
{
return new HeapOrderedEnumerable<TSource, TKey>(source, numItems, keySelector, null, true);
}
public static IOrderedEnumerable<TSource> TopByDescending<TSource, TKey>(
this IEnumerable<TSource> source,
int numItems,
Func<TSource, TKey> keySelector,
IComparer<TKey> comparer)
{
return new HeapOrderedEnumerable<TSource, TKey>(source, numItems, keySelector, comparer, true);
}
Note that I don’t have to define the ThenBy
or ThenByDescending
methods. Those methods are already defined by LINQ. They call CreateOrderedEnumerable
on the IOrderedEnumerable
reference that’s passed as the first parameter. So, if there’s a TopBy
followed by ThenBy
, the HeapOrderedEnumerable<TElement>
that TopBy
created gets called. If the first function is OrderBy
, then the IOrderedEnumerable
implementation created by that is called.
The code now will create a chain of comparers from a TopBy
or TopByDescending
followed by any number of ThenBy
or ThenByDescending
calls. The chain is built backwards, but that’s okay; it’s easy enough to reverse the chain when it comes time to enumerate items.
Enumerating the items involves extracting and saving keys, and doing the actual heap selection. In the next post I’ll talk a bit about key selection, and start building the GetEnumerator
method. I might be able to finish next time, but there’s a lot of material to cover and it might be too big for a single post. If so, we’ll finish up after that.