Using LINQ’s ToLookup to replace dictionary creation code

I often find myself having to build a dictionary from a list of items, with each entry in the dictionary containing a subset of the list. For example, given a list of people and the shifts they’ve worked, I need to group them by shift so that all the people who worked the morning shift are grouped together. I can then operate on those sublists separately.

Rather than illustrating with a list of employees and shifts they’ve worked, imagine you have a list of words, and you want to group them by their first letter. So given the words (ant, aardvark, baboon, giraffe, tortoise, gorilla turtle), you’d end up with a list of lists that looks something like this:

A – ant, aardvark
B – baboon
G – giraffe, gorilla
T – tortoise, turtle

You could then query the data structure to get a list of all the words that begin with a particular letter.

The traditional way to do this in C# would be to create a Dictionary<char, List<string>>, and populate it as you enumerate the words list. For example:

    private Dictionary<char, List<string>> BuildWordsByFirstChar(IEnumerable<string> words)
    {
        var wordsByChar = new Dictionary<char, List<string>>();
        foreach (var word in words)
        {
            List<string> l;
            var key = word[0];
            if (!wordsByChar.TryGetValue(key, out l))
            {
                // Key doesn't exist in dictionary.
                // Make a new entry with an empty list.
                l = new List<string>();
                wordsByChar.Add(key, l);
            }
            l.Add(word);
        }
        return wordsByChar;
    }

You can then query the resulting dictionary data structure to get the list of words that start with a particular character, or you can iterate over the dictionary keys. For example, this code will display all of the keys and their associated words:

    var wordsByChar = BuildWordsByFirstChar(_words);
    foreach (var c in wordsByChar.OrderBy(kvp => kvp.Key))
    {
        Console.Write(c.Key + ": ");
        foreach (var w in c.Value)
        {
            Console.Write(w + ",");
        }
        Console.WriteLine();
    }

I’ve written such dictionary building code countless times in the past. It’s tedious. It’s also unnecessary because you can do the same thing with LINQ in a single line of code:

    var wbcLookup = _words.ToLookup(s => s[0]);

That creates a Lookup<char, string>, which is very similar to the Dictionary<char, List<string>> that the other code creates. For example, code to output the contents is almost identical:

    foreach (var c in wbcLookup.OrderBy(g => g.Key))
    {
        Console.Write(c.Key + ": ");
        foreach (var w in c)
        {
            Console.Write(w + ",");
        }
        Console.WriteLine();
    }

There are differences, however. Whereas attempting to access a non-existent key in the dictionary will result in a KeyNotFoundException, the Lookup will return an empty list. In addition, the value returned by the dictionary is a List<string> whereas the value returned by the lookup is of type IEnumerable<string>.

Lookup, by the way, is a convenient wrapper around IEnumerable<IGrouping<TKey, TElement>>. You can get much of the above functionality without using ToLookup at all, but rather by just grouping the elements like this:

    var groupedWords = _words.GroupBy(s => s[0]);

That gives you a grouping, but you can’t access the individual elements by key as you can with Lookup. If you really need to create a Dictionary<char, List<string>>, you can build it from the grouping, like this:

    var wbcDict = groupedWords.ToDictionary(c => c.Key, c => c.ToList());

Or, if you have a  Lookup:

    var wbcDict = wbcLookup.ToDictionary(c => c.Key, c => c.ToList());

But I’ve found that in most cases a Lookup will do and there’s no need to create a Dictionary.

The primary difference between ToDictionary and ToLookup is that ToDictionary is designed to create a one-to-one mapping from a list, and ToLookup creates a one-to-many mapping.

The primary point here is that with ToLookup, I can replace the dozen or more lines of procedural dictionary creation code with a single line of code that more clearly states the intention. And if I absolutely need a Dictionary, I can just call ToDictionary on the result.

Looks like a win to me.