I often find myself having to build a dictionary from a list of items, with each entry in the dictionary containing a subset of the list. For example, given a list of people and the shifts they’ve worked, I need to group them by shift so that all the people who worked the morning shift are grouped together. I can then operate on those sublists separately.
Rather than illustrating with a list of employees and shifts they’ve worked, imagine you have a list of words, and you want to group them by their first letter. So given the words (ant, aardvark, baboon, giraffe, tortoise, gorilla turtle), you’d end up with a list of lists that looks something like this:
A – ant, aardvark
B – baboon
G – giraffe, gorilla
T – tortoise, turtle
You could then query the data structure to get a list of all the words that begin with a particular letter.
The traditional way to do this in C# would be to create a Dictionary<char, List<string>>
, and populate it as you enumerate the words list. For example:
private Dictionary<char, List<string>> BuildWordsByFirstChar(IEnumerable<string> words)
{
var wordsByChar = new Dictionary<char, List<string>>();
foreach (var word in words)
{
List<string> l;
var key = word[0];
if (!wordsByChar.TryGetValue(key, out l))
{
// Key doesn't exist in dictionary.
// Make a new entry with an empty list.
l = new List<string>();
wordsByChar.Add(key, l);
}
l.Add(word);
}
return wordsByChar;
}
You can then query the resulting dictionary data structure to get the list of words that start with a particular character, or you can iterate over the dictionary keys. For example, this code will display all of the keys and their associated words:
var wordsByChar = BuildWordsByFirstChar(_words);
foreach (var c in wordsByChar.OrderBy(kvp => kvp.Key))
{
Console.Write(c.Key + ": ");
foreach (var w in c.Value)
{
Console.Write(w + ",");
}
Console.WriteLine();
}
I’ve written such dictionary building code countless times in the past. It’s tedious. It’s also unnecessary because you can do the same thing with LINQ in a single line of code:
var wbcLookup = _words.ToLookup(s => s[0]);
That creates a Lookup<char, string>
, which is very similar to the Dictionary<char, List<string>>
that the other code creates. For example, code to output the contents is almost identical:
foreach (var c in wbcLookup.OrderBy(g => g.Key))
{
Console.Write(c.Key + ": ");
foreach (var w in c)
{
Console.Write(w + ",");
}
Console.WriteLine();
}
There are differences, however. Whereas attempting to access a non-existent key in the dictionary will result in a KeyNotFoundException
, the Lookup
will return an empty list. In addition, the value returned by the dictionary is a List<string>
whereas the value returned by the lookup is of type IEnumerable<string>
.
Lookup, by the way, is a convenient wrapper around IEnumerable<IGrouping<TKey, TElement>>
. You can get much of the above functionality without using ToLookup
at all, but rather by just grouping the elements like this:
var groupedWords = _words.GroupBy(s => s[0]);
That gives you a grouping, but you can’t access the individual elements by key as you can with Lookup
. If you really need to create a Dictionary<char, List<string>>
, you can build it from the grouping, like this:
var wbcDict = groupedWords.ToDictionary(c => c.Key, c => c.ToList());
Or, if you have a Lookup
:
var wbcDict = wbcLookup.ToDictionary(c => c.Key, c => c.ToList());
But I’ve found that in most cases a Lookup
will do and there’s no need to create a Dictionary
.
The primary difference between ToDictionary
and ToLookup
is that ToDictionary
is designed to create a one-to-one mapping from a list, and ToLookup
creates a one-to-many mapping.
The primary point here is that with ToLookup
, I can replace the dozen or more lines of procedural dictionary creation code with a single line of code that more clearly states the intention. And if I absolutely need a Dictionary
, I can just call ToDictionary
on the result.
Looks like a win to me.