Cleaning up some code

One thing about inheriting a large code base that’s been worked on by many people over a long period of time is that it’s usually full of learning opportunities. Or perhaps teaching opportunities. Whatever, the other day I ran into a real whopper.

Imagine you have a list of names and associated identification numbers. Items in the list are instances of this class:

    public class Item
    {
        public string Name {get; set;}
        public int Id {get; set;}
    }

We now need a method that, given a list of items and a name, will search the list and return the identifier. Or, if the name doesn’t exist in the list, will return the identifier of the default item, named “Default”.

Ignore for the moment that expecting there to be a default item in the list is almost certainly a bad design decision. Assume that the program you’re working with has that restriction and there’s nothing you can do about it.

I’ve changed the names and simplified things a little bit by removing details of the class that aren’t relevant to the example, but that’s essentially the problem that the code I ran into had to solve. Here’s how the programmer approached that problem.

    public int GetIdFromName(List items, string name)
    {
        int returnValue;
	
        if (items.Where(x => x.Name.ToLower() == name.ToLower()).Count() > 0)
        {
            returnValue = items.Where(x => x.Name.ToLower() == name.ToLower()).FirstOrDefault().Id;
	}
        else
        {
            returnValue = items.Where(x => x.Name.ToLower() == "Default".ToLower()).FirstOrDefault().Id;
        }
        return returnValue;
    }

That code, although it fulfills the requirements, is flawed in many ways. First of all, it’s entirely too complicated and verbose for what it does. Remember, the high level task can be expressed by this pseudocode:

    if (list contains an item with the specified name)
        return that item's Id field
    else
        return the Id field of the item with name "Default"

Let’s look at that piece by piece.

Consider the first if statement:

    if (items.Where(x => x.Name.ToLower() == name.ToLower().Count()) > 0))

This code enumerates the entire list, counting the number of items whose Name property matches the supplied name parameter. But all we really want to know is if there is at least one matching member in the list. There’s no need to count them all. So rather than checking for Count() > 0, we can call Any, which will stop enumerating the first time it finds a match.

    if (items.Where(x => x.Name.ToLower() == name.ToLower()).Any())

And in fact there’s an Any overload that takes a predicate. So we can get rid of the Where, too:

    if (items.Any(x => x.Name.ToLower() == name.ToLower()))

So, if there is an item, then the code scans the list again to get the matching item. That line, too, has an extraneous Where that we can replace with a FirstOrDefault(predicate). So the first lines become:

    if (items.Any(x => x.Name.ToLower() == name.ToLower()))
    {
        returnValue = items.FirstOrDefault(x => x.Name.ToLower() == name.ToLower()).Id;
    }

That simplifies things a bit, but it seems silly to go through the list one time looking to see if something is there, and then go through it again to actually pick out the item. Much better to do a single scan of the list:

    int returnValue;
    var item = items.FirstOrDefault(x => x.Name.ToLower() == name.ToLower());
    if (item != null)
    {
        returnValue = item.Id;
    }

In the else part, we can replace the Where(predicate).FirstOrDefault with FirstOrDefault(predicate), just as above. If we know that an item with the name “Default” will always be in the list, we can replace the call to FirstOrDefault with a call to First:

    else
    {
        returnValue = items.First(x => x.Name.ToLower() == "Default".ToLower()).Id;
    }

I have several problems with the expression: x => x.Name.ToLower() == name.ToLower(). First, I don’t like having to write it twice in that method. It’d be too easy to make a mistake and have the expressions end up doing different things. Second, the == operator for strings always uses the current culture: “current” meaning the CultureInfo settings on the machine that’s currently running the program. That’s not typically a problem, but different cultures have different case conversion rules. It’s best to use an invariant culture for things like this. See my article for more information about why.

So what I would do is create a Func<string, bool> that does the comparison, and pass it as a predicate to First and FirstOrDefault. Like this:

    var comparer =
        new Func<string, string, bool>((s1,s2) =>
            string.Compare(s1, s2, StringComparison.InvariantCultureIgnoreCase) == 0);

    var item = items.FirstOrDefault(x => comparer(x.Name, name));
    if (item != null)
    {
        returnValue = item.Id;
    }
    else
    {
        returnValue = items.First(x => comparer(x.Name, "Default")).Id;
    }
    return returnValue;

Neat, right? Except that I’ve been burned enough in the past not to trust statements like, “There will always be a ‘Default’ item.” In my expreience, “always” too often becomes “sometimes” or “not at all”. I like to think that I’m cautious. Others tend to call me paranoid or at minimum overly concerned with unlikely possibilities. Whatever the case, this code will die horribly if there is no “Default” item; it crashes with NullReferenceException.

As I’ve said before, exceptions should point the finger at the culprit. In this case, a missing “Default” item means that the list is improperly formatted, which probably points to some kind of data corruption. The code should throw an exception that identifies the real problem rather than relying on the runtime to throw a misleading NullReferenceException. I would change the code that explicitly checks for the missing “Default” case, and make it throw a meaningful exception.

The completed code looks like this:

    public int GetIdFromName(List items, string name)
    {
        var comparer =
            new Func<string, string, bool>((s1,s2) => 
                string.Compare(s1, s2, StringComparison.InvariantCultureIgnoreCase) == 0);

        var item = items.FirstOrDefault(x => comparer(x.Name, name));
        if (item != null)
        {
            return item.Id;
        }

        // Check for Default
        item = items.FirstOrDefault(x >= comparer(x.Name, "Default"));
        if (item != null)
        {
            return item.Id;
        }

        // Neither is there. Throw a meaningful exception.

        throw new InvalidOperationException("The Items list does not contain a default item.");
    }

I like that code because it’s easy to read and is very explicit in its error checking and in the message it outputs if there’s a problem. It’s a few more lines of C#, but it’s a whole lot easier to understand and prove correct, and it handles the lack of a “Default” much more reasonably.

That solution is intellectually dissatisfying, though, because it enumerates the list twice if the requested name isn’t found. If I don’t use LINQ, I can easily do this with a single scan of the list:

    public int GetIdFromName2(List<Item> items, string name)
    {
        var comparer =
            new Func<string, string, bool>((s1,s2) =>
                string.Compare(s1, s2, StringComparison.InvariantCultureIgnoreCase) == 0);

        Item defaultItem = null;

        foreach (var item in items)
        {
            if (comparer(item.Name, name))
            {
                return item.Id;
            }
            if (defaultItem == null && comparer(item.Name, "Default"))
            {
                defaultItem = item;
            }
        }
        
        if (defaultItem != null)
        {
            return defaultItem.Id;
        }

        throw new InvalidOperationException("The Items list does not contain a default item.");
    }

Try as I might, I can’t come up with a simple LINQ solution that scans the list only once. The best I’ve come up with is a complicated call to Enumerable.Aggregate that looks something like this:

    public int GetIdFromName3(List items, string name)
    {
        var comparer =
            new Func<string, string, bool>(
                (s1, s2) => string.Compare(s1, s2, StringComparison.InvariantCultureIgnoreCase) == 0);

        Item foo = null;
        var result =
            items.Aggregate(
                new {item = foo, isDefault = false},
                (current, next) =>
                {
                    if (current.item == null)
                    {
                        if (comparer(next.Name, name)) return new {item = next, isDefault = false};
                        if (comparer(next.Name, "Default")) return new {item = next, isDefault = true};
                        return current;
                    }
                    // current item is not null.
                    // if it's a default item, then check to see if the next item is a match for name.
                    if (current.isDefault && comparer(next.Name, name)) return new {item = next, isDefault = false};

                    // otherwise just return the current item
                    return current;
                });
        if (result.item != null)
            return result.item.Id;

        throw new InvalidOperationException("The Items list does not contain a default item.");
    }

That should work, but it’s decidedly ugly and requires a full scan of the list. If I saw that in production code, I’d probably tell the programmer to rewrite it. Of the two other LINQ solutions I considered, one involves sorting with a custom comparer that puts the desired item at the front of the result, and the other involves calling ToLookup to create a lookup table. Both are similarly ugly, require a lot of extra memory, and also require a full scan of the list.

If you can come up with a single, simple LINQ expression that fulfills the requirements, I’d sure like to see it.