Subtraction is not the same as comparison

I tracked down a bug the other day that really surprised me. Again, not because of the code itself, but rather that the person who wrote the code should have known better.

If you want to sort a list of a user-defined type in C# (and other languages), you need a comparison method. There are several different ways to do that, perhaps the simplest being to create a comparison delegate. For example, if you have this class definition:

    public class Foo
    {
        public int Id {get; private set;}
        public string Name {get; set;}
        // constructor and other stuff here . . .
    }

And a list of those:

List<Foo> MyList = GetFooThings(...); Now, imagine you want to sort the list in place by Id. You can write:

    MyList.Sort((x,y) => { return x.Id.CompareTo(y.Id); });

CompareTo will return a value that is:

  • Less than 0 if x.Id < y.Id
  • Equal to 0 if x.Id = y.Id
  • Greater than 0 if x.Id > y.Id

The code I ran into was similar, except the programmer thought he’d “optimize” the function. Rather than incurring the overhead of calling Int32.CompareTo, he took a shortcut:

    MyList.Sort((x,y)) => { return x.Id - y.Id; });

His thinking here is that the result of subtracting the second argument from the first will contain the correct value. And that’s true most of the time. It’s not at all surprising that his testing passed. After all, 12 - 3 returns a positive number, 300 - 123456 returns a negative number, and 99 - 99 returns 0.

But he forgot about negative numbers and integer overflow. Subtracting a negative number is like adding a positive number. That is, 4 - (-1) = 5. And in the artificial world of computers, there are limits to what we can express. The largest 32-bit signed integer we can express is Int32.MaxValue, or 2,147,483,647. If you add 1 to that number, the result is Int.MinValue, or -2,147,483,648. So the result returned by the comparison delegate above, when x is a very large number and y is a sufficiently small negative number is incorrect. It will report, for example, that 2,147,483,647 is smaller than -1!

There’s a reason that Int32.CompareTo is written the way it is. Here it is, directly from Microsoft’s Reference Source:

    public int CompareTo(int value) {
        // Need to use compare because subtraction will wrap
        // to positive for very large neg numbers, etc.
        if (m_value < value) return -1;
        if (m_value > value) return 1;
        return 0;
    }

It seems like I run into these fundamental errors more often now than I did in the past. I’m not yet sure of the reason. I’ve been active in the online programming community for 30 years, and have seen a lot of code written by programmers of all skill levels. Early on, the average skill level of the programmers I met online was much higher than it is today. These days, every Java programming student has access to Stack Overflow, and I see a lot of these rookie mistakes. But I also see many such mistakes made by more experienced programmers.

I’m wondering if the difference is one of education. When I started programming, we learned assembly language early on and actually used it to write real programs. Two’s complement integer representation was a lesson we learned early, and integer overflow was something we dealt with on a daily basis. We would not have made this mistake. No, we made much more interesting mistakes.

This error can be caught, by the way. If you enable runtime overflow checking in C#, the runtime will throw an exception if the result of an arithmetic operation results in overflow or underflow. But that check is almost always turned off because it can have a large negative impact on performance, and in many places we depend on integer overflow for our algorithms to work correctly. It’s possible to disable overflow checking for a specific expression or block of code, but in my experience that functionality is rarely used. You can also disable the check globally but enable it for specific expressions or blocks of code, but doing so requires that you understand the issue. And if a programmer understood the issue, he wouldn’t have written the erroneous code in the first place.

Errors like this can exist in a working program for years, and then crop up at the most inopportune time when somebody uses the code in a new way or presents data that the original programmer didn’t envision. As much as our languages and tools have evolved over the past 30 years, in some ways we’re still distressingly close to the metal.