(Originally posted in somewhat different form on 11/21/2016.)
All too often, I run across integer comparison functions that work, but have a latent bug. It’s not a bug that you’ll run into very often but when you do run into it, it’s potentially catastrophic.
A common idiom for comparison is to have a function that, given two integers x
and y
will return a negative number if x < y
, a positive number if x > y
, and 0 if x == y
. For example:
Compare(1, 2)
returns-1
because1 < 2
Compare(1, 1)
returns0
because1 == 1
Compare(1, 0)
returns1
because1 > 0
The correct way to implement such a function is with cascading comparisons, like this:
static int WorkingCompare(int x, int y)
{
if (x > y) return 1;
if (x < y) return -1;
return 0;
}
A clever programmer might figure out that you can get the same result by subtracting y from x. After all, (1-2) == -1
, (1-1) == 0
, and (1-0) == 1
. So the comparison function becomes:
static int BrokenCompare(int x, int y)
{
return x - y;
}
The aforementioned clever programmer will probably test it with a few mixtures of positive and negative numbers, get the right answers, and drop it into his program. Why not? After all, it’s less code and a single subtraction is going to be a whole lot faster than a couple of comparisons and branches. Right?
Except that it’s broken. It doesn’t always work. Imagine if x = Int.MinValue
and y = 1
. Or x = int.MaxValue-100
and y = -1000
. Integer overflow rears its ugly head and you get the wrong answer! Don’t believe me? Here’s a program to test it.
static void Main(string[] args)
{
DoCompare(1, 2);
DoCompare(-1, -2);
DoCompare(int.MinValue, 4);
DoCompare(int.MaxValue-100, -1000);
}
static void DoCompare(int x, int y)
{
Console.WriteLine("Compare {0} and {1}.", x, y);
Console.WriteLine(" Working comparison: {0}", WorkingCompare(x, y));
Console.WriteLine(" Broken comparison: {0}", BrokenCompare(x, y));
}
And the output:
Compare 1 and 2.
Working comparison: -1
Broken comparison: -1
Compare -1 and -2.
Working comparison: 1
Broken comparison: 1
Compare -2147483648 and 4.
Working comparison: -1
Broken comparison: 2147483644 // Incorrect: -2147483648 is not greater than 4.
Compare 2147483547 and -1000.
Working comparison: 1
Broken comparison: -2147482749 // Incorrect: 2147483547 is not less than -1000
If x
is sufficiently negative and y
is sufficiently positive, or x
is sufficiently positive and y
is sufficiently negative, integer overflow will result in an incorrect result. It’s almost a certainty that the clever programmer who wrote that broken comparison function never even considered integer overflow.
“But,” you say, “I know my program won’t be dealing with numbers that big (or small).” Yeah, sure. Not now. But it might in the future. Worse, that comparison function is probably in some class that three years from now will be lifted out of your program into another that doesn’t have the same limits on the input data. Or some other programmer is looking for an example of how to write a custom comparer for his class, sees yours, and copies it to his program. Everything seems fine. Until, a few weeks or months or years later the program runs into this bug and something breaks. Horribly. Perhaps it’s the custom comparer passed to a sorting function and things aren’t sorted correctly. Or, perhaps worse, the sort doesn’t terminate because a > b
and b > c
, but for some odd reason a < c
. I’ve seen something similar happen. Let me tell you that tracking down that bug wasn’t fun.
Part of our job as programmers is to write code that’s bulletproof whenever practical. And I’m not talking bulletproof just now, but also in the future. We can’t foresee all possible scenarios, but code re-use is a very common thing. Sometimes that means that we forego optimization opportunities because the risk of implementing them is too high.
Not that I’ve seen many programs in which the small difference in the speed of an integer comparison function makes a big enough difference that I’d risk implementing broken code. If I ever did find myself in that situation (and there would have to be a very compelling reason), I would certainly put some big scary comment in there, explaining the limitations and warning others not to use the function for general purposes.