Use your data types!

Imagine there’s a program that makes API calls through a proxy to another service. One of the parameters you pass to that proxy is a timeout value: the maximum time that the proxy should wait before giving up and returning with a timeout error. The function call would be something like:

    result = callProxy("SomeService", "ServiceFunction", {parameters}, timeout);

The timeout value is defined in code as:

    double timeout = 10;

And that code works for months or years until some programmer is in there editing things one day and says to himself, “A timeout of 10 milliseconds? That’s crazy. We should give the service at least a full second to respond.” He promptly changes the timeout value to 1000 and goes on his merry way. All the unit tests pass, the code is pushed, and continues to work for more months or years.

Until it doesn’t.

One fateful morning at about 02:00, you’re shocked out of bed by your pager screaming that there’s a problem. Bleary eyed, you stumble to the computer, log in, and find that your page has stopped serving requests. After a long search, you discover that “SomeServer” is experiencing high CPU usage, but you can’t figure out why that would make your program stop serving requests. After all, your program should be getting a timeout error after a second, and the error handling code looks like it works.

At some point you discover the problem: timeout is actually expressed in seconds, not milliseconds. So rather than waiting 10 seconds for a timeout, it’s waiting 16 minutes and 40 seconds.

Yes, something like this actually has happened. It wasn’t pretty.

Granted, quite a few things went wrong here, among them:

  • There was no comment describing the units that the timeout represents.
  • The timeout variable should have been named timeoutInSeconds.
  • The programmer should have looked more closely at the code before changing it.
  • The code reviewer should have caught the error.
  • The automated tests should have simulated a service timeout.

The error should have been caught. But it wasn’t.

A much more effective way to avoid an error like that is to use descriptive data types. If you’re expressing a time span, then don’t use a unit-less value like an integer or a floating point value. Use a data type intended for expressing time spans. In C#, for example, you’d use the TimeSpan, like this:

    TimeSpan timeout = TimeSpan.FromSeconds(10);

It would be very difficult for a programmer to look at that and think, “10 milliseconds isn’t long enough.” And there’s absolutely no need for a comment here because the code says exactly what it does: it creates a timeout value of 10 seconds. It’s nearly impossible to screw this up. If on the off chance some drunken coder did screw that up, the person doing the code review almost certainly wouldn’t.

These types of errors happen more often than you might like to think. Many of us don’t see these because we work in small companies on relatively new code, where the current employees are also the people who wrote the original code. Things are different when you work at a large company where people move around a lot, and the code you’re working on has been there for 10 years and the programmers who wrote it are off doing other things, or have left the company.

Do yourself and the people who will maintain your code in the future a favor. Use descriptive data types just like you use descriptive variable names. Unit-less values are errors waiting to happen. And I guarantee you don’t want to be the person who made this type of error when that error happens at 2 AM on a large retail web site, where every minute of down time means millions of dollars in lost sales. Nor do you want to be the person who has to track down such an error in those circumstances.