Unwise conventional wisdom, Part 1: Locks are slow

In two different discussions recently I had somebody tell me, “locks are slow.” One person’s comment was “Locks should be avoided whenever possible. They’re slow.” This is a bit of conventional wisdom that’s been around for decades and seems to be getting more prevalent now that more programmers are finding themselves working with multithreaded programs.

And, like all too much conventional wisdom, it’s just flat wrong.

lock is a cooperative synchronization primitive that is used in computer programs to provide mutually exclusive access to a shared resource. Yeah, that’s a mouthful. Let me put it into simpler terms.

When I was in Boy Scouts, we’d sit around the campfire at night and tell stories. The person who held the “speaking stick” was the only one allowed to talk. My holding the stick didn’t actually prevent anybody else from talking. Rather, we all agreed on the convention: he who holds the stick talks. Everybody else listens and waits his turn. The stick was a cooperative mutual exclusion device.

Programming locks work the same way. All threads of execution agree that they will abide by the rules and wait their turn to get the stick before accessing whatever resource is being protected. This is very important because many things in the computer do not react well if two different processes try to access them at the same time. Let me give you another real-world example.

Imagine that you and a co-worker both need to update an Excel spreadsheet that’s stored in a shared location. You take a copy of the file and start making your changes. Your co-worker does the same thing. You complete your changes and copy the new file over to the shared location. Ten minutes later, your co-worker copies his changed version of the document, overwriting the changes that you just made. The same kind of thing can happen inside a computer program, but it happens billions of times faster.

And so we use locks and other synchronization primitives to prevent such things from happening. Locks are common because they’re very simple to understand, easy to use, and quite effective. There are potential problems with locks, but any tool can be misused.

So let’s get back to the conventional wisdom. Are locks really slow? Rather than guess, let’s see how long it takes to acquire a lock. The first bit of code executes a loop, incrementing a variable one million times. The second code snippet does the same thing, but acquires a lock each time before incrementing the variable and then releases the lock afterwards. The code samples are in C#.

int trash = 0;
for (int i = 0; i < 1000000; ++i)
{
    ++trash;
}

// Using a lock
int trash = 0;
object lockobj = new object();
for (int i = 0; i < 1000000; ++i)
{
    lock (lockobj)
    {
        ++trash;
    }
}

Timing those two bits of code reveals that the second takes almost one-tenth of a second longer than the first. So it takes approximately 0.10 seconds to acquire and release a lock one million times. That’s about 10 million locks per second, or 0.10 microseconds (100 nanoseconds) per lock. (It’s actually closer to 80 nanoseconds, but 100 is a nice round number.) I know, I can hear the performance-sensitive screaming already, “Oh my bleeding eyeballs! 100 nanoseconds! That’s like 400 clock cycles! That’s an eternity to my super fast machine!

And he’s right. To a computer running at 4 GHz, 400 nanoseconds is a pretty long time to spend doing nothing. But locks don’t execute in isolation. They’re there to protect a resource, and accessing or updating that resource takes time–usually a whole lot longer than it takes to acquire the lock. For example, let’s say we have this method that takes, on average, about one microsecond (that’s 1,000 nanoseconds) to execute when there is no lock.

void UpdateMyDataStructure()
{
    // do cool stuff here that takes a microsecond
}

Then we add a lock so only one thread at a time can be doing that cool thing.

static object lockobj = new object();
void UpdateMyDataStructure()
{
    lock (lockobj)
    {
        // do cool stuff here that takes a microsecond
    }
}

It still takes 100 nanoseconds to acquire the lock when it’s not contended (that is, when no other thread already has the lock), but doing so only adds 10 percent to the total runtime of the method. I know, I know, more bleeding eyeballs. Programmers lose sleep over 10 percent losses in performance. Dedicated optimizers will go to great lengths to save even five percent, and here I’m talking about 10 percent like it’s nothing. Let’s talk about that a bit because there are several issues to consider.

There’s no doubt that the lock is adding 10 percent to the method’s total runtime. But does it really matter? If your program calls that method once per second, then in a million seconds (about 12 days) acquiring the lock will have cost an extra second in run time. The 10 percent performance penalty in that one method is irrelevant compared to the total runtime of the program.

We also can’t forget that there are multiple threads calling that same method. Sometimes one thread will already be holding the lock when another thread tries to acquire it. In that case, the thread coming in will have to wait up to one microsecond before it can acquire the lock, meaning that executing the method can take a whopping 2,100 nanoseconds! (1,000 nanoseconds for the first thread to complete, 100 nanoseconds to clear the lock, and another 1,000 nanoseconds to do its own cool stuff.) By now my friend’s eyeballs have exploded.

And things only get worse as you add more threads and call the method more often. But again, does it matter? At an average of 1,100 nanoseconds per call, that method can be called more than 900,000 times per second. Without the lock, you can call that method a million times per second. It seems to me that if 90% of your time is spent on one method, you have a much bigger problem than the lock. Your entire program is dependent on the performance of this one method. That’s true whether or not you have multiple threads accessing it.

And don’t forget the most important point: the lock or something like it is required. You’re protecting a resource that you’ve determined will not support simultaneous access by multiple threads. If you remove the lock, the program breaks.

The conventional wisdom that locks are slow is based on two things. From the optimizer’s point of view, a lock is slow because it adds to the amount of time required to execute a particular bit of code, and doesn’t provide any immediately recognizable benefit. It’s just extra machine cycles. The optimizer doesn’t care that the time required for the lock is a miniscule portion of the total program runtime.

The other reason locks are considered slow is because an application can become “gated” by a lock. That is, all of the threads in the application spend an inordinate amount of time doing nothing while waiting to acquire a lock on a critical resource. Therefore, concludes the programmer who’s profiling the application, the lock must be slow. It doesn’t occur to him that the lock isn’t the problem. Any other means of synchronizing access to the critical resource would exhibit similar problems. The problem is designing the program to require mutually exclusive access to a shared resource in a performance-critical part of the code.

That may seem like a fine distinction to some, but there is an important difference. It’s one thing to complain if I were to install an 800 pound steel door on your linen closet because I felt like it, and something else entirely to complain if I did it because you told me to. If you design something that has to use a lock, then don’t get upset at the lock when it turns out to be inappropriate for the task at hand.

There are many alternatives to locks. There have been some important advances recently in lock-free concurrent data structures like linked lists, queues, stacks, and dictionaries. The concurrent versions aren’t as fast as their exclusive access counterparts, but they’re faster to access than if you were to protect the non-concurrent version with some sort of lock. The other primary alternative is to redesign the program to remove the requirement for exclusive access. How that’s done is highly application dependent, but often requires a combination of buffering and favoring infrequent long delays over frequent short delays.

To recap: locks are not slow. Used correctly, a lock provides safe, effective, and efficient mutually exclusive access to a shared resource. When it appears that a lock is slow, it’s because you have gated your application on access to that shared resource. If that happens, the problem is not with the lock, but with the design of the application or of the shared resource.