Multicore crisis?

There’s been some talk recently of the next “programming crisis”: multicore computing. I’ll agree that we should be concerned, but I don’t think we’re anywhere near the crisis point. Before I address that specifically, I think it’s instructive to review the background: why multicore processors exist, how they affect existing software, and the issues involved in writing code to make use of multiple cores.

Moore’s law has been quoted and misquoted so often that it’s almost a cliché. His original statement was simply an observation on the rate at which transistor counts were increasing on integrated circuits, and that he expected the trend to continue for at least 10 years. That was 1965. The trend has continued, and there’s no indication that it will slow.

Some people think Moore’s Law has become something of a self-fulfilling prophecy: because we believe that it’s possible, somehow we strive to make it so. One wonders what would have happened if Moore had said that he expected the rate of growth to increase. Would transistor densities have increased at an exponential rate?

Self-fulfilling prophecy or not, it’s almost certain that the trend in increasing transistor densities will continue (it has through 2007) and that as a result we’ll get ever more powerful CPUs as well as faster, higher-capacity RAM. Absolute processor speed as measured by clock rate will continue to increase, but not at the astounding rates that we saw up to 2005 or so. Quantum effects and current leakage have put a little damper on the rate of growth there. Better materials will solve the problem–are solving the problem–but absent a fundamental breakthrough by the chemists working on the problem, clock speeds won’t be doubling every 18 months like they had been in the recent past. The Clock Speed Timeline graph makes this quite evident.

Today’s trend is towards multiple cores on a single processor, running at a somewhat slower clock rate. The machine I’m writing this on, for example, has a quad-core Intel Xeon processor running at 2 GHz. The clock speed is somewhat slower than you can get in a high-end Pentium, but the multiple cores provide more total computing power. Quad core processors today are quite common. Intel demonstrated an 80 core chip in February of 2007, and promised to deliver it within five years. I fully expect to have a 256-core processor in my desktop computer ten years from now.

The trend towards multiple cores and very slowly increasing clock rates has some interesting ramifications for software developers. In the past, we have depended on more RAM and faster processors to give us some very nice performance boosts. All indications are that the amount of available RAM and the size of on-chip caches will continue to grow, but we can’t count on the biannual doubling of processor speed. Unless we learn to write programs that use multiple cores, we will soon reach a very real performance ceiling.

Not all applications can benefit from multiple cores, but you’d be surprised at how many can. And even in those cases when a single program can’t make use of multiple cores, users still benefit from having a multicore processor because the machine is better at multi-tasking. Imagine running four virtual machines on one computer, for example. If the computer has a single processor core, all four virtual machines and all of the operating system services share that one core. On a quad-core processor, the work load is spread out over all four cores. The result is more processor cycles per virtual machine, meaning that all four virtual machines should run faster.

Software systems that consist of multiple mostly-independent processes can make good use of multicore processors without any modification. Consider a system consisting of two services that are constantly running. On a single-core computer, only one can actually be working at a time. You could almost double performance simply by upgrading to a dual-core processor. Such software systems are quite common, and they require no code changes in order to benefit immediately from the new multicore processor designs.

Contrary to popular belief, writing code that is explicitly multi-threaded–designed to take advantage of multiple cores–isn’t necessarily a huge step up in complexity. Such code can be much more complex than single-threaded code, but it doesn’t have to be. Some programs are more multi-threaded than others. I’ve found it useful to think of programs in terms of the following four levels of complexity:

  1. No explicit multithreading.
  2. Infrequent, mostly independent asynchronous tasks.
  3. Loosely coupled cooperating tasks.
  4. Tightly coupled cooperating tasks.

Obviously, it’s impossible to draw exact boundaries between the levels, and many programs will use features found in two or more of the levels. In general, I would classify a program by the highest level of multi-threading features that it uses.

Level 1 requires little in the way of explanation. This is the most common type of application in use today. In a batch mode program, execution proceeds sequentially from start to finish. In a GUI program, user interface events and processing execute on the same thread. This type of application has served us well over the years.

Most Windows programmers have some experience with the next level of complexity. A GUI application that performs background processing and periodically updates the display is an example of this type of program. Typically, the program starts the background process, which from time to time raises events which the GUI thread handles and updates the display. Data and process synchronization between tasks is limited to the event handlers that respond to asynchronous events. Modern development environments make it very easy to create such programs. These programs can benefit from multiple processor cores because the background thread can operate independently of the the GUI thread, making the GUI thread much more responsive.

I have found the third level of complexity–loosely coupled cooperating tasks–to be a very useful and relatively simple way to make use of multiple cores. The idea is to construct a program that operates in an assembly line fashion. For example, consider a program that gathers input, does some complex processing of the input data, and then generates some output. Many such programs are processor bound. If you structure the program such that it maintains an input queue, a pool of independent worker threads, and an output queue, then there is little danger of running into the problems that often plague more complex programs. You have to supply synchronization (mutual exclusion locks, or similar) on the input and output queues, but the worker threads operate independently. Using this technique on a quad-core processor, it’s possible to get an almost 4x increase in throughput over a single-core processor, with very little danger of running into resource contention issues.

Written correctly, programs that have multiple tightly-coupled cooperating tasks make the best possible use of processor resources. However, explictly coding thread synchronization is perhaps the most difficult type of programming imaginable. Forgetting to lock a resource before accessing it can lead to unexplained crashes or data corruption. Holding a lock for too long can create a performance bottleneck. Locks that are too granular increase complexity and also the chance for deadlock situations. Locks that are not granular enough will stall worker threads. Race conditions are endemic. Assuming you get such a program working, even a small change will often cause new, unanticipated problems. Writing this kind of code is hard. You’re much better off re-thinking your approach to the problem and casting it as a Level 3 problem. Whatever price you pay in performance will be returned many fold in increased reliability and reduced development time.

If you’re writing a Level 3 or Level 4 program, you should very seriously consider using a existing multi-tasking library if at all possible. Doing so will require that you think about your problem differently, but you leverage a lot of known-working code that is almost certainly more robust in all ways than what you’re likely to write yourself in the time allotted. Two good examples of such libraries are the Parallel Extensions to .NET 3.5 and the Java Parallel Processing Framework. Such libraries exist for many other programming environments. Although still in their infancy, these libraries promise to greatly simplify the move to multicore. If you’re contemplating development of a program that makes good use of multiple cores, you definitely should learn about any parallel computing libraries that support your platform.

So, back to the crisis. Bob Warfield over at SmoothSpan Blog has had and continues to have quite a lot to say about it, and many others share his sentiments. I, on the other hand, don’t think we’re anywhere near the crisis point. Nor do I think we’re likely to get there. Whereas it’s true that most current software isn’t multicore ready, software developers have understood for several years now that they need to begin writing applications that take advantage of multiple processor cores. It’s likely that some shops have taken an ad hoc approach to the problem, and they’re probably suffering with the issues I pointed out above. It’s also likely that many (and I would hope, most) development shops have done the prudent thing and adopted a parallel computing library that takes care of the difficult areas, leaving the programmers to worry about their specific applications. Doing so is no different than adopting an operating system, development environment, GUI library, report generator, or any other third party component–something that development shops have long experience with.

In short, the multicore “crisis” that the doomsayers are warning us about is almost a non-issue. It’s going to require a small amount of programmer retraining and there will undoubtedly be a temporary plateau in the rate at which our processing of data increases, but in a very short time we’ll again have mainstream applications that push all this fancy hardware to its limits.