First rule of optimization: Don’t

A recent question on Stack Overflow reads:

Working with legacy code, I’ve found a lot of statements (more than 500) like this:
bool isAEqualsB = (a == b) ? true : false;
Does it make any sense to rewrite it like this?
bool isAEqualsB = (a == b)
Or will it be optimized?

When you see a question like this, you have to wonder what’s going through this programmer’s mind. There are several things wrong with the thought process that led to this question.

First, this is legacy code, which usually means that it’s older code written by somebody who is no longer around and that nobody else understands. It’s orphaned code. Most likely it’s also working code. The new programmer assigned to the code is either tasked with making a modification, or he’s just trying to understand it.

Second, the programmer is worried about optimizing something that he has not shown to be a bottleneck. Even if neither the C# compiler nor the JIT compiler optimizes the code, it’s highly unlikely that making the proposed change will have a significant effect on the program’s performance. This is especially true if the variables being compared are not primitive types. If they’re strings or other reference types (or value types that have an overloaded Equals operator), then the function call to do the comparison will overwhelm any possible performance gain from restructuring the code.

Making changes to working code–especially code that you don’t understand–is a risky business. Sure, this looks like a straightforward modification, although I doubt that the actual variable names are a and b. You might even be able to create an editor macro that will make the change globally in the project. But it still takes time to develop the macro, run it, make sure it compiles, and test the code to ensure that it still works. And the risk of making an error along the way, although small, is non-zero.

There is a small readability benefit to be had. The second form of the expression above is easier to read than the first. However, that benefit does not offset the time and risk associated with making the change. The time you put into making this change will never be repaid–not in readability and not in performance. It’s a complete waste of time.

It’s interesting to note that the C# compiler does not optimize the first expression. That is, the IL from the first expression is:

  IL_0000:  ldarg.0
  IL_0001:  ldarg.1
  IL_0002:  beq.s      IL_0007
  IL_0004:  ldc.i4.0
  IL_0005:  br.s       IL_0008
  IL_0007:  ldc.i4.1
  IL_0008:  stloc.0

And for the second expression:

  IL_0009:  ldarg.0
  IL_000a:  ldarg.1
  IL_000b:  ceq
  IL_000d:  stloc.1

Looking at the second block of code, you might think that the evaluation doesn’t require any branches. Whereas it’s true that there are no branching instructions in the generated IL, there is a branch by the time you get to the native code. It’s hidden in the ceq instruction.

The .NET Boolean type (the C# bool is just an alias) is an 8-bit value that stores 1 for true and 0 for false. This is different from C, in which any non-zero value is considered to be true. When the ceq instruction is compiled (at runtime, by the JIT compiler), it generates assembly language code that’s equivalent to:

    cmp ax,bx
    jz equal
    mov al, 0
    jmp done
equal:
    mov al, 1
done:

The JIT compiler probably generates a slightly more efficient version of the code than that, but there is a branch. There has to be.

Lessons to learn from this:

Restrict your optimization efforts to things you know are performance bottlenecks.
Modifying code is a time consuming and risky business. Be sure that your modifications are worth the cost of making them.

Jim's Random Notes

Random notes about random stuff

First rule of optimization: Don’t