Bit twiddling for the win!

(I don’t recall if I wrote about this before. I really need to resurrect my old blog entries.)

This was an interesting puzzle to work on. Some of the programmers I subsequently explained it to just got that glazed look and figured I was invoking black magic. Or higher math.

Imagine you have a list of structures, each of which consists of a string key, two integer values, and some other stuff. In essence:

struct Foo
    string Name;
    int key1;
    int key2;
    // other stuff that's not relevant to this discussion

For this discussion, let’s assume that int is a 32-bit signed integer.

You have a lot of these Foo structures, more than can keep in memory, but for reasons we don’t need to get into you maintain an index that’s sorted by key1 descending, and key2 ascending. That is, in C#:

index = list.OrderByDescending(x -> x.key1)
            .ThenBy(x->x.key2);

Except that (again, for reasons we don’t need to discuss) the key has to be a single value rather than two separate values.

Packing two int values into a long is no problem. But doing it so that the result sorts with key1 descending and key2 ascending? When I first heard this proposed I wasn’t sure that it was possible. But there was a glimmer of an idea . . .

What I’m about to discuss depends on something called “two’s complement arithmetic,” which defines how most computers these days work with positive and negative integers. I’m going to illustrate with 4-bit numbers but the concepts are the same for any size integer value, including the 32-bit and 64-bit numbers we work with regularly.

With 4 bits we can express 16 possible integer values. Unsigned, that’s values 0 through 15. Signed, using the two’s complement convention, the values we can represent are -8 through +7. The table below shows that in detail.

Binary                       Binary
Value   Unsigned  Signed     Value   Unsigned  Signed
0000    0         0          1000    8         -8
0001    1         +1         1001    9         -7
0010    2         +2         1010    10        -6
0011    3         +3         1011    11        -5
0100    4         +4         1100    12        -4
0101    5         +5         1101    13        -3
0110    6         +6         1110    14        -2
0111    7         +7         1111    15        -1

When viewed as signed numbers, the high bit (the bit in the leftmost position) is called the sign bit. When it’s set to 1, the number is considered negative.

Now, the problem. We’re given two signed integers, call them key1 and key2, each of which is four bits in length. We want to combine them into a single 8-bit value so that a natural sort (i.e. sorting the single 8-bit quantity) will result in the records being ordered with key1 descending and key2 ascending.

The first thought is to just pack key1 into the high four bits, and key2 into the low four bits, and treat it as an unsigned quantity. For example, imagine you have these records:

                         Binary      Unsigned
key1 = 4, key2 = 1     0100 0001       65
key1 = 4, key2 = 7     0100 0111       71
key1 = 4, key2 = -8    0100 1000       72
key1 = 4, key2 = -1    0100 1111       79
key1 = -5, key2 = 2    1011 0010       178

Sorting as unsigned quantities, the ordering is wonky. Both key1 and key2 go from 0 to 7, then from -8 to -1.

We need to do some bit twiddling. If we flip the sign bit on key2 and then sort, the table becomes:

                         Binary      Unsigned
key1 = 4, key2 = -8    0100 0000       64
key1 = 4, key2 = -1    0100 0111       71
key1 = 4, key2 = 1     0100 1001       73
key1 = 4, key2 = 7     0100 1111       79
key1 = -5, key2 = 2    1011 1010       90

key2 is now in ascending order! That is, for records that have key1 = 4, the key2 values are in ascending order.

So how do we make key1 sort in descending order? More bit twiddling!

Consider what happens if you flip all of the bits in key1:

Decimal  Binary  Inverted  Result   Decimal  Binary Inverted  Result
  -8      1000    0111        7        0      0000    1111      -1
  -7      1001    0110        6        1      0001    1110       -2
  -6      1010    0101        5        2      0010    1101       -3
  -5      1011    0100        4        3      0011    1100       -4
  -4      1100    0011        3        4      0100    1011       -5
  -3      1101    0010        2        5      0101    1010       -6
  -2      1110    0001        1        6      0110    1001       -7
  -1      1111    0000        0        7      0111    1000       -8 

If we invert all of the bits in key1, then a natural sort orders the keys in descending order.

It’s not magic at all. Given an understanding of two’s complement arithmetic and a little study, there’s nothing to it. I won’t claim to be the first person to come up with this, but I did develop it independently.

The explanation above is for two 4-bit quantities combined into a single 8-bit quantity, but it works just as well with two 32-bit quantities combined into a 64-bit quantity.

Putting it all together, we come up with this C# function. It should be easy enough to port to any other language that allows bit twiddling:

public static long getKey(int key1, int key2)
{
    long key = (uint)~key1;  // invert all the bits of key1
    key <<= 32;

    // flip the sign bit of key2
    uint flipped = (uint)key2 ^ 0x80000000;
    key |= flipped;
    return key;
}

And of course you can write the reverse function to retrieve the two original keys from the combined key.

It really isn’t that difficult

I’ve been contributing to Stack Overflow for 14 years: pretty much ever since it started. And every year there are Computer Science students who come up with novel ways of screwing up parsing and evaluating arithmetic expressions.

It’s a problem common to all Computer Science curricula: given a string that contains an arithmetic expression (something like 2*(3+1)/4), write a program that evaluates the expression and outputs the answer. There are multiple ways to solve the problem but the easiest as far as I’m concerned is called the Shunting yard algorithm, developed by Edsger Dijkstra. It’s an elegantly simple algorithm that’s very easy to understand and almost trivial to implement once you understand it.

Here’s the funny thing. Computer Science courses often teach postfix expression evaluation (i.e. evaluating “Reverse Polish Notation” like 2 3 1 + * 4 /) as an example of how to use the stack data structure. Then they teach the Shunting Yard and show that by employing two stacks you can turn an infix expression into a postfix expression. (For example, 2*(3+1)/4 becomes 2 3 1 + * 4 /). Then they give the students an assignment that says, “Write a program to evaluate an infix expression.”

The students dutifully go home and write a program that glues the two examples together. That is, the first part converts the infix to postfix, and the second part evaluates the postfix expression to get the answer. Which works. But is kind of silly because a few simple changes to the output method for Shunting Yard let you evaluate the expression directly–without converting to postfix. In my experience, very few students figure this out.

I actually had a job interview where they asked me to write an expression evaluator. I wrote it to evaluate directly from infix, without going through the postfix step. Those white board inquisitions are always weird. At one point as I was furiously scribbling on the white board, the interviewer asked me where my postfix output was? When I told him that I wouldn’t be producing any, he insisted that I must. I don’t argue with an interviewer (more on that in a separate entry) and usually take their advice, but here I told him to just sit tight; that I knew this was going to work.

The post-coding walkthrough was fun. Usually it’s a formality in which I prove to the interviewer that the code I wrote works as intended. The interviewer already knows whether it works or not, because he’s seen the same algorithm written dozens of different times, knows all the different ways it can be screwed up, and knows what to expect from the walkthrough. But this time the interviewer was following along intently as I explained to him how it all worked. A senior software developer at a major company had never before seen the Shunting yard implemented without that unnecessary conversion to postfix. Weird.

I wonder why this oddity exists. Do Computer Science instructors not tell their students that there’s a way to do the evaluation directly? Or do the instructors themselves not know? And then there’s the other question: why would legions of C.S. students every year come to Stack Overflow looking for the answer to a question that’s easily answered with a simple Google search?

Well, okay, I have an answer to the last one. Laziness. It’s easier to post a question asking, “What’s wrong with my code?” than it is to understand how what the code actually does differs from what it’s supposed to be doing.

But that’s the way we’ve always done it!

Some years back I did a lot of research on and experimentation with the binary heap data structure. During that time I noticed that publicly available implementations placed the root of the heap at array[1], even in languages where arrays are 0-based. I found that incredibly annoying and wondered why they did that.

As far as I’ve been able to determine, the binary heap data structure originated with Robert J. Floyd in his 1962 article, “TreeSort” (Algorithm 113) in the August 1962 edition of Communications of the ACM. As with all algorithms published in CACM at the time, it was expressed in Algol: a language with 1-based arrays. The calculation to find the left child of any parent is: childIndex = parentIndex * 2. The right child is (parentIndex * 2) + 1, and the parent of any node can be found by dividing the node index by 2.

When C gained popularity as a teaching language in the 1980s (for reasons that escape me, but I digress), authors of algorithms texts converted their code from Pascal (again, 1-based arrays) to C (0-based arrays). And instead of taking the time to modify the index calculations, whoever converted the code made a terrible decision: keep the 1-based indexing and allocate an extra element in the array.

It’s almost certain that the person who converted the code was smart enough to change the index calculations. I think the reason they didn’t was laziness: they’d have to change the code and likely also change the text to match the code. So they took the easy route. And made a mess.

Programmers in C-derived languages (C++, Java, C#, JavaScript, Python, etc.) learn early on to start counting at 0 because arrays start at 0. If you want an array that holds 100 items, you allocate it: int a[100];, and the indexes are from 0 through 99. This is ingrained in our brains very early on. And then comes this one data structure, the binary heap, in which, if you want to hold 100 items, you have to allocate 101! The element at index 0 isn’t used.

That’s confusing, especially to beginning programmers who are the primary consumers of those algorithms texts.

Some argue that placing the root at array[1] makes the child and parent calculations faster. And that’s likely true. In a 0-based heap, the left child node index is at (parentIndex * 2) + 1, and the parent index of any child is (childIndex - 1)/2. There’s an extra addition when calculating the left child index, and an extra subtraction when calculating the parent. The funny thing is that in Algol, Pascal, and other languages with 1-based arrays, those “extra” operations existed but were hidden. The compiler inserted them when converting the 1-based indexing of the language to the 0-based indexing of the computer.

But in the larger context of what a binary heap does, those two instructions will make almost no difference in the running time. I no longer have my notes from when I made the measurements myself, but as I recall the results were noisy. The difference was so small as to be insignificant. Certainly they weren’t large enough in my opinion to justify replacing the code. If my program’s performance is gated on the performance of my binary heap, then I’ll replace the binary heap with a different type of priority queue algorithm. A paring heap, perhaps, or a skip list. The few nanoseconds potentially saved by starting my heap at index 1 just isn’t worth making the change.

The C++ STL priority_queue, the Java PriorityQueue, the .NET PriorityQueue, and python’s heapq all use 0-based binary heaps. The people who wrote those packages understand performance considerations. If there was a significant performance gain to going with a 1-based binary heap, they would have done so. That they went with 0-based heaps should tell you that any performance gain from a 1-based heap is illusory.

I am appalled that well-known authors of introductory algorithm texts have universally made this error. They should be the first to insist on clear code that embraces the conventions of the languages in which they present their code. They should be ashamed of themselves for continuing to perpetrate this abomination.

First, prove that it’s possible

Building on my previous post about finding a working solution first, before thinking of optimization issues.

I have been fortunate to work on many interesting problems in my career, and I have used that approach every time: first, find something that works. Maybe it works with a subset of the actual data, or solves the problem for specific cases. Maybe the initial solution takes two full days to come up with a solution using only 0.1% of the data. Or perhaps it’s only a partial solution.

That’s okay! As they say, Rome wasn’t built in a day.

Software development involves a certain amount of discovery: figuring out how to do things. The first part of figuring out how to do something is determining if it’s even possible. You need an existence proof. Once you have an existence proof, everything else is just optimization.

Don’t laugh. I’m not trying to minimize the amount of work that might be involved in optimizing a general solution. Rather, I’m saying that no amount of optimization can possibly help if you don’t know whether a solution is possible.

As a senior developer, I’ve dealt with this too many times. Management decides they want a new feature and immediately builds a team to start implementing it, not stopping to ask themselves whether a solution is possible. In one memorable case the problem they wanted to solve was essentially a 10,000 node traveling salesman problem to which they wanted the optimum solution. They built a team with a half dozen programmers and set them to work building the backend infrastructure and asked me to lead the development effort. It didn’t take me very long to determine that what they wanted to do was impossible. It took months to convince them that we could provide a “good enough” solution (at a huge cost, I might add), but that we couldn’t provide a guaranteed optimum solution at any price.

Programmers often aren’t any better. Faced with a problem, many programmers immediately begin searching their brains for all the optimization tricks they’ve learned over the years. That’s all well and good, except they often forget the first step: is a solution even possible.

Upon reflection, that is one of the key differences between a computer science curriculum and a programming job. An undergraduate computer science curriculum teaches one how to solve problems that have known solutions. The basic assumption behind every assignment is that a solution does in fact exist. The student’s task is to discover and implement that solution. But in industry we are often faced with problems for which there is no known solution. We have to determine whether a solution is even possible.

I’ll admit that, even now, I usually approach new problems with the assumption that a solution is possible. But, having been around the block a time or three, I’m prepared for the possibility that a solution is not possible, or not possible given the constraints under which I am working. That’s where things get really interesting. Anybody can write a program to find “the answer.” It takes an entirely different set of skills to write a program that selects the “best” or “good enough” answer within the constraints of time and budget.

Find a solution first

I’ve mentioned before that I contribute to answering questions on StackOverflow. Something I see all too often is a question asking for “the most efficient way” to solve a first- or second-semester homework problem. It’s clear from the content of the question that the person asking has absolutely no idea how to approach finding a solution.

The best advice I ever got about computer programming, and I was fortunate to get this advice very early on, was essentially “Find a solution. Any solution. Prove that it works. Then find a more efficient solution.” I got that advice 45 years ago and it’s never failed me. In fact, the times I’ve ignored that advice and gone straight to looking for an efficient solution have been some of the most difficult.

There are two primary benefits to finding a simple (non-optimum) solution:

  1. You gain insight into the problem by observing how your solution works.
  2. You have a known-good solution against which you can test any optimized solution.

Don’t discount the power of those benefits, especially in entry-level computer science work. The assignment is, essentially, “solve the problem.” A simple solution fulfills the goals of the assignment. Do that first! Then, if you’re curious (or can get extra points), spend some time looking for a more efficient method.

This advice helps not only in school, but also once you get out in the workforce. Very often you will be tasked with solving a problem like a data conversion: a one-time job. The simple solution might take a day’s worth of effort, combined manual work and computer processing. Or you could spend a week developing an optimized solution that will convert the data in an hour of processing time. Which do you choose? I can tell you up front that your employer will be much happier if you get it done tomorrow rather than next week.

Sometimes the non-optimum solution is good enough. Implement it and move on.

You can write a working program and then spend a bit more time making it fast. Or, you can write a fast program and spend the rest of your career trying to make it work.