Bit twiddling for the win!

(I don’t recall if I wrote about this before. I really need to resurrect my old blog entries.)

This was an interesting puzzle to work on. Some of the programmers I subsequently explained it to just got that glazed look and figured I was invoking black magic. Or higher math.

Imagine you have a list of structures, each of which consists of a string key, two integer values, and some other stuff. In essence:

struct Foo
    string Name;
    int key1;
    int key2;
    // other stuff that's not relevant to this discussion

For this discussion, let’s assume that int is a 32-bit signed integer.

You have a lot of these Foo structures, more than can keep in memory, but for reasons we don’t need to get into you maintain an index that’s sorted by key1 descending, and key2 ascending. That is, in C#:

index = list.OrderByDescending(x -> x.key1)
            .ThenBy(x->x.key2);

Except that (again, for reasons we don’t need to discuss) the key has to be a single value rather than two separate values.

Packing two int values into a long is no problem. But doing it so that the result sorts with key1 descending and key2 ascending? When I first heard this proposed I wasn’t sure that it was possible. But there was a glimmer of an idea . . .

What I’m about to discuss depends on something called “two’s complement arithmetic,” which defines how most computers these days work with positive and negative integers. I’m going to illustrate with 4-bit numbers but the concepts are the same for any size integer value, including the 32-bit and 64-bit numbers we work with regularly.

With 4 bits we can express 16 possible integer values. Unsigned, that’s values 0 through 15. Signed, using the two’s complement convention, the values we can represent are -8 through +7. The table below shows that in detail.

Binary                       Binary
Value   Unsigned  Signed     Value   Unsigned  Signed
0000    0         0          1000    8         -8
0001    1         +1         1001    9         -7
0010    2         +2         1010    10        -6
0011    3         +3         1011    11        -5
0100    4         +4         1100    12        -4
0101    5         +5         1101    13        -3
0110    6         +6         1110    14        -2
0111    7         +7         1111    15        -1

When viewed as signed numbers, the high bit (the bit in the leftmost position) is called the sign bit. When it’s set to 1, the number is considered negative.

Now, the problem. We’re given two signed integers, call them key1 and key2, each of which is four bits in length. We want to combine them into a single 8-bit value so that a natural sort (i.e. sorting the single 8-bit quantity) will result in the records being ordered with key1 descending and key2 ascending.

The first thought is to just pack key1 into the high four bits, and key2 into the low four bits, and treat it as an unsigned quantity. For example, imagine you have these records:

                         Binary      Unsigned
key1 = 4, key2 = 1     0100 0001       65
key1 = 4, key2 = 7     0100 0111       71
key1 = 4, key2 = -8    0100 1000       72
key1 = 4, key2 = -1    0100 1111       79
key1 = -5, key2 = 2    1011 0010       178

Sorting as unsigned quantities, the ordering is wonky. Both key1 and key2 go from 0 to 7, then from -8 to -1.

We need to do some bit twiddling. If we flip the sign bit on key2 and then sort, the table becomes:

                         Binary      Unsigned
key1 = 4, key2 = -8    0100 0000       64
key1 = 4, key2 = -1    0100 0111       71
key1 = 4, key2 = 1     0100 1001       73
key1 = 4, key2 = 7     0100 1111       79
key1 = -5, key2 = 2    1011 1010       90

key2 is now in ascending order! That is, for records that have key1 = 4, the key2 values are in ascending order.

So how do we make key1 sort in descending order? More bit twiddling!

Consider what happens if you flip all of the bits in key1:

Decimal  Binary  Inverted  Result   Decimal  Binary Inverted  Result
  -8      1000    0111        7        0      0000    1111      -1
  -7      1001    0110        6        1      0001    1110       -2
  -6      1010    0101        5        2      0010    1101       -3
  -5      1011    0100        4        3      0011    1100       -4
  -4      1100    0011        3        4      0100    1011       -5
  -3      1101    0010        2        5      0101    1010       -6
  -2      1110    0001        1        6      0110    1001       -7
  -1      1111    0000        0        7      0111    1000       -8 

If we invert all of the bits in key1, then a natural sort orders the keys in descending order.

It’s not magic at all. Given an understanding of two’s complement arithmetic and a little study, there’s nothing to it. I won’t claim to be the first person to come up with this, but I did develop it independently.

The explanation above is for two 4-bit quantities combined into a single 8-bit quantity, but it works just as well with two 32-bit quantities combined into a 64-bit quantity.

Putting it all together, we come up with this C# function. It should be easy enough to port to any other language that allows bit twiddling:

public static long getKey(int key1, int key2)
{
    long key = (uint)~key1;  // invert all the bits of key1
    key <<= 32;

    // flip the sign bit of key2
    uint flipped = (uint)key2 ^ 0x80000000;
    key |= flipped;
    return key;
}

And of course you can write the reverse function to retrieve the two original keys from the combined key.