(I don’t recall if I wrote about this before. I really need to resurrect my old blog entries.)
This was an interesting puzzle to work on. Some of the programmers I subsequently explained it to just got that glazed look and figured I was invoking black magic. Or higher math.
Imagine you have a list of structures, each of which consists of a string key, two integer values, and some other stuff. In essence:
struct Foo
string Name;
int key1;
int key2;
// other stuff that's not relevant to this discussion
For this discussion, let’s assume that int
is a 32-bit signed integer.
You have a lot of these Foo structures, more than can keep in memory, but for reasons we don’t need to get into you maintain an index that’s sorted by key1
descending, and key2
ascending. That is, in C#:
index = list.OrderByDescending(x -> x.key1)
.ThenBy(x->x.key2);
Except that (again, for reasons we don’t need to discuss) the key has to be a single value rather than two separate values.
Packing two int
values into a long
is no problem. But doing it so that the result sorts with key1
descending and key2
ascending? When I first heard this proposed I wasn’t sure that it was possible. But there was a glimmer of an idea . . .
What I’m about to discuss depends on something called “two’s complement arithmetic,” which defines how most computers these days work with positive and negative integers. I’m going to illustrate with 4-bit numbers but the concepts are the same for any size integer value, including the 32-bit and 64-bit numbers we work with regularly.
With 4 bits we can express 16 possible integer values. Unsigned, that’s values 0 through 15. Signed, using the two’s complement convention, the values we can represent are -8 through +7. The table below shows that in detail.
Binary Binary
Value Unsigned Signed Value Unsigned Signed
0000 0 0 1000 8 -8
0001 1 +1 1001 9 -7
0010 2 +2 1010 10 -6
0011 3 +3 1011 11 -5
0100 4 +4 1100 12 -4
0101 5 +5 1101 13 -3
0110 6 +6 1110 14 -2
0111 7 +7 1111 15 -1
When viewed as signed numbers, the high bit (the bit in the leftmost position) is called the sign bit. When it’s set to 1, the number is considered negative.
Now, the problem. We’re given two signed integers, call them key1
and key2
, each of which is four bits in length. We want to combine them into a single 8-bit value so that a natural sort (i.e. sorting the single 8-bit quantity) will result in the records being ordered with key1
descending and key2
ascending.
The first thought is to just pack key1
into the high four bits, and key2
into the low four bits, and treat it as an unsigned quantity. For example, imagine you have these records:
Binary Unsigned
key1 = 4, key2 = 1 0100 0001 65
key1 = 4, key2 = 7 0100 0111 71
key1 = 4, key2 = -8 0100 1000 72
key1 = 4, key2 = -1 0100 1111 79
key1 = -5, key2 = 2 1011 0010 178
Sorting as unsigned quantities, the ordering is wonky. Both key1
and key2
go from 0 to 7, then from -8 to -1.
We need to do some bit twiddling. If we flip the sign bit on key2
and then sort, the table becomes:
Binary Unsigned
key1 = 4, key2 = -8 0100 0000 64
key1 = 4, key2 = -1 0100 0111 71
key1 = 4, key2 = 1 0100 1001 73
key1 = 4, key2 = 7 0100 1111 79
key1 = -5, key2 = 2 1011 1010 90
key2
is now in ascending order! That is, for records that have key1 = 4
, the key2
values are in ascending order.
So how do we make key1
sort in descending order? More bit twiddling!
Consider what happens if you flip all of the bits in key1
:
Decimal Binary Inverted Result Decimal Binary Inverted Result
-8 1000 0111 7 0 0000 1111 -1
-7 1001 0110 6 1 0001 1110 -2
-6 1010 0101 5 2 0010 1101 -3
-5 1011 0100 4 3 0011 1100 -4
-4 1100 0011 3 4 0100 1011 -5
-3 1101 0010 2 5 0101 1010 -6
-2 1110 0001 1 6 0110 1001 -7
-1 1111 0000 0 7 0111 1000 -8
If we invert all of the bits in key1
, then a natural sort orders the keys in descending order.
It’s not magic at all. Given an understanding of two’s complement arithmetic and a little study, there’s nothing to it. I won’t claim to be the first person to come up with this, but I did develop it independently.
The explanation above is for two 4-bit quantities combined into a single 8-bit quantity, but it works just as well with two 32-bit quantities combined into a 64-bit quantity.
Putting it all together, we come up with this C# function. It should be easy enough to port to any other language that allows bit twiddling:
public static long getKey(int key1, int key2)
{
long key = (uint)~key1; // invert all the bits of key1
key <<= 32;
// flip the sign bit of key2
uint flipped = (uint)key2 ^ 0x80000000;
key |= flipped;
return key;
}
And of course you can write the reverse function to retrieve the two original keys from the combined key.