In reading about C# performance optimization, I often encounter the conventional wisdom that jagged arrays are faster than rectangular arrays. That is, given these two arrays,
byte[,] rectArray = new byte[NumRows, NumCols];
byte[][] jaggedArray = new byte[NumRows][];
The assertion is that accessing jaggedArray
will be faster.
I’ve rarely seen any kind of benchmark to back up that assertion, and of the benchmarks I have seen many are fundamentally flawed. I’m exploring a project that will make heavy use of arrays, so I thought I’d look into the relative performance of both array types.
Before I do that, though, I should mention that often you don’t have a choice of what type of array to use. If your job is to process a two-dimensional array of bytes that somebody passes to your program, you’re most likely better off processing it in the format that it was given to you. Converting the data to your preferred “faster” format and converting it back to the native format is going to take a lot of time: probably more time than you’ll save with your “faster” processing code.
Even if you’re writing new code, the best array representation is almost always the one that best fits your model. A rectangular array (i.e. byte[,]
) usually leads to simpler code, but jagged arrays (i.e. byte[][]
) can be better in many situations. Jagged arrays are almost always better when you need an “array of arrays.” That is, when the second dimension is variable.
That said, let’s benchmark some simple operations with the two different types of arrays. The platform I’m using is:
- Windows Server 2008 R2
- Visual Studio 2012, Update 1
- Intel Xeon X5350 @ 2.67 GHz
- Program compiled in Release mode with “Any CPU”, targeting .NET 4.5
- Run in the 64-bit runtime with debugger detached
I have a simple test harness that runs the test method once to eliminate JIT effects (the time taken for the JIT compiler to compile the method), and then starts a Stopwatch
and executes the test method the number of times requested. The reported time is the average of all those runs. Here’s the test harness:
private void Test(Action a, string text, int numIterations)
{
Console.Write("{0} ... ", text);
// do it once to eliminate JIT effects
a();
// Now time the iterations.
var sw = Stopwatch.StartNew();
for (var i = 0; i < numIterations; ++i)
{
a();
}
sw.Stop();
Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds/numIterations);
}
In these tests, I’m using arrays that have 2,736 rows and 10,944 columns. This corresponds to the number of pixels in a 10 megapixel image, with each pixel taking up 3 bytes. The timings I report below are the average of running each method 1,000 times.
The first test is just a simple traversal of the array that sums all the values. I first wrote it like this:
private int SimpleSum;
private void SimpleSumRect(byte[,] array, int rows, int cols)
{
SimpleSum = 0;
for (var i = 0; i < rows; ++i)
{
for (var j = 0; j < cols; ++j)
{
SimpleSum += array[i, j];
}
}
}
private void SimpleSumJagged(byte[][] array, int rows, int cols)
{
SimpleSum = 0;
for (var i = 0; i < rows; ++i)
{
for (var j = 0; j < cols; ++j)
{
SimpleSum += array[i][j];
}
}
}
After initializing the arrays, I run the tests like this:
Test(() => SimpleSumRect(rectArray, ArrayRows, ArrayCols), "Simple Sum rect", NumIterations);
Test(() => SimpleSumJagged(jaggedArray, ArrayRows, ArrayCols), "Simple Sum jagged", NumIterations);
The result was rather interesting:
Simple Sum rect ... 70 ms
Simple sum jagged ... 70 ms
No difference? I wasn’t really surprised by that, and almost dismissed the conventional wisdom as so much myth. But I was also curious because I have great respect for some of the people making the “jagged is faster” claim. So I modified the test slightly. Rather than summing to a variable at class scope, I made it a local variable, like this:
private int SimpleSumRect(byte[,] array, int rows, int cols)
{
int SimpleSum = 0;
for (var i = 0; i < rows; ++i)
{
for (var j = 0; j < cols; ++j)
{
SimpleSum += array[i, j];
}
}
return SimpleSum;
}
That result was surprising:
Simple Sum rect ... 39 ms
Simple sum jagged ... 24 ms
I had no idea that accessing a class-scope variable was so much more expensive than accessing a local variable. I’d have to look at the generated code (both IL and the JIT-generated machine code) to say for sure what’s going on here, but apparently accessing that class-scope variable prevents the compiler from making some optimizations. Making the sum a local variable (that the compiler probably keeps in a register) shows the real difference in the arrays’ performance. I thought the jagged array would be a little faster, but I certainly didn’t expect that much of a difference.
It’s pretty clear here that traversing a jagged array sequentially is hugely faster than traversing an equivalent rectangular array. The primary reason, it turns out, is array bounds checking. When accessing array[i, j]
, the runtime has to check the bounds of both indexes. You would think that it would have to do the same thing for the jagged array, but the compiler optimizes the code to something akin to this:
private int SimpleSumJagged2(byte[][] array, int rows, int cols)
{
var SimpleSum = 0;
for (var i = 0; i < rows; ++i)
{
var theRow = array[i];
for (var j = 0; j < cols; ++j)
{
SimpleSum += theRow[j];
}
}
return SimpleSum;
}
So the inner loop only has to do one bounds check per iteration.
In addition, a simple array (i.e. byte[]
) is treated differently. It is assumed to be 0-based, so that indexing is a matter of arrayBase + (index * size)
. The jagged array is a simple array of simple arrays, making indexing faster. See this Stack Overflow answer for a little more detail.
Rectangular arrays, on the other hand, can have non-zero bases. Not in C#, but the C# compiler generates code to be used by the runtime Array
class implementation. Indexing into one of these arrays requires a little more math: arrayBase + ((i - dim1Base) * (size * cols)) + ((j - dim2Base) * size)
(where dim1Base
and dim2Base
are the base indexes for each dimension). The compiler can optimize some of that, but it should be obvious that it can’t eliminate all of the extra work.
It’s important not to dismiss the first test as irrelevant. Whereas the second test shows that jagged arrays can be faster than rectangular arrays, the first test shows that other factors can easily defeat any potential gain realized by using the “more efficient” data structure. In this case, simply accessing a class-scope variable in the inner loop made the compiler optimizations meaningless.
I ran another test that modified each byte in the array. The idea here was to eliminate as much as possible all external factors. Here are the test methods.
private void SimpleTraversalRect(byte[,] array, int rows, int cols)
{
for (var i = 0; i < rows; ++i)
{
for (var j = 0; j < cols; ++j)
{
array[i, j] = (byte)(255 - array[i, j]);
}
}
}
private void SimpleTraversalJagged(byte[][] array, int rows, int cols)
{
for (var i = 0; i < rows; ++i)
{
for (var j = 0; j < cols; ++j)
{
array[i][j] = (byte)(255 - array[i][j]);
}
}
}
The timings are pretty much what you expect: 40 ms for the rectangular array and 24 ms for the jagged array.
Then I thought I’d do a little optimization myself by venturing into the world of unsafe code and pointers. The result looks more like C than C#, but it does show that you can optimize access to a rectangular array.
private void OptimizedTraversalRect(byte[,] array, int rows, int cols)
{
long count = (((long)rows) * cols)/3;
unsafe
{
fixed (byte* pArray = array)
{
byte* p = pArray;
while (count-- > 0)
{
*p++ = (byte)(255 - *p);
*p++ = (byte)(255 - *p);
*p++ = (byte)(255 - *p);
}
}
}
}
private void OptimizedTraversalJagged(byte[][] array, int rows, int cols)
{
for (var i = 0; i < rows; ++i)
{
unsafe
{
int count = cols/3;
fixed (byte* pArray = array[i])
{
byte* p = pArray;
while (count-- > 0)
{
*p++ = (byte)(255 - *p);
*p++ = (byte)(255 - *p);
*p++ = (byte)(255 - *p);
}
}
}
}
}
The timings here are exactly what I expected: 24 ms for the rectangular array and 24 ms for the jagged array. Actually, the rectangular array version is slightly faster, on average, because it doesn’t have to do as much work. Interestingly, if I don’t unroll the inner loop (i.e. only do one operation per iteration), both versions take longer than the original, un-optimized rectangular array version. The compiler is smarter than you think. I was able to outperform the compiler here only because I know that the array’s width is a multiple of three, and it was safe to unroll the loop.
Note that there’s a reason that code is considered unsafe: using the pointer, I circumvent any array bounds checking. Between that and eliminating the indexing calculations, I made traversing the rectangular array a simple sequential walk through memory.
So jagged arrays are faster, right? Well … maybe. What happens if you access them (semi-) randomly?
private const int Increment = 1009;
private void RandomRect(byte[,] array, int rows, int cols)
{
int count = rows*cols;
int row = 0;
int col = 0;
while (count-- > 0)
{
row = (row + Increment) % rows;
col = (col + Increment) % cols;
array[row, col] = (byte)(255 - array[row, col]);
}
}
private void RandomJagged(byte[][] array, int rows, int cols)
{
int count = rows * cols;
int row = 0;
int col = 0;
while (count-- > 0)
{
row = (row + Increment) % rows;
col = (col + Increment) % cols;
array[row][col] = (byte)(255 - array[row][col]);
}
}
The timings here are telling: 310 ms for the rectangular array and 412 ms for the jagged array. The reason for the huge difference has to do with how the two array types are represented in memory. The rectangular array is one big block of memory. The jagged array is one block of memory for an array of row references, and a separate block of memory for each row. Accessing an arbitrary element in a jagged array, then, requires two separate lookups in memory that can be scattered all over. The result is approximately twice as many cache misses for the jagged array when compared to the rectangular array.
I understand, of course, that my “random” traversal is hardly random. It does, however, provide scattered access to the array elements in a manner that simulates what truly random access would do. I actually wrote a version that used a random number generator, but it took much longer to run and gave almost the same behavior as this simpler version.
So the answer to the question of which type of array is faster is, as you’ve probably figured out by now, “It depends.”
If you can guarantee that external factors (like accessing a class-scope variable) won’t negate the compiler’s optimizations, then using a jagged array can be 40% faster than using an equivalent rectangular array. But if you’re accessing the array randomly, a jagged array can be 33% slower than a rectangular array. The question of which is faster is meaningless without context.
The machine I ran these tests on is pretty old. I’ve also run the tests on a Core i7 processor and obtained similar results. The absolute times were faster (the machine is running at 3.4 GHz), but the percentage difference in the speed of the two array types are similar.
Non-speed considerations
“Efficiency” means more than just raw speed. When discussing performance, we often find that speed and memory efficiency are inversely related. That is, doing something faster requires using more memory. That’s not always true, but it often is. If you’re concerned about efficiency you need to know not just how fast things are, but also how much memory they take and how that memory is allocated.
In addition to having different performance characteristics, jagged arrays and rectangular arrays are stored differently in memory. A rectangular array consists of a single allocation of size rows * cols * element-size
, plus about 50 bytes of overhead for array metadata. A jagged array requires somewhat more memory. It consists of an allocation of size rows * sizeof(IntPtr)
(plus metadata), and there is a separate allocation for each row, of size cols * element-size
. So consider the array I showed above, which is 2,736 x 10,944. The difference in this large array is about 150 kilobytes, or less than one percent in this 30 megabyte array. Imagine, though, if the rows were only 50 bytes long. The jagged array would consume twice the memory of the rectangular array because the per-row overhead of 50 bytes still applies.
There are advantages to that distributed array representation, however. Prior to .NET 4.5, a rectangular array couldn’t occupy more than two gigabytes (slightly less, in reality). So an array of bytes with a million rows (actually, 2^20 rows) and 2,048 columns couldn’t be created. That is, byte[10^20, 2048]
would produce OutOfMemoryException
, regardless of how much memory you have on your machine.
But you could easily create a short[][]
with 10^20 rows and 2,048 columns, because each row would be a paltry two kilobytes in size.
.NET 4.5 introduced the gcAllowVeryLargeObjects configuration file element that lets you allocate objects larger than 2 gigabytes, which alleviates some of the problems, but there are still restrictions as pointed out in the documentation. And it’s often impossible to get a huge chunk of contiguous memory, which the rectangular array needs. So there are still benefits to the jagged array.
So, which to use? Whatever best fits your model. I prefer rectangular arrays in most cases because I find them easier to work with (try initializing a jagged array) and easier to reason about. I also know that I can optimize my array handling code if I need to, using unsafe code and pointers. But every situation is different, and I’ll happily use a jagged array if the situation calls for it.
Only you can decide which array representation is appropriate for your application. You have to weigh the costs and benefits and pick the one that best meets your needs. Don’t immediately fall for the simplistic “jagged is faster” hype that’s all too often spouted as a matter of faith without understanding the wider implications.