Fun with coin flipping

Whatever else it is, randomness is peculiar. More peculiar still is our perception of what is or is not random. Consider, for example, the case of flipping a coin five times. Suppose you do that and get the sequence T,H,H,T,H (T is tails, H is heads). That seems perfectly reasonable, right? Random? Imagine instead that you got H,T,H,T,H. You’d probably think the alternating heads and tails is pretty weird, but you still got almost the same number of tails as heads which is pretty much the definition of what people think of as “random.”

If you got H,H,H,H,H, you’d probably think that the coin was rigged. It could be, but it could also be a perfectly fair coin.

Given five flips of a fair coin, there are 25 (32) possible sequences that can occur. You’re equally likely to get all heads as you are to get something like the first sequence I showed, or all tails, or any other sequence. The all heads result just seems “not random” because it’s so distinctive. But there is a one in 32 chance of that or any other result.

Pick up your coin again and flip it five times, writing down each result. Was the sequence T,T,H,T,H? For approximately one out of 32 people who do this exercise, it will be.

Think of it this way. Write down a number from 1 to 32, go to random.org and have it generate a number in that range. Would you be very surprised if it picked your number on the first try? More or less surprised than if you flipped a coin five times and got all tails?

Instead of getting random.org to generate a number for you, you could ask somebody to guess what it is. But that wouldn’t be a fair test because people don’t pick numbers randomly. Asked to select a number in some range, people are more likely to pick an odd number than an even number, more likely to pick primes, and the chosen number is usually in the bottom third or top third of the range. See for example, Is 17 the “most random” number? and Probability, algorithmic complexity, and subjective randomness. Or just do a Google search for “psychology of randomness.”

Because you probably picked a number using the same criteria, other people have a better chance of guessing your number than does a computer that gives each number equal weight. Of course, now that you know what most people do, you’ll probably pick a number that others are less likely to guess.

What most people think of as “randomness” is really complexity. The more complex a sequence looks, the more “random” we perceive it. THHTH appears more complex than HHHHH, and therefore more “random” in our eyes. It’s what researchers call subjective randomness.

Not only are people terrible at perceiving randomness, they’re also terrible at generating randomness. Asked to flip a coin 100 times and write down the results, many college students will “cheat” and forego flipping the coin. They’ll just write down what they think is a random sequence. It’s usually easy to catch them because the idea of a run of four or five tails just seems “not random.” But getting five heads or five tails in a row is very common given 100 flips of a fair coin.

Let me demonstrate with a smaller set.

There are 32 possible sequences of five flips, which I’ve reproduced below. I’ve substituted ‘0’ for ‘T’ and ‘1’ for ‘H’ because 0 and 1 are more visually distinct than T and H, and because I’m accustomed to using 0 and 1 for binary results.

00000 01000 10000 11000
00001 01001 10001 11001
00010 01010 10010 11010
00011 01011 10011 11011
00100 01100 10100 11100
00101 01101 10101 11101
00110 01110 10110 11110
00111 01111 10111 11111

The sequence 00 occurs in 19 of those 32 results. So the likelihood of getting two tails in a row in a sequence of five flips is 19/32, or about 59%. The odds of not getting two tails in a row, then, is 13/32, or 41%. Getting two tails in a row wouldn’t be strange. Neither would getting two heads. What would be strange is not getting either two heads or two tails in a row. There are only two sequences that don’t have a matching pair (01010 and 10101), so in 30 out of 32 cases (94% of the time), there will be either two heads in a row or two tails in a row.

It’s not uncommon, in fact, to see a run of three tails. In the results above, there are eight sequences with 000, so 25% of the time you’ll see a run of three tails. In just five flips of the coin!

You can actually compute the probability of not getting a run of k tails in n tosses using something called a Fibonacci k-step number. I used that formula to verify the results I derived above.

There are 2100 (a really big number) possible sequences of 100 flips. I can’t list them all, but I can use the formula to determine the likelihood of not getting a run of five tails in a sequence of 100 flips. I just had to compute the 102nd item in the 5-step sequence. Here’s the method I used.

    private BigInteger KStep(int k, int n)
    {
        // Calculate the nth value in the Fibonacci k-step number number sequence
        BigInteger[] all = new BigInteger[n];
        all[0] = 1;
        all[1] = 1;
        for (int i = 2; i < n; ++i)
        {
            BigInteger rslt = all[i - 1];
            for (int j = i - 2; j >= 0 && j >= i - k; --j )
            {
                rslt = rslt + all[j];
            }
            all[i] = rslt;
        }

        return all[n - 1];
    }

That’s not the most efficient implementation possible, but it did the job. The result I got for KStep(5, 102) is 240,714,680,556,315,819,945,145,376,976, or about 2.41 x 1029. I have to divide that by 2100 (1.27 x 1030), the result of which is 0.1899. So given 100 flips, 19% of the time I won’t see a run of five tails.

Because I don’t have a trillion lifetimes to generate and examine all the possible 100-flip sequences, I wrote a little function that will generate a million different n-flip sequences, each time checking to see if it got a run of k tails. It keeps track of how many times there isn’t a run, and reports the results. Called with Flipper(5, 100), it reports a value pretty close to 19% every time.

    private const int NTries = 1000000;
    private void Flipper(int k, int n)
    {
        var rnd = new Random();
        string testString = new string('T', k);
        int runCount = 0;
        // how often do we NOT get k tails in n flips?
        for (int i = 0; i < NTries; ++i)
        {
            var sb = new StringBuilder(n);
            for (int flip = 0; flip < n; ++flip)
            {
                var x = rnd.Next(2) == 0;
                sb.Append(x ? 'H' : 'T');
            }
            if (!sb.ToString().Contains(testString))
            {
                ++runCount;
            }
        }
        Console.WriteLine("{0:N0} out of {1:N0}: {2:P2}",
            runCount, NTries, (double)runCount/NTries);
    }

If you play with the program a bit, you’ll find that runs of six tails happen more than half the time, runs of seven happen about a third of the time, and you’re twice as likely to get a run of 10 than not get a run of four. File that one away in the “Math that just don’t feel right” drawer.

A couple of LINQ quickies

I’ve recently had occasion to write LINQ versions of two functions I’d written some time back. It proved instructive.

Random numbers that sum to N

Doing some testing, I needed an array of 100 random floating point numbers that sum to 1.0. That is, given an array a(a[0] + a[1] + a[2] ... + a[99]) = 1. The easiest way I know of to do that is to generate an array of 100 random values, sum them, and then divide each number by the sum.

    const int ArraySize = 100;
    Random rnd = new Random();
    double[] a = new double[ArraySize];
    double sum = 0;
    // generate the values
    for (int i = 0; i < a.Length; ++i)
    {
        a[i] = rnd.NextDouble();
        sum += a[i];
    }

    // now divide each item by the sum
    for (int i = 0; i < a.Length; ++i)
    {
        a[i] /= sum;
    }

The math behind this is pretty simple. What you end up with is (a[0]/sum + a[1]/sum + a[2]/sum ... + a[99]/sum), which works out to sum/sum, which is 1.

Somebody asked me for a LINQ version.

    var temp = (from d in Enumerable.Range(0, ArraySize)
                select rnd.NextDouble()).ToArray();
    var sum = temp.Sum();
    var a = temp.Select(d => d/sum).ToArray();

An obvious extension to this is generating an array of N values that sum to some arbitrary value. The first part of the algorithm is the same: generate N numbers, and sum them. Then, rather than dividing each number by the sum, divide by (sum/desired_value). Or, you can multiply by (desired_value/sum). So the final LINQ expression becomes:

    var a = temp.Select(d => d*(desiredValue/sum)).ToArray();

And don’t worry about the division in the inner loop there. The compiler (either the C# compiler, or the JIT compiler) will move it out of the loop.

Finding palindromes

At a previous job, one part of the interview process for entry-level programmers was having them solve some simple programming puzzles on the white board. I called it The White Board Inquisition. A common exercise was to write a function that determines if an input string is a palindrome. We’d keep it simple and have them look for an exact match. So “amanaplanacanalpanama” would be a palindrome, but “A man, a plan, a canal, Panama!” would not be. Stripping out the non-alphabetic characters and doing case-insensitive matching is a simple refinement, but we wanted to see how the candidate approached the primary problem.

The simplest way, of course, is to reverse the string and do a comparison, but we wouldn’t allow them to call a reverse method. If they wanted to reverse the string, they’d have to do it themselves. Many did. Another common approach is to push the characters onto a stack and then pop them, comparing the popped values with the string characters, in order.

Few candidates tripped to the in-place method of comparing the characters starting from the outside and moving in, like this.

    private bool IsPalindrome(string input)
    {
        int i = 0;
        int j = input.Length - 1;
        while (i < j)
        {
            if (input[i++] != input[j--])
                return false;
        }
        return true;
    }

Somebody recently asked me for a LINQ version of a palindrome finder. The obvious solution is to employ the Reverse extension method and then compare the sequences. But it’s easy enough to duplicate the above algorithm.

    private bool IsPalindrome(string input)
    {
        return
            Enumerable.Range(0, input.Length / 2)
                        .Select(i => input[i] == input[input.Length - i - 1])
                        .All(b => b);
    }

A common misconception is that the LINQ code will do those three operations in sequence: generate a list of indexes, then generate a list of results, and then compare the results. That most definitely is not how the generated code works. If you enable runtime library source stepping, you’ll see that the sequence of operations follows the looping construct very closely. The LINQ solution is slower, of course, because it’s working with iterators and such, but it isn’t stupid.

The more I work with LINQ, the more impressed I am by what I can do with it. It’s not always the fastest executing solution, but for most of what I do that doesn’t matter. I find that the expressiveness leads to much shorter, more easily understood, and more reliable code than writing large looping constructs. I imagine that interviewers these days expect candidates to demonstrate a thorough understanding of LINQ. I know that I certainly would.

Null parameters in extension methods

There is the question of how to handle null parameters in extension methods. Consider this extension method that returns the number of digits in a string.

    public static class MyExtensions
    {
        public static int CountDigits(this string input)
        {
            var count = 0;
            foreach (var c in input)
            {
                if (char.IsDigit(c))
                {
                    ++count;
                }
            }
            return count;
        }
    }

Understand, this is just an example. I realize that I can replace all that code with a single function call, and in fact I wouldn’t actually write such a simple extension method to replace that one line of code in my program. Bear with me.

After running the code below, result will be equal to 10.

    string foo = "(555)123-4567";
    var result = foo.CountDigits();  // returns 10

So what should happen if I call that method when foo == null? Let’s see what the Framework does when I call a string method on a null reference.

    string foo = null;
    string lowerFoo = foo.ToLower();

Not surprisingly, that throws NullReferenceException. And, not surprisingly, the CountDigits method also throws NullReferenceException when foo == null. The difference, though, is important. If you look at the exception detail you’ll find that the call to foo.ToLower reports the exception as being thrown on that line. The stack trace for the call to CountDigits looks like this.

System.NullReferenceException was unhandled
  HResult=-2147467261
  Message=Object reference not set to an instance of an object.
  Source=sotesto
  StackTrace:
       at Test.MyExtensions.CountDigits(String input) in c:\Dev\Jim\Testo2012\sotesto\Program.cs:line 37
       at Test.Program.DoStuff() in c:\Dev\Jim\Testo2012\sotesto\Program.cs:line 28

The exception is thrown on the line containing the foreach statement.

If you understand how extension methods work, this shouldn’t be surprising. After all, the code foo.CountDigits(); is really just shorthand for MyExtensions.CountDigits(foo);.

The exception is saying that the error exists in the extension method when in reality the error exists at the call site. This is an important distinction. Imagine that you’re calling an extension method that exists in a library for which you have no source. If that extension method throws NullReferenceException, you will likely assume that the error is in the extension method. And nothing in the stack trace will tell you otherwise. There is nothing to indicate that it threw that exception because you passed a null parameter.

That is incorrect behavior.

Don’t believe me? Let’s see what happens if I rewrite that extension method.

    public static class MyExtensions
    {
        public static int CountDigits(this string input)
        {
            return input.Count(char.IsDigit);
        }
    }

Yes, that duplicates the functionality of the previous version. But if you call it with a null parameter, the result is quite different.

System.ArgumentNullException was unhandled
  HResult=-2147467261
  Message=Value cannot be null.
Parameter name: source
  Source=System.Core
  ParamName=source
  StackTrace:
       at System.Linq.Enumerable.Count[TSource](IEnumerable`1 source, Func`2 predicate)
       at Test.MyExtensions.CountDigits(String input) in c:\Dev\Jim\Testo2012\sotesto\Program.cs:line 36
       at Test.Program.DoStuff() in c:\Dev\Jim\Testo2012\sotesto\Program.cs:line 28

Here we see that the Enumerable.Count extension method threw ArgumentNullException, and actually told me why: the source parameter cannot be null. But, again, it looks like the error is due to the CountDigits extension method passing a null argument.

The answer to the question of what to do with null parameters to extension methods is simple: check them and throw ArgumentNullException.

    public static int CountDigits(this string input)
    {
        if (input == null)
        {
            throw new ArgumentNullException("input", "Value cannot be null.");
        }
        return input.Count(char.IsDigit);
    }

Now you know exactly why the error occured: the code that called CountDigits passed a null parameter.

The above is how the Framework methods handle null parameters. It’s considered a best practice. More importantly, this is the only way to know for sure where the real error lies.

Confusing traffic signal

The two intersections near my office have traffic lights that are unlike any others I’ve seen, and I fear that they are confusing to drivers and therefore dangerous.

On the north and south sides of the intersection there are three lanes: a left turn lane, a through lane, and a right turn lane. On the east and west sides, there are four lanes (two through lanes). All four sides of the intersection have the normal red/yellow/green through lights and a left turn light.

A traditional traffic light is green, turns yellow for a short period, and then turns red. They have been this way for decades–certainly for the 40+ years I’ve been aware of the way that traffic lights work. Left turn lights work much the same way. The left turn arrow might work on a different schedule than the through lights, but it traditionally follows the green to yellow to red pattern. Or, it might have only green and off, meaning that you have to yield to oncoming traffic when the light is off, but the oncoming traffic has a red light when the left turn light is green.

The left turn lights at this intersection are different. And confusing. They blink yellow to mean, “You may turn after yielding to oncoming traffic.” After the blinking yellow period, they turn green. So drivers have to know that solid yellow means “red is soon to come,” but blinking yellow means, “green is soon to come.”

It doesn’t help, of course, that years of experience have taught drivers that in practice a yellow left turn signal means, “Hurry up through the intersection before the light turns red.”

Every day I see drivers not fully understanding what that blinking yellow means. Some, when they see the blinking yellow, dash through the intersection heedless of oncoming traffic. I can only conclude that they don’t understand that the rules have changed and a light might start blinking yellow. Other drivers see the blinking yellow and hesitate when it’s clear that there is no oncoming traffic and they’re fully within their rights to proceed.

I realize that blinking yellow means “proceed with caution.” But traditionally that’s done with all of the lanes at an intersection (going the same way, of course). But drivers aren’t accustomed to seeing blinking yellow on one light, and solid red or green on the other lights. And they’re certainly not prepared for a blinking yellow light to suddenly go green.

I’ll grant that I’ve only seen two crashes (after the fact–I wasn’t present for the actual impact) at these intersections in the three months I’ve been here. But I’ve seen too many close calls to believe that this unorthodox use of the signal lights is a good idea. I can only hope that those in charge of the traffic lights will see this dangerous situation for what it is and reprogram the lights back to something more reasonable.

Regular expressions, optimization, and practicality

Suppose you wanted a function that would replace multiple instances of a character with a single instance. So the string “+++This+is+++++a++test++” would become “+This+is+a+test+”. The easiest way to do it in C# is to employ regular expressions.

    string testo = "+++This+is+++++a++test++";
    string result = Regex.Replace(testo, @"\++", "+");

The leading slash is there because '+' is a special character in regular expressions. The slash “escapes” the character so that it will be treated as a literal.

By the way, you can’t do this reliably with String.Replace. If you wrote:

    string result = testo.Replace("++", "+");

You would end up with “++This+is+++a+test+”. String.Replace makes a replacement and moves on, so it will miss when the replacement string plus the following characters is equal to the search string.

Curious, I thought I’d write my own function to remove duplicate characters and see how it fares against the regular expression. The result is this RemoveDupes method.

    public static string RemoveDupes(string text, char dupeChar)
    {
        if (string.IsNullOrEmpty(text))
        {
            return text;
        }

        var result = new StringBuilder(text.Length);
        bool dupe = false;
        for (int x = 0; x < text.Length; ++x)
        {
            char c = text[x];
            if (c == dupeChar)
            {
                if (!dupe)
                {
                    result.Append(c);
                    dupe = true;
                }
            }
            else
            {
                result.Append(c);
                dupe = false;
            }
        }
        return result.ToString();
    }

That’s the most straightforward way I could think of to write it, so I wouldn’t be surprised if there are some optimizations that would make it faster.

I created the regular expression like this:

    var re = new Regex(@"\++", RegexOptions.Compiled);

And in the timed loop, called it with:

    result = re.Replace(testString, "+");

That should make the regular expression as fast as it can be.

Performance testing was a little bit surprising. The simple RemoveDupes method was 5 to 8 times as fast as the regular expression version, depending on the length of the strings and the percentage of repeated '+' characters in the string.

That the custom code was faster didn’t particularly surprise me; I’ve proven many times that a custom parser can outperform regular expressions. Two or three times as fast is what I expected. Up to eight times as fast came as a surprise.

One million iterations with a string 100 bytes long took the regular expression version 4,767 milliseconds, or 0.004768 ms per call. The same string run through the RemoveDupes method took 793 ms, or 0.000793 ms per call. The custom method is six times as fast, but the absolute difference for those million iterations is only four seconds.

Granted, the absolute differences get larger as the number of iterations or the length of the string increases. The timings (and the differences) increase linearly. If it takes RemoveDupes some N seconds to complete, it will take the regular expression approximately 6*N seconds.

But does it really matter? Most likely not. In most cases, sanitizing the input data takes up a very small percentage of the total CPU time. Optimizing it just doesn’t yield meaningful performance gains. Unless removing duplicated characters from strings is the most time consuming part of the program, this huge boost in performance would mean nothing.

Would you really notice a forty second difference in the running time of a program that takes an hour?

The other practical matter has to do with time to implement. The regular expression code already works. It took some (small) amount of time to write and test my RemoveDupes method, and the result is a very narrowly-targeted solution: it can remove duplicated characters from a string. Even the seemingly simple modification of letting it de-dupe 2-character strings would be a much more involved job, especially when you take into account the weird edge cases inherent in Unicode. Writing and debugging that would be difficult.

I think I’ll stick with regular expressions for this kind of job, even though they’re slower than the custom parser. Unless I really need the speed.

Are jagged arrays faster than rectangular arrays?

In reading about C# performance optimization, I often encounter the conventional wisdom that jagged arrays are faster than rectangular arrays. That is, given these two arrays,

    byte[,] rectArray = new byte[NumRows, NumCols];
    byte[][] jaggedArray = new byte[NumRows][];

The assertion is that accessing jaggedArray will be faster.

I’ve rarely seen any kind of benchmark to back up that assertion, and of the benchmarks I have seen many are fundamentally flawed. I’m exploring a project that will make heavy use of arrays, so I thought I’d look into the relative performance of both array types.

Before I do that, though, I should mention that often you don’t have a choice of what type of array to use. If your job is to process a two-dimensional array of bytes that somebody passes to your program, you’re most likely better off processing it in the format that it was given to you. Converting the data to your preferred “faster” format and converting it back to the native format is going to take a lot of time: probably more time than you’ll save with your “faster” processing code.

Even if you’re writing new code, the best array representation is almost always the one that best fits your model. A rectangular array (i.e. byte[,]) usually leads to simpler code, but jagged arrays (i.e. byte[][]can be better in many situations. Jagged arrays are almost always better when you need an “array of arrays.” That is, when the second dimension is variable.

That said, let’s benchmark some simple operations with the two different types of arrays. The platform I’m using is:

  • Windows Server 2008 R2
  • Visual Studio 2012, Update 1
  • Intel Xeon X5350 @ 2.67 GHz
  • Program compiled in Release mode with “Any CPU”, targeting .NET 4.5
  • Run in the 64-bit runtime with debugger detached

I have a simple test harness that runs the test method once to eliminate JIT effects (the time taken for the JIT compiler to compile the method), and then starts a Stopwatch and executes the test method the number of times requested. The reported time is the average of all those runs. Here’s the test harness:

    private void Test(Action a, string text, int numIterations)
    {
        Console.Write("{0} ... ", text);
        // do it once to eliminate JIT effects
        a();

        // Now time the iterations.
        var sw = Stopwatch.StartNew();
        for (var i = 0; i < numIterations; ++i)
        {
            a();
        }
        sw.Stop();
        Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds/numIterations);
    }

In these tests, I’m using arrays that have 2,736 rows and 10,944 columns. This corresponds to the number of pixels in a 10 megapixel image, with each pixel taking up 3 bytes. The timings I report below are the average of running each method 1,000 times.

The first test is just a simple traversal of the array that sums all the values. I first wrote it like this:

    private int SimpleSum;

    private void SimpleSumRect(byte[,] array, int rows, int cols)
    {
        SimpleSum = 0;
        for (var i = 0; i < rows; ++i)
        {
            for (var j = 0; j < cols; ++j)
            {
                SimpleSum += array[i, j];
            }
        }
    }

    private void SimpleSumJagged(byte[][] array, int rows, int cols)
    {
        SimpleSum = 0;
        for (var i = 0; i < rows; ++i)
        {
            for (var j = 0; j < cols; ++j)
            {
                SimpleSum += array[i][j];
            }
        }
    }

After initializing the arrays, I run the tests like this:

    Test(() => SimpleSumRect(rectArray, ArrayRows, ArrayCols), "Simple Sum rect", NumIterations);
    Test(() => SimpleSumJagged(jaggedArray, ArrayRows, ArrayCols), "Simple Sum jagged", NumIterations);

The result was rather interesting:

    Simple Sum rect ... 70 ms
    Simple sum jagged ... 70 ms

No difference? I wasn’t really surprised by that, and almost dismissed the conventional wisdom as so much myth. But I was also curious because I have great respect for some of the people making the “jagged is faster” claim. So I modified the test slightly. Rather than summing to a variable at class scope, I made it a local variable, like this:

    private int SimpleSumRect(byte[,] array, int rows, int cols)
    {
        int SimpleSum = 0;
        for (var i = 0; i < rows; ++i)
        {
            for (var j = 0; j < cols; ++j)
            {
                SimpleSum += array[i, j];
            }
        }
        return SimpleSum;
    }

That result was surprising:

    Simple Sum rect ... 39 ms
    Simple sum jagged ... 24 ms

I had no idea that accessing a class-scope variable was so much more expensive than accessing a local variable. I’d have to look at the generated code (both IL and the JIT-generated machine code) to say for sure what’s going on here, but apparently accessing that class-scope variable prevents the compiler from making some optimizations. Making the sum a local variable (that the compiler probably keeps in a register) shows the real difference in the arrays’ performance. I thought the jagged array would be a little faster, but I certainly didn’t expect that much of a difference.

It’s pretty clear here that traversing a jagged array sequentially is hugely faster than traversing an equivalent rectangular array. The primary reason, it turns out, is array bounds checking. When accessing array[i, j], the runtime has to check the bounds of both indexes. You would think that it would have to do the same thing for the jagged array, but the compiler optimizes the code to something akin to this:

    private int SimpleSumJagged2(byte[][] array, int rows, int cols)
    {
        var SimpleSum = 0;
        for (var i = 0; i < rows; ++i)
        {
            var theRow = array[i];
            for (var j = 0; j < cols; ++j)
            {
                SimpleSum += theRow[j];
            }
        }
        return SimpleSum;
    }

So the inner loop only has to do one bounds check per iteration.

In addition, a simple array (i.e. byte[]) is treated differently. It is assumed to be 0-based, so that indexing is a matter of arrayBase + (index * size). The jagged array is a simple array of simple arrays, making indexing faster. See this Stack Overflow answer for a little more detail.

Rectangular arrays, on the other hand, can have non-zero bases. Not in C#, but the C# compiler generates code to be used by the runtime Array class implementation. Indexing into one of these arrays requires a little more math: arrayBase + ((i - dim1Base) * (size * cols)) + ((j - dim2Base) * size) (where dim1Base and dim2Base are the base indexes for each dimension). The compiler can optimize some of that, but it should be obvious that it can’t eliminate all of the extra work.

It’s important not to dismiss the first test as irrelevant. Whereas the second test shows that jagged arrays can be faster than rectangular arrays, the first test shows that other factors can easily defeat any potential gain realized by using the “more efficient” data structure. In this case, simply accessing a class-scope variable in the inner loop made the compiler optimizations meaningless.

I ran another test that modified each byte in the array. The idea here was to eliminate as much as possible all external factors. Here are the test methods.

    private void SimpleTraversalRect(byte[,] array, int rows, int cols)
    {
        for (var i = 0; i < rows; ++i)
        {
            for (var j = 0; j < cols; ++j)
            {
                array[i, j] = (byte)(255 - array[i, j]);
            }
        }
    }

    private void SimpleTraversalJagged(byte[][] array, int rows, int cols)
    {
        for (var i = 0; i < rows; ++i)
        {
            for (var j = 0; j < cols; ++j)
            {
                array[i][j] = (byte)(255 - array[i][j]);
            }
        }
    }

The timings are pretty much what you expect: 40 ms for the rectangular array and 24 ms for the jagged array.

Then I thought I’d do a little optimization myself by venturing into the world of unsafe code and pointers. The result looks more like C than C#, but it does show that you can optimize access to a rectangular array.

    private void OptimizedTraversalRect(byte[,] array, int rows, int cols)
    {
        long count = (((long)rows) * cols)/3;
        unsafe
        {
            fixed (byte* pArray = array)
            {
                byte* p = pArray;
                while (count-- > 0)
                {
                    *p++ = (byte)(255 - *p);
                    *p++ = (byte)(255 - *p);
                    *p++ = (byte)(255 - *p);
                }
            }
        }
    }

    private void OptimizedTraversalJagged(byte[][] array, int rows, int cols)
    {
        for (var i = 0; i < rows; ++i)
        {
            unsafe
            {
                int count = cols/3;
                fixed (byte* pArray = array[i])
                {
                    byte* p = pArray;
                    while (count-- > 0)
                    {
                        *p++ = (byte)(255 - *p);
                        *p++ = (byte)(255 - *p);
                        *p++ = (byte)(255 - *p);
                    }
                }
            }
        }
    }

The timings here are exactly what I expected: 24 ms for the rectangular array and 24 ms for the jagged array. Actually, the rectangular array version is slightly faster, on average, because it doesn’t have to do as much work. Interestingly, if I don’t unroll the inner loop (i.e. only do one operation per iteration), both versions take longer than the original, un-optimized rectangular array version. The compiler is smarter than you think. I was able to outperform the compiler here only because I know that the array’s width is a multiple of three, and it was safe to unroll the loop.

Note that there’s a reason that code is considered unsafe: using the pointer, I circumvent any array bounds checking. Between that and eliminating the indexing calculations, I made traversing the rectangular array a simple sequential walk through memory.

So jagged arrays are faster, right? Well … maybe. What happens if you access them (semi-) randomly?

    private const int Increment = 1009;
    private void RandomRect(byte[,] array, int rows, int cols)
    {
        int count = rows*cols;
        int row = 0;
        int col = 0;
        while (count-- > 0)
        {
            row = (row + Increment) % rows;
            col = (col + Increment) % cols;
            array[row, col] = (byte)(255 - array[row, col]);
        }
    }

    private void RandomJagged(byte[][] array, int rows, int cols)
    {
        int count = rows * cols;
        int row = 0;
        int col = 0;
        while (count-- > 0)
        {
            row = (row + Increment) % rows;
            col = (col + Increment) % cols;
            array[row][col] = (byte)(255 - array[row][col]);
        }
    }

The timings here are telling: 310 ms for the rectangular array and 412 ms for the jagged array. The reason for the huge difference has to do with how the two array types are represented in memory. The rectangular array is one big block of memory. The jagged array is one block of memory for an array of row references, and a separate block of memory for each row. Accessing an arbitrary element in a jagged array, then, requires two separate lookups in memory that can be scattered all over. The result is approximately twice as many cache misses for the jagged array when compared to the rectangular array.

I understand, of course, that my “random” traversal is hardly random. It does, however, provide scattered access to the array elements in a manner that simulates what truly random access would do. I actually wrote a version that used a random number generator, but it took much longer to run and gave almost the same behavior as this simpler version.

So the answer to the question of which type of array is faster is, as you’ve probably figured out by now, “It depends.”

If you can guarantee that external factors (like accessing a class-scope variable) won’t negate the compiler’s optimizations, then using a jagged array can be 40% faster than using an equivalent rectangular array. But if you’re accessing the array randomly, a jagged array can be 33% slower than a rectangular array. The question of which is faster is meaningless without context.

The machine I ran these tests on is pretty old. I’ve also run the tests on a Core i7 processor and obtained similar results. The absolute times were faster (the machine is running at 3.4 GHz), but the percentage difference in the speed of the two array types are similar.

Non-speed considerations

“Efficiency” means more than just raw speed. When discussing performance, we often find that speed and memory efficiency are inversely related. That is, doing something faster requires using more memory. That’s not always true, but it often is. If you’re concerned about efficiency you need to know not just how fast things are, but also how much memory they take and how that memory is allocated.

In addition to having different performance characteristics, jagged arrays and rectangular arrays are stored differently in memory. A rectangular array consists of a single allocation of size rows * cols * element-size, plus about 50 bytes of overhead for array metadata. A jagged array requires somewhat more memory. It consists of an allocation of size rows * sizeof(IntPtr) (plus metadata), and there is a separate allocation for each row, of size cols * element-size. So consider the array I showed above, which is 2,736 x 10,944. The difference in this large array is about 150 kilobytes, or less than one percent in this 30 megabyte array. Imagine, though, if the rows were only 50 bytes long. The jagged array would consume twice the memory of the rectangular array because the per-row overhead of 50 bytes still applies.

There are advantages to that distributed array representation, however. Prior to .NET 4.5, a rectangular array couldn’t occupy more than two gigabytes (slightly less, in reality). So an array of bytes with a million rows (actually, 2^20 rows) and 2,048 columns couldn’t be created. That is, byte[10^20, 2048] would produce OutOfMemoryException, regardless of how much memory you have on your machine.

But you could easily create a short[][] with 10^20 rows and 2,048 columns, because each row would be a paltry two kilobytes in size.

.NET 4.5 introduced the gcAllowVeryLargeObjects configuration file element that lets you allocate objects larger than 2 gigabytes, which alleviates some of the problems, but there are still restrictions as pointed out in the documentation. And it’s often impossible to get a huge chunk of contiguous memory, which the rectangular array needs. So there are still benefits to the jagged array.

So, which to use? Whatever best fits your model. I prefer rectangular arrays in most cases because I find them easier to work with (try initializing a jagged array) and easier to reason about. I also know that I can optimize my array handling code if I need to, using unsafe code and pointers. But every situation is different, and I’ll happily use a jagged array if the situation calls for it.

Only you can decide which array representation is appropriate for your application. You have to weigh the costs and benefits and pick the one that best meets your needs. Don’t immediately fall for the simplistic “jagged is faster” hype that’s all too often spouted as a matter of faith without understanding the wider implications.

Polling for cancellation

Last time, I showed how compiler optimization can cause problems with polling for cancellation, and discussed a few alternatives to solving the problem. At the end of the article I mentioned that my simple program is the uncommon case. Very often (perhaps most often?), your program’s background processing is done by a method in a completely different class, and it’s common for there to be multiple background tasks running. It’s likely that you want a single method call or variable setting to signal cancellation for all of the background tasks.

Before I get to that, I want to address a few comments I received on Facebook in response to the last post.

Ron asks why this isn’t considered a compiler bug. On the face of it, it does look like the compiler is being overly aggressive. However, the compiler doesn’t know that the loop is executing in a separate thread. The compiler doesn’t understand that ThreadPool.QueueUserWorkItem starts a new background thread and that it should therefore widen its scope when examining the cancelRequested variable. Expecting the compiler to understand that is unreasonable, and forcing the compiler to assume that a local variable is accessible by multiple threads would eliminate a large number of perfectly valid optimizations. The compiler is behaving correctly here.

Readers Eric and Pete advocate moving the variable to class scope and adding the volatile qualifier. I agree that that’s the easiest solution and that it would work. I’d probably even use it if I were writing a simple program like the one in the last post. But I consider it a hack. Whereas Pete is right that an anonymous function closing over local variables muddies the concept of local variables, promoting the variable to class scope completely destroys any concept of information hiding. More importantly, there are the problems with volatile that I pointed out in the article. Those problems are irrelevant in this simple case, but I wouldn’t want to deal with them in a production program.

More importantly, as you’ll see below, volatile isn’t a good general solution.

Let’s move on from that toy demonstration program to a more likely real world design.

In the program below, the code that’s executed in the background thread is in a separate class that has no knowledge of or access to the code that calls it. It can’t access a public cancelRequested flag, so the flag is passed by reference.

    public class Program
    {
        private static void Main(string[] args)
        {
            var p = new Program();
            p.DoStuff();
            Console.WriteLine("Press Enter to end program.");
            Console.ReadLine();
        }

        private void DoStuff()
        {
            bool cancelRequested = false;

            // Start tasks
            ThreadPool.QueueUserWorkItem((s) =>
                {
                    var task1 = new Task1Class();
                    task1.DoTask1(ref cancelRequested);
                });

            Console.WriteLine("Press Enter to stop thread.");
            Console.ReadLine();

            // set cancel flag
            cancelRequested = true;
            Console.WriteLine("Cancel requested!");
        }
    }

    public class Task1Class
    {
        public void DoTask1(ref bool cancelRequested)
        {
            Console.WriteLine("Task1 started.");
            while (!cancelRequested)
            {
                // do stuff
            }
            Console.WriteLine("Task1 exit.");
        }
    }

If you run that in the debugger, it will work as expected. But run it without the debugger attached and the thread never exits! Once again, the compiler has optimized things.

Moving the cancelRequested flag to the outer scope doesn’t help, nor does moving it to the outer scope and marking it as volatile. In fact, adding volatile results in this compiler warning:

“a reference to a volatile field will not be treated as volatile.”

As with the example in my previous post, doing certain things in the loop will prevent the compiler from applying the optimization that reveals this programming bug. But do you really want to count on knowing what those rules are? For every platform your code will be running on, in all versions past, present, and future? I didn’t think so.

Passing the cancel flag by reference is a latent bug. Don’t do it.

I have to admit here that I gave some bad advice to my friend who first asked me about this. I was convinced that the compiler wouldn’t perform that optimization on a reference parameter. I was wrong, as this little example shows. I still don’t fully understand the optimization rules, but I’ve come to the conclusion that the compiler assumes single-threaded code in most situations. That makes sense, actually. The compiler would have to avoid a whole host of optimizations if it couldn’t assume single-threaded code.

If you’re writing multi-threaded code, you need to use synchronization primitives in order to control things. Prior to .NET 4.0, a common technique was to use a ManualResetEvent, like this:

    private void DoStuff()
    {
        ManualResetEvent cancelRequested = new ManualResetEvent(false); // <---- 

        // Start tasks
        ThreadPool.QueueUserWorkItem((s) =>
            {
                var task1 = new Task1Class();
                task1.DoTask1(cancelRequested);                         // <----
            });

        Console.WriteLine("Press Enter to stop thread.");
        Console.ReadLine();

        // set cancel flag
        cancelRequested.Set();
        Console.WriteLine("Cancel requested!");
    }

    public class Task1Class
    {
        public void DoTask1(ManualResetEvent cancelRequested)           // <----
        {
            Console.WriteLine("Task1 started.");
            while (!cancelRequested.WaitOne(0))                         // <----
            {
                // do stuff
            }
            Console.WriteLine("Task1 exit.");
        }
    }

The lines with “// <----” comments are the only ones that change.

You can still use that technique in .NET 4.0 or later. You can get slightly better performance using ManualResetEventSlim. I see only two problems with using the events this way.

First, the expression !cancelRequested.WaitOne(0) isn’t exactly clear. It’s not as bad as the Interlocked.CompareExchange example from my last post, but it’s definitely not as clear as !cancelRequested.

More importantly, any thread can call cancelRequested.Set to cancel the entire operation. Granted, background tasks shouldn’t do that, but they could. It would be better if we could pass a read-only token of some sort.

.NET 4.0 formalized the concept of Cancellation, implemented with the CancellationTokenSource and CancellationToken classes. Using these classes, you can implement cancellation in a standard, safe, and reliable manner.

The basic idea is that the main program creates a CancellationTokenSource instance, and passes the Token member (a CancellationToken) to the methods that want to check for cancellation. I’ll let you read up on the details.

Here’s an example.

    private void DoStuff()
    {
        CancellationTokenSource cancel = new CancellationTokenSource();

        // Start tasks
        ThreadPool.QueueUserWorkItem((s) =>
            {
                var task1 = new Task1Class();
                task1.DoTask1(cancel.Token);
            });

        Console.WriteLine("Press Enter to stop thread.");
        Console.ReadLine();

        // set cancel flag
        cancel.Cancel();
        Console.WriteLine("Cancel requested!");
    }

    public class Task1Class
    {
        public void DoTask1(CancellationToken cancelToken)
        {
            Console.WriteLine("Task1 started.");
            while (!cancelToken.IsCancellationRequested)
            {
                // do stuff
            }
            Console.WriteLine("Task1 exit.");
        }
    }

That looks a lot cleaner to me. I can tell just by looking at the code exactly what is going on. Furthermore, there is no way that one of the tasks can inadvertently cancel the entire process because there’s no way for them to set the cancel flag. They can only query it.

Another advantage of this method (and the one that uses events) over the reference flag (which doesn’t work, but even if it did) is that you can pass the CancellationToken to a class’s constructor so that the class can cache it. That way, it doesn’t have to pass the cancellation token to every internal method that might need it. For example:

    private void DoStuff()
    {
        CancellationTokenSource cancel = new CancellationTokenSource();

        // Start tasks
        ThreadPool.QueueUserWorkItem((s) =>
            {
                var task1 = new Task1Class(cancel.Token);
                task1.DoTask1();
            });

        Console.WriteLine("Press Enter to stop thread.");
        Console.ReadLine();

        // set cancel flag
        cancel.Cancel();
        Console.WriteLine("Cancel requested!");
    }

    public class Task1Class
    {
        private readonly CancellationToken _cancelToken;

        public Task1Class(CancellationToken cancelToken)
        {
            _cancelToken = cancelToken;
        }

        public void DoTask1()
        {
            Console.WriteLine("Task1 started.");
            DoSubtask1();
            if (!_cancelToken.IsCancellationRequested)
            {
                DoSubtask2();
            }
            Console.WriteLine("Task1 exit.");
        }

        private void DoSubtask1()
        {
            while (!_cancelToken.IsCancellationRequested)
            {
                // do stuff
            }
        }

        private void DoSubtask2()
        {
            while (!_cancelToken.IsCancellationRequested)
            {
                // do stuff
            }
        }
    }

You can’t store a reference in C# (you can store an instance of a reference type, but that’s a different thing), so there wouldn’t be any way to do this using a simple Boolean flag. You can’t write, for example:

    private ref bool cancelRequested;    // this doesn't compile!

The most important thing I learned from this investigation is:

If you’re writing multi-threaded code, use synchronization primitives for all inter-thread communication.

Furthermore, I’ve shown that volatile is not a synchronization primitive.

If you’re writing for .NET 4.0 or later, then take advantage of Cancellation. Otherwise, use a ManualResetEvent or ManualResetEventSlim to approximate that functionality.