Another little puzzle solved

Given an array, a, of unsorted distinct integers, arrange them so that a[0] < a[1] > a[2] < a[3] > a[4], etc.

So, given the array [3, 1, 7, 5, 9], one possible solution is [1, 7, 3, 9, 5].

One way to do this would be to sort the array and then take the last part, starting from the end of the array, and interleave it with the first. So, for example:

    [3, 1, 7, 5, 9]  // original array
    [1, 3, 5, 7, 9]  // sorted
    [1, 9, 3, 7, 5]  // rearranged into another possible solution

That’s kind of expensive, though. It takes O(n log n) to sort. It would take O(n) to build the rearranged list into a new array. O(n2) to rearrange in place naively. Perhaps you could do an in-place rearrangement in O(n log n).

There’s a faster solution that takes O(n) to rearrange the numbers. Believe it or not, it’s as simple as scanning the array, checking each pair of numbers to see if they meet the condition (a[i] < a[i+1] for the even pairs, and a[i] > a[i+1] for the odd pairs). If a pair doesn’t meet the condition, swap them. Expressed in code, it looks like this:

    for i = 0 to n-2
        if i is even
            // ensure a[i] < a[i+1]
            if (a[i] > a[i+1])
                swap(a[i], a[i+1])
            end if
        else
            // ensure a[i] > a[i+1]
            if (a[i] < a[i+1])
                swap(a[i], a[i+1])
        end

Really, that’s all there is to the algorithm.

Let me explain why it works.

Given three distinct numbers, (a, b, c) there are four possible relationships:

  1. a < b < c
    The first condition is met. We know that a < c, so if we swap b and c, we end up with a < c > b
  2. a < b > c
    Both conditions are already met.
  3. a > b < c
    Here we have to swap a and b to give b < a ? c. But we already know that b < c, so if a < c, swapping a and c still gives us b < c > a.
  4. a > b > c
    We know that a > c, so swapping a and b to meet the first condition maintains the second condition, giving b < a > c.

Okay, so it works with three numbers. Now what happens when you add a fourth? You have a < b > c ? d. If c < d, then of course everything’s fine. If c > d, then we have case 4 above, and we already know that swapping c and d will not invalidate the first condition.

Similarly, given a < b > c < d ? e, if d > e then we move on. Otherwise we have case 1 above, which we know can be resolved by swapping, without affecting the first condition.

I just love the elegance of this solution. It’s so … neat.

Simple multithreading, Part 2

This is the second in a short series about using multithreading to speed traditional input-process-output (IPO) processes. Part 1 described the problem and showed solutions in pseudo code. This part presents C# source code that compares the approaches.

A C# program that reads a text file, processes each line, and outputs the result is a pretty simple thing to write. The only “heavy lifting,” if any, is in the processing part. The input and output are standard. I write a lot of simple utility programs that process text files, and I’ve developed a skeleton program that takes care of all the boilerplate for me. The heart of that template is a method that looks a lot like this:

    void ProcessFile_SingleThread(string inputFilename, string outputFilename,
        Func<string, string> processLine)
    {
        using (StreamWriter outputFile = new StreamWriter(outputFilename))
        {
            foreach (string line in File.ReadLines(inputFilename))
            {
                string rslt = processLine(line);
                outputFile.WriteLine(rslt);
            }
        }
    }

Or, if you really want to be concise:

    File.WriteAllLines(outputFilename,
        File.ReadAllLines(inputFilename).Select(processLine));

Given that, I just have to write a method that takes a string parameter, processes it, and returns a string result. For example, if I just want to reverse the string, I’d write my ReverseLine method:

    
    private string ReverseLine(string s)
    {
        return new string(s.Reverse().ToArray());
    }

And then call it like this:

    ProcessFile_SingleThread("foo.txt", "oof.txt", ReverseLine);

If it really is something as simple as reversing a line, I’d probably use a Lambda:

    ProcessFile_SingleThread("foo.txt", "oof.txt",
        (s) => new string(s.Reverse().ToArray()));

However it’s called, that method opens the output file, reads the input file line-by-line, processes the line, and writes it to the output. That’s really all there is to it.

The method that uses asynchronous reads and writes isn’t much more complicated:

    void ProcessFile_UsingAsync(string inputFilename, string outputFilename,
        Func<string, string> processLine)
    {
        using (var outputFile = new StreamWriter(outputFilename))
        {
            Task writeTask = null;
            using (var inputFile = new StreamReader(inputFilename))
            {
                string line;
                // start reading the first line
                var readTask = inputFile.ReadLineAsync();
                // querying readTask.Result blocks until the task is complete
                while ((line = readTask.Result) != null)
                {
                    // start next read
                    readTask = inputFile.ReadLineAsync();

                    // while that's happening, process the line
                    var rslt = processLine(line);

                    // wait for previous write (if any) to complete
                    if (writeTask != null)
                    {
                        writeTask.Wait();
                    }

                    // start writing the line
                    writeTask = outputFile.WriteLineAsync(rslt);
                }
            }
            // Wait for final write to complete
            if (writeTask != null)
            {
                writeTask.Wait();
            }
        }
    }

I know it looks more involved, but a lot of those lines are just white space. It does take a little more work to start the asynchronous reads and writes, and to wait for them to complete when necessary, but it’s really not that complex.

Unfortunately, that asynchronous version runs slower than the simple single-threaded version. The problem is that asynchronous transactions take some non-zero time to set up, apparently more time than it takes to read/write a single line from/to a text file. If disks weren’t so fast or my processing were much slower, the asynchronous version would be faster because the sub-tasks would overlap.

It turns out that I don’t very often use ReadLineAsync and WriteLineAsync. I do, however, use the asynchronous methods for reading from a FileStream. The older BeginRead / EndRead and corresponding Write asynchronous calls save a lot of time when reading large blocks of binary data. They’re a bit difficult to use at first, but they’re very effective. .NET 4.5 introduced ReadAsync and WriteAsync, which are dead simple to use and perform very well. Converting the method above to process binary data using those methods is a straightforward job.

The last technique I discussed in Part 1 was having three independent tasks: a reader, a processor, and a writer, all of which communicate through queues. The idea is to fire up the reader and have it start putting lines into a queue. The processing thread reads a line from the queue, processes it, and writes the result to a second queue: the output queue. The writer task reads the output queue and writes data to disk. As you can see here, the .NET BlockingCollection class (introduced in .NET 4.0) makes this fairly easy to do.

    const int QueueSize = 10000;
    private BlockingCollection<string> _inputQueue = new BlockingCollection<string>(QueueSize);
    private BlockingCollection<string> _outputQueue = new BlockingCollection<string>(QueueSize);

    private void ReadInput(string inputFilename)
    {
        foreach (var line in File.ReadLines(inputFilename))
        {
            _inputQueue.Add(line);
        }

        _inputQueue.CompleteAdding();
    }

    private void WriteOutput(string outputFilename)
    {
        using (var outputFile = new StreamWriter(outputFilename))
        {
            foreach (var line in _outputQueue.GetConsumingEnumerable())
            {
                outputFile.WriteLine(line);
            }
        }
    }

    void ProcessFile_Multithread(string inputFilename, string outputFilename,
        Func<string, string> processLine)
    {
        var reader = Task.Factory.StartNew(() => ReadInput(inputFilename), TaskCreationOptions.LongRunning);
        var writer = Task.Factory.StartNew(() => WriteOutput(outputFilename), TaskCreationOptions.LongRunning);

        foreach (var line in _inputQueue.GetConsumingEnumerable())
        {
            var newline = processLine(line);
            _outputQueue.Add(newline);
        }

        _outputQueue.CompleteAdding();
        reader.Wait();
        writer.Wait();
    }

Again, it looks more complex than it really is, and I’ve split things up to make it more clear what’s happening. Let’s take a closer look at each part of the program.

The ReadInput method uses the same File.ReadLines method that the single-threaded program uses. But instead of processing the line immediately, it puts the line on the input queue. The queue is sized to hold up to 10,000 lines. If the queue fills up, then the call to Add will block until there is space available. Once all of the lines have been read and added to the queue, the method calls CompleteAdding to notify the queue that no more data will be added. If you subsequently call Add on the queue, it will throw an exception.

The writer thread opens the output file and reads the output queue, writing each line as it’s received to the file. This code makes use of the GetConsumingEnumerable method, which returns an enumerator that makes working with the queue a little nicer. Without it, the code would look something like this:

    string line;
    while (_inputQueue.TryTake(out line, -1))
    {
        processLine(line);
        _outputQueue.Add(line);
    }

The second parameter to TryTake is a delay. The value -1 (in milliseconds) signifies an infinite delay. I find it much more clear to use GetConsumingEnumerable here.

Reading that code, you might wonder how we tell the difference between a queue that’s temporarily empty, and a queue that’s empty because there’s no more data. That’s where CompleteAdding comes in. When somebody calls CompleteAdding on a queue, the IsAddingCompleted property is set to True. If that flag is not set and the queue is empty, then the assumption is that it’s only temporarily empty. But if IsAddingCompleted is True and the queue has no items in it, we know that it will not be getting any more data. All of the data has been consumed.

The main processing loop combines reading the input queue and writing the output queue. And when it’s done consuming the input and filling the output, it calls CompleteAdding on the output queue. Then it waits for both tasks to complete (the reader will already be done).

By the way, you can write all of that in a single method using anonymous methods, like this:

    void ProcessFile_Multithread(string inputFilename, string outputFilename,
        Func<string, string> processLine)
    {
        const int QueueSize = 10000;
        BlockingCollection<string> _inputQueue = new BlockingCollection<string>(QueueSize);
        BlockingCollection<string> _outputQueue = new BlockingCollection<string>(QueueSize);

        var reader = Task.Factory.StartNew(() =>
            {
                foreach (var line in File.ReadLines(inputFilename))
                {
                    _inputQueue.Add(line);
                }
                _inputQueue.CompleteAdding();
            }, TaskCreationOptions.LongRunning);

        var writer = Task.Factory.StartNew(() =>
            {
                using (var outputFile = new StreamWriter(outputFilename))
                {
                    foreach (var line in _outputQueue.GetConsumingEnumerable())
                    {
                        outputFile.WriteLine();
                    }
                }
            }, TaskCreationOptions.LongRunning);

        foreach (var line in _inputQueue.GetConsumingEnumerable())
        {
            processLine(line);
            _outputQueue.Add(line);
        }

        _outputQueue.CompleteAdding();
        reader.Wait();
        writer.Wait();
    }

Which you use is largely a matter of preference. I like that I can limit the scope of the queues this way, although I could have done that by passing them as parameters to the ReadInput and WriteOutput methods. The syntax takes some getting used to, and if the code in the tasks is at all involved the method gets to be pretty long. It’s also tempting to write code that depends too much on variables that are defined in the method body. It’s a balancing act and, as I said, largely a matter of taste. Or a matter of coding standards (that is, somebody else’s taste).

The time it takes the single threaded version to run is easy to compute: input-time + processing-time + output-time. The tasks have to be done sequentially with no overlap, so the whole is exactly equal to the sum of the parts.

The version that does asynchronous I/O runs in time Max(input-time, processing-time, output-time), plus a small amount of time for the first read and the last write. But input-time and output-time are longer because each line read or written requires some time to issue the asynchronous request. So input-time, which in the single threaded version is just (number-of-lines * time-to-read-one-line) becomes (number-of-lines * (request-time + time-to-read-one-line))output-time is similarly affected.

The last version also takes time Max(input-time, processing-time, output-time), but the difference is that input-time is only (startup-time + (number-of-lines * time-to-read-one-line)). The startup-time is almost exactly the same as with the single-threaded version. Same goes for output-time. There is a small amount of per-line overhead in adding and removing things on the queues, but they’re very quick (copy a string reference). As a result, there is a lot more opportunity for the input and output to overlap with each other and with processing.

The single-threaded version of the program is going to be much faster than either of the other two when you’re working with small files or when the per-line processing time is very short. With larger files, the third version becomes more attractive. If the per-line processing is much longer than the per-line input or output time, then you have to choose between the last two versions of the program. The version that uses asynchronous I/O can be attractive in this case because its asynchronous requests occupy thread pool threads for a very short time. If the program is doing other asynchronous processing as well as processing the file, this frees those pool threads. In contrast, the last version keeps two pool threads occupied (but not necessarily busy) for the entire duration of the program. That means two fewer pool threads for other parts of the application.

I use the last version for most of my programs, simply because it performs almost as fast as the single-threaded version for small files, and hugely faster on larger files because the input and output largely overlap. For the smaller files, the slightly increased time just doesn’t matter. But it matters a lot when I’m processing a 30 gigabyte file.

As I think I’ve shown, it’s pretty easy these days to upgrade an old single-threaded IPO program so that it uses multiple threads to increase performance. More importantly, we’ve done it without having to delve into the mysteries of parallel array processing or mucking about with synchronization primitives. There’s nary a lock, semaphore, event, mutex, or any such in the code. Just some simple Tasks, and a concurrent collection. Simple multithreading is easy these days, and by mastering a few simple techniques you can safely increase your program’s throughput.

In most of the programs I’ve done this with, I’ve reduced the time from (input-time + process-time + output-time) to, essentially, output-time, which is often very close to a 50% performance increase. The only times that’s failed is when the per-line (or per-record) processing time is huge, making the program execute in process-time. Still, these techniques essentially eliminate input and output time in those cases. When it takes 10 minutes to read (and 15 minutes to write) a 30 gigabyte file, that’s nothing to sneeze at. But if the processing takes several hours, saving the reading and writing time doesn’t seem like a big win.

It seems like we should be able to do something to reduce that processing time. That’s the subject of the next part.

Know when to stop optimizing

I know I promised that I’d have the second part of my multithreading series, but I haven’t finished writing it up. In the meantime, I ran across this problem today.

Albert Einstein is quoted as saying, “Everything should be made as simple as possible, but not simpler.” That’s probably paraphrasing what he actually said, but the sentiment is the same and applies equally to computer programming and many other fields as it did to his field (theoretical physics). There is beauty and power in simplicity. Today’s lesson is a case in point.

A programmer who interviewed with a mobile development shop was asked how he’d approach the problem of incrementally searching a phone’s list of contacts. For example, the contacts list might contain:

    Earl
    Elaine
    Jerry
    Joe
    Mary
    Melody
    Penelope
    Paul

The desired functionality is that when the user types the first character, the phone should display all of the names that contain that character. So if the user typed “e”, the resulting list would be:

    Earl
    Elaine
    Jerry
    Joe
    Melody
    Penelope

And when the user types the next character, the list would be reduced to those names that contained the new substring. So if the user’s next input was “l”, we’d want to display the contacts that have “el” in their names:

    Elaine
    Melody
    Penelope

The candidate was asked to provide an “efficient” solution to the problem, but apparently the definition of efficiency wasn’t supplied.

I don’t know what the interviewer was expecting, but the interviewee said that he got bogged down trying to develop a solution that makes use of tries. I suspect he was trying to come up with a generalized suffix tree on the fly.

When the candidate asked me about this after his interview, I proposed a much more pragmatic solution. I figured that, given the number of contacts typically found in a phone (a thousand would be a lot), a sequential search that uses the results from the previous search would be plenty fast enough, and dead simple to implement. It would look something like this:

    results = contacts  // initialize results list with all contacts
    search_string = ""
    while (user types letter)
    {
        // append input letter to search string
        search_string = search_string + letter

        // get new results by filtering the previous result set
        results = filter(results, search_string)

        display(results)
    }

    filter(old_results, search_string)
    {
        new_results = new list
        for each contact in old_results
            if contact name contains search_string
                add contact to new_results
        return new_results
    }

The first search has to go through all the contacts, but even searching 1,000 contact names on a mobile phone would be pretty darned quick. Furthermore, the number of contacts to search is likely in the dozens or fewer1 after the first two or three characters are typed. There’s just no point in optimizing a search of such small lists. In addition, this solution takes very little memory and no pre-processing. It only requires the sequential list of contacts whereas the suffix tree approach takes extra memory proportional to the number and lengths of contact names. Finally, you’d have to update or rebuild that suffix tree every time the user adds, changes, or deletes a contact.

One seemingly simple optimization you could make is to store the position of the the found substring along with each contact in the result set. Then, when the next character is typed, you would only have to check to see if the next character in the contact name matched the new character typed. But adding that optimization would complicate things a bit. You’d have to define a structure to hold the contact and substring position, the first search of the entire contacts list would have to be handled separately, and you’d have to explicitly check string lengths before doing the next character check. All to save a few microseconds. It wouldn’t be worthwhile.

I’m all for fancy data structures and efficient algorithms, but they have their place. It’s a certainty that there are faster ways to search a list of names for substrings, but those solutions are more complex, require more memory, take longer to write and test, and are more difficult to modify. All those things are reasonable if you’re searching a very large list, but with the typically small mobile phone contacts list, the few microseconds saved aren’t worth the expense of those faster solutions. When the simple algorithm is fast enough, use it and move on. There are more interesting problems to solve elsewhere.

Next time: part two of simple multithreading. Really.


1 – An analysis of Project Gutenberg text shows that the most common bigram in the English language, “th”, occurs about 3.8% of the time. I realize that proper names won’t exhibit the exact bigram frequency as normal English text, but I suspect it’d be close. Based on that, after only two characters typed my algorithm will be working with, at most, about 4% of the total names list. So instead of 1,000 names, I’d be searching 40 names. With four characters typed, I’d be working with fewer than 10 names.

Simple multithreading, Part 1

I’ve had many programmers tell me that they find the idea of multithreaded programming intriguing, but their problems just don’t lend themselves to being worked on by multiple threads. A little digging usually reveals that the person I’m talking to has exactly the wrong idea about what multithreading is all about.

Apparently, programmers who are unfamiliar with the use of multiple threads of execution believe that multithreading is about two things: massively parallel processing, and synchronization primitives like locks, semaphores, and interlocked operations. Whereas it’s true that parallel processing is possible and synchronization primitives are certainly useful, those things are irrelevant as far as most business applications are concerned. Correctly working with synchronization primitives is hard, especially in a complex program, and not many business application problems need the kind of massively parallel processing that seems to be all the rage in the literature. But business applications can make good use of multithreading.

I’ve developed applications for many types of businesses, and in every one of them there are applications that remind me of the COBOL reporting and transaction processing applications that I wrote in my first programming job. Those programs all take this basic form:

    open output file
    open input file
    while not end of input
        read input record
        process record
        write output
    end while
    close input file
    close output file

In many businesses, almost all of the applications take this form. It’s no wonder so many programmers think that multithreading has nothing to offer them. There is very little literature on how to use multiple threads for such programs. That’s unfortunate, because these input-process-output (IPO) programs lend themselves very well to multithreaded processing.

I find it useful to think of the traditional model above as a factory in which a worker grabs the raw materials from the delivery area, carries them to a workstation where he assembles them, and then carries the finished product to the shipping department before going back for the next batch of raw materials. The worker does everything himself; he’s a single thread of execution and he wastes a lot of potentially productive time walking back and forth between the delivery area, his workstation, and the shipping department.

Now imagine the same factory, but with the worker standing in one place. On his left is a conveyor belt that delivers raw materials. He pulls them off the belt, assembles the product, turns to his right and puts the finished product on a conveyor belt that carries it to the shipping department. Then he turns to the left and picks up the next batch of raw materials that’s already there waiting for him. The worker no longer wastes time carrying things around. In addition, he doesn’t have to wait for raw materials to be delivered, and finished products are whisked away as soon as he’s finished with them. The whole system is more productive because the individual parts can operate independently of each other.

It’s possible to make an IPO program work in a similar manner. One way is with asynchronous I/O operations. In the traditional model shown above, reading an input record consists of issuing a read request and waiting until the operating system can get the record. Sometimes that’s very quick because the data is already in memory and just has to be copied, but other times it can require many milliseconds to get data from the hard drive. Writing a record can be even more time consuming. To execute an asynchronous read, the program issues a request saying, in effect, “go get the next record, put it in this space, and I’ll come back for it later.” The program then goes on to do other things while the operating system is fulfilling the read request.

The IPO model modified for asynchronous I/O looks like this:


    open input file
    issue asynchronous request for record
    while not done
        wait for read to complete
        if end of file
            wait for pending write to complete
            break
        end if
        copy record to working area
        issue asynchronous request for next record
        process record
        wait for pending write to complete
        issue asynchronous write request
    end while

The first read, of course, is synchronous. The program can’t start processing until the first record is read. But subsequent reads take place while the program is processing the record. The same thing goes for the writes. They, too, happen while the program is processing records. Reading and writing are done concurrently, and both overlap with the processing time. The amount of time this saves can be substantial. The “wait for pending read” and “wait for pending write” are usually not waits at all: the operations completed while other processing was going on.

Although asynchronous I/O can give you a nice performance boost, it’s kind of difficult to use in .NET. Older versions of .NET supported the BeginRead / EndRead and BeginWrite / EndWrite asynchronous operations for binary streams. Those are effective, but hard to use. .NET 4.5 introduced ReadAsync and WriteAsync, which are much easier to use. Effective as they are, they do not provide a performance benefit when working with small records. If you have large records, or if your program is doing other processing that takes significant time, the asynchronous methods are well worth looking into.

Prior to .NET 4.5, there was no asynchronous support for reading and writing text files. .NET 4.5 introduced TextReader.ReadLineAsync and TextWriter.WriteLineAsync, but using them doesn’t provide any performance benefit when the processing doesn’t take much time. At least my tests didn’t show an improvement. I suspect that the task switching overhead swamps the actual input and output time and that if I did more processing of the line I’d see some improvement.

Fortunately, there’s another way to leverage multiple threads: a way that more closely simulates the more efficient factory that I described, and one that does provide significant performance improvement. How? By using three long-running tasks. One reads records from the input file and places them into a queue. One reads records from the input queue, processes them, and writes them to an output queue. The third reads the output queue and writes records to the output file. It’s the same concept as using asynchronous I/O, but there’s very little task switching overhead. Rather than start a pool thread for every line read and written, we start long-running threads that serve in-memory queues. It makes a huge difference.

Using queues, the program’s operation looks like this:

    Reader thread
        while not end of input
            read record
            add record to input queue
        end while
    end Reader thread

    Writer thread
        while not end of output
            get record from output queue
            write record to file
        end while
    end Writer thread

    Main thread
        initialize queues
        start Reader thread
        start Writer thread
        while not end of input queue
            get record from input queue
            process record
            put record onto output queue
        end while
        wait for Writer thread to complete
    end Main thread

Again, the idea here is to overlap the three sub tasks. While the reader is gathering records, the worker is processing and the writer is outputting records. Those threads can’t run at full speed all the time, of course, because the input and output queues have finite capacity. But the threads can service the queues fast enough that most of the time the worker doesn’t have to wait for input or output. The key is to have an effecient queue that supports concurrent readers and writers.

I went a little longer than I had planned here, so I’m going to stop for now. Next time we’ll look at some C# code, talk about the concurrent queue data structure, and run some tests to compare performance.

Custom carved knife handle

A fellow carver sent me two knives that he made. The deal is that if I carve one handle and send it back to him, I can keep the other knife. He’s made the deal with many members of the Woodcarving Illustrated Message Board. He ends up with a collection of knives with custom handles, and we each end up with a very nice carving knife. Works for me.

The knives came in back in November while I was in the throes of finishing up my Hundred Birds Project, so I put them on the top of my “to do” list. I took a long break (almost four months) after finishing the birds, and when I started carving again I promptly cut myself, necessitating another stint away from carving.

I really didn’t know what to carve on the knife handle. I considered doing a hillbilly, but some of the other carvers had done caricatures and I wanted mine to stand out. I also considered a ball in cage or some other whimsy, but those would be kind of fragile. I get the idea that these knives will be mostly for show, but I do know that the guy would like to carve with them from time to time. I didn’t want to make a handle that would break with normal use.

So I figured I’d do one of my dogs. But those are only two inches tall and the knife handles are about five inches. After tossing around ideas I finally settled on the dog sitting on a treat, which I called a Yummy Bone.

The knife handle is basswood, which is pretty bland all by itself and is usually painted. My painting is not very good, which is one reason I prefer to work in found wood that doesn’t have to be painted. One wouldn’t paint mesquite, sycamore, walnut, maple, etc. The wood is just too beautiful to cover it in paint!

But Yummy Dog needed paint. So I broke out the acrylics and did the best job I could. The bone is raw sienna. The dog got a couple coats of very light black wash, and then I mixed grey and black, trying to do a charcoal color. The result is … less than what I hoped but better than I expected. I still have a lot to learn about painting.

I let the carving dry for a day and then applied three coats of Deft Satin polyurethane. The shiny part of the blade was stuck in the cork to hold the carving up. I’ll let the new owner buff the poly off the other part of the blade.

The picture below shows the completed carving along with the other, un-carved handle.

The raw handle is five inches long and about one and a quarter inches square. I’m not sure what I’m going to carve on it. I’ve considered another dog, sitting on a fire hydrant. But I’m not sure if that’d be comfortable to carve with. The dog-and-bone handle is surprisingly comfortable.

Ranchero sauce

I went over to my friend Mike’s place one day and he was cooking up some ranchero sauce. Think “thick and chunky salsa,” or look it up. It’s good stuff. Mike was kind enough to tell me how he makes it, and I’ve made four or five batches since then. I’ve modified his recipe, of course, and am still playing with it.

Before you get started, understand that this is not a quick thing to make. It’s going to take you three or four hours of cooking, plus prep and cleanup time.

Equipment you’ll need:

  • A 12-quart stock pot. Don’t use one of those enameled stock pots. You’ll be scraping the bottom and you’re likely to start scraping off the enamel. I use my aluminum brew pot.
  • A potato masher. (Yes, really)
  • A steel spatula for scraping the bottom of the stock pot.
  • An oven mitt or other glove to cover your hand while you’re scraping. It gets hot in that pot. Optionally, get a long-handled spatula, but I’ve had better luck with a short-handled one.
  • Canning jars. I would suggest pint-sized, but you can go larger or smaller if you like. The ingredients below will make between one and two gallons of the stuff (see below for details).
  • Recommended: An outdoor propane cooker, sometimes called a cajun cooker. You can cook this stuff indoors if you want, but be aware that your house will smell strongly of peppers for days afterwards if you do. In addition, your typical electric stove won’t generate as much heat as the propane cooker or gas stove. This will take a lot longer on an electric stove.
  • Optional: A smoker if you want to smoke the veggies before cooking them.
  • A bag of corn chips, for sampling the sauce as it’s cooking.

The first couple of times I made the ranchero sauce, I didn’t smoke the veggies and it turned out just fine. But one day I was smoking a brisket and thought I’d throw the peppers on the smoker for a bit before putting them into the sauce. If you like smoked salsa, it’s definitely worth the trouble.

The ingredients list below is necessarily vague. The number of tomatoes you use will depend on their size. The number of jalapeños will also depend on size and how spicy you want the sauce. I like things with a bit of a bite, so I don’t scrimp on the peppers.

It’s also hard to say how much sauce the ingredients will make. A lot depends on how long you let it boil down. If you like your ranchero sauce kind of runny, this will make about two gallons. Debra and I like ours a lot less runny. The batch I made today made about five quarts.

Ingredients:

  • 12 to 18 large tomatoes. Get the biggest tomatoes you can find at the supermarket. I used 18 today because they were smaller than the gigantic tomatoes I’ve found there before.
  • A dozen or so jalapeño peppers, again depending on size and how hot you want the stuff. I’ve been known to line the entire bottom of my stock pot with them, but Debra doesn’t like the sauce quite that hot. You can use other peppers if you want. I made one batch with a mixture of Ancho chilis, jalapeños, and serranos that turned out just great.
  • Two bunches of cilantro. More or less, depending on your taste.
  • Two big yellow onions. Really big. Use sweet onions (vidalia or 1015) or red onions if you like. I prefer the stronger yellow onions.
  • Two heads of garlic. No, not just two cloves. Again, to taste. Debra and I both love the taste of garlic, so we go a little strong on it. Even three heads of garlic isn’t too much for a batch of ranchero sauce.
  • Salt to taste. I’d suggest starting with two tablespoons. If you’re sensitive to salt, start with one.
  • A small amount (one or two tablespoons) of vegetable oil.

Clean the cilantro and chop it. Include the stems. How fine you chop it is up to you.

Peel the garlic. I know, peeling garlic isn’t fun. Well, peeling garlic the traditional way isn’t fun. But there’s a much faster way. You can peel a whole head of garlic in about 10 seconds. Just watch this video.

If you only want to peel a few cloves, throw the cloves in a glass jar, attach the lid, and shake it a few times. Peeled garlic. No muss, no fuss. I showed that to a friend of mine who’s been working in restaurants for over 20 years, and he was just amazed. I think people would use fresh garlic more often if they knew that peeling it didn’t have to be such a pain in the neck.

Oh, before you do anything with the tomatoes, be sure to peel the labels off of them. For some silly reason, every tomato gets a sticker on it. Drives me crazy, especially when the adhesive is too sticky. One of these days I’ll just give up on the supermarket tomatoes and buy them from the local farmer’s market.

Smoking the veggies

If you’re not going to smoke the veggies, you can skip this step.

First you have to decide just how much smoke flavor you want. That will determine how many of the veggies you smoke. The first time I did it, I just smoked the peppers. Now, I smoke everything.

I cut the tomatoes in half before I smoke them, and I quarter the onions. The peppers and garlic cloves (after they’re peeled) go on whole. I’ve experimented with quartering the tomatoes, but that’s kind of messy. I’ve also sliced the peppers in half the long way before putting them on the grill.

The peppers, onions, and tomatoes go directly onto the grill. For the garlic cloves, I use a disposable grill screen or a small cooking basket.  If you quarter the tomatoes rather than halve them, you might consider putting them on a screen, too.

Close the lid and smoke for an hour or hour and a half. You want to smoke at a pretty low temperature, certainly not hotter than 250 degrees, and preferably under 200. The idea here is to smoke the veggies, not cook them. I start the fire and throw on just one or two pieces of mesquite, keeping the temperature well under 200 degrees for an hour or so.

Cooking

Put a tablespoon of vegetable oil in the stock pot and coat the bottom. Then, put the peppers in to the pot. Put the tomatoes on top of the peppers. If you didn’t smoke the tomatoes, just put them into the pot whole. No need to cut them up.

Put the lid on the stock pot and put the pot on the burner. Turn the heat up a little bit, but not too high. The idea here is to build up heat inside the pot, but you don’t want to char the peppers too much and make them stick to the bottom of the pot.

Let this cook for a good 30 minutes, at least, and possibly up to an hour. This will soften the peppers and the tomato skins will start to split. Don’t be in a hurry here. Let the thing cook.

While the peppers and tomatoes are cooking, chop up the onions and garlic. Dice the onions, but don’t make the pieces too small; they’re going to cook for a while, and you don’t want them to cook totally away. Chunk size is a matter of taste, so I’ll leave it to your discretion. The garlic, too, is a matter of taste. Some people like big monster chunks of garlic. I prefer smaller pieces.

When the tomatoes have all split open and they’re getting really soft, take the potato masher and mush them all up. Really smash things, pressing the potato masher to the bottom of the pot and turning it to help split up the peppers. It’s going to be hot in that pot, so wear an oven mitt or other protection on your hand. And be careful when mashing things. If you press too hard near the edge you’re likely to dump the pot over. Balance things with your other hand on one of the pot handles.

After the first mashing, you’ll still have some big clumps of tomato and large pieces of pepper. Put the lid back on the pot and let it cook some more. Adjust the heat to give it a low boil. Cook for another 30 minutes, mashing occasionally.

You’ll also want to scrape the bottom of the pot with your steel spatula. That will loosen the charred bits that stick to the bottom. You can pull the charred bits out of the sauce if you want, but it’s fine to leave them in.

After many mashings, when you’ve broken up all the big tomato clumps and the pepper pieces are the size you want them, add the onions, garlic, and cilantro. Stir the additions in and put the top back on for 15 minutes or so. Then, remove the top and adjust the heat so that the sauce is at a low boil. Add the salt at this time.

Now you’re just boiling away the excess water. If you don’t like charred bits in your sauce, you’ll need to stir frequently, especially as more water boils off. If you don’t mind the charred bits, scrape the bottom of the pot periodically–every 10 minutes or so. Otherwise you’ll get large chunks of charred stuff. Small pieces are okay. Big charred chunks are not.

This is where you need to break out the corn chips. Spoon some of the sauce into a bowl and let it cool for a bit. Check the consistency. Pull out a corn chip, dip some of the sauce, and taste it. If you think it needs more salt, add some. Repeat until the sauce reaches the consistency that you desire.

Take the pot inside and begin filling the canning jars while the sauce is still hot, and seal the tops. I don’t do the boil bath with the jars, so I let them cool overnight and then put them in the refrigerator. Once sealed, the jars will keep in the ‘fridge for many months. But if you open a jar be sure to finish it off in a week or two. Otherwise it will start to mold. That’s why I don’t recommend jars larger than a pint. One can finish off a pint of the ranchero sauce in a week, two at the outside. But a quart lasts a long time unless you’re a real salsa fanatic.

If you cook up a batch, let me know how it turns out. I’d also be interested in any changes you make to the ingredients or the process.

Happy cooking.

Beware operator precedence

In That’s just wrong!, I described some … unexpected results that occur due to the string concatenation optimization that the C# compiler performs. It seems a bit wonky to me, but perhaps people who do a lot of string concatenation find it useful.

Today I ran across a new one, thank to Eric Lippert’s blog post, A string Concatenation puzzle.

Based on my previous blog post, you know that this code:

    string s = "";
    s = s + 5 + 5;

Will result in the string s containing the value "55". Or, after the compiler’s optimizations: string.Concat(s, 5, 5)

So what would you expect as the result of this code?

    string s = "";
    s += 5 + 5;

How about "10"? Ouch. Operator precedence.

You see, writing s += expression is the same as writing s = s + (expression), so what you end up with is s + (5 + 5). After the compiler is done with its optimization magic, you have s = string.Concat(s, 10).

That seems wonky because in numerical calculations (a + b) + c is the same as a + (b + c). That’s true for strings, as well, but not true when you combine numerical operations and string concatenation. Remember, when you see s + (5 + 5), the compiler generates code to add the two numbers in parentheses and passes that value to String.Concat.

This kind of thing isn’t limited to strings, by the way. You can get yourself in trouble with numeric calculations, too. Consider:

    int a = 2;
    int b = 3;
    b = b + a << 1;  // result = 10

    // or, replace the above with
    b += a << 1;     // result = 7

Operator precedence again. Operator << has lower precedence than operator +. So b = b + a << 1 is evaluated as (b + a) << 1. The other is evaluated as b + (a << 1). Oddly enough, I saw this one trip somebody up the other day.

I think the key here is to understand that a += expression is equivalent to a = a + (expression). The entire expression is evaluated before the addition. Failure to understand that can lead to some rather mystifying results.