Fun with the laser engraver

Monday afternoon I took the Safety and Basic Use class for the Trotec laser engraver at TechShop. The class consists mostly of “lecture”: going over safety considerations, the machine’s controls, and how to use the software (CorelDraw or Adobe Illustrator) to prepare files for sending to the engraver. It was mostly a demonstration with very little hands-on time.

So last night I went up to TechShop and reserved the laser engraver for two hours. The class instructor recommended spending some time doing simple things like cutting out basic shapes or engraving text on scrap material, but I figured I’d do a more ambitious project: engrave a picture onto something.

Having never used Illustrator or CorelDraw before, I spent a couple of hours fiddling with them prior to my scheduled time on the laser. I prepared the picture I was going to engrave and also spent some time just poking around in CorelDraw trying to get familiar with all it can do. I think it’s going to be a steep learning curve.

I also collected a few pieces of cardboard and a small piece of hardboard (Masonite) from the scrap pile. I figured I’d practice on scrap before doing this on an expensive piece of material.

The idea was to reproduce this picture on the laser engraver.

charlie_640I still think that’s the best picture I’ve ever taken of anything. The subject is highly photogenic and I got lucky with the composition.

I first converted the picture to grayscale and ran it through a “pencil sketch” filter to create this:

charlie_Pencil640I also played with the brightness and contrast a bit.

Then I put a piece of cardboard into the laser engraver and printed the picture. It took several passes, including fiddling with the speed and power settings between passes. The result was somewhat curious. This is what it looks like when viewed from directly above.

Charlie_Cardboard1That’s what I saw while watching it printing in the engraver. You can imagine my disappointment. But then I saw it from an angle, which was quite a surprise:

Charlie_Cardboard2I don’t really understand the physics. I know it has something to do with how the light is reflecting off the material, but I don’t know the specifics of it. It’s kind of a neat effect, though.

Aside from that curious effect, though, the picture still isn’t great. But my time was running out and I was annoyed with the cardboard anyway. I suspect that I had the engraver going too fast. I’ll reduce the settings from the recommended value the next time I try to print on cardboard.

Having successfully engraved the cardboard without destroying anything or starting a fire, I put the Masonite in there, input the recommended settings for that material, and printed again. The result was even lighter than on the cardboard. So I reduced the speed from 50 to 20 and re-printed. That produced a very nice picture:

charlie_MasoniteIt’s pixelated on purpose; I had set the thing to do 250 DPI. I might try it again sometime at a much higher resolution. But I’m really happy with the way this one turned out.

I’d call it a successful experiment. I learned a bit about how to use the software, and I got a neat picture of Charlie engraved on hardboard. Might have to hang that one on the wall.

Now, for my next project . . .

 

 

 

 

 

Tales from the trenches: Fat Bald Famous Person

The story is true. Names have been changed or omitted to protect the guilty.

When I was in the games business I worked on a game that was named after a famous person. Something like Famous Person’s Fantastic Game. I don’t remember the exact circumstance, but one day we on the development team were discussing the user interface with the game publisher’s representative. The topic of the avatar came up and the publisher’s representative said, in effect, “just don’t make Famous Person look bald or fat.”

So of course the first thing our UI guy did the next day was suck the avatar image into Photoshop, remove the hair, and add some extra pounds. We used that version of the program for internal testing for a couple of weeks. We all got a good laugh out of it.

As part of our contract, we had to submit the latest version of the program to the publisher periodically (every three or four weeks, as I recall) for evaluation. The publisher would typically give the build a once-over and then send a copy off to Famous Person’s representatives for … well, for something. Probably to make sure that we got the branding right, and that we didn’t show Famous Person in a bad light. On Friday afternoon, several weeks after our meeting with the publisher’s representative, we dutifully submitted a new version of the program which we assumed the recipient would review the following week.

About an hour after we sent the new build, we got a call from the publisher’s representative. He was in California, two time zones earlier. He had downloaded and installed the new version, and when he started the program the first thing he saw was Fat Bald Famous Person. As you can imagine, he was not at all happy with that. We immediately swapped out the avatar image, made a new build, and sent it off. The publisher’s representative wasn’t happy, but at least he stopped screaming.

We were lucky. Had the publisher’s representative just passed the new build off to Famous Person without first looking at it, the whole deal could have been blown. Famous Person could have canceled his contract with the publisher, who would be well within their rights not only to cancel our development contract but also sue us for causing the loss of the contract with Famous Person. Even if Famous Person didn’t see it, the publisher could have canceled our contract, taken the code, and had somebody else finish the project. Fortunately, our project manager and the publisher’s representative had a good relationship and we just got a stern lecture about doing stupid stuff like that.

Since then I’ve been very careful not to add “funny” messages or images to programs, even in the earliest stages of development. It’s tempting early on to use placeholder text for error messages, for example. Something like, “Uh oh. Something bad happened.” That’s a whole lot easier than trying to come up with a meaningful error message while my brain’s geared towards getting the overall algorithm right. The problem with such an approach is that I have to go back later and add the real error message. Anybody who’s familiar with product development knows that such tasks often fall through the cracks in the last-minute rush to ship. This is especially true when the text to be changed is an error message that might never be seen during testing.

Come to think of it, my primary task on that project included doing some significant user interface work. The user interface included many buttons, each of which required an image. When I told the project manager that I’m not a graphic artist he said, “Just put something there. We’ll have one of the artists create the real buttons.” If you’ve been in the games business, you’re familiar with programmer art, and you probably can imagine what my buttons looked like. Apparently they were “good enough,” though, because the game shipped with my ugly button images. I was appalled. Since then, if somebody tells me to “just put something there,” I make sure that the art I add is so ugly that nobody would even think of shipping the program without first changing it.

Do it right the first time, because it’s quite likely that you won’t have time to or you won’t remember to go back and do it right later.

Evenly distributing items in a list

Imagine that you have many different types of liquids, and you want to mix them in some given ratios. As a simple example, say you have liquid types A, B, and C, and you want to mix them in the ratio of 30 A, 20 B, and 10 C. If you wanted to mix those in a canning jar, you’d probably pour 30 units of A into the jar, then 20 units of B, and then add 10 units of C. If you did that, you could very well find that the mixture is stratified: that the bottom of the jar contains a layer of A, followed by an area where A and B are mixed, a thinner layer of B, an area where B and C are mixed, and then a thin layer of C. If you want things to mix, you have to stir, swirl, shake, or otherwise mix the items.

If you have no way to mix the items after they’ve been added to the container, then you have to change your addition process so that you minimize stratification. One way to do that is to use smaller additions. Rather than pouring all 30 units of A into the jar, followed by all the B and all the C, do it in one-unit increments. So, for example, you’d make one-unit additions in the order [A,B,C,A,B,A] 10 times.

Figuring that out for a mixture of three liquids is pretty easy. Let’s describe that mixture as A(3),B(2),C(1). Now imagine you want to mix seven liquids with these proportions: A(50),B(25),C(12),D(6),E(3),F(2),G(2).

Yeah. It gets messy quick. At first I thought I could create an array of 100 buckets and then, starting with the first item, put an A into every other one. Then but a B into every fourth one. But I ran into a problem real quick because “every fourth” has already been filled with an A. So then I figured I could just put the “every fourth” into positions 5, 9, 13, 17, 21, etc. But then I came to C and I have to put a C into every 8-1/3 item . . .

I stopped at that point because I couldn’t come up with a reasonable set of rules for the general case of mixing three liquids. And without a clear set of rules, I wasn’t even going to attempt writing code. I went to bed the other night frustrated with my inability to make progress.

I don’t even try to understand how my brain works. At some point just before or after I fell asleep, I envisioned three different streams of liquid. One was pouring out of a nozzle that was delivering three gallons per minute, one delivering two gallons per minute, and the last delivering one gallon per minute. The three streams of liquid were merging!

It was an “Aha!” moment. I actually sat straight up in bed. I know how to merge streams. I just need a way to make those proportions look like streams.

I’ve shown a tabular representation of the three-liquid mixture below. The value in the last column, Frequency, means “one every N”. So the value 6 means “one out of every six.”

Liquid Count Frequency
A 3 2
B 2 3
C 1 6

So A would be in positions 2, 4, and 6. B would be in positions 3 and 6. And C would go into position 6. Forget for the moment that position 6 is occupied by three different liquids. Those are the positions we want the liquids to take. Positions 1 and 5 aren’t occupied, but they’ll obviously have to be filled.

If you remember my merging discussions (see article linked above), we built a priority queue (a min-heap) with one item from each stream. Let’s do that with our three items, using a Priority field for ordering. That field is initially set to the Frequency value. So the heap initially contains: [(A,2),(B,3),(C,6)].

Now, remove the first item from the heap and output it. Then add the frequency value to the priority and add the item back to the heap. So after the first item (A) is output, the heap contains: [(B,3),(A,4),(C,6)].

Again, remove the first item, output it, update its priority, and add it back to the heap. The result is [(A,4),(C,6),(B,6)].

If you continue in that fashion, the first six items output are A,B,A,C,B,A.

Given those rules, it was just a few minutes’ work to develop a method that, given an array of counts, returns a sequence that will ensure a reasonable mixture. The OrderItems method below is the result. It uses my DHeap class, which you can download from this blog entry.

    private IEnumerable<int> OrderItems(int[] counts)
    {
        var totalItems = counts.Sum();

        // Create a heap of work items
        var workItems = counts
            .Select((count, i) => new HeapItem(i, count, totalItems));
        var workHeap = new MinDHeap<HeapItem>(2, workItems);

        while (workHeap.Count > 0)
        {
            var item = workHeap.RemoveRoot();
            yield return item.ItemIndex;
            if (--item.Count == 0)
            {
                // all done with this item
                continue;
            }
            // adjust the priority and put it back on the heap
            item.Priority += item.Frequency;
            workHeap.Insert(item);
        }
    }

    private class HeapItem : IComparable<HeapItem>
    {
        public int ItemIndex { get; private set; }
        public int Count { get; set; }
        public double Frequency { get; private set; }
        public double Priority { get; set; }

        public HeapItem(int itemIndex, int count, int totalItems)
        {
            ItemIndex = itemIndex;
            Count = count;
            Frequency = (double)totalItems / Count;
            Priority = Frequency;
        }

        public int CompareTo(HeapItem other)
        {
            if (other == null) return 1;
            var rslt = Priority.CompareTo(other.Priority);
            return rslt;
        }
    }

The counts parameter is an array of integers that define the count of each item type to be delivered. In the case of our simple example, the array would contain [3,2,1]. The values in the returned sequence are indexes into that array. The returned sequence for this example would be [0,1,0,2,1,0].

You’ll need to do some translation, then: first to create the counts array, and then to get the actual items from the indexes returned. Here’s an example.

    var mixLiquids = new char[] {'A', 'B', 'C'};
    var mixCounts = new int[] {3, 2, 1};

    foreach (var index in OrderItems(mixCounts))
    {
        Console.Write(mixLiquids[index] + ",");
    }
    Console.WriteLine();

The output from that will be A,B,A,C,B,A.

The OrderItems method produces a good mixture of items in that it spreads out the liquid additions to minimize stratification, but it might not produce a uniform mixture when some liquids are used much less than others. In my second example, where A, B, and C together make up 87% of the total, liquids D, E, F, and G might not get mixed thoroughly. If I run the code above with my second example, the first additions of F and G don’t occur until the middle of the sequence: around index 45. The result would be that those liquids might not intermix with the first half.

It might be better to push the low-frequency additions towards the front so that they’re mixed well with all the other additions. To do that, set the initial Priority of all items to 0, and make sure that the comparison function (HeapItem.CompareTo) favors the low frequency item if the priorities are equal. In the HeapItem constructor, change the Priority initialization to:

    Priority = 0;

And replace the CompareTo method with this:

public int CompareTo(HeapItem other)
    {
        if (other == null) return 1;
        var rslt = Priority.CompareTo(other.Priority);
        // If priorities are the same, then we want the lowest frequency item.
        return rslt != 0
            ? rslt
            : other.Frequency.CompareTo(Frequency);
    }

With those modifications, the lower-frequency additions are done up front, giving us [C,B,A,A,B,A] for the first example. In the second example, E and F would be added first, and also at index 50 or so.

Or, you might want to put those liquid additions around indexes 25 and 75. To do that, change the Priority initialization to Priority = Frequency/2;.

Whereas I solved this problem specifically for ordering the additions of liquids in a mixture, the OrderItems method is more generally useful. If you have a bunch of different items that you want to spread out evenly, this will do it for you. It’s a simple solution, and the algorithmic complexity is the same as with merging streams: O(n log2 k), where n is the total number of items and k is the number of different item types.

Synthetic biology gone wrong

March 4, 2137

Fire Breathing Hamster Destroys Lab

A five alarm fire at the headquarters of Synthetic biology startup Custom Creature Creations caused one hundred million dollars’ damage and destroyed the entire facility. Five fire fighters were treated at a local hospital for smoke inhalation after helping to extinguish the blaze. No other injuries were reported.

According to Dr. Stanley McEldridge, president and founder of CCC, the company’s technicians were attempting to create a small fire breathing dragon to demonstrate the company’s capabilities at an upcoming trade show. All appeared to be going well. But when technicians arrived this morning they found a hamster in the synthesis tank where they had expected a dragon. They immediately assumed the worst: that a competitor had broken into the facility overnight, stolen the dragon, and replaced it with a hamster.

After notifying security of the break-in, they removed the hamster from the synthesis tank and placed it in a cage located in another part of the building. About an hour later, one of the lab technicians opened the cage to remove the hamster, and received the shock of his life. The hamster, startled by the technician’s hand reaching in to grab him, backed up and, according to the technician, hissed.

“When the hamster hissed, fire came from its mouth and singed my hand.”

Then the hamster’s whiskers caught on fire, followed quickly by the poor creature’s fur. Screaming and still belching fire, the hamster jumped out of his cage and knocked over a large container of ethanol nearby. The flaming hamster ignited the ethanol, which spread across the room.

Investigators are still trying to determine how the fire spread from there, although they do point out that pure oxygen was piped to the room and that if one or more of the valves was leaking, it could have turned what should have been a minor fire into a conflagration.

The real question is how the lab managed to create a fire breathing hamster. Dr. McEldridge and his staff at Custom Creature Creations would not respond to questions, saying that they are reviewing their procedures and will issue a report when their investigation is complete.

Dr. Alfred Swain, a leading opponent of synthetic biology technology, speculates that the cause was faulty sanitation procedures.

“You have to be very careful with this stuff. Any bit of contamination can lead to unexpected and potentially disastrous results. If one of the technicians working on the project was handling his family’s pet hamster before coming to work, and failed to follow the protocols before entering the clean room, you could very well end up with hamster DNA contaminating the experiment. This is just one example of the things that can happen when we try to create new life forms. I have warned about this kind of thing in the past, and I will continue to lobby for the abolition of all synthetic biology experiments.”

Splitting a log

tree1Back in the summer of 2010, this oak tree developed some kind of disease and we had to have it taken down. It probably would have lived a few more years, but it was starting to rot at the base and it was close enough to the house that if a good stiff wind came along it would end up crashing into the house and ruining the new roof and siding. It’s kind of too bad we had to remove it; the tree provided a lot of shade on the south side of the house.

As an aside, I had a heck of a time finding the pictures of this event. For some reason I thought that we took the tree down in 2009, and I thought I’d blogged about it. But I couldn’t find it in the blog, and I couldn’t find the pictures where I thought they should be. I finally decided to check the year 2010. Hey, at least I got the month right.

I call this the Facebook problem. With Facebook, I’m much more likely to post a picture or two and a few paragraphs. Writing a blog entry is more work and doesn’t have the instant gratification of people pressing “Like” or leaving a quick comment. It’s way too easy to make a quick Facebook post and move on. I had to search sequentially through the history to find the old post. Then I discovered how to search my Activity Log . . .

trrunk

Anyway, back to the tree. What was left after they topped it is shown on the left: a 12-foot-tall trunk about two feet in diameter and a fork at the top. That ended up in three pieces, the largest being the bottom seven feet. I spent a couple of days cutting up the larger limbs and putting them in the firewood pile, and grinding up the smaller stuff for mulch. The larger limbs and the two smaller (if you call two and a half feet tall and two feet thick “small”) trunk pieces got stacked around a nearby mesquite tree so I could split them after they dried.

Debra and I, with the help of the lawn tractor, rolled the large trunk out of the way under some other trees. The idea was to let it dry for a few years and then carve it into something. I didn’t know what, but I wanted to try my hand at chainsaw carving.

But the log started to crack quite a bit and I didn’t really know how to prevent or even slow the cracking. So I left the trunk there under the other trees, figuring I’d cut it up for firewood (or BBQ wood) one of these days.

I did make an end table from the top trunk piece. That’s another example of the Facebook problem. I’ll have to post about that here another time.

A couple of weeks ago I got the crazy idea of trying to get usable carving or possibly building wood from that trunk. It’d be kind of cool to mill lumber from that tree and build a table or a small hutch or something. And seeing as how my little electric chainsaw would have some serious trouble getting through that trunk, I decided I’d try to split the log and see if I could get any usable lumber out of it. And, because I’m curious, I thought I’d see if I could split it without using power tools.

I started by driving my steel splitting wedge into the end of the log with a little four pound sledge. That worked well enough: a split formed at the top of the log and there and there were satisfying crackling sounds coming from the log as the fibers split. But then my wedge was stuck.

stuckWedge_sm

I tried making wedges out of oak branches and some scrap 2×4 lumber, but they disintegrated in short order when I tried to drive them into the crack.

A friend who was building a deck a few years ago gave me a bunch of cutoffs: 2×4 and 4×4 pieces that were six to eight inches long. The wood was Ipê: a very hard wood from South America that is commonly used for building decks. I carved a few birds from it, but the rest has been sitting in my shop waiting for me to come up with uses for it. It’s an okay carving wood. It makes excellent splitting wedges, though. A few cuts on the bandsaw and I was back in business.

ipeWedges

 

Then it was a matter of driving a wedge into the log, moving a few inches, driving another wedge, etc. I had enough wedges that by the time I ran out the log had split enough that I could re-use those from the back of the line. I did have to make another trip to the bandsaw for more, though: even the Ipê isn’t indestructible. Between me whacking it with the hammer and the oak resisting splitting, those wedges were only good for two or three uses. I suspect they would have lasted longer if I’d been using a wooden hammer. I might try that if I ever split a log like this again.

When I got to the end of the log it was split most of the way through all along its length, but I didn’t have long enough wedges to complete the job. Debra hurt her finger (nearly crushed it) helping me roll the log over and it was almost dark anyway, so I reluctantly put up my tools (except for the steel wedge that was still stuck in the other end) and called it a night.

almost

 

And that’s how I left it for a week. This evening I cut eight more wedges and used a steel bar as a lever to roll the log over. It didn’t take but about 15 minutes to finish the job of splitting the log into two pieces.

splitThat’s a very strange perspective. Those two pieces really are the same length. The foreground piece is not as wide as the one in back (the log didn’t split exactly evenly down the middle), but they’re absolutely the same length. The picture makes it look like the foreground piece is longer.

You can also see the remains of the Ipê wedges there on the foreground piece. The rest of them are in splinters.

Both of those pieces have large cracks along which I’ll split off pieces, again by hand. I should end up with a about eight 7-foot pieces of wood of varying thickness. I’m hoping that I can get at least one piece that’s four inches square. I know that I will get several pieces that will allow me to cut 2×4 boards, and possibly even some 2×6 pieces. And of course I’ll get lots of stuff that’s one inch or less in thickness.

Once I get the log split into roughly the sized pieces I want, I’ll take them to TechShop and spend some time with the jointer and planer to make lumber. Unless, of course, the log is too far gone. Then I’ll just cut it up and use it for the smoker.

I learned quite a bit in splitting this log. If I had to do another, I could probably do it in half the time. It was pretty interesting going through the learning process, and I have a new appreciation for how people did things before they had the benefit of sawmills that produce nice straight lumber as if by magic. Making your own boards is work.

 

Making boards

Debra surprised me at Christmas with a one-year membership to TechShop, and a gift certificate for five classes. I’ve been wanting to get involved with TechShop for a couple of years, but there were always other priorities.

Since I got into wood carving, I’ve been slowly making my into wood working as well, with the oval table being the most recent of my experiments. I’ve long wanted to make cutting boards and similar things, but haven’t really had the tools necessary to do a good job. TechShop, though, has a full wood shop with table saw, large band saw, router table, jointer, planer, thickness sander, etc. I just had to take a couple of classes on Safety and Basic Use (SBU).

Today I took a couple chunks of mesquite–cutoffs from a tree I had to take down last spring–to the shop and rough cut them into lumber. The logs were about eight inches thick, which is two inches larger than what will fit in my band saw. The first thing I did was cut four slabs off one end. I’m planning to turn these into little cheese boards, hopefully keeping the bark edge.

cheeseBoards_sm

Three of those are 3/4 inch thick. The other is 1/2 inch thick. The dark splotches are actually from moisture. I was surprised at how wet that log was, even after spending the last eight or nine months in my garage. I know that it takes time for wood to dry, but this wood was wet on the inside. Way more moisture than I had expected after that time.

After cutting those slabs, the remaining log is about 14 inches long. The other log, shown here before cutting, was right at 18 inches.

log2

I didn’t take any progress pictures showing how I set up to cut boards from the logs. Briefly:

For cutting the cheese boards, I screwed a scrap piece of 2×6 lumber to the log so that there was a flat and stable base for it to rest on. I took a thin slice to square up the end, and then set the band saw fence to 3/4 inch and cut the three cheese boards. I had planned to cut four that thick, but I goofed when I screwed the 2×6 onto the log; I didn’t leave enough log hanging out. So I had to settle for 1/2 inch on the last one. I could have just sawed through the 2×6 or taken the time to adjust the base. I decided to see if 1/2 inch will be thick enough.

For cutting the boards, I set the scrap 2×6 firmly on the table beside the log, and carefully screwed them together. Doing that provides a steady base so that the log can’t roll to the side when I’m pushing it through the saw. I made one cut of about 3/4 inch to get a good flat side on the log. I then took it over to the jointer and made that perfectly flat.

The picture linked below is one I took a few years back, showing how the board attached to the log works.

Then back to the band saw with the flat base on the table, I took 3/4 inch off one of the sides, took the log back to the jointer, and squared that side up so that I had two perfectly flat sides that were at an angle of exactly 90 degrees with each other.

Back to the band saw, I set the fence one inch away from the blade and with one flat side on the table and the other flat side on the fence, I cut the boards.

I’ve cut lumber on my band saw at home without using a jointer to square up the sides. It works okay, but the boards don’t come out near as close to square as they did today.

So now I have a bunch of rough cut mesquite boards, all one inch thick and with varying widths and lengths. I’ve stacked them in my garage, separated by some scrap wood so that air can circulate, and will let them dry for six or eight months. I figure next fall I’ll be able to make some cutting boards. Although I might get impatient and cut up some of the other wood I have here that’s already dry. Unfortunately, I don’t think I have enough dry mesquite to make a cutting board. I have plenty of other types, though.

The cheese boards won’t take nearly as long to dry. I’ve put them aside, as well, but I expect them to be dry enough for working in less than two months. Possibly even sooner than that. Wood loses its moisture very quickly through the ends, so those 3/4 inch pieces should dry fast. I’ve also considered experimenting with using the oven as a kiln to speed the drying process. I might sacrifice one of the slabs and one of the boards to an experiment . . .

lumber1

stacked

I made a few thinner cuts, as experiments. One of the pieces is a little less than 1/16 inch thick. I’m sure that with a little practice I could reliably cut 1/16 inch veneer, and quite possibly 1/32 inch. That opens up some interesting possibilities.

lumber2

All told, I had a great time playing in the wood shop today. Now I just have to be patient until the wood drys so I can use it.

 

Building an oval table

After having so much fun working with the folks at Sam Bass Community Theatre, I volunteered to help out with their next show: a production of James Lapine’s Table Settings. Rather than acting this time, I’ll be running the lights and sound, and I’m also helping out with set construction.

The primary set piece is a table, and the director wanted something specific: a 4 foot by 8 foot oval table covered with a tablecloth and strong enough that a 200 pound man can stand on it. Feeling adventurous, I volunteered to build the table.

Understand, I’d never really built anything before. Oh, sure, I’ve assembled Ikea furniture, knocked together a few rickety work benches and some barely functional garage shelves, and even trimmed a door or three, but that’s a far cry from creating a large table starting with a plan and a bunch of lumber. But what the heck: you learn by doing, right?

It was cold (35 degrees) this weekend and there’s no heat in my garage, so I elected to construct the table in our master bedroom, which is currently under renovation. That is, it’s torn apart and we haven’t started putting it back together. That’s my next project. I picked up the required materials at Home Depot on Friday evening and Debra helped me carry it through the house to the bedroom. The only thing I really needed help with was a 4×8 sheet of 3/4 inch plywood. The rest of the lumber was a bunch of 2×4’s and one wooden dowel.

table1
I chose to get plain old plywood rather than cabinet grade. No use spending the extra money when it’s going to be covered with a tablecloth. And the tablecloth (somebody else is making that) will reach all the way to the ground, so I didn’t have to spend any effort making the legs look good.

There’s nothing particularly difficult about cutting an oval. I remembered learning how to draw one in geometry class nearly 40 years ago, but I didn’t remember the specifics. YouTube to the rescue. There are about a zillion videos showing how to draw an ellipse using nothing more than a few pins or nails, some string, and pencil. Here’s one that I found to be particularly clear and easy to follow.

It took a couple of tries to get it right because the knot in my string kept slipping. But I managed to get a reasonably accurate ellipse on the plywood. Then it was time to break out the jigsaw.

table2The bite there on the left corner was a test cut. I’ve used a jigsaw maybe twice in my life before this project, and I wanted to make sure I could follow a line. You can see that I goofed on entry to ellipse line (overshot it). I knew that I wouldn’t cut it perfect, and I had already planned to take a belt sander to the edge once I was done with the rough cutout. I just had to do a little more smoothing than I’d originally planned.

Making a smooth cut with the jigsaw requires a fine blade, and patience. Take it slow. Don’t force the saw through the wood. Rather, just guide the saw. Let the blade do the cutting. Also, don’t try to go all the way around in a single cut. Take off smaller segments. Otherwise you risk having the plywood break off and ruin your pretty shape.

Even taking a few breaks for pictures and to stretch out my back (leaning over to guide that saw is uncomfortable), it took less than 30 minutes to complete the cutout.

table3

The completed cutout is 91 inches long and 46 inches wide. Not bad starting from 96×48, although I can’t give a good reason why I didn’t get 94 inches. Oh, well. It’s close enough.

With the top cut out, it was time for the hard part: constructing the base. I chose to modify the base for this simple table because … well, it’s simple. The base is functional, sturdy, and looks easy enough to build with simple tools.

The only modification I made was to the dimensions. My base is 49 inches long and 32 inches wide. That leaves almost two feet of table hanging off each end, but it’s still plenty sturdy. I wouldn’t recommend trying to sit or stand on one of the ends, though. I was a little worried that the center span would be too large and would sag under the weight of 200 pounds standing on it, but The Sagulator says that it’s acceptable.

I won’t detail construction of the base. I followed the directions in the linked article and everything worked out just fine. It just took a long time because I was checking everything multiple times to be sure I wasn’t making a mistake. When I got it all put together, I was a little surprised that the base was level with no wobbles. I guess all that double- and triple-checking paid off.

table4

Attaching the top turned out to be a chore. For some reason I couldn’t get the screws to hold in one corner of the plywood. I futzed with the thing for a while and finally got it to work. I still don’t know what the problem was. I suspect that there was a soft spot in the plywood that kept the screws from biting. Moving the screws a few inches solved the problem. And, as you can see, the table passed the fat guy standing test. I’m smart enough not to try the fat guy bouncing up and down test.

test

The last thing I did was sand the top to remove any splinters and the manufacturer’s printing (including that silly notice telling me that plywood contains chemicals that the State of California has determined to cause cancer; is there any product in existence that doesn’t have one of those warnings on it?), and run a router around the edge. I’ve always disliked how a tablecloth looks hanging over a hard edge. A nice rounded edge makes the cloth drape a lot nicer. Here you can see the difference between the straight edge and the rounded edge.

routerThe completed table should work well for the play, and if they don’t want to keep it afterwards I’ll probably take the base back and attach a rectangular top to use as a workbench. Not sure what I’ll do with the elliptical top.

finishedThis was a fun project. Better, I was able to complete it with tools I already had. As the author of the Simple Table article points out, this project can be completed with a minimum of tools. The only tools I added were the jigsaw, belt sander, and router, and those were for constructing the top. I did use my compound miter saw to cut the legs because my electric circle saw grew legs a few years ago and the battery powered saw couldn’t make it through more than two cuts before crapping out on me. I even had to cut one of the rabbets with a chisel because the battery died and I didn’t want to wait for it to recharge.

If you ever thought of making your own work table, you should give that Simple Table a try. It’s not hard to build, and it’s not like you’d be out a huge investment if you screw it up. 2×4’s are three or four dollars each. For me, it was a great first project and now I’m looking forward to building other things.

 

 

Short term thinking

The price of gas has dropped about $1.50 per gallon in the past couple of months. The other day I paid $1.85 per gallon for regular unleaded. Inflation adjusted, that would be like paying $1.35 in the year 2000. Not an historic low (I paid 95 cents per gallon back in November 2001), but it’s down almost 50% since June.

With that reduction in gas prices, people are already thinking about how to spend their savings. Car dealerships are reporting a large jump in sales recently, and buyers are citing the low price of gas as one reason for their purchases. And it’s not the economy cars that are selling. No, people are buying big ol’ gas guzzlers, conveniently ignoring that the price of gas is volatile and quite likely to climb back to $4.00 per gallon as quickly as it dropped. It might be a year or more before the price goes up, but it will go up. I will have no sympathy for those who, two years from now, complain that they can’t afford the payments on their SUVs or to buy gas to drive the silly things.

Not that I expect people to do anything other than what they’re doing. It seems most people will spend just a little more than they can afford on a car, regardless of what they really need in a car. Why spend only $15,000 on basic transportation when you can spend $30,000 on a cool new whiz bang monster SUV with all the bells and whistles? After all, the finance company wouldn’t let me borrow more than I can afford. Right?

Politicians, too, aren’t afraid to say and do stupid things in response to this latest drop in the price of gas. Democrats and Republicans alike are making noises about instituting a “carbon tax” on gasoline. To the tune of 40 cents per gallon! The argument is that gasoline is under priced, with the price not reflecting the full cost of the product. That is, the damage done to the environment by burning the fuel. One is supposed to believe, I guess, that if such a tax were instituted, the revenue would go towards some method of combating climate change.

The truth is somewhat different. Republicans are looking at an increased gas tax as–wait for it–a means of reducing income taxes. This is one of the best examples of double think that I’ve seen in a long time. Conservatives who historically look at any new tax or reduction in tax deductions are seriously saying that taxing consumption rather than income is a solid “conservative” principle that they’ve been advocating forever.

Now I’m not saying that Democrats would necessarily use that additional revenue to combat climate change. No, they’d be more likely to put forth bills that fund all manner of additional social programs, few of which have any chance of doing anything but making people think Congress is Doing Something About The Problem, and most of which no different than programs that have failed in the past.

It’s all a bunch of short-term thinking. Knowing how Congress works, they would project revenue based on consumption of gas at the current price, without taking into account that consumption decreases as price increases. Adding 40 cents per gallon will immediately reduce consumption, and the inevitable price increase in the next few years will reduce it even more. Any proposed legislation to squander the ill-gotten gains would be dependent on the projected tax revenue, and when that revenue decreases those programs would be under-funded.

What Congress should do is … nothing: let us consumers enjoy this temporary respite from the high price of gas. Let suppliers sort things out, and when demand increases or the Saudis decide they need more money, the price will start going up again. But Congress is money junkie with all the self control of a drug addict. The primary difference being that we prosecute drug addicts but we condone and even encourage Congress’s addiction even though they do way more harm than good.

Environmental groups should concentrate on encouraging more sustainable energy supplies, and ignore the temporary increase in fossil fuel usage. The sooner we burn all of the readily available fossil fuels, the sooner their alternative energy sources will be in demand. If the environmentalists’ projections are right in regards to climate change, a few years’ increased consumption isn’t going to make much of a difference anyway. They might as well spend their limited resources (time and money) on developing alternatives rather than on fighting a losing battle against consumption.

Jim the actor

Back in October I received an email from the Sam Bass Community Theatre, inviting people to audition for a part in their holiday production of The Best Christmas Pageant Ever. I had no particular desire to act, but I went to volunteer as a stage hand or some such. They gratefully accepted my offer, and a few weeks later called to tell me they were ready for me to start.

I had spoken to the stage manager briefly and was waiting for instructions when the director approached me and asked if I’d consider taking over the role of an actor who had to leave unexpectedly. Having read the script, I knew that the role she was offering had only one scene with five lines of dialogue. She didn’t even ask me to audition; just gave me a script and pushed me backstage.

Seeing as how I only had that one scene, I also played stage hand: doing some minor set construction, arranging the set before each performance and during intermission, and riding herd on 20 children aged from about eight to sixteen. I thought it would be a struggle not to strangle a couple of those kids before the play had finished its run, but once they got into costume, they were very well behaved. Even young children know when it’s time to quit being childish and get to work.

Debra volunteered, too, but her schedule was already pretty full. She did manage to paint a custom backdrop based on the director’s sketch. That turned out great.

backdrop

Even though it’d been nearly 50 years since the last time I was in a play (I was Dr. Dolittle in a production for my kindergarten class back in … 1967), stage fright wasn’t a problem. I’ve done enough public speaking that getting up in front of an audience of 50 people isn’t particularly daunting. Memorizing five lines of dialogue and delivering it reasonably well isn’t terribly difficult. By the time the first performance rolled around, I’d done the scene often enough that I felt very comfortable with it. Even so, I did manage to fumble a line during one or two performances. Fortunately, the woman with whom I did the scene is much more experienced than I and was able to make it look like nothing was amiss.

It was a big time commitment: about three hours a night for four or five nights per week, for about eight weeks. But I can’t remember the last time I had so much fun. I especially can’t remember the last time I had that much fun without having to spend any money. And I learned a bit about theater, more about myself, and met a lot of good people who I hope to work with again. I’ve already volunteered to help as a stage hand with their next production, and I’m planning to audition for a part in one of their upcoming productions in the spring.

It was altogether a thoroughly enjoyable experience. If you’ve ever wanted to give acting a try, find when your local community theater is having auditions and go audition! You don’t need any experience. They like new faces and inexperienced actors. Many of the people who produce and direct at community theater are teachers at heart. If you go in with a sincere desire to learn, they will welcome you and help you learn to act and to appreciate the art of theatre.

I can’t say enough good things about the director, stage manager, the other actors, and everybody else who I met there. Every one of them was friendly, welcoming, and helpful. The theatre is the people, and the people I met at Sam Bass Community Theatre are among the best ever.

 

Merge code posted

I’ve uploaded the source code for my Merge LINQ extensions. This also includes the source for my DHeap class and for the TopBy extension method.

The code is supplied in a .zip file and includes a Visual Studio 2013 project file. If you have an earlier version of Visual Studio, you probably won’t be able to read the project file, but it’s easy enough to recreate one from the source files provided.

The code is all MIT License, meaning you can do with it what you like. I request that you let me know if it’s useful to you, but there’s no requirement.

Download: EnumerableExtensions.zip

Testing merge performance

Getting back to my discussion of merging multiple lists.

Testing relative performance of the different merge algorithms turned out to be more involved than I had originally thought it would be. I found the results somewhat surprising.

The short version is that in a normal procedural implementation, the pairwise merge is faster than the heap based merge, except in rare cases of many lists with very large differences in the lists’ lengths. But when it comes to LINQ implementation, the heap based merge is faster due to complications in the way that multiple criteria are handled.

I found the first result very surprising because the heap-based merge is provably optimum from a theoretical standpoint. But it turns out that managing the heap is expensive compared to just iterating lists. Even more surprising is that the difference in speed increases as the number of lists grows. That is, when merging five lists that are the same size the pairwise merge is perhaps 10% faster than the heap based merge. When merging 10 lists, it will be 15 to 20 percent faster. I found that result surprising.

At first I thought the difference was because my heap based merge implementation made use of my generic DHeap class, which adds some overhead. So I rewrote the merge to use a custom heap that’s optimized for this application. That made the heap merge faster, but not faster enough to overtake the pairwise merge.

The relative speed of the two implementations is somewhat sensitive to the keys being compared. As key comparisons become more expensive the heap based merge begins to outperform the pairwise merge. When merging strings, for example, the heap based merge is closer to the same speed as the pairwise merge. With very expensive comparisons, the heap based merge is clearly the faster of the two.

The reason for this is simple. The runtime complexity of the algorithms is based on the number of comparisons. The heap based merge will never do more than n log k comparisons, so when comparison speed becomes the limiting factor the algorithm that does the fewest comparisons will be the fastest.

After testing performance of the algorithms as standard method calls, I began creating LINQ-callable methods called MergeBy and PairwiseMergeBy. The heap based merge conversion was straightforward and doesn’t look fundamentally different from the MergeWithBy extension method that I presented a while back.

The pairwise merge works by creating a merging network that does multiple merges in parallel. That is, given eight lists to merge, labeled ListA through ListH, it creates the following merge operations:

Merge A and B to create AB
Merge C and D to create CD
Merge E and F to create EF
Merge G and H to create GH
Merge AB and CD to create ABCD
Merge EF and GH to create EFGH
Merge ABCD and EFGH to create ABCDEFGH

It creates an inverted tree of merges that looks like this:


    A     B   C     D   E     F   G     H
    |     |   |     |   |     |   |     |
    +--+--+   +--+--+   +--+--+   +--+--+
       |         |         |         |
       +----+----+         +----+----+
            |                   |
            +---------+---------+
                      |
                    Output

So I have seven different merges running concurrently. The first problem is that this requires twice as much memory as the heap based merge. The heap based merge requires memory proportional to the number of lists (N) times the number of keys (K). That is, N*K memory.

The pairwise merge has N-1 merges running in parallel, each of which requires (2 * K). So the pairwise merge memory requirement is (2 * (N-1) * K) for the keys, and (N – 1) for the iterators. That’s not a huge deal, because the number of lists and the number of keys are both typically small. If we were merging millions of lists this might be a concern. Still, the additional memory is a drawback.

The bigger problem turned out to be structuring the code so that it worked and was understandable. If you’ve been following along, you know that implementing the IOrderedEnumerable interface is complicated enough. Adding this complexity on top of it made things really convoluted. And by the time I got it all working, the pairwise merge just wasn’t consistently faster enough to justify the complexity. So I junked the idea. The final code uses the heap based merge.

The resulting MergeBy method is not an extension method. Rather, it’s a static method in the LinqExtensions namespace.

    public static IOrderedEnumerable MergeBy(
        IEnumerable> lists,
        Func keyExtractor,
        IComparer comparer = null);

There is also a MergeByDescending method, and you can use the ThenBy and ThenByDescending methods to add multiple merge keys.

I’ll publish the final code in the next posting. That will contain all of the merging code I’ve developed in this series.

Digital computers in an analog world

One of the machines our software controls builds oligonucleotides by pushing various chemicals through a fine glass mesh. To do that, the machine dispenses the chemical into a common line and then pumps it to the destination. Logically, the machine looks like this:


                                  |--------|                     |-----------------|
      +-----------+------+--------+- PUMP -+----+---------+------*---- Column ---- |----+
      |           |      |        |--------|    |         |      |-----------------|    |
    Neutral       A      B                      |         |                             |
                                                |         |                             |
                          +-------+-------+-----+         +-----------------------------+---- Waste
                          |       |       |
                        Argon     S       O

Flow is from left to right. The chemicals at the top are delivered by turning the pump. If we want to push 100 microliters of A through the column, we first open the Neutral valve and run the pump until the entire line through the column is filled with the Neutral solution. Then, the software closes the Neutral valve and switches the output to target waste rather than the column. Then the A valve is opened and the pump is turned long enough to inject 100 ?l into the line. Then, the output valve is switched to target the column, valve A is closed, Neutral is opened, and the pump turns to push the chemical through the column.

It’s a little more complicated than what I describe, of course, but that’s the gist of it. Pump delivery is relatively easy because we know the length (more importantly, the volumes) of each tube segment, and we know very precisely how much liquid is moved with each revolution of the pump. If we want 100 ?l and the pump moves 20 ?l per revolution, then we run the pump for five revolutions. The pump is operated by a stepper motor, so we can vary the pump speed by controlling the rate at which we issue step commands to the motor. All we have to do is tell the pump controller, “do X revolutions in Y seconds.” The controller takes care of it and reports back when it’s finished. If we want to deliver 20 ?l in three minutes, we tell the pump to do one revolution in 3,000 milliseconds. The motor controller figures out how many microsteps that is, and sends the motor pulses as required.

For reasons I won’t go into, some chemicals are better delivered by pushing with an inert gas (Argon, in this case). Each of the bottles on the bottom is sourced from a bottle that’s under positive pressure. If the S valve is opened, liquid starts flowing, and continues to flow until the valve is closed. We know that at our standard operating pressure, the liquid will flow at a rate of about one microliter every seven milliseconds. So if we want to inject 20 ?l of S, we open the S valve for 140 milliseconds. We can then close the S valve and open the Argon valve long enough to push the chemical through the column.

The problem comes when we want to vary the speed of Argon delivery. With the pump, all we have to do is slow the pump speed. But we can’t vary the bottle pressure so we have to toggle the Argon valve to simulate pumping. And that’s where things get kind of tricky.

Imagine, for example, that you want to deliver 20 ?l of S smoothly over three seconds, rather than at the standard rate of 140 ms. If we could switch the valve on and off quickly enough, that wouldn’t be too difficult. We already know that we can pump 1/7 ?l in a millisecond, so if we could toggle the valve every millisecond we could have 140 pulses of 1 ms each, with a 20 ms wait between each pulse. That would deliver in 2,940 ms.

Unfortunately, those valves don’t turn on and off instantly. It takes about 25 milliseconds for a valve to open after it’s energized. And it takes another 25 milliseconds for the valve to close completely after we cut power to it. So we can’t switch the valve faster than once every 50 milliseconds. And during at least part of that time the valve is partially open. So some liquid flows before the 25 ms is up. We can measure that, of course, and we could calibrate each valve if we wanted to. In practice, that calibration isn’t all that helpful. We can do almost as well by taking an average of all the valves in the system.

If you’re wondering how long 50 milliseconds is, a typical eye blink is between 300 and 400 milliseconds. So, whereas you can blink your eyes maybe three times per second, I can toggle these valves on and off 20 times per second.

The other problem is that the computer controlling the valve doesn’t have infinite resolution. Wait precision is typically about five milliseconds. If I want a seven millisecond wait, I’ll typically get something between 7 and 12 ms.

All those uncertainties mean that if I want to open the valve for what I think is 7 ms to dispense 1 ?l, it takes 57 ms and I could end up with 2 ?l flowing from the valve. This is what we end up calibrating. The tube volume from the farthest Argon-propelled valve to the waste is, say, 4,000 ?l. We’ll clear the line (blow it out with Argon), and then start toggling the S valve in one-?l pulses until liquid flows from the waste. If it takes 2,500 pulses, then we know that one pulse dispenses an average of 1.6 ?l. If it takes 5,000 pulses, we know that one pulse is, on average, 0.8 ?l. That’s the number we’ll use in our calculations.

Interfacing with the real world presents a lot of challenges, but it’s incredibly rewarding. I worked on this machine’s software for several months before I got to see one of them in operation; all my work had been against a simulator. It was very rewarding when I finally got to watch one of these machines work in response to user input, knowing that my software had converted the user’s commands into commands that make the machine switch valves and pump liquid through the system. Controlling hardware like that is the most rewarding type of programming I’ve ever done. I did it years ago, and I’m re-discovering just how much fun it can be.

A closer look at pairwise merging

When I started writing the performance test code, I got sidetracked with yet another way to merge multiple sorted lists. While I was analyzing it I realized that the pairwise merge might have a problem, as well. It’s worth talking about here because knowing of the problem’s existence will help me create better test data when I finally get around to performance testing.

To review, assuming four lists named A, B, C, and D, the pairwise merge works like this:

    Merge A and B to produce list AB
    Merge C and D to produce list CD
    Merge AB and CD to produce list ABCD

You can achieve the same result without a queue:

    Merge A and B to produce AB
    Merge AB and C to produce ABC
    Merge ABC and D to produce ABCD

Whereas they both require the same number of merges, not all merges are equal.

We can estimate the speed of these algorithms by figuring out how many items they process. We already know that merging two lists that have among them n items requires processing n items. So if we sum the number of items for each two-way merge, we have the total number of items processed for the entire operation.

If we assume that each of the lists has 10 items, then the first example requires:

20 items to merge A and B. The resulting list AB has 20 items.
20 items to merge C and D. The resulting list CD has 20 items.
40 items to merge AB and CD.
80 total items processed.

The second example requires:

20 items to merge A and B. The resulting list AB has 20 items.
30 items to merge AB and C. The resulting list ABC has 30 items.
40 items to merge ABC and D.
90 total items processed.

In this case, the queue-based algorithm is faster.

But not in all cases. Consider, for example, what happens when the lists’ lengths are:

    A = 1
    B = 1
    C = 1
    D = 100

The queue merge will look at two items to make AB, 101 items to make CD, and 103 items to make ABCD. That’s a total of 206 items processed. The other algorithm will look at two items to make AB, three items to make ABC, and 103 items to make ABCD, for a total of 108 items processed. If you switch the order, though, the queue merge will still process 203 items but the other version will have to look at 101 + 102 + 103, or 308 items.

The order of merges matters. A lot. Change that 100 to 1,000,000, and you start to get the idea.

If we know the lengths of the lists to be merged, we could potentially order the pairwise merges to take advantage of that information. The idea would be to order the merges so that we create lists that are relatively the same length. It wouldn’t have to be a perfect solution in order to realize a huge performance increase. A simple priority queue ordered by increasing list length would allow us to merge the shorter lists first, which would give a reasonably good solution to the problem.

The only drawback–and it’s a deal killer–is that we don’t know how long the lists are. All of the merge code I’ve been working with assumes that the lists are of indeterminate length.

The result is that worst cases can happen. If we use the queue-based pairwise merge, at some point we’ll be presented with a pathological case that makes the merge perform very poorly. In those cases, the running time isn’t O(n log2 k), but rather O(n*k). Ouch.

In contrast, the heap-based merge always takes O(n log2 k) time. The reason is that every item is inserted into the heap once, and every item is removed from the heap once, regardless of the individual lists’ length.

Asymptotic analysis, then, tells us that the heap-based merge is better because it’s always O(n log2 k), whereas the pairwise merge can be O(n*k). Actual running time is a different story. If maintaining the heap is very expensive in comparison to maintaining the enumerators, then the pairwise merge can still outperform the heap-based merge.

The pairwise merge looks like it should be faster in the average case, and even in pathological cases with relatively few items. I suspect that it will be slower than the heap-based merge with many large, unevenly-sized lists.

Don’t take my word for it, though. I’ve been wrong often enough to distrust my gut feelings when it comes to performance. The only way to know for sure which is faster is to measure it.

A different way to merge multiple lists

In my previous post, I said that this time I would show full-featured MergeBy and MergeByDescending methods. Before I do that, I want to show an alternate method for merging multiple sorted lists.

If you recall, I said that the generalized k-way merge algorithm is:

    Initialize list indexes
    while any list has items
        Determine which list has the smallest current item
        Output the item from that list
        Increment that list's current index

As I illustrated last time, that does indeed work. But there is another way to do it that is, asymptotically at least, just as fast. That is, it operates in O(n log2 k) time.

Imagine you have four sorted lists, named A, B, C, and D. Another way to merge them is this way:

    Merge A and B to create a list called AB.
    Merge C and D to create a list called CD.
    Merge AB and CD to create the final list.

To estimate the time complexity, let’s assume that each of the four lists contains the same number of items. So if the total number of items is n, then each list’s length is n/4.

Merging A and B, then, takes time proportional to n/4 + n/4, or n/2. Merging C and D also requires n/2 time. The final merge of AB and CD takes n/2 + n/2. So the total amount of time it takes to merge the four lists is 2n.

Time complexity of the k-way merge is, as I said, O(n log2 k). With four lists, that’s n * log2(4). log2(4) = 2, so the time to merge four lists with a total of n items is n * 2, or 2n.

If there is an odd number of lists, then one of the original lists gets merged with one of the larger, already merged, lists. For example, with five lists the task becomes:

    Merge A and B to create a list called AB.
    Merge C and D to create a list called CD.
    Merge AB and E to create a list called ABE.
    Merge ABE and CD to create the final list.

To do that with an arbitrary number of lists, we create a queue that initially contains the original lists. We then remove the first two lists from the queue, merge them, and add the result back to the queue. Then take the next two, merge them, and add the result to the queue. We continue doing that until there is only one list left on the queue. That is:

    Initialize queue with all of the lists.
    while queue has more than one item
        l1 = queue.Dequeue()
        l2 = queue.Dequeue()
        rslt = Merge(l1, l2)
        queue.Enqueue(rslt)
    merged = queue.Dequeue()

At first, it seems like that method will require a lot of extra space to hold the temporary lists. Remember, the merge algorithms I’ve shown so far don’t require much in the way of extra storage: just a little memory to hold the smallest value in each of the lists. That is, O(k) extra space. Perhaps not surprisingly, you can do the same thing with LINQ. Here’s the code.

    public static IEnumerable<T> MergeVersionTwo<T>(IEnumerable<IEnumerable<T>> lists)
    {
        // Populate a queue with all the lists.
        var theQueue = new Queue<IEnumerable<T>>(lists);
        if (theQueue.Count == 0) yield break;

        // Do pair-wise merge repeatedly until there is only one list left. 
        while (theQueue.Count > 1)
        {
            var l1 = theQueue.Dequeue();
            var l2 = theQueue.Dequeue();
            var rslt = Merge(l1, l2); // uses the two-way merge
            theQueue.Enqueue(rslt);
        }
        var merged = theQueue.Dequeue();
        foreach (var x in merged)
        {
            yield return x;
        }
    }

That code uses the two-way merge that I presented a while back.

When building the queue, we’re just setting things up. Everything is an IEnumerable<T>, meaning that no actual work takes place. LINQ’s deferred execution postpones the merging until we start to enumerate the result. And when we do enumerate the result, only one item is produced at a time. All of the intermediate merges take place simultaneously, in steps.

Single-stepping through a merge is very instructive. Here’s some code that uses it:

    var l1 = new int[] {1, 3, 9, 12, 15};
    var l2 = new int[] {2, 7, 14, 16, 19};
    var l3 = new int[] {6, 8, 13, 17, 20};
    var rslt = MergeVersionTwo(new List<int[]> {l1, l2, l3});

    foreach (var i in rslt)
        Console.WriteLine(i);

If you wrap that in a method and single-step it, you’ll see that no real work is done until you get to the foreach. Even then, all that mucking about with the queue doesn’t enumerate any of the lists until you get to the foreach at the end of the MergeVersionTwo method. If you start single-stepping (in Visual Studio, press F11, or the “Step into” debugger command), you’ll see it start working on the two-way merges. Watching how all that works will help you to understand just how powerful deferred execution can be.

It’s important to note that asymptotic analysis just tells us how the run time varies with the size of the input. It doesn’t say that both of the k-way merge algorithms will take the same amount of time. n log2 k just says that for a particular algorithm the amount of time varies with the size of the input and the logarithm of the number of lists. We’ll have to do some performance tests to determine which is actually faster.

That testing should prove interesting. Before we go there, though, we need to look more closely at the pairwise merge.

 

Merging multiple lists with LINQ

To merge multiple sorted lists efficiently, we need a better way to obtain the smallest value from among the lists’ current items. That is, given three lists:

    A: 1, 8, 13, 19, 40
    B: 0, 4, 12, 41
    C: 3, 22, 30, 37, 43

The first item to output would be the 0 from list B. Then we’d output 1 from list A, 3 from list C, etc. In my previous post I showed one way to find the smallest value. That involved looking at the current item from each list to get the minimum. It works, but it’s very expensive. If you have k sorted lists, then it requires O(k) operations to determine the smallest item.

I spent a lot of time last year talking about heaps. The heap, as you recall, is a data structure that efficiently keeps track of the smallest (or largest, depending on how you structure it) item. In particular, you can remove the smallest item from the heap in O(log2 k) time, where k is the number of items in the heap. Also, insertion into the heap is an O(log2 k) operation.

If there are n items spread out over k lists, then the naive method takes O(n * k) time, and the method with the heap requires O(n log2 k) time. Using a heap can save you a lot of time, even when k is small.

That sounds simple enough, but there’s one catch: you can’t just store values in the heap. With the value, you have to store a reference to the list itself. Otherwise you don’t know which list index to increment. Still, it’s not terribly difficult. With C#, we could use a Tuple<T1, T2>, or a simple object. Tuple is convenient, but I prefer creating a class for this kind of thing:

    class HeapItem
    {
        int[] a;  // the array
        int ix;   // current index
    }

Given that, and assuming we have a heap data structure, merging multiple lists becomes much easier. Following the basic algorithm from last time, the almost-C# code looks like this:

    // Initialize the heap
    var heap = new MinHeap();
    foreach (array in arrays)
        if (array.Length > 0)
            heap.Insert(new HeapItem(a = array; ix = 0);
    
    while (!heap.IsEmpty)
    {
        HeapItem item = heap.RemoveSmallest();
        result.Add(item.a[item.ix]);  // output item
        item.ix++;                    // move to next item in this array
        // and put it back into the heap if there are still items
        if (item.ix < item.a.Length)
            heap.Insert(item);
    }

I glossed over the heap implementation here. The code assumes that you have a heap implementation that will work with those HeapItem instances. I do in fact have such a thing, and we’ll use it in a bit.

Note that there is no need to explicitly check for the end of all lists. All of the lists’ indexes are initialized at 0. When a list is removed from the heap, its current item is output and then its index is incremented. If the new current index is beyond the end of the list, then the list isn’t added back to the heap. If we drop lists from the heap once they’ve been exhausted, then we know that there are still items as long as the heap is not empty. That simplifies things quite a bit.

The problem with the above is that it assumes we’re working with arrays of integers. What we really want is a generic Merge method that works with any type that implements IEnumerable<T>. That is:

    IEnumerable<T> Merge<T>(IEnumerable<IEnumerable<TSource>> lists);

The basic idea isn’t much different from what I presented in Merging sequences with LINQ. But rather than having only two enumerators and doing a simple comparison, I create a heap of enumerators. It looks like this:

    public static IEnumerable<T> Merge<T>(IEnumerable<IEnumerable<T>> lists)
    {
        // This builds a heap of the first items from each list
        var initialItems =
            lists.Select(l => l.GetEnumerator())
                .Where(i => i.MoveNext());

        var heap = new MinDHeap<IEnumerator<T>>(
            2, initialItems, new EnumeratorComparer<T>(Comparer<T>.Default));

        while (!heap.IsEmpty)
        {
            var i = heap.RemoveRoot();
            yield return i.Current;
            if (i.MoveNext())
            {
                heap.Insert(i);
            }
        }
    }

That code relies on an EnumeratorComparer, which is defined as:

    private class EnumeratorComparer<T> : IComparer<IEnumerator<T>>
    {
        private readonly IComparer<T> _comp;

        public EnumeratorComparer(IComparer<T> comparer)
        {
            _comp = comparer ?? Comparer<T>.Default;
        }

        public int Compare(IEnumerator<T> x, IEnumerator<T> y)
        {
            var rslt = _comp.Compare(x.Current, y.Current);
            return rslt;
        }
    }

It also depends on my DHeap, the code for which you can download from that page.

One of the cool things about using enumerators here is that the concept of the current item is built in. The logic of incrementing and checking for the end of the list is all there in the MoveNext method and the Current property. So the code to implement the merge is incredibly simple.

I should mention the initialization code, which consists of this LINQ expression:

   var initialItems =
        lists.Select(l => l.GetEnumerator())
             .Where(i => i.MoveNext());

That code is just getting the enumerators for each of the lists, positioning to the first item in the list, and adding it to the list of enumerators (which is subsequently inserted into the heap) if the list has items. It’s roughly equivalent to:

    foreach (var list in lists)
    {
        var i = list.GetEnumerator();
        if (i.MoveNext)
            initialItems.Add(i);
    }

That’s the simple method that merges multiple lists using the default comparison. That’s useful in itself, but it doesn’t provide all of the features that I included in the MergeWithBy extension. Next time, we’ll look at extending this code to provide full-featured MergeBy and MergeByDescending methods.

Actually, I went off on a tangent and presented a different way to merge multiple lists. I’ll get to the full-featured methods after exploring that.

Merging multiple sorted lists

The process of merging multiple (i.e. more than two) sorted sequences is a simple extension of merging just two sorted sequences. The two-way merge algorithm I posted last month included an optimization that I shouldn’t have introduced quite so soon. That optimization complicates the algorithm a bit. The simplest algorithm for merging two sorted lists looks like this:

    while ((not end of List A) and (not end of List B))
        if (end of List A)
            output List B current item
            advance List B index
        else if (end of List B)
            output List A current item
            advance List A index
        else if (List A current item <= List B current item)
            output List A current item
            advance List A index
        else
            output List B current item
            advance List B index

That does the same thing as what I posted before, but does it all in a single loop. My previous version is slightly more efficient because once it reaches the end of one list it just outputs the remainder of the other. Once List A is exhausted, there’s no need to keep checking it.

You could extend that algorithm for three, four, or more lists, but the logic gets complicated in a hurry. What we really need is a generalized k-way merge algorithm that can merge items from any number of sorted lists. That algorithm, simply stated, is:

    Initialize list indexes
    while any list has items
        Determine which list has the smallest current item
        Output the item from that list
        Increment that list's current index

One way to implement that would be with an array of lists, and a corresponding array of current index pointers. The first cut of such a thing might look something like this:

    // merges multiple integer arrays
    private static int[] MergeInts(List<int[]> arrays)
    {
        var result = new List<int>();
        // Initialize indexes
        var indexes = new int[arrays.Count];
        for (int i = 0; i < arrays.Count; ++i)
        {
            indexes[i] = 0;
        }

        // Function returns true if we've reached the end of all the arrays
        var endOfAllArrays = new Func<bool>(() =>
        {
            for (int i = 0; i < indexes.Length; ++i)
            {
                if (indexes[i] < arrays[i].Length)
                    return false;
            }
            return true;
        });

        while (!endOfAllArrays())
        {
            // Find the array with the smallest current item
            int smallestItem = int.MaxValue;
            int smallestArray = 0;
            for (int i = 0; i < indexes.Length; ++i)
            {
                if (indexes[i] < arrays[i].Length)
                {
                    if (arrays[i][indexes[i]] <= smallestItem)
                    {
                        smallestArray = i;
                        smallestItem = arrays[i][indexes[i]];
                    }
                }
            }

            // output that item
            result.Add(smallestItem);

            // advance the index on that array
            ++indexes[smallestArray];
        }
        return result.ToArray();
    }

That code implements the basic algorithm exactly. It’s not especially pretty, but it gets the job done. It’s also horribly inefficient because for every time through the loop it looks at all of the current items to determine which is the smallest. The complexity of this particular implementation is O(n*k), where n is the total number of items in all lists, and k is the number of lists. If you’re merging 10 lists that combined contain n items, it’s going to take twice as long to merge than if you have the same items spread out over just 5 lists.

What we need is a faster way to find the list that has the smallest current item. I mentioned in my first article (linked above) that the k-way merge is can be done in O(n log2 k) time complexity. That makes a big difference, even for small values of k. The base 2 logarithm of 4, for example, is 2, meaning that if we can improve the selection algorithm, we can merge four lists in half the time.

What data structure do we know that can return the smallest item in O(log2 k) time? The heap, maybe? It’s a surprisingly useful data structure.

Next time we’ll look at replacing the array of arrays in the implementation above with a heap. That will reduce the time required to select the smallest item, and it will also eliminate the time required to determine if we’re at the end of the list.

Removing duplicates

Imagine you have two lists of names in alphabetical order. The first list is of people who drive from Point A to Point B on Monday, and the second list is of people who drove that route the on Tuesday. You want to build a list of people who drove that route on either day, and you don’t want any duplicates. So if Joe, Mary, and Sam drove on Monday and Joe, Mary, and Ralph drove on Tuesday, then your list would contain [Joe, Mary, Ralph, Sam].

My MergeWithBy LINQ extension can easily merge the two lists to give [Joe, Joe, Mary, Mary, Ralph, Sam], but it doesn’t remove duplicates. I could make it remove duplicates easily enough by adding another parameter to the method, but it seems more useful to have a separate method that can remove duplicates from any sorted sequence.

LINQ’s existing Distinct method can can do it, but it has at least two drawbacks:

  • Because it works on unsorted sequences as well, it has to hold all of the items in memory. This limits the size of sequences you can operate on.
  • It requires that I define an equality comparer if I want to do custom comparisons.

The first is a problem because I often work with very large lists that won’t fit into memory. The second is an annoyance because if I want to sort, merge, and then uniquify, I end up with two different meanings of equality: one defined by the IComparer I pass to OrderBy and MergeWithBy, and one defined by the IEqualityComparer implementation that I pass to Distinct. In addition, the syntax is different. That is, to merge and then uniquify:

    var merged = List1.MergeWithBy(List2, k => k.Age).ThenBy(k => k.LastName);
    var unique = merged.Distinct(new ItemComparerByAgeAndName());

I’d much rather have the second line be:

    var unique = merged.UniqueBy(k => k.Age).ThenBy(k => k.LastName);

Removing duplicates from an ordered sequence is easy enough. For example, this code de-dupes an ordered list, using the passed comparer.

    IEnumerable<TSource> Unique(IEnumerable<TSource> theList, IComparer<TSource> comparer)
    {
        var it = theList.GetEnumerator();
        var hasItems = it.MoveNext();
        if (!hasItems)
            yield break;
        
        yield return it.Current;
        var prev = it.Current;
        hasItems = it.MoveNext();
        while (hasItems)
        {
            if (comparer.Compare(prev, it.Current) != 0)
            {
                yield return it.Current;
                prev = it.Current;
            }
            hasItems = it.MoveNext();
        }
    }

Note that I used an IComparer<T> rather than an IEqualityComparer<T> here, which is different from what Distinct uses. I did that because the next step is to implement a LINQ extension, called UniqueBy, that works in the same way that OrderBy and my MergeWithBy methods work, including support for ThenBy and ThenByDescending.

I’m not going to spend any time describing how the UniqueBy and related methods work. The inner workings are sufficiently similar to the previously covered MergeWithBy that going over it again would be needless repetition. See the article I linked at the top of this post if you want more information.

UniqueBy will be faster than Distinct, and will use a lot less memory. Like my MergeWithBy, it only needs enough extra memory to store three items at any one time. But it only works on ordered sequences. If you want to uniquify an unordered list, you need to call Distinct, or you need to sort it first before calling UniqueBy. But Distinct will be faster if you have an unsorted list that fits in memory.

As I showed above, using the new method is really easy: it’s just like using OrderBy or my TopBy method.

One question that comes up is whether there is any benefit to calling UniqueBy before merging sequences. That is, which of these two is better?

    var foo = L1.UniqueBy(k => k).Merge(L2.UniqueBy(k => k), k => k).UniqueBy(k => k);

OR:

    var foo = L1.Merge(L2, k => k).UniqueBy(k => k)

It should be clear that the final UniqueBy is required in the first example. Otherwise you wouldn’t remove duplicates from the merged list. You’d only remove duplicates from each of the input lists. But if “Joe” existed in both lists then “Joe” would be output twice.

You’re better off with the second example because it’s simpler, produces the same result as the first, will use slightly less memory, and will be ever slightly faster than the first example unless each of the input lists has a high percentage of duplicate items. Even in pathological cases, though, the difference in execution speed will likely be so small as to be irrelevant. So you should stick with the simpler code.

Whereas merging two sorted lists into a single sorted list and potentially removing duplicates is the most common type of merge I’ve used or seen used, it’s certainly not the only merge topic. Merging is a technique that’s been used in transaction processing systems for at least 50 years. I’ll likely come back to it in the future.

Following is complete code for the UniqueBy extension.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;

namespace EnumerableExtensions
{
    public static class UniqueExtension
    {
        public static IOrderedEnumerable<TSource> UniqueBy<TSource, TKey>(
            this IEnumerable<TSource> list1,
            Func<TSource, TKey> keySelector)
        {
            return UniqueBy(list1, keySelector, null);
        }

        public static IOrderedEnumerable<TSource> UniqueBy<TSource, TKey>(
            this IEnumerable<TSource> list1,
            Func<TSource, TKey> keySelector,
            IComparer<TKey> comparer)
        {
            return new UniqueOrderedEnumerable<TSource, TKey>(list1, keySelector, comparer, false);
        }

        public static IOrderedEnumerable<TSource> UniqueByDescending<TSource, TKey>(
            this IEnumerable<TSource> list1,
            Func<TSource, TKey> keySelector)
        {
            return UniqueByDescending(list1, keySelector, null);
        }

        public static IOrderedEnumerable<TSource> UniqueByDescending<TSource, TKey>(
            this IEnumerable<TSource> list1,
            Func<TSource, TKey> keySelector,
            IComparer<TKey> comparer)
        {
            return new UniqueOrderedEnumerable<TSource, TKey>(list1, keySelector, comparer, true);
        }

        internal abstract class UniqueOrderedEnumerable<TSource> : IOrderedEnumerable<TSource>
        {
            private readonly IEnumerable<TSource> _list1;
            internal UniqueOrderedEnumerable<TSource> Parent;

            protected UniqueOrderedEnumerable(
                IEnumerable<TSource> list1)
            {
                _list1 = list1;
            }

            public IOrderedEnumerable<TSource> CreateOrderedEnumerable<TKey>(
                Func<TSource, TKey> keySelector,
                IComparer<TKey> comparer,
                bool @descending)
            {
                var oe = new UniqueOrderedEnumerable<TSource, TKey>(
                    _list1, keySelector, comparer, descending) { Parent = this };
                return oe;
            }

            public IEnumerator<TSource> GetEnumerator()
            {
                var criteria = GetCriteria().ToArray();

                var selector = new UniqueSelector<TSource>(_list1, criteria);
                return selector.DoSelect().GetEnumerator();
            }

            // Walks the ordering criteria to build an array that the HeapSelector can use.
            private IEnumerable<UniqueOrderedEnumerable<TSource>> GetCriteria()
            {
                var keys = new Stack<UniqueOrderedEnumerable<TSource>>();

                var current = this;
                while (current != null)
                {
                    keys.Push(current);
                    current = current.Parent;
                }
                return keys;
            }

            IEnumerator IEnumerable.GetEnumerator()
            {
                return GetEnumerator();
            }

            // The individual ordering criteria instances (UniqueOrderedEnumerable<TElement, TKey>)
            // implement these abstract methods to provice key extraction, comparison, and swapping.
            public abstract void ExtractKey(TSource item, int ix);
            public abstract int CompareKeys(int x, int y);
        }

        internal class UniqueOrderedEnumerable<TSource, TKey> : UniqueOrderedEnumerable<TSource>
        {
            private readonly Func<TSource, TKey> _keySelector;
            private readonly IComparer<TKey> _comparer;
            private readonly bool _descending;
            private readonly TKey[] _keys;

            internal UniqueOrderedEnumerable(
                IEnumerable<TSource> list1,
                Func<TSource, TKey> keySelector,
                IComparer<TKey> comparer,
                bool descending)
                : base(list1)
            {
                _keySelector = keySelector;
                _comparer = comparer ?? Comparer<TKey>.Default;
                _descending = descending;

                _keys = new TKey[2];
            }

            public override int CompareKeys(int x, int y)
            {
                return _descending
                    ? _comparer.Compare(_keys[y], _keys[x])
                    : _comparer.Compare(_keys[x], _keys[y]);
            }

            public override void ExtractKey(TSource item, int ix)
            {
                _keys[ix] = _keySelector(item);
            }
        }

        internal class UniqueSelector<TSource>
        {
            private readonly IEnumerable<TSource> _list1;
            private readonly UniqueOrderedEnumerable<TSource>[] _criteria;

            public UniqueSelector(
                IEnumerable<TSource> list1,
                UniqueOrderedEnumerable<TSource>[] criteria)
            {
                _list1 = list1;
                _criteria = criteria;
            }

            public IEnumerable<TSource> DoSelect()
            {
                // Initialize the iterator
                var i1 = _list1.GetEnumerator();

                // ix controls the key position that's loaded
                var ix = 0;

                var next = new Func<bool>(() =>
                {
                    if (!i1.MoveNext()) return false;
                    ExtractKeys(i1.Current, ix);
                    return true;
                });

                var i1HasItems = next();
                if (!i1HasItems) yield break;

                // output the first item
                yield return i1.Current;
                ix = 1;
                i1HasItems = next();
                while (i1HasItems)
                {
                    // Output the item if it's not equal to the previous item
                    if (Compare(0, 1) != 0)
                    {
                        yield return i1.Current;

                        // Change the index position for the next loaded key
                        ix = (ix == 0) ? 1 : 0;
                    }
                    i1HasItems = next();
                }
            }

            private int Compare(int x, int y)
            {
                var rslt = 0;
                for (var i = 0; rslt == 0 && i < _criteria.Length; ++i)
                {
                    rslt = _criteria[i].CompareKeys(x, y);
                }
                return rslt;
            }

            private void ExtractKeys(TSource item, int ix)
            {
                foreach (var t in _criteria)
                {
                    t.ExtractKey(item, ix);
                }
            }
        }
    }
}

Thoughts after the election

If you believe the after election commentary, the American people have “repudiated the policies of this administration and embraced conservative ideals.” Republicans are saying that gaining control of the Senate and increasing their control of the House is “a mandate” from the American people. In a sense it is, but almost certainly not in the way that they apparently believe and, more importantly, want you to believe.

For the president’s opposing party to take control of Congress during midterm elections is nothing new. Republicans did it in 1994, and Democrats did it in 2006. Historically, it’s unusual for the opposing party not to gain seats during the midterm. The media pundits and party propaganda twits will come up with all kinds of complicated reasons, often based on the belief that the winning party lied or cheated, but I think the real reason is much simpler: voters are expressing their discontent by throwing the bums out.

Throwing the bums out is a good thing as far as I’m concerned. Unfortunately, they just elect a new crop of bums and it’s business as usual.

The American electorate has an incredibly short memory and an even more limited imagination. They’re smart enough to seee that things aren’t working well, and know that we need a change in Congress. But their imagination is limited to one of two flavors: Democrat or Republican. They’ll happily throw out the Democrats in favor of the Republicans, despite having been burned by the Republicans eight or ten years ago. And at some time in the near future they’ll throw the Republicans out in favor of the Democrats, forgetting that things weren’t any better the last time Democrats controlled Congress.

For some unfathomable reason, Americans lack the imagination to throw out Democrats and Republicans. That would send the right message. As it stands, all we’re doing is swapping one crowd of blood sucking parasites for another.

I liken it to being given the choice of having your face slapped repeatedly or getting punched in the gut repeatedly. We get tired of slaps after a while and switch to the gut punchers, but then our stomachs start to hurt and we go for the face slapping again. But what we really want is for people to stop hitting us. And yet we don’t have the imagination or the will to do anything about it.

In part, that’s because we long ago allowed Congress to make rules that enforce a very strict two-party system. The party that has the most seats in the House or Senate gets to make rules and control what legislation is presented. That in itself encourages an adversarial relationship, which is especially bad when the President’s party controls one house and the opposing party controls the other. In that case, the opposing party is forced to do whatever it can to block the president’s every move. To do otherwise would alienate their base and anybody else who might be discontented enough to vote for them during the next election cycle.

When the president’s party controls both houses of Congress, we’re in real trouble. Especially when they hold a super majority that can completely block every move of the opposing party. In that case, there is no opposition to whatever grandiose scheme the president’s party can dream up. We usually regret such laws that are enacted without careful consideration and lively debate. Giving a single party full control of two of our three branches of government is dangerous. So far we’ve been fortunate that the parties aren’t quite as tightly controlled as they could be. It’s a good thing that sometimes a party member will vote against the wishes of the party leaders.

We’re best off when one party controls the Executive branch and the other party controls the Legislative. In that case, the two parties are forced to work together. When one party controls both houses of Congress, the people who elected them expect them to Get Things Done. Sure, some of the party faithful think Congress should adopt an adversarial posture towards the president, but that leads to idiocy like the Clinton impeachment trial. Most of the people will want Congress to work with the president and enact meaningful legislation. Or, one would hope, repeal stupid legislation that was imposed on us when one party had a little too much power.

If Republicans can work with the president over the next two years (one year, really, because the second year will be dedicated to the mud slinging and political posturing we call campaigning), Republicans have a chance of gaining the White House again, and perhaps can keep from losing too much ground in the Congressional elections. If, however, they adopt an adversarial role and refuse to work with the president, the 2016 elections will be a repeat of 2008.

What I’d really like to see, though, is a meaningful third party or, preferably, a serious independent (i.e. no party affiliation) movement. Our system of party politics, especially the artificial two party system, is a serious impediment. It just encourages tribalism and perpetuates the dysfunctional governance we’ve experienced for the last 25 years or more.

Timers don’t keep track of time

In computer programming, the word Timer is used to describe a hardware or software component that raises an event (“ticks”) at a specified interval. So, for example, if you want a notification once per second, you would instantiate a timer and set its period to one second. The timer API often has you specify the period in milliseconds, so you’d say 1,000 milliseconds.

In a C# program, for example, you can create a timer that ticks every second, like this:


    private System.Threading.Timer _timer;
    private int _tickCount;
    public void Test()
    {
        _timer = new System.Threading.Timer(MyTimerHandler, null, 1000, 1000);
        Console.ReadLine();
    }

    private void MyTimerHandler(object state)
    {
        ++_tickCount;
        Console.WriteLine("{0:N0} tick", _tickCount);
    }

It’s important to note that ticks don’t come at exact 1-second intervals. The timer is reasonably accurate, but it’s not perfect. How imperfect is it? Let’s find out. Below I’ve modified the little program to start a Stopwatch and output the actual elapsed time with each tick.

    private System.Threading.Timer _timer;
    private Stopwatch _stopwatch;
    private int _tickCount;
    public void Test()
    {
        _stopwatch = Stopwatch.StartNew();
        _timer = new System.Threading.Timer(MyTimerHandler, null, 1000, 1000);
        Console.ReadLine();
    }

    private void MyTimerHandler(object state)
    {
        ++_tickCount;
        Console.WriteLine("{0:N0} - {1:N0}", _tickCount, _stopwatch.ElapsedMilliseconds);
    }

On my system, that program shows the timer to be off by about 200 milliseconds every three minutes. That’s on a system that isn’t doing anything else of note: no video playing, no downloads, video streaming, or heavy duty background jobs. 200 milliseconds every three minutes doesn’t sound like much, but that’s one second every fifteen minutes, or 96 seconds every day. My granddad’s wind-up Timex kept better time than that.

Granted, the problem could be with the Stopwatch class. But it isn’t. I created a simple program that starts a Stopwatch and then outputs the elapsed time whenever I press the Enter key. I let that program run for days, and every time I hit Enter, it agreed very closely with what my wristwatch said. There was some variation, of course, because no two clocks ever agree perfectly, but the difference between Stopwatch and my wristwatch were a lot smaller than the differences between Stopwatch and the Timer. I have since run that experiment on multiple computers and multiple clocks, and every time the Stopwatch has been quite reliable.

Things get worse. On a heavily loaded system with lots of processing going on, timer ticks can be delayed by an indeterminate time. I’ve seen ticks delayed for five seconds or more, and then I get a flood of ticks. I’ll see, for example, a tick at 5,000 milliseconds, then one at 9,000 milliseconds followed very quickly by ticks at 9,010, 9015, and 9,030 milliseconds. What happened is that the system was busy and couldn’t schedule the timer thread at 6,000 milliseconds, so the system buffered the ticks. When it finally got around to dispatching, three other ticks had come in and it fired all four of them as quickly as it could.

This can be a huge problem on heavily loaded systems because it’s possible that multiple timer ticks can be processing concurrently.

The only thing a timer is good for is to give you notification on a periodic basis–approximately at the frequency you requested. A timer pointedly is not for keeping time.

Given that, you can probably explain what’s wrong with this code, which is a very simplified example of something I saw recently. The original code did some processing and periodically checked the _elapsedSeconds variable to see how much time had elapsed. I simplified it here to illustrate the folly of that approach.

private Timer _timer;
    private int _elapsedSeconds;
    private ManualResetEvent _event;
    private void Test()
    {
        _timer = new Timer(MyTimerHandler, null, 1000, 1000);
        _event = new ManualResetEvent(false);
        _event.WaitOne();
        Console.WriteLine("One minute!");
        _timer.Dispose();
    }
    
    private void MyTimerHandler(object state)
    {
        ++_elapsedSeconds;
        if (_elapsedSeconds == 60)
        {
            _event.Set();
        }
    }

Here, the programmer is using the timer like the second hand of a clock: each tick represents one second. There are at least two problems here. First, we’ve already seen that the timer won’t tick at exactly one-second intervals. Second and more importantly, it’s possible on a busy system for multiple timer events to occur concurrently. Imagine what happens in this case:

  1. _elapsedSeconds is equal to 59.
  2. A tick occurs and Thread A is dispatched to handle it.
  3. Thread A increments the _elapsedSeconds value to 60.
  4. Another tick occurs and Thread B is dispatched to handle it.
  5. At the same time, Thread A is swapped out in favor of some higher priority thread.
  6. Thread B increments the _elapsedSeconds value to 61.

At this point, the _elapsedSeconds value has been incremented beyond 60, so the conditional will never be true and the event will never be set. (Well, it might: 4 billion seconds from now, when _elapsedSeconds rolls over. That’s only about 125 years.)

You could change the conditional to _elapsedSeconds >= 60, but that’s missing the point. We’ve already shown that the timer isn’t going to be particularly accurate. If you were trying to time a whole day, you could be off by a minute and a half.

The problem of concurrent execution of timer handlers is another topic entirely that I probably should cover in a separate post. It’s important you understand that it can happen, though, and you have to keep that in mind.

In programming, timers don’t keep time. Don’t try to make them. They can’t count up reliably, and they can’t count down reliably. If you want to keep track of elapsed time, start a Stopwatch and then check its Elapsed or ElapsedMilliseconds properties when necessary.

On a related note, do not use the clock to measure time.

The MergeWithBy LINQ extension

Last time, I showed how to write the standard two-way merge using LINQ. With that code, you can easily merge two sorted sequences from any type of container that implements the IEnumerable<T> interface. That in itself is very useful, and with the addition of a parameter that defines the comparison function, it likely would suffice for most merge operations. However, it’s less than satisfactory for the same reasons I pointed out in my introduction to the TopBy method earlier this year: sometimes creating an IComparer is a giant pain in the neck. To make a LINQ-compatible merge capability, I really need to implement IOrderedEnumerable.

I’ve struggled with what to call the method. Obvious candidates are Merge, MergeBy, MergeWith, and MergeWithBy, but each of those has its drawbacks. Consider, for example, the possibilities:

    var merged = List1.Merge(List2, k => k.Age).ThenBy(k => k.LastName);
    var merged = List1.MergeBy(List2, k => k.Age).ThenBy(k => k.LastName);
    var merged = List1.MergeWith(List2, k => k.Age).ThenBy(k => k.LastName);
    var merged = List1.MergeWithBy(List2, k => k.Age).ThenBy(k => k.LastName);

I’m not 100% happy with any one of those options but after struggling with it for a while I decided on MergeWithBy which, although a little clunky to the ear (MergeWithByDescending in particular), is the most descriptive. I’m merging one list with another by the specified ordering criteria.

With the naming out of the way, the easy part is defining the LINQ extension methods. These exactly mirror the standard OrderBy methods as well as my TopBy methods.

    public static class MergeExtension
    {
        public static IOrderedEnumerable<TSource> MergeWithBy<TSource, TKey>(
            this IEnumerable<TSource> list1,
            IEnumerable<TSource> list2,
            Func<TSource, TKey> keySelector)
        {
            return MergeWithBy(list1, list2, keySelector, null);
        }

        public static IOrderedEnumerable<TSource> MergeWithBy<TSource, TKey>(
            this IEnumerable<TSource> list1,
            IEnumerable<TSource> list2,
            Func<TSource, TKey> keySelector,
            IComparer<TKey> comparer)
        {
            return new MergeOrderedEnumerable<TSource, TKey>(list1, list2, keySelector, comparer, false);
        }

        public static IOrderedEnumerable<TSource> MergeWithByDescending<TSource, TKey>(
            this IEnumerable<TSource> list1,
            IEnumerable<TSource> list2,
            Func<TSource, TKey> keySelector)
        {
            return MergeWithByDescending(list1, list2, keySelector, null);
        }

        public static IOrderedEnumerable<TSource> MergeWithByDescending<TSource, TKey>(
            this IEnumerable<TSource> list1,
            IEnumerable<TSource> list2,
            Func<TSource, TKey> keySelector,
            IComparer<TKey> comparer)
        {
            return new MergeOrderedEnumerable<TSource, TKey>(list1, list2, keySelector, comparer, true);
        }
    }

The harder part is implementing the IOrderedEnumerable<T> interface that actually does the work. The idea behind it, as I described in my two-part series on IOrderedEnumerable, is to create, for each of the ordering criteria, a class instance that can compare the relevant key. There is a base class, MergeOrderedEnumerable<TSource<, and a derived class for each key type: MergeOrderedEnumerable<TSource, TKey>.

I covered the details of how those two classes work in the article linked above, and in the second part. For the most part, those two classes are just bookkeeping: setting things up so that a third class, the selector, can do the actual work. The full source of the merge extension class is shown at the end of this post.

The guts of the merge–the part that implements the Merge method I showed last timeMergeSelector.DoSelect method, shown here:

    public IEnumerable<TSource> DoSelect()
    {
        // Initialize the iterators
        var iterators = new IEnumerator<TSource>[2];

        var next = new Func<int, bool>(ix =>
        {
            if (!iterators[ix].MoveNext()) return false;
            ExtractKeys(iterators[ix].Current, ix);
            return true;
        });

        iterators[0] = _list1.GetEnumerator();
        iterators[1] = _list2.GetEnumerator();
        var i1HasItems = next(0);
        var i2HasItems = next(1);
        while (i1HasItems && i2HasItems)
        {
            if (Compare(0, 1) <= 0)
            {
                yield return iterators[0].Current;
                i1HasItems = next(0);
            }
            else
            {
                yield return iterators[1].Current;
                i2HasItems = next(1);
            }
        }

        while (i1HasItems)
        {
            yield return iterators[0].Current;
            i1HasItems = next(0);
        }

        while (i2HasItems)
        {
            yield return iterators[1].Current;
            i2HasItems = next(1);
        }
    }

This code is nearly a direct translation of the Merge I showed the last time. The primary difference in the structure of the code is that I put the iterators in an array so that I could reduce the amount of duplicated code. The next function advances to the next item in whichever list is specified. Because I have to extract keys that are maintained by the individual MergeOrderedEnumerable<TSource, TKey> instances, without that function I’d have to write code like this four times (twice for i1, and twice for i2):

    yield return i1.Current;
    i1.HasItems = i1.MoveNext();
    if (i1.HasItems)
    {
        ExtractKeys(i1.Current, 0);
    }

I suppose I could have combined the code:

    if ((i1.HasItems = it.MoveNext())) ExtractKeys(i1.Current, 0);

Call it a matter of style.

That’s really all there is to it. The MergeOrderedEnumerable classes look unnecessarily complex at first glance, but after you study them for a few minutes, you understand why all that code is there. It’s particularly instructive to set up a short merge of a few items, and then single-step the code. Not only will you see how all this stuff works together, you’ll gain a better understanding of how LINQ works in general.

That’s almost all there is to merging two sequences. The only thing remaining is the matter of uniqueness, which I’ll talk about next time in Removing duplicates.

    namespace EnumerableExtensions
    {
        public static class MergeExtension
        {
            public static IOrderedEnumerable<TSource> MergeWithBy<TSource, TKey>(
                this IEnumerable<TSource> list1,
                IEnumerable<TSource> list2,
                Func<TSource, TKey> keySelector)
            {
                return MergeWithBy(list1, list2, keySelector, null);
            }

            public static IOrderedEnumerable<TSource> MergeWithBy<TSource, TKey>(
                this IEnumerable<TSource> list1,
                IEnumerable<TSource> list2,
                Func<TSource, TKey> keySelector,
                IComparer<TKey> comparer)
            {
                return new MergeOrderedEnumerable<TSource, TKey>(list1, list2, keySelector, comparer, false);
            }

            public static IOrderedEnumerable<TSource> MergeWithByDescending<TSource, TKey>(
                this IEnumerable<TSource> list1,
                IEnumerable<TSource> list2,
                Func<TSource, TKey> keySelector)
            {
                return MergeWithByDescending(list1, list2, keySelector, null);
            }

            public static IOrderedEnumerable<TSource> MergeWithByDescending<TSource, TKey>(
                this IEnumerable<TSource> list1,
                IEnumerable<TSource> list2,
                Func<TSource, TKey> keySelector,
                IComparer<TKey> comparer)
            {
                return new MergeOrderedEnumerable<TSource, TKey>(list1, list2, keySelector, comparer, true);
            }
        }

        internal abstract class MergeOrderedEnumerable<TSource> : IOrderedEnumerable<TSource>
        {
            private readonly IEnumerable<TSource> _list1;
            private readonly IEnumerable<TSource> _list2;
            internal MergeOrderedEnumerable<TSource> Parent;

            protected MergeOrderedEnumerable(
                IEnumerable<TSource> list1,
                IEnumerable<TSource> list2)
            {
                _list1 = list1;
                _list2 = list2;
            }

            public IOrderedEnumerable<TSource> CreateOrderedEnumerable<TKey>(
                Func<TSource, TKey> keySelector,
                IComparer<TKey> comparer,
                bool @descending)
            {
                var oe = new MergeOrderedEnumerable<TSource, TKey>(
                    _list1, _list2, keySelector, comparer, descending) {Parent = this};
                return oe;
            }

            public IEnumerator<TSource> GetEnumerator()
            {
                var criteria = GetCriteria().ToArray();

                var selector = new MergeSelector<TSource>(_list1, _list2, criteria);
                return selector.DoSelect().GetEnumerator();
            }

            // Walks the ordering criteria to build an array that the HeapSelector can use.
            private IEnumerable<MergeOrderedEnumerable<TSource>> GetCriteria()
            {
                var keys = new Stack<MergeOrderedEnumerable<TSource>>();

                var current = this;
                while (current != null)
                {
                    keys.Push(current);
                    current = current.Parent;
                }
                return keys;
            }

            IEnumerator IEnumerable.GetEnumerator()
            {
                return GetEnumerator();
            }

            // The individual ordering criteria instances (MergeOrderedEnumerable<TElement, TKey>)
            // implement these abstract methods to provice key extraction, comparison, and swapping.
            public abstract void ExtractKey(TSource item, int ix);
            public abstract int CompareKeys(int x, int y);
        }

        internal class MergeOrderedEnumerable<TSource, TKey> : MergeOrderedEnumerable<TSource>
        {
            private readonly Func<TSource, TKey> _keySelector;
            private readonly IComparer<TKey> _comparer;
            private readonly bool _descending;
            private readonly TKey[] _keys;

            internal MergeOrderedEnumerable(
                IEnumerable<TSource> list1,
                IEnumerable<TSource> list2,
                Func<TSource, TKey> keySelector,
                IComparer<TKey> comparer,
                bool descending)
                : base(list1, list2)
            {
                _keySelector = keySelector;
                _comparer = comparer ?? Comparer<TKey>.Default;
                _descending = descending;

                _keys = new TKey[2];
            }

            public override int CompareKeys(int x, int y)
            {
                return _descending
                    ? _comparer.Compare(_keys[y], _keys[x])
                    : _comparer.Compare(_keys[x], _keys[y]);
            }

            public override void ExtractKey(TSource item, int ix)
            {
                _keys[ix] = _keySelector(item);
            }
        }

        internal class MergeSelector<TSource>
        {
            private readonly IEnumerable<TSource> _list1;
            private readonly IEnumerable<TSource> _list2;
            private readonly MergeOrderedEnumerable<TSource>[] _criteria;

            public MergeSelector(
                IEnumerable<TSource> list1,
                IEnumerable<TSource> list2,
                MergeOrderedEnumerable<TSource>[] criteria)
            {
                _list1 = list1;
                _list2 = list2;
                _criteria = criteria;
            }

            public IEnumerable<TSource> DoSelect()
            {
                // Initialize the iterators
                var iterators = new IEnumerator<TSource>[2];

                var next = new Func<int, bool>(ix =>
                {
                    if (!iterators[ix].MoveNext()) return false;
                    ExtractKeys(iterators[ix].Current, ix);
                    return true;
                });

                iterators[0] = _list1.GetEnumerator();
                iterators[1] = _list2.GetEnumerator();
                var i1HasItems = next(0);
                var i2HasItems = next(1);
                while (i1HasItems && i2HasItems)
                {
                    if (Compare(0, 1) <= 0)
                    {
                        yield return iterators[0].Current;
                        // I could do a loop here using ExtractCompare to compare against
                        // item 2. That would reduce the key extraction as long as the
                        // new item from list 1 is smaller than the new item from list 2.
                        // Then extract all of the keys once l1 goes high.
                        // Lots of code that might not be particularly useful.
                        i1HasItems = next(0);
                    }
                    else
                    {
                        yield return iterators[1].Current;
                        i2HasItems = next(1);
                    }
                }

                while (i1HasItems)
                {
                    yield return iterators[0].Current;
                    // TODO: Could add an "extract" parameter to the next function
                    // If "extract" is false, it doesn't extract the keys.
                    // That would speed up the tail copying,
                    // but might not be worth the trouble.
                    i1HasItems = next(0);
                }

                while (i2HasItems)
                {
                    yield return iterators[1].Current;
                    i2HasItems = next(1);
                }
            }

            private int Compare(int x, int y)
            {
                var rslt = 0;
                for (var i = 0; rslt == 0 && i < _criteria.Length; ++i)
                {
                    rslt = _criteria[i].CompareKeys(x, y);
                }
                return rslt;
            }

            private void ExtractKeys(TSource item, int ix)
            {
                foreach (var t in _criteria)
                {
                    t.ExtractKey(item, ix);
                }
            }
        }
    }