Evenly distributing items in a list

Imagine that you have many different types of liquids that you want to mix in some given ratios. As a simple example, say you have liquid types A, B, and C, and you want to mix them in the ratio of 30 A, 20 B, and 10 C. If you wanted to mix those in a canning jar, you’d probably pour 30 units of A into the jar, then 20 units of B, and then add 10 units of C. If you did that, you could very well find that the mixture is stratified: that the bottom of the jar contains a layer of A, followed by an area where A and B are mixed, a thinner layer of B, an area where B and C are mixed, and then a thin layer of C. If you want things to mix, you have to stir, swirl, or shake the jar.

If you have no way to mix the items after they’ve been added to the container, then you have to change your addition process so that you minimize stratification. One way to do that is to use smaller additions. Rather than pouring all 30 units of A into the jar, followed by all the B and all the C, do it in one-unit increments. So, for example, you’d make one-unit additions in the order [A,B,C,A,B,A] 10 times.

Figuring that out for a mixture of three liquids is pretty easy. Let’s describe that mixture as A(3),B(2),C(1). Now imagine you want to mix seven liquids with these proportions: A(50),B(25),C(12),D(6),E(3),F(2),G(2).

Yeah. It gets messy quick. At first I thought I could create an array of 100 buckets and then, starting with the first item, put an A into every other one. Then but a B into every fourth one. But I ran into a problem real quick because “every fourth” has already been filled with an A. So then I figured I could just put the “every fourth” into positions 5, 9, 13, 17, 21, etc. But then I came to C and I have to put a C into every 8-1/3 item . . .

I stopped at that point because I couldn’t come up with a reasonable set of rules for the general case of mixing three liquids. And without a clear set of rules, I wasn’t even going to attempt writing code. I went to bed the other night frustrated with my inability to make progress.

I don’t even try to understand how my brain works. At some point just before or after I fell asleep, I envisioned three different streams of liquid. One was pouring out of a nozzle that was delivering three gallons per minute, one delivering two gallons per minute, and the last delivering one gallon per minute. The three streams of liquid were merging!

It was an “Aha!” moment. I actually sat straight up in bed. I know how to merge streams. I just need a way to make those proportions look like streams.

I’ve shown a tabular representation of the three-liquid mixture below. The value in the last column, Frequency, means “one every N”. So the value 6 means “one out of every six.”

LiquidCountFrequency
A32
B23
C16

So A would be in positions 2, 4, and 6. B would be in positions 3 and 6. And C would go into position 6. Forget for the moment that position 6 is occupied by three different liquids. Those are the positions we want the liquids to take. Positions 1 and 5 aren’t occupied, but they’ll obviously have to be filled.

If you remember my merging discussions (see article linked above), we built a priority queue (a min-heap) with one item from each stream. Let’s do that with our three items, using a Priority field for ordering. That field is initially set to the Frequency value. So the heap initially contains: [(A,2),(B,3),(C,6)].

Now, remove the first item from the heap and output it. Then add the frequency value to the priority and add the item back to the heap. So after the first item (A) is output, the heap contains: [(B,3),(A,4),(C,6)].

Again, remove the first item, output it, update its priority, and add it back to the heap. The result is [(A,4),(C,6),(B,6)].

If you continue in that fashion, the first six items output are A,B,A,C,B,A.

Given those rules, it was just a few minutes’ work to develop a method that, given an array of counts, returns a sequence that will ensure a reasonable mixture. The OrderItems method below is the result. It uses my DHeap class, which you can download from this blog entry.

    private IEnumerable<int> OrderItems(int[] counts)
    {
        var totalItems = counts.Sum();

        // Create a heap of work items
        var workItems = counts
            .Select((count, i) => new HeapItem(i, count, totalItems));
        var workHeap = new MinDHeap<HeapItem>(2, workItems);

        while (workHeap.Count > 0)
        {
            var item = workHeap.RemoveRoot();
            yield return item.ItemIndex;
            if (--item.Count == 0)
            {
                // all done with this item
                continue;
            }
            // adjust the priority and put it back on the heap
            item.Priority += item.Frequency;
            workHeap.Insert(item);
        }
    }

    private class HeapItem : IComparable<HeapItem>
    {
        public int ItemIndex { get; private set; }
        public int Count { get; set; }
        public double Frequency { get; private set; }
        public double Priority { get; set; }

        public HeapItem(int itemIndex, int count, int totalItems)
        {
            ItemIndex = itemIndex;
            Count = count;
            Frequency = (double)totalItems / Count;
            Priority = Frequency;
        }

        public int CompareTo(HeapItem other)
        {
            if (other == null) return 1;
            var rslt = Priority.CompareTo(other.Priority);
            return rslt;
        }
    }

The counts parameter is an array of integers that define the count of each item type to be delivered. In the case of our simple example, the array would contain [3,2,1]. The values in the returned sequence are indexes into that array. The returned sequence for this example would be [0,1,0,2,1,0].

You’ll need to do some translation, then: first to create the counts array, and then to get the actual items from the indexes returned. Here’s an example.

    var mixLiquids = new char[] {'A', 'B', 'C'};
    var mixCounts = new int[] {3, 2, 1};

    foreach (var index in OrderItems(mixCounts))
    {
        Console.Write(mixLiquids[index] + ",");
    }
    Console.WriteLine();

The output from that will be A,B,A,C,B,A.

The OrderItems method produces a good mixture of items in that it spreads out the liquid additions to minimize stratification, but it might not produce a uniform mixture when some liquids are used much less than others. In my second example, where A, B, and C together make up 87% of the total, liquids D, E, F, and G might not get mixed thoroughly. If I run the code above with my second example, the first additions of F and G don’t occur until the middle of the sequence: around index 45. The result would be that those liquids might not intermix with the first half.

It might be better to push the low-frequency additions towards the front so that they’re mixed well with all the other additions. To do that, set the initial Priority of all items to 0, and make sure that the comparison function (HeapItem.CompareTo) favors the low frequency item if the priorities are equal. In the HeapItem constructor, change the Priority initialization to:

    Priority = 0;

And replace the CompareTo method with this:

public int CompareTo(HeapItem other)
    {
        if (other == null) return 1;
        var rslt = Priority.CompareTo(other.Priority);
        // If priorities are the same, then we want the lowest frequency item.
        return rslt != 0
            ? rslt
            : other.Frequency.CompareTo(Frequency);
    }

With those modifications, the lower-frequency additions are done up front, giving us [C,B,A,A,B,A] for the first example. In the second example, E and F would be added first, and also at index 50 or so.

Or, you might want to put those liquid additions around indexes 25 and 75. To do that, change the Priority initialization to Priority = Frequency/2;.

Whereas I solved this problem specifically for ordering the additions of liquids in a mixture, the OrderItems method is more generally useful. If you have a bunch of different items that you want to spread out evenly, this will do it for you. It’s a simple solution, and the algorithmic complexity is the same as with merging streams: O(n log2 k), where n is the total number of items and k is the number of different item types.

Synthetic biology gone wrong

March 4, 2137

Fire Breathing Hamster Destroys Lab

A five alarm fire at the headquarters of synthetic biology startup Custom Creature Creations caused one hundred million dollars’ damage and destroyed the entire facility. Five fire fighters were treated at a local hospital for smoke inhalation after helping to extinguish the blaze. No other injuries were reported.

According to Dr. Stanley McEldridge, president and founder of CCC, the company’s technicians were attempting to create a small fire breathing dragon to demonstrate the company’s capabilities at an upcoming trade show. All appeared to be going well. But when technicians arrived this morning they found a hamster in the synthesis tank where they had expected a dragon. They immediately assumed the worst: that a competitor had broken into the facility overnight, stolen the dragon, and replaced it with a hamster.

After notifying security of the break-in, they removed the hamster from the synthesis tank and placed it in a cage located in another part of the building. About an hour later, one of the lab technicians opened the cage to remove the hamster, and received the shock of his life. The hamster, startled by the technician’s hand reaching in to grab him, backed up and, according to the technician, hissed.

“When the hamster hissed, fire came from its mouth and singed my hand.”

Then the hamster’s whiskers caught on fire, followed quickly by the poor creature’s fur. Squealing and still belching fire, the hamster jumped out of his cage and knocked over a large container of ethanol nearby. The flaming hamster ignited the ethanol, which spread across the room.

Investigators are still trying to determine how the fire spread from there, although they do point out that pure oxygen was piped to the room and that if one or more of the valves was leaking, it could have turned what should have been a minor fire into a conflagration.

The real question is how the lab managed to create a fire breathing hamster. Dr. McEldridge and his staff at Custom Creature Creations would not respond to questions, saying that they are reviewing their procedures and will issue a report when their investigation is complete.

Dr. Alfred Swain, a leading opponent of synthetic biology technology, speculates that the cause was faulty sanitation procedures.

“You have to be very careful with this stuff. Any bit of contamination can lead to unexpected and potentially disastrous results. If one of the technicians working on the project was handling his family’s pet hamster before coming to work, and failed to follow the protocols before entering the clean room, you could very well end up with hamster DNA contaminating the experiment. This is just one example of the things that can happen when we try to create new life forms. I have warned about this kind of thing in the past, and I will continue to lobby for the abolition of all synthetic biology experiments.”