Evenly distributing items in a list

Imagine that you have many different types of liquids that you want to mix in some given ratios. As a simple example, say you have liquid types A, B, and C, and you want to mix them in the ratio of 30 A, 20 B, and 10 C. If you wanted to mix those in a canning jar, you’d probably pour 30 units of A into the jar, then 20 units of B, and then add 10 units of C. If you did that, you could very well find that the mixture is stratified: that the bottom of the jar contains a layer of A, followed by an area where A and B are mixed, a thinner layer of B, an area where B and C are mixed, and then a thin layer of C. If you want things to mix, you have to stir, swirl, or shake the jar.

If you have no way to mix the items after they’ve been added to the container, then you have to change your addition process so that you minimize stratification. One way to do that is to use smaller additions. Rather than pouring all 30 units of A into the jar, followed by all the B and all the C, do it in one-unit increments. So, for example, you’d make one-unit additions in the order [A,B,C,A,B,A] 10 times.

Figuring that out for a mixture of three liquids is pretty easy. Let’s describe that mixture as A(3),B(2),C(1). Now imagine you want to mix seven liquids with these proportions: A(50),B(25),C(12),D(6),E(3),F(2),G(2).

Yeah. It gets messy quick. At first I thought I could create an array of 100 buckets and then, starting with the first item, put an A into every other one. Then but a B into every fourth one. But I ran into a problem real quick because “every fourth” has already been filled with an A. So then I figured I could just put the “every fourth” into positions 5, 9, 13, 17, 21, etc. But then I came to C and I have to put a C into every 8-1/3 item . . .

I stopped at that point because I couldn’t come up with a reasonable set of rules for the general case of mixing three liquids. And without a clear set of rules, I wasn’t even going to attempt writing code. I went to bed the other night frustrated with my inability to make progress.

I don’t even try to understand how my brain works. At some point just before or after I fell asleep, I envisioned three different streams of liquid. One was pouring out of a nozzle that was delivering three gallons per minute, one delivering two gallons per minute, and the last delivering one gallon per minute. The three streams of liquid were merging!

It was an “Aha!” moment. I actually sat straight up in bed. I know how to merge streams. I just need a way to make those proportions look like streams.

I’ve shown a tabular representation of the three-liquid mixture below. The value in the last column, Frequency, means “one every N”. So the value 6 means “one out of every six.”

LiquidCountFrequency
A32
B23
C16

So A would be in positions 2, 4, and 6. B would be in positions 3 and 6. And C would go into position 6. Forget for the moment that position 6 is occupied by three different liquids. Those are the positions we want the liquids to take. Positions 1 and 5 aren’t occupied, but they’ll obviously have to be filled.

If you remember my merging discussions (see article linked above), we built a priority queue (a min-heap) with one item from each stream. Let’s do that with our three items, using a Priority field for ordering. That field is initially set to the Frequency value. So the heap initially contains: [(A,2),(B,3),(C,6)].

Now, remove the first item from the heap and output it. Then add the frequency value to the priority and add the item back to the heap. So after the first item (A) is output, the heap contains: [(B,3),(A,4),(C,6)].

Again, remove the first item, output it, update its priority, and add it back to the heap. The result is [(A,4),(C,6),(B,6)].

If you continue in that fashion, the first six items output are A,B,A,C,B,A.

Given those rules, it was just a few minutes’ work to develop a method that, given an array of counts, returns a sequence that will ensure a reasonable mixture. The OrderItems method below is the result. It uses my DHeap class, which you can download from this blog entry.

    private IEnumerable<int> OrderItems(int[] counts)
    {
        var totalItems = counts.Sum();

        // Create a heap of work items
        var workItems = counts
            .Select((count, i) => new HeapItem(i, count, totalItems));
        var workHeap = new MinDHeap<HeapItem>(2, workItems);

        while (workHeap.Count > 0)
        {
            var item = workHeap.RemoveRoot();
            yield return item.ItemIndex;
            if (--item.Count == 0)
            {
                // all done with this item
                continue;
            }
            // adjust the priority and put it back on the heap
            item.Priority += item.Frequency;
            workHeap.Insert(item);
        }
    }

    private class HeapItem : IComparable<HeapItem>
    {
        public int ItemIndex { get; private set; }
        public int Count { get; set; }
        public double Frequency { get; private set; }
        public double Priority { get; set; }

        public HeapItem(int itemIndex, int count, int totalItems)
        {
            ItemIndex = itemIndex;
            Count = count;
            Frequency = (double)totalItems / Count;
            Priority = Frequency;
        }

        public int CompareTo(HeapItem other)
        {
            if (other == null) return 1;
            var rslt = Priority.CompareTo(other.Priority);
            return rslt;
        }
    }

The counts parameter is an array of integers that define the count of each item type to be delivered. In the case of our simple example, the array would contain [3,2,1]. The values in the returned sequence are indexes into that array. The returned sequence for this example would be [0,1,0,2,1,0].

You’ll need to do some translation, then: first to create the counts array, and then to get the actual items from the indexes returned. Here’s an example.

    var mixLiquids = new char[] {'A', 'B', 'C'};
    var mixCounts = new int[] {3, 2, 1};

    foreach (var index in OrderItems(mixCounts))
    {
        Console.Write(mixLiquids[index] + ",");
    }
    Console.WriteLine();

The output from that will be A,B,A,C,B,A.

The OrderItems method produces a good mixture of items in that it spreads out the liquid additions to minimize stratification, but it might not produce a uniform mixture when some liquids are used much less than others. In my second example, where A, B, and C together make up 87% of the total, liquids D, E, F, and G might not get mixed thoroughly. If I run the code above with my second example, the first additions of F and G don’t occur until the middle of the sequence: around index 45. The result would be that those liquids might not intermix with the first half.

It might be better to push the low-frequency additions towards the front so that they’re mixed well with all the other additions. To do that, set the initial Priority of all items to 0, and make sure that the comparison function (HeapItem.CompareTo) favors the low frequency item if the priorities are equal. In the HeapItem constructor, change the Priority initialization to:

    Priority = 0;

And replace the CompareTo method with this:

public int CompareTo(HeapItem other)
    {
        if (other == null) return 1;
        var rslt = Priority.CompareTo(other.Priority);
        // If priorities are the same, then we want the lowest frequency item.
        return rslt != 0
            ? rslt
            : other.Frequency.CompareTo(Frequency);
    }

With those modifications, the lower-frequency additions are done up front, giving us [C,B,A,A,B,A] for the first example. In the second example, E and F would be added first, and also at index 50 or so.

Or, you might want to put those liquid additions around indexes 25 and 75. To do that, change the Priority initialization to Priority = Frequency/2;.

Whereas I solved this problem specifically for ordering the additions of liquids in a mixture, the OrderItems method is more generally useful. If you have a bunch of different items that you want to spread out evenly, this will do it for you. It’s a simple solution, and the algorithmic complexity is the same as with merging streams: O(n log2 k), where n is the total number of items and k is the number of different item types.