Priority queues

Your job today is to sit behind a counter and manage a group of objects (two-inch cubes, each with a unique number) that people will be working with. Specifically, you have to:

  1. Collect cubes that people come by and drop off.
  2. When somebody asks for a cube, you have to give him the lowest-numbered cube you currently have in your possession. For brevity, we’ll call that the “smallest” cube.

That’s all you have to do.

If you only have to keep track of a few items, you can throw cubes haphazardly on your counter for storage, and quickly pick out the smallest one by inspection whenever somebody asks for it. But if you get more than seven or eight cubes, finding the smallest one starts to take a little time. If you had 100 cubes lying on your desk, unordered, you’d have to search very carefully to find the smallest one.

If you’re told that you have to keep track of 10 or 20 cubes, you could line them up on the counter in order, with the smallest on the left and the largest on the right. If somebody comes by asking for the smallest cube, you just pick it off the left side. If somebody drops one off, you make space for it in the list and put it in the proper spot.

Making space for a new cube in the list involves moving all the cubes from insertion point forward, down one space. So, given this initial list:

    03  05  11  23  37  41

If you need to insert cube number 7, first you have to make space by moving everything down one:

    03  05  XX  11  23  37  41

Then you can put cube 7 where it belongs: in the empty spot shown above by the XX.

That’s not a problem when you have a few dozen items. It gets tiresome pretty quickly, though, if you have 100 or 1,000 cubes to track. Imagine that the list above is just the first part of a list of 1,000 entries. If you insert a new cube with the number 4, you would have to move 998 cubes just to make room for the one you’re inserting. Ouch.

If you have to keep track of more than a few dozen items, the technique of keeping them in order like that becomes time consuming because every time somebody drops off a cube you have to move a bunch of others in order to make space for the new one. Getting the smallest item is trivial, of course, but if you had to insert an item near the front of the list, you’d end up moving almost 1,000 items to make room for it. You’d also need a really long counter. You’d spend a whole lot of time making room for new cubes or shifting cubes to the front as items are removed. The people wanting their smallest cube will have to wait while you’re shuffling things around.

The two tasks you’ve been assigned are exactly the tasks that the priority queue abstract data type performs.

Lots of common algorithms use priority queues, including selection algorithmsgraph searching, and Huffman coding, to name just a few. You can also sort with a priority queue

The priority queue, as I said, is an abstract data type that describes what the thing does. It says nothing about how it’s done, and in fact there are many ways to implement a priority queue. I described two of them above: an unsorted list, and a sorted list. Those are simple to implement, but they’re terribly inefficient.

For example, adding an item to an unordered list is very fast (just add it to the end of the list). But to get the smallest item you have to:

  1. Search the entire list to find the smallest item.
  2. Move the item from the end of the list to replace the item that you remove.

The code below shows how you would implement the Insert and Remove methods for a priority queue of integers. It assumes that the items are stored in a List<int>.

    void Insert(int item)
    {
        _items.Add(item);
    }

    int RemoveFirst()
    {
        int minIndex = 0;
        for (int i = 1; i < _items.Count; ++i)
        {
            if (_items[i] < _items[minIndex])
            {
                minIndex = i;
            }
        }
        int result = _items[minIndex];
        // now remove the item by moving the last
        // item to where the minimum item was.
        _items[minIndex] = _items[_items.Count-1];
        _items.RemoveAt(_items.Count-1];
    }

Inserting is very fast; it takes constant time. Removing an item requires that you search the entire list. So the longer the list gets, the longer it takes to remove an item.

You can do slightly better with a sorted list. If you maintain the list in descending order, then removing an item takes constant time; you just remove the item from the end of the list. Inserting an item, though, requires that you:

  1. Perform a binary search to find where the item is supposed to go.
  2. Move everything down one place to make room for the new item.

The code looks something like this:

    void Insert(int item)
    {
        int insertionSpot = 
            _items.BinarySearch(item, new ReverseIntComparer());
        if (insertionSpot < 0)
        {
            insertionSpot = ~insertionSpot;
        }
        // insert it at the proper position
        _items.Insert(item);
    }

    class ReverseIntComparer: IComparar<int>
    {
        public int Compare(int x, int y)
        {
            return y.CompareTo(x);
        }
    }

    int RemoveFirst()
    {
        int result = _items[_items.Count-1];
        _items.RemoveAt(_items.Count-1];
    }

Finding the insertion spot is pretty fast; it takes a maximum of 10 probes for BinarySearch to find an item in a sorted list of 1,000. The call to _items.Insert, though, will move everything down by one position to make room for the new item. Assuming that items are inserted in random order, Insert will have to move an average of Count/2 items. If items are inserted in ascending order, then every item is added to the head of the list and Insert will always move Count items.

So with an unsorted list Insert is fast and Remove is slow. With a sorted list, Remove is fast and Insert is slow. In general, a sorted list will perform faster than an unsorted list, but not by a whole lot.

One other common operation on priority queues makes the sorted list preferable. Peek lets you look at the smallest item without removing it. That would be fast with the sorted list, but slow with the unsorted list because you have to search for the smallest item.

It would be nice if we could make both of those operations fast, and there are many ways to do that. A popular and well-performing priority queue implementation is the heap data structure, which will be the subject of my next several posts.