An assumption of competence

My second programming job was with a small commercial bank in Fresno, CA, where I helped maintain the COBOL account processing software.  I was still pretty inexperienced, having only been working in the industry for about 18 months.  My previous job involved maintaining software, also in COBOL, for small banks in western Colorado.

One of the first things my new boss asked me to look at was a program that computed loan totals by category:  a two-digit code that was assigned to each loan.  Federal regulations required that we report the total number of and dollar amount of all loans, by category, as well as the number and dollar amount that were 30, 60, and 90 days or more past due.  The problem was that the report program was taking way too long to run.  The bank had recently acquired a bunch of new loans, and the program’s run time had increased sharply—several times more than what one would expect from the increase in the number of loans.

Understand, this was a very simple program.  All it had to do was go through the loans sequentially and compute totals in four columns (total, 30 days past, 60 days past, 90+ days past) for each of the 100 categories.  The data structures are very simple.  I don’t remember enough COBOL to write it intelligently, so I’m showing them translated to C#:

struct CountAndTotal
{
    public int Category;
    public int Count;
    public double Total;
}

// Arrays for Total, 30, 60, and 90+ days past due
CountAndTotal[] TotalAllLoans = new CountAndTotal[100];
CountAndTotal[] Total30Past = new CountAndTotal[100];
CountAndTotal[] Total60Past = new CountAndTotal[100];
CountAndTotal[] Total90Past = new CountAndTotal[100];

I’ll admit that I was a little mystified by the Category field in the CountAndTotal structure, but figured it was an artifact from early debugging.

Those definitions and the description of the problem above lead to a simple loop:  for each loan, determine its category and payment status, and add the totals to the proper items in the arrays. The program almost writes itself:

while (!LoanFile.Eof)
{
    LoanRec loan = LoanFile.ReadNext();
    AddToTotals(loan, TotalAllLoans);
    if (loan.PastDue(90))
        AddToTotals(loan, Total90Past);
    else if (loan.PastDue(60))
        AddToTotals(loan, Total60Past);
    else if (loan.PastDue(30))
        AddToTotals(loan, Total30Past);
}

What surprised me when I looked at the code was the implementation of the AddToTotals function. You would expect it to be a simple index into the array from the loan’s category code. After all, the category was guaranteed to be in the range 0-99. That just begs for this implementation:

void AddToTotals(LoanRec loan, CountAndTotal[] Totals)
{
    ++CountAndTotal[loan.Category].Count;
    CountAndTotal[loan.Category].Total += loan.Balance;
}

What I found was quite surprising. Rather than directly index into the array of categories, the program would do a sequential search of the array to see if that category was already there. If it was, the total was added. Otherwise the program made a new entry at the next empty spot in the array. That explained the mysterious Category field and the absurdly long run time. The code is much more complicated:

void AddtoTotals(LoanRec loan, CountAndTotal[] Totals)
{
    int i = 0;
    while (i < 100)
    {
        if (Totals[i].Category == loan.Category)
        {
            ++Totals[i].Count;
            Totals[i].Total += loan.Balance;
            break;
        }
        else if (Totals[i].Category == -1)
        {
            // Unused position.  The category wasn't found in the array.
            Totals[i].Category = loan.Category.
            Totals[i].Count = 1;
            Totals[i].Total = loan.Balance;
            break;
        }
        ++i;
    }
}

The difference in run times is enormous! The first implementation accesses the array directly from the loan.Category field. The second has to search the array sequentially—an operation that involves, on average, looking at 50 different items every time.  The first version of the program (the way I thought it should be written) is at least 50 times faster than the second.  In addition, the second required a subsequent sort to put things in the proper order before printing the results.

Being new at the job, I went to my boss, explained what I’d found, and said, “What am I missing?”  His response:  “Why do you think you’re missing something?”

He went on to explain that my analysis was correct, and that the industry (at least back then) was full of programmers who had no business sitting at a terminal.  It was something of a revelation to me, because I had assumed that the people who wrote this stuff really knew what they were doing.  It also taught me to question everything when faced with a problem.  It’s always a good idea to assume competence when you start debugging somebody else’s (or your own) code, but when things stop making sense it’s time to re-evaluate that assumption.