Code review on a 30 year old program

I mentioned yesterday that I’d run across a program that I’d written in 1982. I spent a little time looking it over and thought I’d post my impressions of the code. You can download the program itself and follow along if you like: RENUMBER.BAS [06/24/2024 – The link is broken. I need to track that file down and upload it again.]

I’ll admit up front that being objective might prove difficult. After all, I wrote the program 30 years ago. I might be too easy on myself in some places and too hard on myself in others. I do have to keep in mind that some of what we would consider poor programming practice today was de rigueur at the time, and much was a result of the tool. BASIC wasn’t a particularly good language for writing structured code. Today’s BASIC (even Visual Basic 6) is a much different beast than the line-oriented BASIC of 1982.

At less than 400 lines, the program isn’t particularly large or, as it turns out, terribly complicated. My recollection, before I saw this code again the other day was of a much larger and more complex program. I do recall that it was something of a challenge at the time, although I never doubted that I could write it. In a modern block-structured language, the program would easily be 1,200 lines of code. That old BASIC could be terse, and there were definite benefits to placing multiple statements on a single line.

General operation

The program operates in two passes.

Pass 1

For each input line
    - Strip high bit from each character // required because the editor used
                                         // would sometimes set the high bit.
    - Parse line number (if it exists) and save
    - Write fixed output line to a temp file
    - Compute new line numbers

Line numbers are stored in a two-dimensional array called LINE.NUMBERS. After the first pass is done, the first column contains the old line numbers and the second column contains the new line numbers. Given that structure, it’s easy to look up an old line number (using binary search) and get the new line number.

Pass 2

For each input line (from the temp file)
    - Parse the old line number, look it up in the line numbers array, and
      replace with the new line number.
    - Parse the line looking for GOTO, GOSUB,
      and other constructs that make line number references.
    - As each such construct is found, get the old line number, and replace
      it with the new line number.
    - Write the modified line to the output file.

That’s the general operation. The option to move a block of code rather than renumber the entire program adds some wrinkles, but the basic idea remains the same.

First impressions

After I got past the visual assault of all-capital BASIC and such dense code, I tried to focus on the overall structure of the program. I thought I did a creditable job of modularizing the code at a high level, given the constraints of the tool. Each major subroutine is preceded by a short comment describing what it does, and those subroutines are reasonably short and focused.

Although I did place multiple statements per line in some places, most of that was due to the lack of a multi-line IF...ELSE construct. You'll often see

    IF <condition> THEN <stmt1>:<stmt2>
       ELSE <stmt3>:<stmt4>

I created that visual line break by inserting a LF character in the editor. That little bit of trickery made it look like a multi-line construct, although the entire thing was considered a single logical line by the interpreter. Doing that made the code more readable.

For reasons that I don’t quite recall, I didn’t like using the DEFINT A-Z statement to make all the variables implicitly integers. I think I got burned by relying on implicit definitions at some point and went on an explicit kick. The variable FOUND%, for example, is explicitly made an integer by the % suffix. Similarly, strings are identified by the $ character. Considering that the program doesn’t deal with floating point numbers at all, there’s no good reason not to have used DEFINT. It probably would have made the code more readable. I’ll dock myself on style for this one, especially because I didn’t apply it consistently.

As shown here, the program won’t handle more than 1,200 numbered lines. That’s not as bad as it sounds, though. The compiled BASIC didn’t require a number on every line–just those that were the targets of GOTOGOSUB, etc. I don’t recall how large Dean’s source code (the program I wrote this to handle) was, but I do remember that 1,200 was far more line numbers than we actually needed.

An observation: it took me a few minutes to remember what PRINT #2 is supposed to do. It writes text to file number 2. The number is assigned in the OPEN statement. My first thought on seeing that was, “Some comments would have been nice.” But in truth, the problem was that I’m unfamiliar with the language. My 21-year-old self would rightfully object if I told him to add a comment like:

    ' WRITE LINES TO OUTPUT FILE
    PRINT#2, A$

Doing so would be akin to commenting a MyStream.WriteLine(line) today.

I was too enamoured with short variable names. Whereas it’s true that code ran faster with shorter variable names, this wasn’t a particularly performance sensitive application. My usage here was inconsistent. There are plenty of variables like LINE.NUMBERS and LASTLINE, but also too many like MT%.

On a side note, I have to say that this is the only language I recall ever using that allowed the period to be part of a variable name. Other languages use it as a scope separator. LINE.NUMBERS, for example, would refer to the NUMBERS field in the LINE structure. But MBASIC didn’t support the underscore or any other such separator character.

I also made some poor choices in variable names. FIND for the line number to find is a particularly good example, as is FOUND for the result. Other examples are the variable MOVE, and using CHAR% and CHAR$ in the same context, and plenty more.

It’s impossible not to use global variables in a BASIC program, since all variables are global and there are no parameters to subroutines. Communication consists of setting global variables before calling, and retrieving the results from other global variables after the call returns. Still, it’s considered good practice to keep variables as local as possible. I don’t see any egregious misuses of the global data model, but I probably could have made things more explicit by marking data transfer variables somehow.

I’d give myself good marks for the overall structure of the program and trying to maintain locality of reference. About average for variable naming. A big black mark for failing to use DEFINT and for being inconsistent with the type suffixes.

A few nasties

Take a look at this construct:

610 IF MOVE%=0 THEN
        WHILE NOT(EOF(1))
        ELSE WHILE NOT(EOF(1) AND EOF(3))

When I saw that the other day, I remembered writing it. More to the point, I remember thinking that it was incredibly clever. And it gets worse. The loop is terminated starting at line 980:

    980 IF MOVE%=1 THEN WEND:GOTO 1000
    990 WEND

That BASIC allowed such a wonky construct is kind of surprising. Imagine being able to write this in a C-like language:

    if (move == 0)
        while (!eof(1)) {
    else
        while (!(eof(1) && eof(3)) {

    // do stuff here

    if (move == 1)
        }; // ends the loop for one case
    else
        }; // ends the loop for the other case

I could have written that much more clearly and avoided the brain bending wonkiness. BASIC didn’t short circuit Boolean evaluation, but there were other alternatives.

Major demerits for that bit of cleverness, youngster. You knew better.

The first time I saw “REFRENCES”, I thought I’d made a typo. But there are four uses with that spelling, and no uses of the correct spelling, “REFERENCES.” Apparently, I didn’t know how to spell.

The verdict

Overall, I think I did a good job on that program. The most important part was that it worked. As I recall, there were a few minor bugs and one showstopper after the program was in production, but after that the program got regular use for several years. I don’t recall what any of the bugs were, and I don’t know if this version of the program includes the fixes.

It’s tempting to rewrite the program in C#, just to see how it would look. But not quite tempting enough to actually do it.