Code review on a 30 year old program

I mentioned yesterday that I’d run across a program that I’d written in 1982. I spent a little time looking it over and thought I’d post my impressions of the code. You can download the program itself and follow along if you like: RENUMBER.BAS.

I’ll admit up front that being objective might prove difficult. After all, I wrote the program 30 years ago. I might be too easy on myself in some places and too hard on myself in others. I do have to keep in mind that some of what we would consider poor programming practice today was de rigueur at the time, and much was a result of the tool. BASIC wasn’t a particularly good language for writing structured code. Today’s BASIC (even Visual Basic 6) is a much different beast than the line-oriented BASIC of 1982.

At less than 400 lines, the program isn’t particularly large or, as it turns out, terribly complicated. My recollection, before I saw this code again the other day was of a much larger and more complex program. I do recall that it was something of a challenge at the time, although I never doubted that I could write it. In a modern block-structured language, the program would easily be 1,200 lines of code. That old BASIC could be terse, and there were definite benefits to placing multiple statements on a single line.

General operation

The program operates in two passes.

Pass 1

For each input line
    - Strip high bit from each character // required because the editor used
                                         // would sometimes set the high bit.
    - Parse line number (if it exists) and save
    - Write fixed output line to a temp file
    - Compute new line numbers

Line numbers are stored in a two-dimensional array called LINE.NUMBERS. After the first pass is done, the first column contains the old line numbers and the second column contains the new line numbers. Given that structure, it’s easy to look up an old line number (using binary search) and get the new line number.

Pass 2

For each input line (from the temp file)
    - Parse the old line number, look it up in the line numbers array, and
      replace with the new line number.
    - Parse the line looking for GOTO, GOSUB,
      and other constructs that make line number references.
    - As each such construct is found, get the old line number, and replace
      it with the new line number.
    - Write the modified line to the output file.

That’s the general operation. The option to move a block of code rather than renumber the entire program adds some wrinkles, but the basic idea remains the same.

First impressions

After I got past the visual assault of all-capital BASIC and such dense code, I tried to focus on the overall structure of the program. I thought I did a creditable job of modularizing the code at a high level, given the constraints of the tool. Each major subroutine is preceded by a short comment describing what it does, and those subroutines are reasonably short and focused.

Although I did place multiple statements per line in some places, most of that was due to the lack of a multi-line IF...ELSE construct. You'll often see

    IF <condition> THEN <stmt1>:<stmt2>
       ELSE <stmt3>:<stmt4>

I created that visual line break by inserting a LF character in the editor. That little bit of trickery made it look like a multi-line construct, although the entire thing was considered a single logical line by the interpreter. Doing that made the code more readable.

For reasons that I don’t quite recall, I didn’t like using the DEFINT A-Z statement to make all the variables implicitly integers. I think I got burned by relying on implicit definitions at some point and went on an explicit kick. The variable FOUND%, for example, is explicitly made an integer by the % suffix. Similarly, strings are identified by the $ character. Considering that the program doesn’t deal with floating point numbers at all, there’s no good reason not to have used DEFINT. It probably would have made the code more readable. I’ll dock myself on style for this one, especially because I didn’t apply it consistently.

As shown here, the program won’t handle more than 1,200 numbered lines. That’s not as bad as it sounds, though. The compiled BASIC didn’t require a number on every line–just those that were the targets of GOTOGOSUB, etc. I don’t recall how large Dean’s source code (the program I wrote this to handle) was, but I do remember that 1,200 was far more line numbers than we actually needed.

An observation: it took me a few minutes to remember what PRINT #2 is supposed to do. It writes text to file number 2. The number is assigned in the OPEN statement. My first thought on seeing that was, “Some comments would have been nice.” But in truth, the problem was that I’m unfamiliar with the language. My 21-year-old self would rightfully object if I told him to add a comment like:

    ' WRITE LINES TO OUTPUT FILE
    PRINT#2, A$

Doing so would be akin to commenting a MyStream.WriteLine(line) today.

I was too enamoured with short variable names. Whereas it’s true that code ran faster with shorter variable names, this wasn’t a particularly performance sensitive application. My usage here was inconsistent. There are plenty of variables like LINE.NUMBERS and LASTLINE, but also too many like MT%.

On a side note, I have to say that BASIC is the only language I recall ever using that allowed the period to be part of a variable name. Other languages use it as a scope separator. LINE.NUMBERS, for example, would refer to the NUMBERS field in the LINE structure. But MBASIC didn’t support the underscore or any other such separator character.

I also made some poor choices in variable names. FIND for the line number to find is a particularly good example, as is FOUND for the result. Other examples are the variable MOVE, and using CHAR% and CHAR$ in the same context, and plenty more.

It’s impossible not to use global variables in a BASIC program, since all variables are global and there are no parameters to subroutines. Communication consists of setting global variables before calling, and retrieving the results from other global variables after the call returns. Still, it’s considered good practice to keep variables as local as possible. I don’t see any egregious misuses of the global data model, but I probably could have made things more explicit by marking data transfer variables somehow.

I’d give myself good marks for the overall structure of the program and trying to maintain locality of reference. About average for variable naming. A big black mark for failing to use DEFINT and for being inconsistent with the type suffixes.

A few nasties

Take a look at this construct:

610 IF MOVE%=0 THEN
        WHILE NOT(EOF(1))
        ELSE WHILE NOT(EOF(1) AND EOF(3))

When I saw that the other day, I remembered writing it. More to the point, I remember thinking that it was incredibly clever. And it gets worse. The loop is terminated starting at line 980:

    980 IF MOVE%=1 THEN WEND:GOTO 1000
    990 WEND

That BASIC allowed such a wonky construct is kind of surprising. Imagine being able to write this in a C-like language:

    if (move == 0)
        while (!eof(1)) {
    else
        while (!(eof(1) && eof(3)) {

    // do stuff here

    if (move == 1)
        }; // ends the loop for one case
    else
        }; // ends the loop for the other case

I could have written that much more clearly and avoided the brain bending wonkiness. BASIC didn’t short circuit Boolean evaluation, but there were other alternatives.

Major demerits for that bit of cleverness, youngster. You knew better.

The first time I saw “REFRENCES”, I thought I’d made a typo. But there are four uses with that spelling, and no uses of the correct spelling, “REFERENCES.” Apparently, I didn’t know how to spell.

The verdict

Overall, I think I did a good job on that program. The most important part was that it worked. As I recall, there were a few minor bugs and one showstopper after the program was in production, but after that the program got regular use for several years. I don’t recall what any of the bugs were, and I don’t know if this version of the program includes the fixes.

It’s tempting to rewrite the program in C#, just to see how it would look. But not quite tempting enough to actually do it.

My first programming contract

Today a coworker and I were discussing the first programs we ever sold. He and I have been in the business about the same amount of time (right at 30 years), so it was a nostalgic although unfortunately brief discussion.

After dinner this evening, I got to wondering if I still had that program lying around somewhere. Thanks to the magic of multi-terabyte drives and Windows Explorer’s search facility, it was a matter of just a few minutes to locate a copy of that program. Man, I’m a pack rat.

10 'RENUMBER.BAS
 PROGRAM RENUMBERS BASIC PROGRAMS.
 BASIC PROGRAM MUST BE SAVED IN ASCII FORMAT OR EDITED WITH A
 WORD PROCESSOR SUCH AS WORDSTAR.

This was the fall of 1982. I’d left college after two years and was doing odd jobs to get spending money, but mostly I was broke. My friend Dean had a fairly large compiler BASIC program that he had to renumber because … well, that’s what happened to BASIC programs. You thought you put enough space between line numbers and sections, but inevitably you’d get squeezed.

That wasn’t a problem with interpreted BASIC, because there was a RENUM command. No such thing in the compiler BASIC, though, and you couldn’t load the compiler BASIC program into an interpreter because the code didn’t have a line number on every line.

One feature of Microsoft’s compiler BASIC is that not every line needed to be numbered. Only lines that were the targets of GOTO, GOSUB, and other such statements required line numbers.

Dean mentioned this one night and, eager for a challenge (or not smart enough to keep my mouth shut), I said that I’d write him a program to renumber his BASIC program. After all, I’d been writing BASIC programs for three whole years. I’d even taken a few college courses on programming. Of course I could whip this thing out in no time.

I honestly don’t remember how long it took me to write the program. Probably on the order of a week of evenings spent at Dean’s shop standing at one of his computer terminals (a Televideo 950 hooked to a machine running MP/M, supporting four users with a 4 MHz Z-80 processor and a whopping 256 kilobytes of RAM).

The funny thing is that I took on the project as a challenge: something fun to do while spending time with my friend. Dean might have mentioned that he’d pay me for it, and he did when I finished it, but that wasn’t my primary motivation.

I always thought it was funny that I used an interpreted BASIC program to renumber his compiler BASIC source file. In the days before source level debuggers for compiled code, the BASIC interpreter was a much friendlier environment for prototyping things. I probably would have written the program in Pascal if I’d had a compiler for it at the time. The only other language I knew at the time was Z80 assembly, and I wasn’t looking for that much of a challenge.

The program is surprisingly small–less than 400 lines of code. Looking it over brought back some memories. Surprisingly, the code is pretty good. A little difficult to follow, as even small BASIC programs are. But that’s due more to the nature of the language than my meager programming skill at the time. There also are some reasonably mature insights in the code.

More than anything, looking at that code makes me appreciate languages that have real subroutines that require parameters and return values. Calling line numbers (GOSUB) and communicating with globals is terribly difficult to follow. And the language lets you do (sometimes forces you to do) wonky things.

I think I’ll study this code a little more and do something of a code review. Might even post the program here. That should be fun. More next time.

You never know who’s listening

The barracks at military school had three red lights on each floor (deck). These were fire lights, on a different circuit (one would assume battery backed up) than the rest of the lights in the building. They were supposed to be on from dusk until dawn, and it was the Duty NCO’s job to turn them on at sunset and off at first light (or when he got up at 6:00 with the rest of us).

We, of course, called these the whorehouse lights. I can honestly say that for the first year or two I was at the school, I had no idea what else to call them. I also didn’t  really understand why they were called whorehouse lights. It wasn’t until I was 15 or 16 that I made the connection with the red lights.

The Duty NCO (usually a younger cadet–a freshman or sophomore) very often forget this important duty, and it wasn’t uncommon to hear the company commander or company first sergeant bellow down the hall, “Duty NCO, turn on the whorehouse lights!” After all, we were all guys there. Our language in the barracks was . . . coloful. Okay, profane. Immaturely so. But I digress.

So parents’ weekend rolls around and there are parents wandering through the barracks all day. But that usually doesn’t last beyond mid afternoon. By then all the parents and siblings are back at their hotels getting ready for the Marine Corps Birthday Ball to be held later that evening, and all the cadets are lounging around or cutting up before rushing to get into their dress blues just in time to make the bus for the ride into town where the Ball will be held. (The Ball is held in the basketball gym these days. Back then, our gym wasn’t large enough to hold the entire Corps of Cadets, much less their parents and dates.)

You probably know where this is going already.

I was the company commander my senior year. I walked out of my room shortly after sundown and noticed that the fire lights were off. So I did what I’d done countless times over the last year (the previous year I was the second-ranking junior cadet in the company): in my best command voice I shouted, “Duty NCO, turn on the whorehouse lights!”

Then I turned around and saw a younger cadet coming out of his room, trailed by his father, his mother, and two pre-teen sisters.

I fully expected to hear about that incident from my Drill Instructor, but I never did. Apparently the parents were either too shocked or too amused by the look on my face to further embarrass me by bringing it up with the D.I.

Samsung Galaxy Tab

I’m sitting on the couch, typing this on my new Samsung Galaxy Tab 2–a 10 inch tablet computer. I have had this machine in my hot little hands for approximately 4 hours.

My first impressions are mostly favorable. Initial setup was no problem. I’m still trying to understand the logic (?) behind the home screen, though. Pressing the “home” button doesn’t always take me to my home screen. There are six different screens on which I can place icons and widgets, and pressing “home” will take me to the proper one most of the time. Infrequently, with no apparent reason, it will take me to one of the others–sometimes one of the blank pages. Odd.

It would have been nice to get a manual wih this thing. I found several on Samsung’s site, but haven’t yet figured out which is the right one.

My only real complaint at the moment is that the tablet keeps losing its WiFi connection. I thought it was because I’m in the other room, with a couple of walls between me and the router, but it was losing the connection when I was in the same dang room, earlier. I’ll find out tomorrow if it’s my connection, or if there’s something screwy with this tablet.

I also got a faux leather case wih a Bluetooth keyboad. Typing on this litle keyboard is better than trying to type on the tablet’s screen, but it’s going to take some practice. There is only one shift key, some of the keys (like the apostrophe/quote) are in odd places, the shift has a maddening tendency to remain engaged for a little longer than necessary, resulting in things like “HEllo.” And the keys are so dang close together! That said, I think I can learn to write on this thing.

Which is one of the reasons I bought it. I’ve long wanted the ability to blog and to write things when I’m away. A laptop works okay, but it’s kind of clunky to cart around. This tablet, with the keyboad and a good writing application, just might do the trick. And there are any number of reading applications available. I do a lot of online reading as it is. With this thing, I can download PDFs and other files, and read them at my leisure, wherever I am. Or so I think.

By the way, I’m writing this in the free WordPress application for Android. Seems to work well. Let’s see if it posts.

The trick is knowing where to put it

The story is told of a homeowner who notices a squeaky step in his house and tries any number of remedies before finally giving up and calling somebody in. The handyman arrives, looks at the staircase, tries the squeaky step, stands back, scratches his beard, pulls out a nail, and hammers it in. Problem solved. Time: 5 minutes.

A week later, the homeowner receives a bill for $200. Outraged, he calls the handyman: “$200 for hammering in a single nail? I can’t pay this without an itemized invoice that justifies the outrageous price.”

A few days later, an itemized invoice arrives:

Nail: $0.50
Labor: $10.00
Knowing where to put the nail: $189.50

I’ve been working on implementing the security system for our new services framework, using some relatively new Microsoft technology (the Windows Identity Foundation support in .NET 4.5). One particular component is the Java Web Token (JWT) support that’s currently in beta (“Developer Preview”).

I won’t go into the particulars of the problem here. If you’re interested, see my Stackoverflow question on the topic. It’s interesting to note, though, that I spent a total of about four days on research, trying to understand why things weren’t working as expected, and figuring out the best place to insert my own code that makes things work. The result is about two dozen lines of code.

Once again, the solution is easy once you’ve identified the real cause of the problem. I suppose it’s a good thing that I’m not being paid per line of code . . .

Not that I couldn’t have written a whole lot more code. I could have rewritten the entire JWT handler if I wanted to, in approximately the same amount of time. I would have been less confident in the result, though. I’ve seen all too many projects in which the programmers substituted typing for thought. It’s a much better thing to spend time understanding the problem and applying very localized changes.

The robot story

Many years ago, some friends and I decided to build a robot. Dean was the primary hardware guy, and I was in charge of writing most of the software. The robot was a pretty ambitious undertaking that we never finished because most of the development team (four of us) moved away or had to spend our time on other more pressing things.

Early on in the development, Dean was wirewrapping the main logic board (Z-80 based) and I was writing the firmware on a CP/M system. By the time he had the board wirewrapped, I had the first cut of the base code finished. We burned the firmware into an EEPROM, did what bench testing we could, and then plugged the board into robot. After checking all the connections, including the RS-232 cable that we would use to control the robot during this test, we all stepped back and Dean turned on the power.

The robot’s motor came from an electric lawn mower, and the power source was a very high capacity lead-acid battery. We’d done some testing of the drive train earlier and learned that the thing was capable of some rather high speeds. Probably overkill for a little robot, but we figured we’d never tell it to go that fast. Little did we know . . .

Dean turned on the power and the robot took off across the shop like a rocket. 10 feet away from the bench, the RS-232 connector disengaged and the robot continued the rest of the way across the room and smacked into a wall. Where it stayed, its wheels spinning madly while the four of us rolled on the floor, laughing.

After picking ourselves up off the floor and turning off the robot, Dean turned to me and said, “Doesn’t your software turn off the motor at boot like I told you?”

Me: “I thought so!”

Sure enough, we looked at the code, which read:

start:
    ; turn off the drive motor
    xor a
    out (DriveMotor),a

Dean’s instructions to me at the time were, “You control the motor’s speed by outputting a value from 0 to 255 to the motor control port.” So I figured that 0 would be the lowest possible speed (i.e. stop), and 255 would be “shoot across the room like a rocket.”

Of course, Dean looked at that code and saw the problem immediately. If he explained to me why he’d designed the logic so that 255 was stop and 0 was “go like a bat out of hell,” I don’t remember the specifics. All I know is that I thought that doing things backwards made little sense.

People often look at me funny when I ask them today for clarification. A music player API, for example, might have a Volume property that’s documented as taking a number from 0 to 100. When I ask the author if 0 is mute or max, he rolls his eyes and says, “mute, of course.” Usually. But over the years I’ve run into a few backwards APIs similar to the robot’s drive motor, most often when dealing with custom hardware. I’ve learned the hard way that it pays to be sure, even if it does result in some funny looks from time to time.

Programs are not cats

I see questions like this fairly regularly:

I need to write a program that checks a database every five minutes, and processes any new records. What I have right now is this:

    void Main()
    {
        while (true)
        {
            CheckDatabaseForUpdates();
            Thread.Sleep(300000); // 5 minutes, in milliseconds
        }
    }

Is this the best way, or should I be using threads?

I’ve said before that Thread.Sleep is a red flag indicating that something is not quite right. And when a call to Sleep is in a loop, it’s a near certainty that there is a real problem. Most Sleep loops can be re-coded to use a timer, which then frees the main thread to do other things.

In this example, the “other thing” the main thread could be doing is waiting for a prompt from the user to cancel the program. As it stands, the only way to close the program would be to abnormally terminate the process (Control+C, for example, or through Task Manager). That could be a very bad thing, because the program might be in the middle of updating the database when you terminate it. That’s a virtually guaranteed path to data corruption.

I’m not going to show how to refactor this little program to use a timer because doing so is fixing a solution that is fundamentally broken at a higher level. How can this program be broken, you ask?

The program is broken because it’s doing a poor job of implementing a feature that already exists in the operating system. The program is occupying memory doing nothing most of the time. It naps, wakes up periodically, looks around, maybe does some work, and then goes back to sleep. It’s like a cat that gets up to stretch every five minutes and maybe bats a string around for a bit before going back to sleep.

Operating systems already have a way to implement catnap programs. In the Linux world, the cron daemon does it. Windows has Task Scheduler and the schtasks command line utility. These tools let you schedule tasks to execute periodically. Using them has several advantages over rolling your own delay loop.

  • You can change or terminate the schedule without having to recompile the program. Yes, you could make the program read the delay time from a configuration file, but that just makes the fundamentally broken program a little more complicated.
  • Task schedules are remembered when the computer is restarted, freeing you from having to start it manually or figure out how to put it in the startup menu.
  • If a catnap program crashes, it must be restarted manually. A scheduled task, on the other hand, will log the error, and run the next time.
  • The program doesn’t consume system resources when it’s not doing useful work. It’s like a cat that’s not even there when it’s sleeping. (Which would make for a cat that you don’t see very often, and only for very brief periods.)

Programs aren’t cats. If a program isn’t doing active work or maintaining some in-memory state that’s required in order to respond to requests, then it shouldn’t be running. If all your program does is poll a resource on a regular schedule, then you’ve written a catnap that is better implemented with the OS-provided task scheduler.