Bad Boolean interfaces

Today I got a bug report saying that an error occurred while trying to save a record. That there was an error in this code I inherited isn’t unusual. It was odd, though, to encounter an error in saving a record. That particular bit of code executes hundreds of times per day and I’d never seen a bug report. A few minutes looking at the logs led me to this snippet:

    var outlets = outletRepository.GetOutlets();
    var outletName = outlets.First(outlet => outlet.Id == myRecord.OutletId).Name;

For those of you who don’t speak C#, all that’s doing is getting the list of Outlet records from the database and locating the one that’s referenced by myRecord. We do it this way rather than just looking up the single outlet because we use the returned list of outlets later in the method.

At first I couldn’t figure out why that failed. After all, our database has referential integrity, so myRecord can’t possibly (as far as I know) reference an outlet that doesn’t exist.

It turns out that GetOutlets is defined as:

    public List<Outlet> GetOutlets(bool activeOnly = true)

So a call to GetOutlets() by default returns just the active outlets. Because the outlet referenced by this particular record was marked as inactive, that outlet was not returned by GetOutlets, and as a result the call to Enumerable.First threw an exception.

The most glaring error here is the use of First, which I contend should be used only if you know that what you’re looking for is somewhere in the list. Otherwise you probably should call FirstOrDefault and then check for a null return value.

A more insidious error is the design of the GetOutlets method. In the first place, using a Boolean value to determine whether the method returns all outlets or just the active outlets is a bad idea. true or false are rather ambiguous here. You can’t tell by looking at the value passed what attribute is being tested, or if the flag is used for something else entirely. Maybe true means “sort the records.” Programmers are somehow supposed to “know” what it means. In any case, if the parameter is required the programmer knows that he has to pass true or false in order for it to work. And one would assume that an unfamiliar programmer would check the parameter name and pass the appropriate value. But when you give the parameter a default value, you’ve effectively removed any roadblock to using the method incorrectly. Certainly the statement outlets = outletRepository.GetOutlets(); doesn’t convey any information about the parameter–or that it even exists.

Don’t use Boolean values like this. If you want two forms of the GetOutlets method, then write them.

    public List<outlet> GetAllOutlets();
    public List<outlet> GetOnlyActiveOutlets();

Another option, less preferred but possibly acceptable, would be to define an enumeration that explicitly states what you’re looking for:

    enum OutletFilterType

    public List <outlet> GetOutlets(OutletFilterType filter);

And to call it:

    outlets = GetOutlets(OutletFilterType.All);
    outlets = GetOutlets(OutletFilterType.OnlyActive);

But do not give that parameter a default value.

If you really must take shortcuts in your private implementation–a practice I strongly discourage, by the way, because private code has a distressing tendency to become public–then fine. Make a mess in your own house if you wish, but keep your public interfaces clean and unambiguous.

Every time I think I’m getting a handle on this code, I uncover another little code bomb. The code is littered with this GetXXX(true/false) pattern, and the use of First to get things from untrusted lists. I suppose you could call it job security, but I’m getting tired of it.

Delete the UPSERT

Many programmers who write code to work with databases like the concept of an “upsert” command: passing a record to the database and letting it figure out whether the record should be added or updated. If the record already exists in the database then it’s updated. Otherwise a new record is created. It’s very convenient.

It’s also a terrible idea. At least, that’s my experience. Many years ago I argued strongly for an “upsert” command, and on one project I created my data access layer with such a thing. We weren’t using stored procedures on that project, so the “check and update if there” logic was written in code. It worked. It also caused no end of problems, and I swore that I’d never use the upchuck (err … upsert) model again. And I didn’t, until I inherited this project.

The project I’m working on uses the upsert model throughout. The undocumented but well-understood convention is that none of our database tables have a record id of 0. So if I pass a record with id 0 to an upsert stored procedure, the record won’t be found in the table and a new record is inserted. This is convenient for the guy who writes the database access code because he only has to write one method to handle both insert and update. When you’re writing CRUD code for tables that have dozens of fields, that seems like a real savings: you write a whole lot less code.

But it comes at a cost that many don’t recognize: clients who call the upsert method have to know that record id 0 has a special meaning. That’s fine for the original development team, but it’s not at all obvious to a programmer who comes on to the project some time after the program is in production.

It gets worse. In our project we have a Categories table that contains category names and other information. Each Reward that we create belongs to exactly one category, the identifier of which is stored in the Rewards record. There are UI screens that allow system administrators to add new categories. The project is pretty mature now, so we don’t often add new categories.

Unfortunately, whoever wrote those UI screens wasn’t too hip on things. To create the drop-down list of categories, one would typically get all of the categories from the database and create a combo box (drop down) that lets the user select a category from the list. Sometimes the dropdown will have an initial value something like “Select a category,” with a default value. Usually, that label is added in code after getting the list of categories from the database. But not in this case. No, the genius who designed these screens decided that he’d create a category with an id of 0, and text “Select a category.” And he decided that his “no record” id would be -1.

So for Categories and a few other configuration tables, you pass -1 for the record id if you want to insert a new record. Passing 0 just overwrites the existing “Select a category” record.

So now the rule is “Passing a record id of 0 inserts a new record. Except for tables Foo, Bar, and Idiot, for which you must pass a record id of -1.” (Actually, any negative number will do, but I digress.)

Yeah. That’s like totally obvious. Not!

So when we did some refactoring in that area, one of my guys, relatively new to the project, made the reasonable assumption that 0 meant “new record” for this table just like it does for every other table. Imagine our surprise when QA told us that adding a new category just overwrote the record with ID 0.

I’ll grant that the programmer who introduced this bug is at fault for not noticing the difference. I put more blame the programmer who thought it’d be a good idea to special case these few tables for making the mistake not only possible but almost guaranteed to happen. But ultimately the blame goes to whoever decided that the Upsert model was a good idea. If we had separate Insert and Update methods and stored procedures, then there would be no question what was happening. With a separate Insert method, you’d never pass a record id, which would make it impossible to inadvertently do an Update when you meant to do an Insert.

One argument programmers make for the use of Upsert is that, with it, the create new record and edit existing record functions can be identical: initialize the record (either by reading an existing one from the database, or creating a new one with default values), and call the Edit method. When editing is done, just call Upsert and things “just work.” I’ve written code like that. I’ve also been bitten by it because that common Edit method always ends up needing a flag to indicate whether you’re creating new or editing existing, and the method is peppered with if (IsEdit) conditionals. In the end, the “common” Edit method is hugely complex and very fragile. It’s been my experience that the temptation to combine create and edit functionality is very strong, but giving in to it is the first step on a long and painful road.

Upsert is evil. It might be useful on the database end, but it has no use in a data access layer or anywhere else a mere mortal programmer  might be. The programmer should know whether his code is editing an existing or creating a new record, and his code should call the Update or Insert method, as appropriate.




A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.