Many programmers who write code to work with databases like the concept of an “upsert” command: passing a record to the database and letting it figure out whether the record should be added or updated. If the record already exists in the database then it’s updated. Otherwise a new record is created. It’s very convenient.
It’s also a terrible idea. At least, that’s my experience. Many years ago I argued strongly for an “upsert” command, and on one project I created my data access layer with such a thing. We weren’t using stored procedures on that project, so the “check and update if there” logic was written in code. It worked. It also caused no end of problems, and I swore that I’d never use the upchuck (err … upsert) model again. And I didn’t, until I inherited this project.
The project I’m working on uses the upsert model throughout. The undocumented but well-understood convention is that none of our database tables have a record id of 0. So if I pass a record with id 0 to an upsert stored procedure, the record won’t be found in the table and a new record is inserted. This is convenient for the guy who writes the database access code because he only has to write one method to handle both insert and update. When you’re writing CRUD code for tables that have dozens of fields, that seems like a real savings: you write a whole lot less code.
But it comes at a cost that many don’t recognize: clients who call the upsert method have to know that record id 0 has a special meaning. That’s fine for the original development team, but it’s not at all obvious to a programmer who comes on to the project some time after the program is in production.
It gets worse. In our project we have a Categories table that contains category names and other information. Each Reward that we create belongs to exactly one category, the identifier of which is stored in the Rewards record. There are UI screens that allow system administrators to add new categories. The project is pretty mature now, so we don’t often add new categories.
Unfortunately, whoever wrote those UI screens wasn’t too hip to things. To create the drop-down list of categories, one would typically get all of the categories from the database and create a combo box (drop down) that lets the user select a category from the list. Sometimes the dropdown will have an initial value something like “Select a category,” with a default value. Usually, that label is added in code after getting the list of categories from the database. But not in this case. No, the genius who designed these screens decided that he’d create a category with an id of 0, and text “Select a category.” And he decided that his “no record” id would be -1.
So for Categories and a few other configuration tables, you pass -1 for the record id if you want to insert a new record. Passing 0 just overwrites the existing “Select a category” record.
So now the rule is “Passing a record id of 0 inserts a new record. Except for tables Foo, Bar, and Idiot, for which you must pass a record id of -1.” (Actually, any negative number will do, but I digress.)
Yeah. That’s like totally obvious. Not!
So when we did some refactoring in that area, one of my guys, relatively new to the project, made the reasonable assumption that 0 meant “new record” for this table just like it does for every other table. Imagine our surprise when QA told us that adding a new category just overwrote the record with ID 0.
I’ll grant that the programmer who introduced this bug is at fault for not noticing the difference. I put more blame the programmer who thought it’d be a good idea to special case these few tables for making the mistake not only possible but almost guaranteed to happen. But ultimately the blame goes to whoever decided that the Upsert model was a good idea. If we had separate Insert and Update methods and stored procedures, then there would be no question what was happening. With a separate Insert method, you’d never pass a record id, which would make it impossible to inadvertently do an Update when you meant to do an Insert.
One argument programmers make for the use of Upsert is that, with it, the create new record and edit existing record functions can be identical: initialize the record (either by reading an existing one from the database, or creating a new one with default values), and call the Edit method. When editing is done, just call Upsert and things “just work.” I’ve written code like that. I’ve also been bitten by it because that common Edit method always ends up needing a flag to indicate whether you’re creating new or editing existing, and the method is peppered with if (IsEdit) conditionals. In the end, the “common” Edit method is hugely complex and very fragile. It’s been my experience that the temptation to combine create and edit functionality is very strong, but giving in to it is the first step on a long and painful road.
Upsert is evil. It might be useful on the database end, but it has no use in a data access layer or anywhere else a mere mortal programmer might be. The programmer should know whether his code is editing an existing record or creating a new record, and his code should call the Update or Insert method, as appropriate.