Serialize enums as numbers rather than as strings

Imagine that your application works with different document types. You have a server that stores documents in a database, and several applications that query the database, and briefly cache query results in shared memory.

One of the fields that’s returned from a search is the document type, which you’ve encoded in a C# enumeration:

    public enum DocumentType
    {
        WordSmasher = 1,
        NumberCruncher = 2,
        MessageMangler = 3
    }

And in your document record:

    public class DocumentSummary
    {
        public string Name { get; set; }
        public DocumentType DocType { get; set; }
        // other stuff
    }

Now, say you serialize that to JSON, converting the enum values to strings. You end up with:

    {
        "Name":"MyDocument.msh",
        "DocType":"WordSmasher"
    }

All of the applications include a shared library that defines common data structures and such, like DocumentSummary and DocumentType.

This all works great until you add support for a new type of document: the company’s new PowerPoint competitor, called SleepyAudience. You update your enumeration:

    public enum DocumentType
    {
        WordSmasher = 1,
        NumberCruncher = 2,
        MessageMangler = 3,
        SleepyAudience = 4
    }

Then you compile and test your server code, build one of the other applications to test things with, and release it to the world. Only to find that the other applications, which haven’t yet been rebuilt, die a horrible death when they see the type “SleepyAudience”.

The application crashes because the deserialization code doesn’t know what to do when it finds the string “SleepyAudience” in the DocType. As far as that code is concerned, the DocType field can be an integer value, or it can have one of the first three strings. “SleepyAudience” is not the name of an enum value, as far as the deserializer is concerned.

The easiest solution to this problem is to encode the enumeration value as a number, rather than as a string. Had the DocType been encoded as a number, the deserializer would have read and populated a DocumentSummary instance. It’s then up to the application code to decide what to do with a document type that it is not familiar with. Perhaps the application would filter that document from the list presented to the user. Or it would display “Unknown document type.” Whatever the case, it certainly shouldn’t crash.

You lose a little bit of readability by encoding the enum value as a number, but only if you’re in the practice of eyeballing JSON data. And, yes, it’s often easier for people who are writing code that interacts with your API to understand what “WordSmasher” means. But we programmers are used to associating numbers with names. It’s no particular burden for a programmer to look up the value “1” in the API documentation to determine that it identifies the WordSmasher application.

One solution to this problem, of course, is to make sure that all applications that depend on the enumeration are rebuilt and released whenever the enumeration changes. That would solve the problem, but it’s not practical when you have dozens of different applications online all the time. You can’t just stop the world for a few hours (or even a few minutes) to apply updates.

You could also argue that the applications themselves are at fault, and I would agree with that assessment. After all, an application shouldn’t crash horribly because it got some bad data. The application should ignore that one data item (in this case, a single document record from a list of documents), log an error message, and continue. But standard serialization libraries aren’t that robust. They either understand the entirety of what’s sent, or they roll over and die.

The Robustness principle says “Be conservative in what you send, be liberal in what you accept.” When applied to deserialization it means that if you don’t understand something, deal with it gracefully; do the best you can. Fail gracefully. Interpret the value or ignore it and supply a default. If you can’t do that, then discard the record. If you can’t discard the record, discard the input. At worst return an empty data set. But under no circumstances should the program go down in flames because it didn’t understand some input data.

Regardless of who’s at fault here, the point is that encoding enumeration values as strings has very little, if any, real benefit, and makes it harder to write robust deserialization code. Especially when working with deserialization libraries (as opposed to rolling your own deserialization code), you should do whatever you reasonably can to ensure that the deserializer can interpret your data. Let the rest of your application decide how to proceed, after the data has been read.

In my opinion, the application I describe above has a more fundamental flaw in that the document types it supports are hard coded in an enumeration. But that’s a discussion for another day.