Things get so much more complicated when your development team grows larger than one person. Not only coordinating the efforts of two or more people, but also agreeing on data structures and naming conventions, designing interfaces so that you insulate developers as much as possible from others’ work, and a whole host of other problems. A source code control system relieves a large part of the worry, although you should be using source code control as soon as your development team grows larger than zero people. I won’t belabor that point here, as it’s the subject for a rather long post in itself.
One of the many things programmers like to argue about is code formatting. This topic is right up there with the discussion of the best language and best text editor. And as is the case with those topics in which each programmer has his own opinion that is unquestionably right, each programmer has his own code formatting style that is The One True Way. Anybody who does things differently is, at best, suspect.
But most development shops impose code formatting standards along with naming conventions. Why? In order to reduce confusion when somebody other than the code’s original author works on it. Without a code formatting standard, confusion arises in several ways.
The format of the code tells you things about it. For example, indentation implies a compound statement. In many languages (the C-derived languages), an opening brace begins a compound statement, which is normally indented. As creatures of habit, when we see indentation we expect to see an opening brace. And vice-versa. If the brace is not where we expect it, or the indentation is different than we’re accustomed to, then our brains have to adjust. This adjustment takes longer than you might expect. For example, consider these three blocks of code, each of which does the same thing, but they are formatted slightly differently:
if (someValue == 0) { DoSomething(); DoSomethingElse(); } if (someValue == 0) { DoSomething(); DoSomethingElse(); } if (someValue == 0) { DoSomething(); DoSomethingElse(); }
Each of those styles has its adherents and detractors, and objectively each is as valid as the other. But two of them make my brain hurt.
Most programmers these days use editors that perform some level of automatic formatting when you enter new code. If my editor is configured to create things in the One True Way, the result of editing a file that was created with one of the heretical formatting techniques is a complete mess.
Absent imposed coding standards, there are four possible solutions to the problem, none of which is satisfactory:
- Just ignore the problem. Believe me when I say that you don’t want to do this. Few things are as confusing as working on a single file that has inconsistent formatting.
- Disable the automatic formatting features of the editor. This is possible, but makes you much less productive. The automatic formatting features save a lot of time, especially when you’re refactoring code, as it prevents you from having to manually indent or unindent code to make sure that the braces match up.
- Reconfigure your editor to match the coding style of whichever file you’re currently working on. This is problematic because programmers often (probably most often) are working on multiple files at the same time. No editor that I know of will allow you to specify the configuration on a per-file basis. (On a file type basis, yes. But not per file.)
- Instruct the editor to reformat code to your style whenever you open a file. Although this sounds like a good idea to begin with, it’s disastrous when source control is involved which, as I pointed out above, should be for every project. Source control typically saves the deltas–changes between two versions of a file–rather than saving the entire file every time it’s modified. This is a huge space savings, and also allows you to view the history to see the specific changes between two versions. If you reformat the entire file, then the source control system will assume that (almost) every line changed. The result is that your source control database will be orders of magnitude larger than it has to be, and the significant changes between versions will be lost in the reformatting noise.
There is one other option that is not currently available, but should be. If the source control system (or a filter in front of the source control system) could normalize a file–convert it to a standard form–then the fourth option above would be possible. You could get a file from source control in whatever the normalized form is, open it in your editor and reformat to your heart’s content. When you saved the file and checked it in, the source control system would reformat it to the common form and store only the deltas. This allows each programmer to view and edit code in whatever format he is most comfortable with, but also allows the source control system to be efficient and effective.
I realize that I’m waving my hand over some implementation detail, particularly the many filters that would be required to format different kinds of files. But nothing here sounds terribly difficult with today’s tools. Has this been done? If not, why?
Update, 2012/04/18
I gave this idea quite a bit of thought after writing the blog entry, and finally decided that it’s not a good idea. There are some interesting issues involved in creating the tools that would be needed, but I don’t think those are insurmountable. The bigger problem is one of debugging. Imagine this scenario.
- You get a bug report from a tester. Part of that bug report says that the program crashed on line 263.
- You get the source code that the test version was built from.
- Your development environment re-formats the code to your desired style.
- Line 263 in your source file is not the same as line 263 in the source repository.
Sure, you could use an external tool to view the code in its canonical format, thereby ensuring that line 263 is where it’s supposed to be. But why? And what are you going to do when there are many different files and line numbers involved?
I came to the conclusion that all of the code in a project should use one and only one indentation style. If the developers can’t agree on that, then it should be imposed from above. The truth is that it takes about two days to become accustomed to a new indentation standard. Save your creativity for things that really matter.