When faced with inexplicable program behavior, inexperienced programmers often blame the operating system, the runtime library, the compiler, or some other external force for the error. Even when they discover that the bug is in their code, these programmers often will try to blame it on something else. A sure sign of maturity in a programmer is how he approaches bugs. If, when faced with a bug, he concentrates his initial efforts on looking for the problem in his own code, you know that he’s learned.
I learned long ago. In a little more than 30 years of programming computers (about 25 years doing it full time), I can think of only a handful of cases in which a bug in one of my programs was caused by anything other than my own error. I know that there are bugs in operating systems and runtime libraries, but it’s not often that they cause problems for me.
So imagine my surprise when, within the space of one week, I’ve identified two previously unknown (or unreported, as far as I can tell) genuine bugs in the Microsoft .NET Framework runtime libraries. I wrote about the first one the other day. Microsoft has acknowledged that it is a bug and is currently reviewing it.
I originally thought that the new bug was in the Uri.TryCreate
method, because it throws an exception in some circumstances even though the documentation says that it won’t throw an exception but rather will return False
if it’s unable to create a valid Uri
from the input parameters. And although throwing an exception in this case is (maybe) a bug, the cause of the bug is something else: the Uri
constructor allows you to construct invalid Uri
instances.
In my particular case, my Web crawler crashed because Uri.TryCreate
threw an exception. That was very unexpected, and whatever parameters caused it are lost to me. But it’s pretty rare. I pass somewhere upwards of 250 million urls through that function every day. Unable to reconstruct the exact parameters that caused the problem, I used what I learned from poking around in the disassembled code to come up with a string that illustrates the problem.
The Uri
constructor creates a Uri
instance from a passed string. Uri
is pretty cool in that it supports many types of resources, not just HTTP. One such type is a mailto: URI of the form mailto:jim@mischel.com.
But the Uri
constructor will succeed when it should fail. For example:
Uri mailUri = new Uri("mailto:jim@mischel.comtest@mischel.com");
// trying to access mailUri.AbsoluteUri at this point will throw UriFormatException
// with the message "The host name cannot be parsed"
Passing that invalid Uri
as the baseUri
parameter to Uri.TryCreate
throws the exception that I encountered in the crawler.
What I find most curious about all this is that the Uri
class appears to have two different methods for parsing strings. There appears to be one parser for constructing Uri
instances from strings (as in the constructor), and a separate parser used internally for various things. I know from experience that parsing URIs is a horrendously difficult problem and hard to do correctly. Why anybody would want to write, test, and maintain two different versions of such difficult code is beyond me.
Bug reported. Awaiting response.