You must have this whether or not you need it

We converted all of our projects to Visual Studio 2010 a couple of weeks ago, as a first step towards upgrading the production machines to .NET 4.0.  I was very pleased at how painless the conversion from Visual Studio 2008 went.  There were a few glitches, but in just a day we had all 100+ projects converted and nothing went wrong.

Since then, I’ve been testing the code with .NET 4.0, making sure that programs still talk to each other, that we can still read our custom data format, and that the programs actually work as expected.  Call me paranoid, but when the business depends on the software, I try to make absolutely sure that everything’s working before I throw the switch on a major platform change.

All the tests have been successful.  I’ve not found anything that breaks under .NET 4.0.  I hadn’t expected anything, but I had to test anyway.  So this morning my job was to merge my project changes into our source repository and ensure that the automated build works.

The merge was no problem.  The automated build failed.  It failed because I didn’t check in some files.  Which files?  I’m so glad you asked.

A convention in .NET is that you can store application-specific configuration information in what’s called an application configuration file that you ship along with the executable.  The  file, if it exists, is usually named <application name>.config, so if your executable program is hello.exe, then the configuration would be called hello.exe.config.  As a convention, it works quite well.

The major point here is that the application configuration file is optional.  If you don’t want to store any information there, then you don’t need the file.

In Visual Studio, the tool lets you create a file called App.config to hold the configuration information.  When you build the program, Visual Studio copies App.config to the build output directory and renames it.  Continuing the above example, App.config would become hello.exe.config.

So far so good.  All that still works as expected in Visual Studio 2010.  Until, that is, you change the program’s target framework in the build options from .NET 3.5 to .NET 4.0.  Then, Visual Studio becomes helpful.  It automatically creates an App.config file for you, places it in your source directory, and updates your project settings to reflect the new file.  So helpful.

And so broken!

It’s broken because when I subsequently check in the modified project (which I thought I just changed to target .NET 4.0), the project also contains a new and unknown to me reference to the App.config file, which I did not check in.  So when my automated build script checks out the code and tries to build the solution, every executable project failed with a missing file.

That Visual Studio added a file and didn’t tell me is merely annoying.  What really chaps my hide, though, is that the file it added is extraneous.  It contains no required information.  Here is the entire contents of the file that Visual Studio so helpfully added:

<?xml version="1.0"?>
<configuration>
  <startup>
    <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0"/>
  </startup>
</configuration>

My options at this point are: 1) Go through the compiler errors and check in every App.config file that was automatically created; 2) Edit the projects to remove the file references.

I’d like to do #2, but if I do I’ll just run into this same problem again the next time I upgrade Visual Studio.

The final result is that it took me an extra hour to get my build to work, and now every executable program that I deploy has an extraneous file tagging along as useless baggage.  It’s extra work to eliminate those files in the deployment process, and in doing so I might eliminate one that I really need.  In the past, only programs that really needed the configuration files had them, and I knew that the existence of the file meant that there were settings I could modify.  Now that every program has a configuration file, I’ll be forever checking to see if there are settings that change the application’s behavior.

So, to sum up.  Visual Studio adds an unnecessary file, modifies the project, doesn’t notify me of the added file, fails to build if the file isn’t checked in, forces me to deploy that unnecessary file or somehow modify my build process to delete it, and the resulting file will lead to uncertainty on the part of the user.  Wow.  Who could have thought that a single simple feature would cause so much unnecessary work and annoyance?

Bikes, shoes, and armadillos

Armadillos are an occasional feature on the local roadways.  Dead armadillos, unfortunately.  In 15 years of living in the Austin area I’ve seen three live armadillos.  The rest have been dead–splattered on the highway like a ‘possum on a half shell.  But I’ve seen more than usual on my bike rides the last few weeks.  I’m thinking that the rain we got from Hermine (16 inches at our place) drove the creatures out of their normal hiding places and they’ve been on the roads in unusual numbers.

But I can’t explain the number of shoes I’ve seen on the road lately.  I’ve occasionally seen a shoe on the side of the road in the past, but last week I saw at least one shoe on every bike ride.  I’m not talking ratty old beat up shoes, either, but relatively new and expensive looking running shoes.  The majority were on the 1.25 mile stretch between the office and the data center.  I ride that nearly every day to retrieve the backup drive.  Most days, I saw a different shoe lying in the road.  My only explanation is that the road is used by high school students, and somebody (or several somebodies) thought it’d be funny to throw his friend’s shoe out the window.

The other road on which I encountered shoes is a major surface street, and I suppose also used by high school students on their way to and from school.  Whatever the case, if you figure $100 for a good pair of shoes, last week I saw $400 or $500 worth of shoes on the ground.  Always just one shoe at a time, though, and I don’t think any two together would have made a matched pair.

Yeah, one gets an entirely new perspective of the road when riding on the shoulder at 15 or 20 MPH.

I picked up an old rusty metal file from the shoulder of the road the other day.  I don’t particularly need a rusty file, but I’ve been told that they contain very good steel.  A number of wood carvers I know make custom knives from discarded files.  I thought I’d give it a try.

I’ve signed up to ride the Outlaw Trail 100 on October 9.  Debra and I did this ride in 2005–her first century.  I’ve continued my training since the Hotter ‘N Hell ride, and expect to do a bit better here in Round Rock.

Unwise conventional wisdom, Part 1: Locks are slow

In two different discussions recently I had somebody tell me, “locks are slow.” One person’s comment was “Locks should be avoided whenever possible. They’re slow.” This is a bit of conventional wisdom that’s been around for decades and seems to be getting more prevalent now that more programmers are finding themselves working with multithreaded programs.

And, like all too much conventional wisdom, it’s just flat wrong.

A lock is a cooperative synchronization primitive that is used in computer programs to provide mutually exclusive access to a shared resource. Yeah, that’s a mouthful. Let me put it into simpler terms.

When I was in Boy Scouts, we’d sit around the campfire at night and tell stories. The person who held the “speaking stick” was the only one allowed to talk. My holding the stick didn’t actually prevent anybody else from talking. Rather, we all agreed on the convention: he who holds the stick talks. Everybody else listens and waits his turn. The stick was a cooperative mutual exclusion device.

Programming locks work the same way. All threads of execution agree that they will abide by the rules and wait their turn to get the stick before accessing whatever resource is being protected. This is very important because many things in the computer do not react well if two different processes try to access them at the same time. Let me give you another real-world example.

Imagine that you and a co-worker both need to update an Excel spreadsheet that’s stored in a shared location. You take a copy of the file and start making your changes. Your co-worker does the same thing. You complete your changes and copy the new file over to the shared location. Ten minutes later, your co-worker copies his changed version of the document, overwriting the changes that you just made. The same kind of thing can happen inside a computer program, but it happens billions of times faster.

And so we use locks and other synchronization primitives to prevent such things from happening. Locks are common because they’re very simple to understand, easy to use, and quite effective. There are potential problems with locks, but any tool can be misused.

So let’s get back to the conventional wisdom. Are locks really slow? Rather than guess, let’s see how long it takes to acquire a lock. The first bit of code executes a loop, incrementing a variable one million times. The second code snippet does the same thing, but acquires a lock each time before incrementing the variable and then releases the lock afterwards. The code samples are in C#.

int trash = 0;
for (int i = 0; i < 1000000; ++i)
{
    ++trash;
}

// Using a lock
int trash = 0;
object lockobj = new object();
for (int i = 0; i < 1000000; ++i)
{
    lock (lockobj)
    {
        ++trash;
    }
}

Timing those two bits of code reveals that the second takes almost one-tenth of a second longer than the first. So it takes approximately 0.10 seconds to acquire and release a lock one million times. That’s about 10 million locks per second, or 0.10 microseconds (100 nanoseconds) per lock. (It’s actually closer to 80 nanoseconds, but 100 is a nice round number.) I know, I can hear the performance-sensitive screaming already, “Oh my bleeding eyeballs! 100 nanoseconds! That’s like 400 clock cycles! That’s an eternity to my super fast machine!

And he’s right. To a computer running at 4 GHz, 400 nanoseconds is a pretty long time to spend doing nothing. But locks don’t execute in isolation. They’re there to protect a resource, and accessing or updating that resource takes time–usually a whole lot longer than it takes to acquire the lock. For example, let’s say we have this method that takes, on average, about one microsecond (that’s 1,000 nanoseconds) to execute when there is no lock.

void UpdateMyDataStructure()
{
    // do cool stuff here that takes a microsecond
}

Then we add a lock so only one thread at a time can be doing that cool thing.

static object lockobj = new object();
void UpdateMyDataStructure()
{
    lock (lockobj)
    {
        // do cool stuff here that takes a microsecond
    }
}

It still takes 100 nanoseconds to acquire the lock when it’s not contended (that is, when no other thread already has the lock), but doing so only adds 10 percent to the total runtime of the method. I know, I know, more bleeding eyeballs. Programmers lose sleep over 10 percent losses in performance. Dedicated optimizers will go to great lengths to save even five percent, and here I’m talking about 10 percent like it’s nothing. Let’s talk about that a bit because there are several issues to consider.

There’s no doubt that the lock is adding 10 percent to the method’s total runtime. But does it really matter? If your program calls that method once per second, then in a million seconds (about 12 days) acquiring the lock will have cost an extra second in run time. The 10 percent performance penalty in that one method is irrelevant compared to the total runtime of the program.

We also can’t forget that there are multiple threads calling that same method. Sometimes one thread will already be holding the lock when another thread tries to acquire it. In that case, the thread coming in will have to wait up to one microsecond before it can acquire the lock, meaning that executing the method can take a whopping 2,100 nanoseconds! (1,000 nanoseconds for the first thread to complete, 100 nanoseconds to clear the lock, and another 1,000 nanoseconds to do its own cool stuff.) By now my friend’s eyeballs have exploded.

And things only get worse as you add more threads and call the method more often. But again, does it matter? At an average of 1,100 nanoseconds per call, that method can be called more than 900,000 times per second. Without the lock, you can call that method a million times per second. It seems to me that if 90% of your time is spent on one method, you have a much bigger problem than the lock. Your entire program is dependent on the performance of this one method. That’s true whether or not you have multiple threads accessing it.

And don’t forget the most important point: the lock or something like it is required. You’re protecting a resource that you’ve determined will not support simultaneous access by multiple threads. If you remove the lock, the program breaks.

The conventional wisdom that locks are slow is based on two things. From the optimizer’s point of view, a lock is slow because it adds to the amount of time required to execute a particular bit of code, and doesn’t provide any immediately recognizable benefit. It’s just extra machine cycles. The optimizer doesn’t care that the time required for the lock is a miniscule portion of the total program runtime.

The other reason locks are considered slow is because an application can become “gated” by a lock. That is, all of the threads in the application spend an inordinate amount of time doing nothing while waiting to acquire a lock on a critical resource. Therefore, concludes the programmer who’s profiling the application, the lock must be slow. It doesn’t occur to him that the lock isn’t the problem. Any other means of synchronizing access to the critical resource would exhibit similar problems. The problem is designing the program to require mutually exclusive access to a shared resource in a performance-critical part of the code.

That may seem like a fine distinction to some, but there is an important difference. It’s one thing to complain if I were to install an 800 pound steel door on your linen closet because I felt like it, and something else entirely to complain if I did it because you told me to. If you design something that has to use a lock, then don’t get upset at the lock when it turns out to be inappropriate for the task at hand.

There are many alternatives to locks. There have been some important advances recently in lock-free concurrent data structures like linked lists, queues, stacks, and dictionaries. The concurrent versions aren’t as fast as their exclusive access counterparts, but they’re faster to access than if you were to protect the non-concurrent version with some sort of lock. The other primary alternative is to redesign the program to remove the requirement for exclusive access. How that’s done is highly application dependent, but often requires a combination of buffering and favoring infrequent long delays over frequent short delays.

To recap: locks are not slow. Used correctly, a lock provides safe, effective, and efficient mutually exclusive access to a shared resource. When it appears that a lock is slow, it’s because you have gated your appliclation on access to that shared resource. If that happens, the problem is not with the lock, but with the design of the application or of the shared resource.

Spam problem found, but solution questionable

A few months ago I noticed an marked increase in the amount of spam that I was receiving.  At the time it was a minor inconvenience and I just dealt with the problem the old fashioned way:  I deleted the offending messages.  But a week or two ago Debra started noticing a large increase.  And then we were gone over the weekend and when I came back I had to trash over 200 messages.  Time to do something about the problem.

I get my email through my ISP, who has SpamAssassin installed.  I checked my settings again, just to be sure I had it configured correctly, and then sent a message to my ISP’s support through their exceedingly user-unfriendly help desk software.  After a short exchange of messages I got their answer:  1) lower the spam threshold in my SpamAssassin configuration to 3; 2) train SpamAssassin.

Fine.  Except.

1) According to the SpamAssassin configuration information, a setting of 5.0 is “fairly aggressive”, and should be sufficient for one or just a handful of users.  The instructions caution that a larger installation would want to increase that threshold.  It doesn’t say what lower numbers would do, but since several of the obviously spam messages I’ve examined have numbers over 2.0, I hesitate to reduce the setting to 3.  Otherwise I’ll start getting false positives.

2) Their method of training SpamAssassin involves me installing a Perl script (written by a user who has no official connection to the ISP and that is not officially supported), forwarding good messages to a private email address (that I control), and having the Perl script examine those messages so that it can update the tables that SpamAssassin uses.

That’s ridiculous!

First, there’s no explanation why my spam count went from almost zero to 50 or more per day almost overnight.  Second, they expect me to have the knowledge, time, and inclination to install and run that script.  Oh, and if I want to make sure that Debra’s mail is filtered correctly, I should have her forward her good emails to that private email address, too.  “I promise I won’t look at them.”  I wouldn’t, and it’s unlikely that there’s anything she’d want to hide from me anyway, but I can imagine that others who share my predicament would have users who are reluctant to forward their emails.

Honestly, it’s among the most ridiculous things I’ve ever heard.  Why don’t they have a reasonable Web client that has a “mark as spam” button?  Why, after 10 years of dealing with spam, is there no informal standard for notifying your ISP that a message you received in your email client is spam?  Why should I even have to think about spam anymore?  Shouldn’t the ISP’s frontline filters catch the obvious garbage that’s been clogging my mailbox?

I think I need a new ISP.  Or at least a better way to get my mail:  something that will filter the spam for me after downloading from my ISP.  But it has to be Web based.  I like using a Web mail client, because I regularly check my email from multiple locations.  Any suggestions on Web-based email services that can do this for me?

Categories

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.