Wishing for an address book standard

Another item in Jim’s list of things that should be easy but aren’t: converting address books. Come to think of it, you shouldn’t have to convert your address book at all, but rather use the existing one with a new program. But you can’t do that, and converting from one vendor’s proprietary format to another’s is difficult or impossible.

Why are software developers so dead set against agreeing on a standard address book format and interface that they all can use? Is writing address handling code really so exciting? Why the heck isn’t there a simple, standard, text-based (or XML, if you want to use the latest and greatest buzzword technology) address book format and a standard library to which all programs can interface?

It’s available on Windows, if you limit yourself to Outlook (or Outlook Express), Microsoft Office applications, and third party applications that use the incredibly complex MAPI and CDO interfaces. Forget being able to modify the darned thing with a text editor, though. And forget being able to reliably import addresses from other programs into the Microsoft Office applications.

Microsoft isn’t the only uncooperative vendor here, by the way. In fact, one could make the argument that Microsoft is the only vendor that is even partially cooperative. At least they define an interface (MAPI/CDO) that all Windows programs can use to interface with the address book. It’s difficult to use, but it’s possible. Not so with other vendors’ address books. In the last 18 months or so, I’ve used Microsoft Outlook and Outlook Express, PocoMail, KMail and Evolution on Linux, and Mozilla Thunderbird. Plus my PalmOS-based telephone. The only two that share data reliably are Outlook and the phone. All the rest use their own proprietary formats or interface for storing addresses. Most will import addresses from Outlook or Outlook Express, but won’t go the other way, nor will they import from other programs’ formats.  Writing code to convert between formats is difficult and time consuming.

You think maybe I’m an extreme case and that I wouldn’t have this problem if I’d just stick with one email program? That’s almost true if I’d stick with Microsoft applications (IE and Office). As long as Word or Excel will do whatever it is I want to do with addresses, no problem. But if I buy a third party program to perform some specific task, I’d better hope that the vendor included MAPI/CDO support for the address book. Considering how buggy and difficult to use MAPI/CDO has proven to be, I’m not terribly surprised that many vendors have opted not to include that support.

Please, somebody create a simple XML-based address book format along with a standard access library. Give it away. Let other vendors include it in their code. Evangelize it. Convince Microsoft and the Open Source crowd to embrace it. That one step would make life so much easier for all users, and would go a very long way on the road to convincing users that computers don’t have to be complex and frustrating.

Spam Filtering / Mozilla Thunderbird

The system administrator at work recently installed or subscribed to (I’m not entirely sure how it works) spam filtering software supplied by VeriSign.  Now, instead of a hundred or more junk email messages clogging my mailbox every day, suspect messages are quarantined and I’m given the opportunity to use a Web browser to sift through the messages.  I’m happy to see my Inbox shrink, and it’s good to know that internal messages (messages that originate from inside the company firewall) won’t be marked as spam, but I’m disappointed that I have to go to yet another Web site and use yet another crappy browser interface to sift through the suspect messages.  You’d think that VeriSign could afford a better interface; maybe even borrow a few ideas from POPFile.

Speaking of POPFile, I’ve stopped using it.  At least, I’m not using the standalone version.  I’ve installed Mozilla Thunderbird on my primary Windows machine and have moved my email back from the Linux box.  Thunderbird is a much nicer email client than Evolution, and it has a built-in adaptive spam filter that appears to work rather well.  Somebody said that this is a version of POPFile, something which I haven’t taken the time to verify.  Whatever the case, there’s a lot to like in Thunderbird.  I’ll give it a mini review here after I get a little more comfortable with it.

Weird email messages

Jeff Duntemann mentioned in his Web diary post for May 30 that we’d been puzzling over some strange email messages we’ve both been receiving.  These messages have none of the standard header fields beyond the tracing information (i.e. the “Received:” lines):  no subject, no to field, no from field, and no message body.  I’ve been seeing these on and off for a few months now, but they seem to be getting more prevalent.  For a while I thought it was malfunctioning SMTP or POP servers because I figured the specifications wouldn’t allow such a message to be passed.  I was wrong.  SMTP servers are quite happy to pass on empty messages.  As it turns out, SMTP servers don’t need any of the message header information in order to accept and deliver a message.  Everything they need to get a simple message across is supplied by the MAIL FROM and RCPT TO commands to the server.  I was able to send myself one of these blank messages, although I won’t say how it’s done.  Read RFC2821 and figure it out for yourself if you’re so inclined.

One other thing to note is that the original messages could have contained some text in the message body, but the resulting message is badly formed (not conforming to RFC2822).  Mail clients barf when trying to parse the message.  I connected to my POP server with telnet to examine one of my test messages and found that the body text is placed in the wrong position; immediately after the headers without the intervening blank line that’s required by the specification.  I have to wonder if this is a feature of the spec or a bug in some server implementations.

The more interesting question is the source of these messages.  Who would send badly formed email messages?  I’d suspect a denial of service attack, except that a dozen or so messages per week hardly represents an “attack” in my book.  My best guess is a malfunctioning home-grown mail program, most likely a spam utility, but without more information it’s hard to say.  I do find it odd, though, that a spam utility would relay through the spammer’s SMTP server rather than connecting directly to the target SMTP server.  I can show that this happens by examining the tracing information in the message.

The grab bag

A grab bag full of stuff on the email front:

  • The U.S. House of Representatives yesterday afternoon agreed on the Senate’s changes to the Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (CAN-SPAM).  The President has said that he will sign it into law.  Full text of the bill is available in PDF form here.  The too-cutesy title itself should give away the bill’s purpose:  a feel-good measure to tell people that Congress is “doing something about the problem.”  The bill instructs the Federal Trade Commission to create regulations to control spam, and gives them considerable leeway in doing so.  I don’t see this law making any significant dent in the load of spam I filter every day.  The Coalition Against Unsolicited Commercial Email (CAUCE) is unhappy because the bill in effect gives spammers license to hit each mailbox once with impunity.
  • spamhole creates a fake “open relay”—the kind of server that spammers just love to connect their mass email programs to.  spamhole servers don’t forward messages, but rather just swallow them.  The idea is simple:  “By creating as many false ‘open relays’ on the Internet as possible, we hope to make the detection and use of a real open relay as much of a chore as we can.”  They configure the server to allow a certain number of messages to go through unmolested just to trick the spammer into using the relay.  After the threshold is reached, messages go into the bit bucket.  It’s kind of a cool idea, but nothing that spammers couldn’t get around with an afternoon’s coding.  Just make every 100 or so messages a test message and stop when a message doesn’t go through.  spamhole might slow the spammers down a bit, but I can’t see it making any more of a dent than the CAN-SPAM act.
  • Take a look at Remail from the Collaborative User Experience (CUE) team at IBM Research.  They’ve spent 10 years studying how email is used, identifying ways to improve email clients, and developing a prototype to try out their ideas.  Are they new ideas or just refinements of old ideas?  Makes no difference as far as I’m concerned, as long as they can make the absurd amount of time I spend in my Inbox a little less tedious.
  • Jeff Duntemann reports in his December 6 web diary entry that the cause of his email problems looks to be an overflow bug in PocoMail.  Things started going wiggy when his mailbase accumulated between 32,000 and 33,000 messages.  You programmers out there probably remember the magic number 32,767:  the upper limit of a 16-bit signed integer.  Apparently somebody on the Poco development team figured that 32,767 messages was more than enough for anybody to have in a single mailbase.  I think that was a pretty silly assumption.  I know that I’d have at least that many if I had converted my Outlook files when I converted to Poco.  Seeing this error makes me a little nervous about what other surprises might be lurking nearby.
  • Spammers are becoming more technically adept.  Rather than searching for open relays and putting up with fakes like spamhole, they’re learning to compromise legitimate servers or turn unwitting client computers into stealth spam servers.  slashdot just posted this story about recent incidents.  Pretty frightening stuff.

Yahoo’s email authentication plans

On my morning scan of Techdirt I picked up this story about Yahoo working on an email authentication plan that would let senders prove they are who they say they are.  This only two and a half years after I suggested it here (May 15, 2001).

The beauty of Yahoo’s plan is that it will continue to work with existing message traffic, allowing even unauthenticated email to pass.  That might seem folly at first glance.  The article is short on detail, but I suspect that Yahoo will have a way to flag a message as authenticated or not, thereby giving email clients a method of filtering unauthenticated messages.  Yahoo will make the source of their “Domain Keys” software available to open-source email software and systems, which means that a large percentage of clients will have the ability to create and filter these messages.  I wonder if they’ll also make it available to developers of proprietary systems.  I certainly hope so.  Otherwise we’ll end up with competing standards that will make the problem even worse.

This is exactly what we’ve needed: a major player in the email space to take the lead and implement something.  If it works out well for Yahoo, then the other major email providers will have ample incentive to follow suit.  Some will argue that reverse DNS lookup is already available, and since very few servers use it today there’s no reason to expect that they’ll use this new system that is essentially the same thing: a DNS “private key” lookup.  The article doesn’t provide any detail, but I would suspect that Yahoo’s people looked into SMTP authentication and found it lacking.  The system the article describes sounds stronger than what’s already available.  I sure hope it works out.

POPFile spam filter

I honestly thought somebody would have fixed the email spam problem by now.  I resisted installing a filter for years, first because my spam problem wasn’t all that bad, then because filters just shift the problem, and finally out of sheer pig-headedness:  I hate sub-optimal solutions.  I still have to review every message sender and subject line.  I’d still like a real solution, but the amount of spam I get is nearly unbearable.  On Jeff Duntemann‘s recommendation, I downloaded and installed POPFile—a trainable Bayesian filter.  It’s been 2 weeks now since I first installed the thing, and after about 1500 messages (oddly, my spam count has gone down markedly since mid-October) I’m still teaching it the difference between spam and good mail.  I’ve added “magnets” that automatically classify important personal and work-related message, but I’ve resisted adding magnets for everybody in my contact list in the hopes of training the silly thing to tell the difference between jokes from friends and ads for questionable drugs.

Beyond automatically throwing almost everything in the “Junk” box, the filter isn’t yet saving me much time.  Just as I feared, I still have to review the filter’s output to ensure that it hasn’t mis-classified an important message as spam.  I’m hoping that it gets smarter as it gets more experience, but at the moment it’s just as much work with the filter as without.  I’m going to give it until the end of the year.  If it’s still missing 5% or more after that, then I’ll have to re-evaluate the wisdom of using this type of filter.

I’m very disappointed that nothing has yet been done at the protocol level to address the spam problem.  Maybe that’s still coming?  I won’t hold my breath.  From where I sit, it looks like another 5 to 10 years (if ever!) before the protocols can be changed to prevent  most types of spam.

Sendmail author on spam

Eric Allman, creator of Sendmail, has weighed in on the spam problem with a well written article about the current state of spam, possible methods of preventing it, and problems inherent with those techniques.  He doesn’t paint a pretty picture.  With the spam doubling rate at something like 8 weeks, a spam filter that lets through only 1.5 percent of spam will, in one year, be letting through as much spam as it blocks today.  So if your filter is blocking 197 out of 200 messages today, in a year it will be letting 197 messages through!  It’s a big problem.  As Allman says:  It’s an arms race and nobody wins but the arms dealers.

Allman is the latest in the growing number of informed industry leaders who agrees that protocol changes are required.  They’re not going to happen quickly, though.  He mentions a time frame of 10 years!  It looks like filtering is the only possible short-term solution.  Gah!

Spam slam

I checked my email after dinner last night.  A few hours later I checked it again and had 70 new messages.  PocoMail flagged every one of them as junk mail because all but one of them was a variant of the message claiming to contain the latest security patch from Microsoft.  The oddball was a pitch to sell me $100,000 homes for $10,000.  As of this evening—24 hours after I first got one of those messages—I had received over 250 of those emails.  That’s in addition to my apparently meager spam load of about 80 messages per day.  Oddly enough, Debra hasn’t received even one of these worm emails.

This has gotten totally out of hand.  I’m surprised that, after the last 3 or 4 years of worms and viruses, there are still so many people who will unquestioningly run a program that they get via email from an unknown person.  Are these the same people who will give their credit card numbers over the phone?

This kind of attack would be much less apt to succeed if the email protocols required end-to-end authentication.  If there was no way to get an anonymous message into the system, then it would be very difficult for somebody to start this without getting caught.  In addition, people opening their messages could easily check to see if the message really was from Microsoft Support before running the attachment.  Laws that prescribes penalties for perpetrators are wholly ineffective at stopping such attacks, because it’s very difficult or impossible to prove who sent the message.

Irresponsible spam black hole

I sent email to one of my clients today and promptly got a message back from their email server telling me that my message was rejected because our IP address is being blocked, and to check the relays.osirusoft.com website (purposely not linked) for details.

It turns out that Joe Jared, operator of relays.osirusoft.com and a very aggressive (to put it mildly) anti-spammer, decided yesterday to discontinue his service.  Rather than do something reasonable like just shut down the server or remove all entries from his blocking list, he decided to mark every IP address as a spammer domain so that people would get the message that he’s no longer offering the service.  This affected a huge number of mail servers because over-zealous systems administrators had been relying on that list as their primary or only weapon against spam.  Never mind that there are some well documented cases of the operators of this and similar lists behaving quite irresponsibly.

Yet another argument against using filtering to fight the spam problem.  If you think a client-side filter is subject to false positives, imagine what happens when a legitimate ISP is blocked by one of these filters because one person has a personal score to settle.

Rewrite the protocol

I’ve been saying for a couple of years now that the only way we’ll be able to get a handle on the spam problem is by making a fundamental change to the email protocols.  And for most of those two years, I’ve been inundated by messages from people saying that it’s not possible, that it’d cause too much upheaval, that any system could be broken anyway, and that I must be some kind of communist because I don’t believe that people sending email have some right to anonymity.  Not one of those messages, and none of the garbage I’ve seen regurgitated on Slashdot and other public forums have given any evidence of why this isn’t possible.  It’s all a bunch of reactionary scribbling by people who engage their fingers on the keyboard before they engage their brains.

Finally, about three years too late, knowledgeable people are beginning to poke their heads up out of the sand and step out from behind their proposed legislation, and saying what many of us have known for years: it’s time to rewrite the protocol.