More on timestamp formats

Continuing yesterday’s entry about syndication format dates…

As it turns out, the Atom syndication format draft specification uses a W3C date time value, which includes a UTC offset specifier, and Atom implementations apparently store the offset along with the date.  RSS 2.0 uses the RFC822 format.  RSS 1.0 (also known as RDF) doesn’t define a date field as part of the core specification.  RSS 1.0 implementations that do implement a date field seem to be split on which of the two formats to use.  The information is there.  We just need to get implementors to use it.

I’ve recently downloaded two open source packages for working with syndication formats in C#:  RSS.NET and ATOM.NET.  Interestingly, both fail to parse the date and time stamps.  Why?  Because they rely on the .NET Framework’s DateTime.Parse method, which doesn’t understand those formats.  I’ve submitted a fix to the maintainer of RSS.NET, and will be submitting a fix to the ATOM.NET project as soon as I finish it.

I was surprised to encounter these bugs, especially in ATOM.NET. DateTime.Parse says that it handles RFC822 format dates, but it only supports a subset of that format.  I can understand how a “quick test” of the code against a few RSS feeds would let that bug slip by.  But ATOM.NET uses the W3C date time value, and DateTime.Parse doesn’t even come close to parsing that.  The author either didn’t test that part of the code, or decided to release the code with the bug undescribed.  Shoddy work, that.

RSS timestamps are ambiguous

I’ve slowly been working on the RSS article categorization program that I outlined back in August. In the process, I’ve come to the conclusion that a standard timestamp as expressed in most programming languages (which is just a date and time) isn’t enough information to store about an article because it doesn’t provide any context. First, some background.

When you publish an RSS article you can supply the publication date and time. The timestamp is expressed using the RFC822 date/time format, which takes this form:

Thu, 16 Dec 2004 15:50:03 -0600

The “-0600” is the time offset from Universal Coordinated Time (UTC)–what used to be called Greenwich Mean Time (GMT).   The offset is the difference between UTC and the local time, expressed in hours and minutes. Thus, -0600 means that the time here is six hours and zero minutes earlier than UTC. So UTC is 21:50, or 10 minutes to 10 in the evening. Every RSS program that I’ve seen strips the offset information after possibly converting the date/time. That leaves you with at least four possible interpretations of the timestamp that the RSS aggregator displays. The timestamp can represent:

  1. The local time where the article was published;
  2. The time at your location;
  3. The UTC time;
  4. The local time at the location where the article was read and processed (assuming a Web service is processing the articles).

To make matters worse, RSS aggregators often won’t tell you which of these interpretations is being used.

If we’re going to use simple timestamps that don’t contain timezone information, then all times should be reported as UTC. This will become increasingly important as more people use RSS and similar tools to communicate. If everybody would standardize on UTC and explicitly state that times are expressed in UTC, there would be no confusion as to when things were posted.

Even if we standardize on UTC, a simple timestamp doesn’t provide enough contextual information. Sometimes you want to know all three relevant times: what time it was for the author when he published the entry (assuming that it’s close to the time he wrote it), what time it was for you, and the UTC time. Did the author write his entry in the middle of the day? At the end of a late night hacking session? What were you doing when he wrote the article? The UTC time, of course, is the absolute time used to order the article.

RSS aggregators could supply all three of these times quite easily by storing the offset information (the “-0600” value in the example above) along with the date and time that they normally store. If the timestamp is stored in UTC, then adding the offset will return the local publish time, and converting the UTC time to your time zone (something you can do in any modern programming language) will return the time at your location. That technique also has the benefit of being able to convert to your localtion wherever you are at the moment, something that becomes increasingly important as RSS aggregators begin to appear on mobile devices.

To all publishers of RSS information: please include the offset with the timestamp when you publish your articles.

To all authors of RSS aggregators: please store times in UTC, store the offset, and give me the ability to see all three of the relevant times.

RSS Feed

Behind the times as usual, today I downloaded SharpReader and stepped into the world of Really Simple Syndication.  I’m tired of going to the news.  Now I’ll let it come to me.  The only excuse I can give for not doing that sooner is laziness.

I also created an RSS feed for my Web site.  It was surprisingly easy to do.  There’s an article on the CityDesk Knowledge Base that shows how to create the required XML to publish an RSS feed.  It’s just a couple minutes’ work.  I’ll add an RSS icon link to the Web page template over the weekend.  The link is http://www.mischel.com/rss.xml.

Posted in Uncategorized | Tagged

Really Simple Syndication

RSS, or Really Simple Syndication, is a format for syndicating news and news-like sites like Slashdot and personal web logs.  By using an RSS reader and subscribing to different RSS feeds, you can have the contents of your favorite web logs or news sites presented to you whenever new content is added.  It’s like a clipping service for web logs.  Except it’s more.  Much more.  And perhaps the nicest thing about it is there’s an open standard (7 of them, actually—things are still quite fluid in this space), and anybody can write a program to subscribe to whatever feeds they like.  I really need to look into this more closely, and perhaps set up an RSS feed for my Random Notes.