Movie madness

Outside of YouTube, MP4 is probably the most popular video file format available online.  MP4 videos exist inside a container format that’s also widely referred to as “the MP4 container format.”  Honestly, I’m not sure what the correct name for the format is, but it’s described by a standard identified as ISO/IEC 14496-12: “Information technology – Coding of audio-visual objects  Part 12: ISO base media file format.”  Yeah, that’s a mouthful.  Let’s just call it “Part 12.”

The document is freely available online as a PDF, although it can be difficult to find.  I just went searching for it again and couldn’t find the full version.  If I remember where I found it, I’ll post a link here.

Adobe Flash Player Update 3 (9, 0, 115, 0) and higher can play some MP4 files.  The subset of MP4s that Flash can play is described in Video File Format Specification Version 9.  That document gives you an idea of the MPEG-12 file format, although you probably want the full specification if you’re implementing a reader.

The file format is quite flexible–perhaps overly so–but reasonably easy to parse once you grok the basic structure.  I coded up a quick MP4 reader in a day, and within two days had my web crawler extracting metadata from files I located online.  But then I ran into a problem:  my movie player would sometimes hang when trying to play a file.  The really weird part was that the movie would play fine if I downloaded it first.  It was only when trying to play from online that I experienced the problem.

It didn’t take long to find the problem.  Or at least part of the problem.

Data in the Part 12 file format is organized in “boxes.”  Those boxes contain all manner of information:  a file header, overall movie information, information about the individual tracks, synchronization data for different tracks, etc.  Part 12 describes the overall structure of the file and the contents of the “moov” box that contains basic movie metadata (number and types of tracks, duration, codecs required to play the tracks, etc.).

Another box, called “mdat,” contains the actual movie data:  the video and audio information that will be played.

In order to start playing a movie, a player must have the metadata.  The player can’t play the first frame until it knows how to decode that frame.  The movie data, on the other hand, can be delivered relatively slowly:  at whatever the playback speed is.  In other words, playing a movie consists of these steps:

Read the metadata.
Determine if the movie is playable with this player.
repeat
    Read movie data (audio, visual, etc.) frame
    Render frame
until end of movie

So it makes sense to organize data in the movie file to facilitate that.  Right?  In fact, the Part 12 document makes two very pertinent recommendations:

2)  It is strongly recommended that all header boxes be placed first in their container: these boxes are the Movie Header, Track Header, Media Header, and the specific media headers inside the Media Information Box (e.g. the Video Media Header).

8)  It is recommended that the progressive download information box be placed as early as possible in files, for maximum utility.

The emphasized “recommended” is in the original document.

There are good reasons for these recommendations, as I discovered in the first problem file I looked at.  In that particular file, the “mdat” box, which contains the frame data, is placed at the front of the file:  immediately after the file header.  “mdat” is 89 megabytes long.  It’s followed by the the “moov” box that’s a little less than two megabytes.  A movie player has to download 89 megabytes of stuff before it can get to the metadata that tells the player how to play the movie.  89 megabytes might not sound like much, but at 10 megabits per second (which would be a very fast residential connection here in the U.S.), it’s a minute and a half.  Nobody’s going to wait a minute and a half for their video to download.

I suspect that whoever made these movies has no idea that they’re effectively unplayable over the Internet, and might not even care.  I care, because I’m not going to download the entire movie just to see if I’m really interested in watching it.

What surprises me is that video player software doesn’t recognize this and skip over the movie data to get to the metadata.  The HTTP 1.1 specification makes it very easy to get a partial file.  The movie player should see that the “mdat” box comes before “moov”, and make another request to get “moov”.  It could then go back to “mdat” after digesting the metadata.

I wonder how hard it would be to create a tool that fixes those backwards movies . . .

Rules of bicycling commuting

I’ve done a fair bit of bicycle commuting over the last 10 or 12 years, and have developed a few rules that by now are strongly ingrained. I don’t think about them often: they’re just part of how I ride. Still, it’s beneficial to write them down.

A few things about these rules:

  • This list is not all-inclusive. There are almost certainly things that I forgot.
  • Although I’ve called these “Rules of Bicycle Commuting,” most of my rules apply to road bicycling in general.
  • A rule’s position in the list does not necessarily reflect its importance.

Rules of Bicycle Commuting

  1. You forgot something. Seriously. I’ve commuted to work hundreds of times, and I often forget things. I’ve forgotten my water bottle, my office key, my wallet, and I can’t remember what all else. How I managed to ride a mile before I realized that I forget my helmet is beyond me. But I did it. If you’re prone to forgetting things, make yourself a checklist of the important stuff, and use it. Works for me.
  2. Wear a helmet. It’s the law in some places, but it’s really a matter of common sense. I’ve laid the bike down countless times (mostly when mountain biking), and it’s never been because a car hit or almost hit me. Yes, the helmet can be hot and can give you “helmet hair.” Deal with it. Your head is irreplaceable.
  3. Remember the Law of Tonnage. The Law of Gross Tonnage is a nautical convention: the smaller vessel must yield the right of way to a larger vessel. It’s common sense, based entirely on physics: the smaller vessel is more maneuverable, whereas the larger vessel might not be capable of getting out of the way. My modification, applied to bicycles is also common sense: Never argue with somebody whose vehicle out-masses your own because you will lose.
  4. It doesn’t matter who’s right. Related to the previous rule: When in doubt, yield the right of way to cars. You may be “right” in assuming that you have the right of way, but that’s small consolation when you’re road pizza.
  5. Wave with all five fingers. Most drivers are curteous and will give you space. Others will honk, yell, scream, curse, or otherwise try to make your life difficult. You may be tempted to give them the one-finger salute. Resist the temptation. Ignore them, or smile and wave with all five fingers. Doing otherwise risks putting you in a position where the Law of Tonnage can be used against you. Result, again: road pizza.
  6. You’re invisible. Drivers often don’t see bicyclists. Drivers should be observant, but it’s in your best interest to make yourself as visible as possible. Never assume that a driver sees you. When you’re lying on the road with a broken leg or worse, the matter of whose fault it was is pretty minor.
  7. Lights. If you’re going to ride at night or even close to dawn or dusk, get a tail light that blinks and a head light that you can mount to your handlebars. Those are so others can see you. Side reflectors are also good. If you want to see reliably, get a helmet-mounted light so you can direct the beam around corners and also at drivers so you know that they see you.
  8. Make your intentions known. Most people use their turn signals when driving a car, and of course your brake lights work automatically. Your bike doesn’t have those luxuries, so you have to make do with hand signals. It’s best if you point to where you’re going. Forget that silliness of left arm up to signal a right turn: use your right arm and point to where you’re going. And if you’re stopping, don’t just hold your hand down with palm open: pump your hand like you’re pushing back.
  9. How to Not Get Hit by Cars. Read, memorize, apply. ‘Nuff said.
  10. Watch for cars entering the road. I’ve never been hit by a car. I have come close quite a few times, though, and most of those were entering the road from the right. You’re riding on the shoulder, and people entering the road often look down the road for other cars and ignore the shoulder. They’ll turn right in front of you or right into you. Make sure you’re seen, and be especially observant to driveways and cross streets.
  11. Shortcuts are dangerous. Be careful when you take advantage of cutting through a parking lot. You’re not the only one who treats an empty parking lot as a no-rule zone. Keep a sharp lookout for cars that are cutting through the parking lot, too.
  12. Choose your route carefully. The fastest route by car may not be the fastest route by bicycle. Whereas it makes sense to take major roads with higher speed limits when you’re in a car, that 25 MPH residential street is shorter, and faster, on your bike. Besides, it keeps you off the busy streets and reduces your chance of getting hit. When in doubt, take the safer route, even if it’s a bit longer.
  13. Carry a spare tube and know how to change it. City streets are hell on bicycle tires. There are nails, screws, glass, shredded soda cans, construction staples, and all manner of other things that can puncture your tire and cause a flat. A spare tube will save you most of the time. If you don’t know how to fix a flat, go to your local bicycle shop and ask for a demonstration. Then go home and practice until you can do it reliably. With practice, you can do it in under five minutes. You might also consider carrying a piece of Tyvek (cut up an old FedEx envelope) or some similar material to put inside the tire in case the tire gets more than just a little puncture.
  14. Get puncture-resistant tires. I used to get a flat at least once a week. One memorable day I got three flats on the way to the office and another on the way home. Then I discovered puncture-resistant tires that have a Kevlar strip in them. Now I get flats … almost never. I went an entire year (more than 3,000 miles) without a flat. I ride on Specialized Armadillo tires, but other manufacturers have similar products. Puncture-resistant tires are heavier than racing tires, but in my opinion the tradeoff is worthwhile. I seriously dislike having to fix a flat.  Oh, and don’t waste your money on thicker “puncture-resistant” tubes.  My experience is that they provide no benefit.
  15. Learn to do simple repairs. A bicycle is an incredibly simple and reliable machine. Still, things break from time to time. Most good bike shops offer basic maintenance classes where you can learn to fix a flat, perform simple adjustments on your brake and shifter cables, spot-true a wheel, and a few other things. Learning how to do those things can save you from an uncomfortably long walk.

Bike happenings

For various reasons, I haven’t been very active on the bicycle the last three years.  When I finished my birthday ride back in 2006, my bike computer read 15,795 miles.  That’s cumulative mileage since I got the computer in 1999, I think.  I’ve not done a whole lot of riding since then.  When I dusted off the bike on Saturday for my ride to the office, the computer read 16,788.  Figure a thousand miles in three and a half years.

Yeah, I’ve been lazy.

Ever since I got a road bike, I’ve wanted to ride the Hotter’N Hell Hundred, which is something of a rite of passage for Texas cyclists.  But every year I’ve made tentative plans to do the ride, something has come up.  My friend Frank Colunga, who did the third Gunny Ski ride with us, has been doing that ride the last few years, and this year he laid down a challenge.  Craig (my other friend from the Gunny Ski rides) and I have accepted the challenge and will be heading to Wichita Falls at the end of August.  That’s the plan, barring any unforeseen circumstances.  This year I’m going to register for the ride in advance so that I have more incentive to go.

Today, the bike computer says 16,839.  My goal is to have it read 20,000 by the end of the year.  3,200 miles seems like a lot of riding in nine months, but it really isn’t.  For example, it’s right at nine miles from home to the office.  In a couple of weeks I’ll be in good enough shape to do that ride every day.  If that 18 mile round trip commute five days per week was all I did, I’d have 3,200 miles in just 35 weeks.  There are 39 weeks left in the year.

So the goal looks like it’s too easy?  Not necessarily.  The hard part is sticking to it.  If I stick to the training plan I’ve outlined, I’ll be at 19,000 miles before I even start the Hotter’N Hell ride.  I’ll re-evaluate my end-of-year goal when I get home from Wichita Falls.

Command line XML processing

Today I got a big XML file full of yummy audio and video links that my Web crawler will just love to slurp up.  Not thinking, I wrote a quick grep command to extract some of the links and send them to the crawler.  Later it dawned on me that some of those links are broken because the XML is entity encoded.  That is, this link:

http://www.example.com/videos/?id=23&format=hd

Will be encoded so that “&” becomes “&”.  Any character that is “special” in XML will end up being entity encoded like that.  Oops.

A quick search for “xml grep” led me to XMLStarlet:  a command line XML toolkit that lets you examine, query, fold, spindle, and mutilate XML files from the command line. I don’t know nearly as much as I should about XPath, XSLT, and XML in general, but after a few minutes of looking at examples and struggling with the syntax, I managed to pull those URLs out of the XML file and send them off to the crawler.

Granted, I spent a heck of a lot more time on this than I would have just writing a quick C# program to extract that one element from the file in question. But my C# program would have worked for this situation only. I already have other plans for XMLStarlet.

Highly recommended. If you ever find yourself having to manipulate XML files outside of your application, you need this tool.

GNU tools for Windows

I got annoyed with Windows today.  I had this HTML file that contained a bunch of links to RSS files I wanted to download and examine.  The task before me was to extract the URLs, remove duplicates, and then download.  It’s basic text processing that you can solve trivially with a bare-bones Linux distribution.  It’s a single command line (wrapped for readability):

grep -o "http://www.example.com/feeds/rss/[^.]\+.rss" feedIndex.html 
  | sort -u | xargs wget

What makes that possible is the GNU tools–a standard set of tools that mimic and extend the standard tools that have been available for Unix-based systems for decades.

Although the Windows command line supports piping, it doesn’t include a comprehensive set of tools that were designed to work together the way the GNU tools (actually, the original Unix tools) were designed.  The Windows toolset is primitive, and not up to solving this simple task.  I used GNU grep for Windows to extract the URLs and save them to a file, TextPad to sort and then manually remove duplicates, and finally GNU wget for Windows to download the files.

This isn’t the first time I’ve had to resort to a hodgepodge of tools to solve a problem that I could solve without trouble if I had the GNU tools.  But in the past, downloading and installing all the GNU tools for Windows was a giant pain in the neck with version conflicts and such.  That’s not a problem any longer.  Today I discovered the getgnuwin32 project, which automates the process of downloading, installing, and maintaining a full set of GNU tools for Windows. 

The few tools I’ve used so far work exactly as expected.  Time (and some effort:  it’s been a while since I used the GNU tools) will tell if this is as useful as I hope it is.

Update (later the same day):
There is one slight problem:  some of the GNU tools have name conflicts with the Windows tools.  sort is a good example.  If I tried the above command line on a Windows machine, it would try to invoke the brain damaged Windows SORT utility, which is so bad that whoever wrote it should die from embarrassment.  It depends on where in your path you put the GnuWin32\bin directory.  Either way you go, name conflicts are going to give you some headaches.

I’m thinking that, since most programmers don’t even know that the Windows command line exists, I’ll put GnuWin32\bin ahead of the Windows directory that contains the standard tools.  Or maybe I should just delete or rename SORT and any other tools that have conflicting names.  It’s not like I ever run batch files that I get from other people.

Bad laptop USB ports

My USB mouse on the laptop stopped working the other night.  I replaced the mouse with a known-working one and even it didn’t work on the laptop.  I wasn’t prepared to debug the thing at the time, so I continued my work using the built-in mouse stick.  But the computer was slow and Task Manager showed that I was using 100% CPU.

A little debugging with Process Explorer revealed the culprit:  the USB driver.  Odd, that.  Still going with the bad mouse theory, I figured that the mouse had somehow caused the driver to freak out.  So I rebooted the computer.  That solved the problem.  For a bit.  Then the mouse stopped working.

Now here’s the odd part.  As long as I leave the mouse plugged in to the USB port, everything else is fine.  None of the USB ports work, but there’s no excess CPU usage.  But as soon as I unplug the mouse, CPU usage goes to 100%.

I’ve looked around online and have tried most of the solutions others with this problem have tried:  uninstalling the devices and letting them re-install on reboot, updating the driver, etc.  All to no avail.  The mouse works fine for a while when I first restart, but it stops working after some unpredictable amount of time.

The one solution I haven’t tried yet is re-installing the operating system (Windows XP Pro).  I hesitate to go to that effort if it’s not required, but I have no idea what else to try.  The USB ports on this Dell 630 notebook are built into the motherboard, so there’s no chance of just replacing them.  And by the time I pay for a new motherboard and the labor to install it, I’m out about the same amount it would cost me to just buy a refurbished replacement computer.

I can limp along without USB ports for a while, but it’s not a good long-term solution.

I’d sure like to hear from anybody who’s had a similar problem, and learn how you solved it (if you did).  I suppose I’ll try the full disk wipe and OS restore, as much as I hate to do it.  I can’t think of how that’d solve the problem, but I also can’t think of any other possible solution.