Outside of YouTube, MP4 is probably the most popular video file format available online. MP4 videos exist inside a container format that’s also widely referred to as “the MP4 container format.” Honestly, I’m not sure what the correct name for the format is, but it’s described by a standard identified as ISO/IEC 14496-12: “Information technology – Coding of audio-visual objects Part 12: ISO base media file format.” Yeah, that’s a mouthful. Let’s just call it “Part 12.”
The document is freely available online as a PDF, although it can be difficult to find. I just went searching for it again and couldn’t find the full version. If I remember where I found it, I’ll post a link here.
Adobe Flash Player Update 3 (9, 0, 115, 0) and higher can play some MP4 files. The subset of MP4s that Flash can play is described in Video File Format Specification Version 9. That document gives you an idea of the MPEG-12 file format, although you probably want the full specification if you’re implementing a reader.
The file format is quite flexible–perhaps overly so–but reasonably easy to parse once you grok the basic structure. I coded up a quick MP4 reader in a day, and within two days had my web crawler extracting metadata from files I located online. But then I ran into a problem: my movie player would sometimes hang when trying to play a file. The really weird part was that the movie would play fine if I downloaded it first. It was only when trying to play from online that I experienced the problem.
It didn’t take long to find the problem. Or at least part of the problem.
Data in the Part 12 file format is organized in “boxes.” Those boxes contain all manner of information: a file header, overall movie information, information about the individual tracks, synchronization data for different tracks, etc. Part 12 describes the overall structure of the file and the contents of the “moov” box that contains basic movie metadata (number and types of tracks, duration, codecs required to play the tracks, etc.).
Another box, called “mdat,” contains the actual movie data: the video and audio information that will be played.
In order to start playing a movie, a player must have the metadata. The player can’t play the first frame until it knows how to decode that frame. The movie data, on the other hand, can be delivered relatively slowly: at whatever the playback speed is. In other words, playing a movie consists of these steps:
Read the metadata. Determine if the movie is playable with this player. repeat Read movie data (audio, visual, etc.) frame Render frame until end of movie
So it makes sense to organize data in the movie file to facilitate that. Right? In fact, the Part 12 document makes two very pertinent recommendations:
2) It is strongly recommended that all header boxes be placed first in their container: these boxes are the Movie Header, Track Header, Media Header, and the specific media headers inside the Media Information Box (e.g. the Video Media Header).
8) It is recommended that the progressive download information box be placed as early as possible in files, for maximum utility.
The emphasized “recommended” is in the original document.
There are good reasons for these recommendations, as I discovered in the first problem file I looked at. In that particular file, the “mdat” box, which contains the frame data, is placed at the front of the file: immediately after the file header. “mdat” is 89 megabytes long. It’s followed by the the “moov” box that’s a little less than two megabytes. A movie player has to download 89 megabytes of stuff before it can get to the metadata that tells the player how to play the movie. 89 megabytes might not sound like much, but at 10 megabits per second (which would be a very fast residential connection here in the U.S.), it’s a minute and a half. Nobody’s going to wait a minute and a half for their video to download.
I suspect that whoever made these movies has no idea that they’re effectively unplayable over the Internet, and might not even care. I care, because I’m not going to download the entire movie just to see if I’m really interested in watching it.
What surprises me is that video player software doesn’t recognize this and skip over the movie data to get to the metadata. The HTTP 1.1 specification makes it very easy to get a partial file. The movie player should see that the “mdat” box comes before “moov”, and make another request to get “moov”. It could then go back to “mdat” after digesting the metadata.
I wonder how hard it would be to create a tool that fixes those backwards movies . . .