Podly.TV is alive!

It’s been a long road.  Back in January of 2007, David Stafford and I came up with the idea of writing a media search engine.  We thought it’d take maybe a year.

We actually had something in a little more than a year, but it wasn’t very interesting.  It worked, but nobody would have been very interested in it if we had released it to the world.

By then we had grown to four people.  We stuck with it, improved our crawler and indexing technology, and released two or three other incarnations to very limited audiences.  Those attempts, too, were less than successful, but they gave us a lot of valuable information about what people like (and don’t like), and how users want to view online video.

It’s been a long road, and sometimes a bit discouraging, but we’re finally able to present an early version of our new product, Podly.TV.

Take it for a test drive.  Browse the channels.  If you don’t see anything you like, do a search and create your own channels.  It’s totally free.  Anybody can sign up and create a personal channel list.  We’re adding new videos constantly, and new video sources on a regular basis.

Why Google will win

I don’t know who’s calling the shots over there at Google, but they’re absolutely brilliant.

Google’s technology is impressive, no doubt. They’ve come a long way in the 12 years or so since two college kids named Sergi Brin and Larry Page came up with a way to greatly improve the quality of Web search results. They met quite a bit of resistance when they went looking for funding to build a company. Everybody thought that Yahoo owned search, and nobody thought you could make money with search. “You’re going to spend millions of dollars to build a phone book of the Web? How will you make money?”

Around the same time, there was a small group at Microsoft who wanted to build search. Microsoft’s corporate leaders shut that down pretty quickly, for much the same reason: “there’s no money in search.” In addition, and perhaps more importantly, Microsoft hadn’t really embraced the Internet. Sure, Internet Explorer was in ascendance, mostly due to Netscape’s incompetence, and other parts of the company were making noise about using the Web, but at heart Microsoft remained a shrink-wrap software company. Their business was selling Windows and Office. They embraced the Internet to the extent required to sell those products.

Microsoft eventually embraced search, first grudgingly–“It’s something we have to provide”–and finally, after realizing that there was money to be made, by committing serious resources. But by then it was too late.

It was too late because Google had figured out how to make money with search: first by displaying advertisements on search pages, then with Adwords, Adsense, and other cooperative advertising programs. Google was transformed from a search engine with some incredibly impressive technology into an advertising company that understands how to make billions of dollars a few pennies at a time.

And Google is an advertising company. Make no mistake. Google is in the business of placing ads on your screen, and doing so in a manner that makes you more likely to click on the ad. That means making them as relevant as possible and walking that fine line between visiblity and unobtrusiveness. It also means getting their ads everywhere, and everything that Google does furthers that goal, directly or indirectly.

Google’s technology has two jobs: deliver ads, and to increase their audience. I know very little about how they deliver ads–that’s their proprietary process and, one might argue, the heart of their business. But they’re transparent about how they increase their audience. They provide arguably the best results of any general search engine available. With YouTube, they dominate Web video. They have a whole bunch of other free services and software: translation tools, Google Chrome Web browser, Google Maps, Google Earth, Google Books, Patent Search, Blogger, Mail, SketchUp, Images, and many more that make it easier to use the Internet or provide online replacements for traditionally client-bound tools. By making it easier to use the Internet, they get more people on the Internet.

Google also produces and makes available an incredible amount of program source code that developers can use or include in their products for free. Just check out Google code sometime. It’s full of proven working code that Google paid their employees to develop, and is now giving away for free. It’s not that they’re altruistic. They know that by making it easier for developers to create quality Web sites, their audience is growing.

Two recent (well, one not so recent) developments show Google’s commitment. First, the Chrome Web browser. This is Google’s free browser, which is arguably the best on the market today. One might ask why Google would go to the expense and effort of creating a new browser and then make large parts of its source code available (see the Chromium project)? I can’t say for sure, but here’s what I think.

I think that Google wants to do things with the Web that other browsers (Internet Explorer, Firefox, Opera, Safari, etc.) don’t currently support. Although it’s often possible for Google to convince the people in control of those browsers to support new features, Google is left waiting for support. If they control the browser, then Google can start pushing new technologies on their own schedule.

Whatever the reason behind it, Google Chrome is building market share. It used to be that Microsoft’s Internet Explorer had 70 to 75% of the browser market, followed by FireFox in the 20 to 25% range, and everybody else was down in the noise. The most recent numbers I have put IE below 60% for the first time, Firefox still hanging in there around 20 to 25%, and the rest being shared by Opera, Safari, and Chrome. Except Chrome is taking market share, most of which is coming from Internet Explorer.

The more recent development is Google’s support of the WebM project, a high-quality, open, and free video format. I cannot overemphasize the importance of this development. WebM combines a container format with free video and audio codecs so that anybody can create and distribute video royalty-free without having to worry about patents or other intellectual property concerns. Google spent something like $100 million to obtain the rights to the VP8 video codec in order to make this possible. Then they turned around and made it freely available to anybody. Why? Because ubiquitous free video gives Google a huge increase in surface area–a larger audience–that they can exploit for the purpose of delivering ads.

From the outside, Google’s business plan really does look as simple as, “Make the Web easy to use so that we can deliver more ads to more people.”

In the process, Google is steamrolling over a number of entrenched companies who thought they had it made. Consider Adobe, whose Flash player is currently The Standard for online video. Back in 2007, Flash 8 had something like 95% (perhaps higher) penetration. That is, 95% of computers connected to the Internet had Flash installed. Why? Because of YouTube. When Adobe released Flash version 9, it achieved more than 90% penetration in just a few months, again in large part (perhaps primarily) because YouTube went to Flash 9 for their video. Adobe owned Web video.

But Adobe dropped the ball. For reasons I’ll never understand, Adobe still clings to the idea that Flash is for creating rich Web apps. The ability to do rich client things in a Web page is cool, and there was a time when Flash was the best way to do it. But browsers and computers are more capable now. I know from experience that it’s now much easier to build rich applications with JavaScript than it ever was with Flash. And all you need is a modern browser. There’s no need to download and install a Flash control to do it.

After Google’s WebM announcement last week, Adobe made a press release saying that they’ll support WebM “in a future version.” YouTube will continue to use Flash for low-quality videos. Starting soon, though, higher quality video will be delivered with WebM. You have to be blind not to see what’s coming: the eventual removal of Flash support on YouTube. But it’s already over for Adobe Flash. They will only see decreasing market share. And Adobe has nobody to blame but themselves. They ran into much the same thing clinging to their old .FLV format when the rest of the world was moving to .MP4. The reason? They make money by selling very expensive software packages that create video files. Much like their PDF tools, they give away the reader and charge a lot of money for software that creates the files that their free players read.

With WebM, all that goes away. There are already FFmpeg patches for WebM, and likely will be some very good free tools.

Microsoft, too, is getting steamrolled by Google. After Google’s WebM announcement, Microsoft said that they’re very excited about the new technology and that Internet Explorer 9 will fully support it as long as the user has installed the proper codec. If you’re not familiar with the world of codecs, don’t feel bad. Understanding codecs is not something a user should have to do. Finding and installing the proper codec can be incredibly frustrating and fraught with danger. If you go looking for a codec for Media Player, for example, you’ll find yourself confused and in very real danger of inadvertently downloading and installing some malware.

For Microsoft to say, “as long as the user has installed the proper codec” is like GM saying that the new car they sell you will be fully functional as long as you find and install a compatible engine.

And don’t expect Microsoft’s Media Player to support WebM any time soon. According to Microsoft’s own Information about the Multimedia file types that Windows Media Player supports, they don’t even support MP4. Granted, that article was written two years ago, but it covers Media Player 11, which is the most current version. That article says, “You can play back .mp4 media files in Windows Media Player when you install DirectShow-compatible MPEG-4 decoder packs. DirectShow-compatible MPEG-4 decoder packs include the Ligos LSX-MPEG Player and the EnvivioTV.” In other words, you have to install a codec made by a third party in order to play a video format that the rest of the world embraced five years ago.

The announcement of WebM is also pushing innovation in another area: the server. The day after the WebM announcement, somebody was streaming WebM from the Cherokee Web serverOne day! This has some very interesting ramifications. An open media format combined with an open Web server (like Apache) means that a free media server is not far behind. There goes Adobe’s Flash Media Server business. And quite possibly Microsoft’s Home Media Server, especially if somebody releases an easy Linux configuration that includes this hypothetical (but soon to be realized) media server, backup and data recovery, and document management.

It’s interesting to note that Google hasn’t had to “target” any of these companies in order to take them out. In fact, Google probably isn’t even interested in “taking them out.” Google is just doing what it needs to do in order to grow the business. If it means investing hundreds of millions of dollars so that more people will come online to watch video, then so be it. If Google makes a few pennies every time somebody watches a video online, that hundred million bucks will be returned in short order.

The really funny part here is that both Microsoft and Adobe had to have seen it coming. It’s not like Google made a surprise announcement last week: there was a big splash when they acquired the VP8 technology a few months back, and Google has been telegraphing this move since at least 2007, when they paid $1.5 billion for YouTube. That kind of investment says, “We want to own Web video because we think we can make money at it.” No, Microsoft and Adobe saw this coming and knew that they were powerless to stop it. But rather than embrace VP8 and try to find a way to work with it, they clung to their own product plans hoping that some imaginary Maginot Line would block Google’s advance. Adobe, Microsoft, and other companies whose businesses are built on artificial scarcity (selling bits) are living in the past and will continue to see their market share stolen by companies like Google that can provide better products for free.

You’re going to see this same thing play out all over again in the world of television. Google recently announced a deal with Intel and Sony that will put Google TV on Sony television sets. Today, something like 25% of all new televisions sold are Internet ready. Google is ready to go there, and not just because it increases the surface area for their Web advertisements, but also because it gives Google a platform from which to launch an assault on the television advertising market ($70 billion annually in the U.S. alone).

Google’s competition for that market is a handful of old media companies and Madison Avenue advertising firms, both of which have grown fat and complacent. Sure, they’ve been hit by Internet advertising over the years, but it’s been more of a slow leak in a dike rather than a tsunami that overwhelms the entire system. Those companies probably aren’t smart enough to see it coming yet, but when they do see Google riding the wave, they’ll probably all hunker down behind the dike and hope for the best. And then complain bitterly (read: try to win through litigation) when they discover that they lost the war while they were sitting there with their thumbs up their butts trying to decide if they should do anything.

Remember, you heard it here first.

I’m not trying to paint Google in a bad light at all. On the contrary, I have nothing but admiration for them. They’re going about their business. If the entrenched companies can’t keep up, it’s not Google’s fault. While the old media companies are refining the horse-drawn carriage, Google is hard at work on the V8 engine. In the process, Google is making all manner of things available to Internet users and developers, and actually encouraging us to build products that leverage the free services that the company offers. Given the choice between begging for access from the old media companies or accepting the bounty freely offered by Google, I’ll throw in with Google.

Command line tools strike again

Every morning at 3:00, one of our servers grabs the latest code from our source repository and runs the build script.  As you would expect, the build usually completes without error and everything’s fine.  From time to time, though, one of us will forget to check in a file or dependent project, and the build will fail.  At that point, it’s nice to have a way to tell everybody that the build failed, and why.

The build is an MSBUILD script that compiles all of our projects and dependencies, and then copies the results to a staging directory from which we can run unit tests or build distribution packages for our internal customers (see below).  To this point, everything can be done with a minimal batch file script, the MSBUILD program supplied with the .NET development tools, and of course the subversion command line client.  We have one other tool called sendEmail that notifies me of the build status.

I’d like to notify everybody when the build fails, but doing so requires that I tell them why it failed.  And the generated build log is very large:  about 120 kilobytes, most of which is irrelevant.  The important information is typically the last 10 or so lines of the file, and that’s what I’d like to send to people when the build fails.  Those lines say, in effect, “The build failed for these reasons.”  A programmer who receives that message can quickly determine if it’s his responsibility, and take steps to fix the problem.

The only trouble I have is that there is no simple way with Windows-supplied tools to extract those pertinent lines from the file.  At least, I can’t think of a way.  But the GNU awk (gawk) can do it trivially.

When the build fails, the last thing that MSBUILD outputs is a line that says, “Build FAILED”, followed by some lines that describe the error or errors.  So all I need is a program that will go through the file, locate the “Build FAILED” line, and then output that line and all following lines to the end of the file.  It’s been 20 years since I did any awk programming, but this script was simple:

gawk "{ if (/^Build FAILED/) { doit=1 } if (doit) print $0 }" < buildlog.txt

Done and done.

The only problem I have now is deciding whether I want to install the full GNU Tools for Windows package on my server, or if I should just grab Gawk for Windows.  The full package is probably the right way to go because I suspect I’ll be needing some other tools in the future.

Either way, I’m annoyed that Windows doesn’t include these simple text processing tools.  I can perhaps understand why they don’t exist in desktop versions, but we do these types of things on servers all the time, and the standard server install should include a more robust toolset.

Above I mentioned “internal customers.”  In reality, we are our own customers.  There are only five of us here, and one of us doesn’t use the tools that the build creates.  In light of that, it’d be easy to take a more cavalier attitude towards our build process.  I’ve found, though, that things run smoother if, as a programmer, I think of the users of my software (in this case, the crawler subsystem and the tools that process the collected data) in much the same way as I would an external customer.  Even though the primary user of the crawler is me.  I wear a number of different hats around here (as does everybody else–we’re a startup, after all), and it’s useful to think of Jim the SysAdmin as a separate person from Jim the Programmer.  That way, when we can afford to hire a system administrator to take those duties from me, the systems will already be in place for him to step right in.

Like source code version control, a formal build process is one of those things that you don’t need to implement until the size of your project team exceeds zero people.