I got annoyed with Windows today. I had this HTML file that contained a bunch of links to RSS files I wanted to download and examine. The task before me was to extract the URLs, remove duplicates, and then download. It’s basic text processing that you can solve trivially with a bare-bones Linux distribution. It’s a single command line (wrapped for readability):
grep -o "http://www.example.com/feeds/rss/[^.]\+.rss" feedIndex.html | sort -u | xargs wget
What makes that possible is the GNU tools–a standard set of tools that mimic and extend the standard tools that have been available for Unix-based systems for decades.
Although the Windows command line supports piping, it doesn’t include a comprehensive set of tools that were designed to work together the way the GNU tools (actually, the original Unix tools) were designed. The Windows toolset is primitive, and not up to solving this simple task. I used GNU grep for Windows to extract the URLs and save them to a file, TextPad to sort and then manually remove duplicates, and finally GNU wget for Windows to download the files.
This isn’t the first time I’ve had to resort to a hodgepodge of tools to solve a problem that I could solve without trouble if I had the GNU tools. But in the past, downloading and installing all the GNU tools for Windows was a giant pain in the neck with version conflicts and such. That’s not a problem any longer. Today I discovered the getgnuwin32 project, which automates the process of downloading, installing, and maintaining a full set of GNU tools for Windows.
The few tools I’ve used so far work exactly as expected. Time (and some effort: it’s been a while since I used the GNU tools) will tell if this is as useful as I hope it is.
Update (later the same day):
There is one slight problem: some of the GNU tools have name conflicts with the Windows tools. sort is a good example. If I tried the above command line on a Windows machine, it would try to invoke the brain damaged Windows SORT utility, which is so bad that whoever wrote it should die from embarrassment. It depends on where in your path you put the GnuWin32\bin directory. Either way you go, name conflicts are going to give you some headaches.
I’m thinking that, since most programmers don’t even know that the Windows command line exists, I’ll put GnuWin32\bin ahead of the Windows directory that contains the standard tools. Or maybe I should just delete or rename SORT and any other tools that have conflicting names. It’s not like I ever run batch files that I get from other people.