<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jim's Random Notes &#187; Computers</title>
	<atom:link href="http://blog.mischel.com/category/computers/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mischel.com</link>
	<description></description>
	<lastBuildDate>Wed, 21 Jul 2010 21:16:34 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Why Google will win</title>
		<link>http://blog.mischel.com/2010/05/24/why-google-will-win/</link>
		<comments>http://blog.mischel.com/2010/05/24/why-google-will-win/#comments</comments>
		<pubDate>Mon, 24 May 2010 18:21:23 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=853</guid>
		<description><![CDATA[I don&#8217;t know who&#8217;s calling the shots over there at Google, but they&#8217;re absolutely brilliant.
Google&#8217;s technology is impressive, no doubt.  They&#8217;ve come a long way in the 12 years or so since two college kids named Sergi Brin and Larry Page came up with a way to greatly improve the quality of Web search [...]]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t know who&#8217;s calling the shots over there at Google, but they&#8217;re absolutely brilliant.</p>
<p>Google&#8217;s technology is impressive, no doubt.  They&#8217;ve come a long way in the 12 years or so since two college kids named Sergi Brin and Larry Page came up with a way to greatly improve the quality of Web search results.  They met quite a bit of resistance when they went looking for funding to build a company.  Everybody thought that Yahoo owned search, and nobody thought you could make money with search.  &#8220;You&#8217;re going to spend millions of dollars to build a phone book of the Web?  How will you make money?&#8221;</p>
<p>Around the same time, there was a small group at Microsoft who wanted to build search.  Microsoft&#8217;s corporate leaders shut that down pretty quickly, for much the same reason:  &#8220;there&#8217;s no money in search.&#8221;  In addition, and perhaps more importantly, Microsoft hadn&#8217;t really embraced the Internet.  Sure, Internet Explorer was in ascendance, mostly due to Netscape&#8217;s incompetence, and other parts of the company were making noise about using the Web, but at heart Microsoft remained a shrink-wrap software company.  Their business was selling Windows and Office.  They embraced the Internet to the extent required to sell those products.</p>
<p>Microsoft eventually embraced search, first grudgingly&#8211;&#8221;It&#8217;s something we have to provide&#8221;&#8211;and finally, after realizing that there was money to be made, by committing serious resources.  But by then it was too late.</p>
<p>It was too late because Google had figured out how to make money with search: first by displaying advertisements on search pages, then with Adwords, Adsense, and other cooperative advertising programs.  Google was transformed from a search engine with some incredibly impressive technology into an advertising company that understands how to make billions of dollars a few pennies at a time.</p>
<p>And Google <em>is</em> an advertising company.  Make no mistake.  Google is in the business of placing ads on your screen, and doing so in a manner that makes you more likely to click on the ad.  That means making them as relevant as possible and walking that fine line between visiblity and unobtrusiveness.  It also means getting their ads <em>everywhere</em>, and everything that Google does furthers that goal, directly or indirectly.</p>
<p>Google&#8217;s technology has two jobs:  deliver ads, and to increase their audience.  I know very little about how they deliver ads&#8211;that&#8217;s their proprietary process and, one might argue, the heart of their business.  But they&#8217;re transparent about how they increase their audience.  They provide arguably the best results of any general search engine available.  With YouTube, they dominate Web video.  They have a whole bunch of other free services and software&#8211;translation tools, Google Chrome Web browser, Google Maps, Google Earth, Google Books, Patent Search, Blogger, Mail, SketchUp, Images, and many more&#8211;that make it easier to use the Internet or provide online replacements for traditionally client-bound tools.  By making it easier to use the Internet, they get more people on the Internet.</p>
<p>Google also produces and makes available an incredible amount of program source code that developers can use or include in their products for free.  Just check out <a href="http://code.google.com/intl/en/">Google code</a> sometime.  It&#8217;s full of proven working code that Google paid their employees to develop, and is now giving away for free.  It&#8217;s not that they&#8217;re altruistic.  They know that by making it easier for developers to create quality Web sites, their audience is growing.</p>
<p>Two recent (well, one not so recent) developments show Google&#8217;s commitment.  First, the <a href="http://www.google.com/chrome">Chrome Web browser</a>.  This is Google&#8217;s free browser, which is arguably the best on the market today.  One might ask why Google would go to the expense and effort of creating a new browser and then make large parts of its source code available (see the <a href="http://code.google.com/intl/en/chromium/">Chromum</a> project)?  I can&#8217;t say for sure, but here&#8217;s what I think.</p>
<p>I think that Google wants to do things with the Web that other browsers (Internet Explorer, Firefox, Opera, Safari, etc.) don&#8217;t currently support.  Although it&#8217;s often possible for Google to convince the people in control of those browsers to support new features, Google is left waiting for support.  If they control the browser, then Google can start pushing new technologies on their own schedule.</p>
<p>Whatever the reason behind it, Google Chrome is building market share.  It used to be that Microsoft&#8217;s Internet Explorer had 70 to 75% of the browser market, followed by FireFox in the 20 to 25% range, and everybody else was down in the noise.  The most recent numbers I have put IE below 60% for the first time, Firefox still hanging in there around 20 to 25%, and the rest being shared by Opera, Safari, and Chrome.  Except Chrome is taking market share, most of which is coming from Internet Explorer.</p>
<p>The more recent development is Google&#8217;s support of the <a href="http://www.webmproject.org/">WebM project</a>, a high-quality, open, and <em>free</em> video format.  I cannot overemphasize the importance of this development.  WebM combines a <a href="http://blog.mischel.com/2010/04/19/movie-madness/">container format</a> with free video and audio codecs so that anybody can create and distribute video royalty-free without having to worry about patents or other intellectual property concerns.  Google spent something like $100 million to obtain the rights to the VP8 video codec in order to make this possible.  Then they turned around and made it freely available to anybody.  Why?  Because ubiquitous free video gives Google a huge increase in surface area&#8211;a larger audience&#8211;that they can exploit for the purpose of delivering ads.</p>
<p>From the outside, Google&#8217;s business plan really does look as simple as, &#8220;Make the Web easy to use so that we can deliver more ads to more people.&#8221;</p>
<p>In the process, Google is steamrolling over a number of entrenched companies who thought they had it made.  Consider Adobe, whose Flash player is currently The Standard for online video.  Back in 2007, Flash 8 had something like 95% (perhaps higher) penetration.  That is, 95% of computers connected to the Internet had Flash installed.  Why?  Because of YouTube.  When Adobe released Flash version 9, it achieved more than 90% penetration in just a few months, again in large part (perhaps primarily) because YouTube went to Flash 9 for their video.  Adobe <em>owned</em> Web video.</p>
<p>But Adobe dropped the ball.  For reasons I&#8217;ll never understand, Adobe still clings to the idea that Flash is for creating rich Web apps.  The ability to do rich client things in a Web page is cool, and there was a time when Flash was the best way to do it.  But browsers and computers are more capable now.  I know from experience that it&#8217;s now much easier to build rich applications with JavaScript than it ever was with Flash.  And all you need is a modern browser.  There&#8217;s no need to download and install a Flash control to do it.</p>
<p>After Google&#8217;s WebM announcement last week, Adobe made a press release saying that they&#8217;ll support WebM &#8220;in a future version.&#8221;  YouTube will continue to use Flash for low-quality videos.  Starting soon, though, higher quality video will be delivered with WebM.  You have to be blind not to see what&#8217;s coming:  the eventual removal of Flash support on YouTube.  But it&#8217;s already over for Adobe Flash.  They will only see decreasing market share.  And Adobe has nobody to blame but themselves.  They ran into much the same thing clinging to their old .FLV format when the rest of the world was moving to .MP4.  The reason?  They make money by selling very expensive software packages that create video files.  Much like their PDF tools, they give away the reader and charge a lot of money for software that creates the files that their free players read.</p>
<p>With WebM, all that goes away.  There are already FFmpeg patches for WebM, and likely will be some very good free tools.</p>
<p>Microsoft, too, is getting steamrolled by Google.  After Google&#8217;s WebM announcement, Microsoft said that they&#8217;re very excited about the new technology and that Internet Explorer 9 will fully support it as long as the user has installed the proper codec.  If you&#8217;re not familiar with the world of codecs, don&#8217;t feel bad.  Understanding codecs is not something a user should have to do.  Finding and installing the proper codec can be incredibly frustrating and fraught with danger.  If you go looking for a codec for Media Player, for example, you&#8217;ll find yourself confused and in very real danger of inadvertently downloading and installing some malware.</p>
<p>For Microsoft to say, &#8220;as long as the user has installed the proper codec&#8221; is like GM saying that the new car they sell you will be fully functional as long as you find and install a compatible engine.</p>
<p>And don&#8217;t expect Microsoft&#8217;s Media Player to support WebM any time soon. According to Microsoft&#8217;s own <a href="http://support.microsoft.com/kb/316992">Information about the Multimedia file types that Windows Media Player supports</a>, they don&#8217;t even support MP4.  Granted, that article was written two years ago, but it covers Media Player 11, which is the most current version.  That article says, &#8220;You can play back .mp4 media files in Windows Media Player when you install DirectShow-compatible MPEG-4 decoder packs. DirectShow-compatible MPEG-4 decoder packs include the Ligos LSX-MPEG Player and the EnvivioTV.&#8221;  In other words, you have to install a codec made by a third party in order to play a video format that the rest of the world embraced five years ago.</p>
<p>The announcement of WebM is also pushing innovation in another area:  the server.  The day after the WebM announcement, somebody was <a href="http://www.alobbs.com/1386/Streaming_WebM_VP8_One_Day_Later.html">streaming WebM from the Cherokee Web server</a>.  <em>One day!</em>  This has some very interesting ramifications.  An open media format combined with an open Web server (like <a href="http://httpd.apache.org/">Apache</a>) means that a free media server is not far behind.  There goes Adobe&#8217;s <a href="http://www.adobe.com/products/flashmediaserver/">Flash Media Server</a> business.  And quite possibly Microsoft&#8217;s <a href="http://www.microsoft.com/windows/products/winfamily/windowshomeserver/default.mspx">Home Media Server</a>, especially if somebody releases an easy Linux configuration that includes this hypothetical (but soon to be realized) media server, backup and data recovery, and document management.</p>
<p>It&#8217;s interesting to note that Google hasn&#8217;t had to &#8220;target&#8221; any of these companies in order to take them out.  In fact, Google probably isn&#8217;t even interested in &#8220;taking them out.&#8221;  Google is just doing what it needs to do in order to grow the business.  If it means investing hundreds of millions of dollars so that more people will come online to watch video, then so be it.  If Google makes a few pennies every time somebody watches a video online, that hundred million bucks will be returned in short order.</p>
<p>The really funny part here is that both Microsoft and Adobe had to see it coming. It&#8217;s not like Google made a surprise announcement last week:  there was a big splash when they acquired the VP8 technology a few months back, and Google has been telegraphing this move since at least 2007, when they paid $1.5 billion for YouTube.  That kind of investment says, &#8220;We want to own Web video because we think we can make money at it.&#8221;  No, Microsoft and Adobe saw this coming and knew that they were powerless to stop it.  But rather than embrace VP8 and try to find a way to work with it, they clung to their own product plans hoping that some imaginary <a href="http://en.wikipedia.org/wiki/Maginot_Line">Maginot Line</a> would block Google&#8217;s advance.  Adobe, Microsoft, and other companies whose businesses are built on artificial scarcity (selling bits) are living in the past and will continue to see their market share stolen by companies like Google that can provide <em>better</em> products for free.</p>
<p>You&#8217;re going to see this same thing play out all over again in the world of television.  Google recently announced a deal with Intel and Sony that will put <a href="http://www.google.com/tv/">Google TV</a> on Sony television sets.  Today, something like 25% of all new televisions sold are Internet ready.  Google is ready to go there, and not just because it increases the surface area for their Web advertisements, but also because it gives Google a platform from which to launch an assault on the television advertising market ($70 billion annually in the U.S. alone).</p>
<p>Google&#8217;s competition for that market is a handful of old media companies and Madison Avenue advertising firms, both of which have grown fat and complacent.  Sure, they&#8217;ve been hit by Internet advertising over the years, but it&#8217;s been more of a slow leak in a dike rather than a tsunami that overwhelms the entire system.  Those companies probably aren&#8217;t smart enough to see it coming yet, but when they do see Google riding the wave, they&#8217;ll probably all hunker down behind the dike and hope for the best.  And then complain bitterly (read: try to win through litigation) when they discover that they lost the war while they were sitting there with their thumbs up their butts trying to decide if they should do anything.</p>
<p>Remember, you heard it here first.</p>
<p>I&#8217;m not trying to paint Google in a bad light at all.  On the contrary, I have nothing but admiration for them.  They&#8217;re going about their business.  If the entrenched companies can&#8217;t keep up, it&#8217;s not Google&#8217;s fault.  While the old media companies are refining the horse-drawn carriage, Google is hard at work on the V8 engine.  In the process, Google is making all manner of things available to Internet users and developers, and actually <em>encouraging</em> us to build products that leverage the free services that the company offers.  Given the choice between begging for access from the old media companies or accepting the bounty freely offered by Google, I&#8217;ll throw in with Google.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/05/24/why-google-will-win/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Command line tools strike again</title>
		<link>http://blog.mischel.com/2010/05/16/command-line-tools-strike-again/</link>
		<comments>http://blog.mischel.com/2010/05/16/command-line-tools-strike-again/#comments</comments>
		<pubDate>Sun, 16 May 2010 17:39:49 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=849</guid>
		<description><![CDATA[Every morning at 3:00, one of our servers grabs the latest code from our source repository and runs the build script.  As you would expect, the build usually completes without error and everything&#8217;s fine.  From time to time, though, one of us will forget to check in a file or dependent project, and the build [...]]]></description>
			<content:encoded><![CDATA[<p>Every morning at 3:00, one of our servers grabs the latest code from our source repository and runs the build script.  As you would expect, the build usually completes without error and everything&#8217;s fine.  From time to time, though, one of us will forget to check in a file or dependent project, and the build will fail.  At that point, it&#8217;s nice to have a way to tell everybody that the build failed, and why.</p>
<p>The build is an MSBUILD script that compiles all of our projects and dependencies, and then copies the results to a staging directory from which we can run unit tests or build distribution packages for our internal customers (see below).  To this point, everything can be done with a minimal batch file script, the MSBUILD program supplied with the .NET development tools, and of course the <a href="http://subversion.apache.org/">subversion</a> command line client.  We have one other tool called sendEmail that notifies me of the build status.</p>
<p>I&#8217;d like to notify <em>everybody</em> when the build fails, but doing so requires that I tell them <em>why</em> it failed.  And the generated build log is very large:  about 120 kilobytes, most of which is irrelevant.  The important information is typically the last 10 or so lines of the file, and that&#8217;s what I&#8217;d like to send to people when the build fails.  Those lines say, in effect, &#8220;The build failed for these reasons.&#8221;  A programmer who receives that message can quickly determine if it&#8217;s his responsibility, and take steps to fix the problem.</p>
<p>The only trouble I have is that there is no simple way with Windows-supplied tools to extract those pertinent lines from the file.  At least, I can&#8217;t think of a way.  But the GNU awk (gawk) can do it trivially.</p>
<p>When the build fails, the last thing that MSBUILD outputs is a line that says, &#8220;Build FAILED&#8221;, followed by some lines that describe the error or errors.  So all I need is a program that will go through the file, locate the &#8220;Build FAILED&#8221; line, and then output that line and all following lines to the end of the file.  It&#8217;s been 20 years since I did any awk programming, but this script was simple:</p>
<pre>gawk "{ if (/^Build FAILED/) { doit=1 } if (doit) print $0 }" &lt; buildlog.txt</pre>
<p>Done and done.</p>
<p>The only problem I have now is deciding whether I want to install the full <a href="http://blog.mischel.com/2010/04/02/gnu-tools-for-windows/">GNU Tools for Windows</a> package on my server, or if I should just grab <a href="http://gnuwin32.sourceforge.net/packages/gawk.htm">Gawk for Windows</a>.  The full package is probably the right way to go because I suspect I&#8217;ll be needing some other tools in the future.</p>
<p>Either way, I&#8217;m annoyed that Windows doesn&#8217;t include these simple text processing tools.  I can perhaps understand why they don&#8217;t exist in desktop versions, but we do these types of things on servers all the time, and the standard server install should include a more robust toolset.</p>
<p>Above I mentioned &#8220;internal customers.&#8221;  In reality, we are our own customers.  There are only five of us here, and one of us doesn&#8217;t use the tools that the build creates.  In light of that, it&#8217;d be easy to take a more cavalier attitude towards our build process.  I&#8217;ve found, though, that things run smoother if, as a programmer, I think of the users of my software (in this case, the crawler subsystem and the tools that process the collected data) in much the same way as I would an external customer.  Even though the primary user of the crawler is me.  I wear a number of different hats around here (as does everybody else&#8211;we&#8217;re a startup, after all), and it&#8217;s useful to think of Jim the SysAdmin as a separate person from Jim the Programmer.  That way, when we can afford to hire a system administrator to take those duties from me, the systems will already be in place for him to step right in.</p>
<p>Like source code version control, a formal build process is one of those things that you don&#8217;t need to implement until the size of your project team exceeds zero people.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/05/16/command-line-tools-strike-again/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Movie madness</title>
		<link>http://blog.mischel.com/2010/04/19/movie-madness/</link>
		<comments>http://blog.mischel.com/2010/04/19/movie-madness/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 16:04:30 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=842</guid>
		<description><![CDATA[Outside of YouTube, MP4 is probably the most popular video file format available online.  MP4 videos exist inside a container format that&#8217;s also widely referred to as &#8220;the MP4 container format.&#8221;  Honestly, I&#8217;m not sure what the correct name for the format is, but it&#8217;s described by a standard identified as ISO/IEC 14496-12: &#8220;Information technology [...]]]></description>
			<content:encoded><![CDATA[<p>Outside of YouTube, MP4 is probably the most popular video file format available online.  MP4 videos exist inside a container format that&#8217;s also widely referred to as &#8220;the MP4 container format.&#8221;  Honestly, I&#8217;m not sure what the <em>correct</em> name for the format is, but it&#8217;s described by a standard identified as ISO/IEC 14496-12: &#8220;Information technology &#8211; Coding of audio-visual objects  Part 12: ISO base media file format.&#8221;  Yeah, that&#8217;s a mouthful.  Let&#8217;s just call it &#8220;Part 12.&#8221;</p>
<p>The document is freely available online as a PDF, although it can be difficult to find.  I just went searching for it again and couldn&#8217;t find the full version.  If I remember where I found it, I&#8217;ll post a link here.</p>
<p style="padding-left: 30px;">Adobe Flash Player Update 3 (9, 0, 115, 0) and higher can play some MP4 files.  The subset of MP4s that Flash can play is described in <a href="http://www.adobe.com/devnet/flv/pdf/video_file_format_spec_v9.pdf">Video File Format Specification Version 9</a>.  That document gives you an idea of the MPEG-12 file format, although you probably want the full spec. if you&#8217;re implementing a reader.</p>
<p>The file format is quite flexible&#8211;perhaps overly so&#8211;but reasonably easy to parse once you grok the basic structure.  I coded up a quick MP4 reader in a day, and within two days had my web crawler extracting metadata from files I located online.  But then I ran into a problem:  my movie player would sometimes hang when trying to play a file.  The really weird part was that the movie would play fine if I downloaded it first.  It was only when trying to play from online that I experienced the problem.</p>
<p>It didn&#8217;t take long to find the problem.  Or at least <em>part</em> of the problem.</p>
<p>Data in the Part 12 file format is organized in &#8220;boxes.&#8221;  Those boxes contain all manner of information:  a file header, overall movie information, information about the individual tracks, synchronization data for different tracks, etc.  Part 12 describes the overall structure of the file and the contents of the &#8220;moov&#8221; box that contains basic movie metadata (number and types of tracks, duration, codecs required to play the tracks, etc.).</p>
<p>Another box, called &#8220;mdat,&#8221; contains the actual movie data:  the video and audio information that will be played.</p>
<p>In order to start playing a movie, a player must have the metadata.  The player can&#8217;t play the first frame until it knows how to decode that frame.  The movie data, on the other hand, can be delivered relatively slowly:  at whatever the playback speed is.  In other words, playing a movie consists of these steps:</p>
<pre>Read the metadata.
Determine if the movie is playable with this player.
repeat
    Read movie data (audio, visual, etc.) frame
    Render frame
until end of movie</pre>
<p>So it makes sense to organize data in the movie file to facilitate that.  Right?  In fact, the Part 12 document makes two very pertinent recommendations:</p>
<blockquote><p>2)  It is strongly <strong>recommended</strong> that all header boxes be placed first in their container: these boxes are the Movie Header, Track Header, Media Header, and the specific media headers inside the Media Information Box (e.g. the Video Media Header).</p>
<p>8)  It is <strong>recommended</strong> that the progressive download information box be placed as early as possible in files, for maximum utility.</p></blockquote>
<p>The emphasized &#8220;<strong>recommended</strong>&#8221; is in the original document.</p>
<p>There are good reasons for these recommendations, as I discovered in the first problem file I looked at.  In that particular file, the &#8220;mdat&#8221; box, which contains the frame data, is placed at the front of the file:  immediately after the file header.  &#8220;mdat&#8221; is 89 megabytes long.  It&#8217;s followed by the the &#8220;moov&#8221; box that&#8217;s a little less than two megabytes.  A movie player has to download 89 megabytes of stuff before it can get to the metadata that tells the player how to play the movie.  89 megabytes might not sound like much, but at 10 megabits per second (which would be a very fast residential connection here in the U.S.), it&#8217;s a minute and a half.  Nobody&#8217;s going to wait a minute and a half for their video to download.</p>
<p>I suspect that whoever made these movies has no idea that they&#8217;re effectively unplayable over the Internet, and might not even care.  <em>I</em> care, because I&#8217;m not going to download the entire movie just to see if I&#8217;m really interested in watching it.</p>
<p>What surprises me is that video player software doesn&#8217;t recognize this and skip over the movie data to get to the metadata.  The HTTP 1.1 specification makes it very easy to get a partial file.  The movie player should see that the &#8220;mdat&#8221; box comes before &#8220;moov&#8221;, and make another request to get &#8220;moov&#8221;.  It could then go back to &#8220;mdat&#8221; after digesting the metadata.</p>
<p>I wonder how hard it would be to create a tool that fixes those backwards movies . . .</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/04/19/movie-madness/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GNU tools for Windows</title>
		<link>http://blog.mischel.com/2010/04/02/gnu-tools-for-windows/</link>
		<comments>http://blog.mischel.com/2010/04/02/gnu-tools-for-windows/#comments</comments>
		<pubDate>Sat, 03 Apr 2010 00:09:18 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=830</guid>
		<description><![CDATA[I got annoyed with Windows today.  I had this HTML file that contained a bunch of links to RSS files I wanted to download and examine.  The task before me was to extract the URLs, remove duplicates, and then download.  It&#8217;s basic text processing that you can solve trivially with a bare-bones Linux distribution.  It&#8217;s a [...]]]></description>
			<content:encoded><![CDATA[<p>I got annoyed with Windows today.  I had this HTML file that contained a bunch of links to RSS files I wanted to download and examine.  The task before me was to extract the URLs, remove duplicates, and then download.  It&#8217;s basic text processing that you can solve trivially with a bare-bones Linux distribution.  It&#8217;s a single command line (wrapped for readability):</p>
<pre>grep -o "http://www.example.com/feeds/rss/[^.]\+.rss" feedIndex.html
  | sort -u | xargs wget</pre>
<p>What makes that possible is the <a href="http://www.gnu.org/">GNU</a> tools&#8211;a standard set of tools that mimic and extend the standard tools that have been available for Unix-based systems for decades.</p>
<p>Although the Windows command line supports piping, it doesn&#8217;t include a comprehensive set tools that were designed to work together the way the GNU tools were designed.  The Windows toolset is primitive, and not up to solving this simple task.  I used <a href="http://gnuwin32.sourceforge.net/packages/grep.htm">GNU grep for Windows</a> to extract the URLs and save them to a file, <a href="http://www.textpad.com/">TextPad</a> to sort and then manually remove duplicates, and finally <a href="http://gnuwin32.sourceforge.net/packages/wget.htm">GNU wget for Windows</a> to download the files.</p>
<p>This isn&#8217;t the first time I&#8217;ve had to resort to a hodgepodge of tools to solve a problem that I could solve without trouble if I had the GNU tools.  But in the past, downloading and installing all the <a href="http://gnuwin32.sourceforge.net/">GNU tools for Windows</a> was a giant pain in the neck with version conflicts and such.  That&#8217;s not a problem any longer.  Today I discovered the <a href="http://sourceforge.net/projects/getgnuwin32/">getgnuwin32</a> project, which automates the process of downloading, installing, and maintaining a full set of GNU tools for Windows. </p>
<p>The few tools I&#8217;ve used so far work exactly as expected.  Time (and some effort:  it&#8217;s been a while since I used the GNU tools) will tell if this is as useful as I hope it is.</p>
<p>Update (later the same day):<br />
There is one slight problem:  some of the GNU tools have name conflicts with the Windows tools.  <strong>sort</strong> is a good example.  If I tried the above command line on a Windows machine, it would try to invoke the brain damaged Windows SORT utility, which is so bad that whoever wrote it should die from embarrassment.  It depends on where in your path you put the GnuWin32\bin directory.  Either way you go, name conflicts are going to give you some headaches.</p>
<p>I&#8217;m thinking that, since most programmers don&#8217;t even know that the Windows command line exists, I&#8217;ll put GnuWin32\bin ahead of the Windows directory that contains the standard tools.  Or maybe I should just delete or rename SORT and any other tools that have conflicting names.  It&#8217;s not like I ever run batch files that I get from other people.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/04/02/gnu-tools-for-windows/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Bad laptop USB ports</title>
		<link>http://blog.mischel.com/2010/04/02/bad-laptop-usb-ports/</link>
		<comments>http://blog.mischel.com/2010/04/02/bad-laptop-usb-ports/#comments</comments>
		<pubDate>Fri, 02 Apr 2010 16:27:30 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=828</guid>
		<description><![CDATA[My USB mouse on the laptop stopped working the other night.  I replaced the mouse with a known-working one and even it didn&#8217;t work on the laptop.  I wasn&#8217;t prepared to debug the thing at the time, so I continued my work using the built-in mouse stick.  But the computer was slow and Task Manager [...]]]></description>
			<content:encoded><![CDATA[<p>My USB mouse on the laptop stopped working the other night.  I replaced the mouse with a known-working one and even <em>it</em> didn&#8217;t work on the laptop.  I wasn&#8217;t prepared to debug the thing at the time, so I continued my work using the built-in mouse stick.  But the computer was slow and Task Manager showed that I was using 100% CPU.</p>
<p>A little debugging with <a href="http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx">Process Explorer</a> revealed the culprit:  the USB driver.  Odd, that.  Still going with the bad mouse theory, I figured that the mouse had somehow caused the driver to freak out.  So I rebooted the computer.  That solved the problem.  For a bit.  Then the mouse stopped working.</p>
<p>Now here&#8217;s the odd part.  As long as I leave the mouse plugged in to the USB port, everything else is fine.  None of the USB ports work, but there&#8217;s no excess CPU usage.  But as soon as I unplug the mouse, CPU usage goes to 100%.</p>
<p>I&#8217;ve looked around online and have tried most of the solutions others with this problem have tried:  uninstalling the devices and letting them re-install on reboot, updating the driver, etc.  All to no avail.  The mouse works fine for a while when I first restart, but it stops working after some unpredictable amount of time.</p>
<p>The one solution I <em>haven&#8217;t</em> tried yet is re-installing the operating system (Windows XP Pro).  I hesitate to go to that effort if it&#8217;s not required, but I have no idea what else to try.  The USB ports on this Dell 630 notebook are built into the motherboard, so there&#8217;s no chance of just replacing them.  And by the time I pay for a new motherboard and the labor to install it, I&#8217;m out about the same amount it would cost me to just buy a refurbished replacement computer.</p>
<p>I can limp along without USB ports for a while, but it&#8217;s not a good long-term solution.</p>
<p>I&#8217;d sure like to hear from anybody who&#8217;s had a similar problem, and learn how you solved it (if you did).  I suppose I&#8217;ll try the full disk wipe and OS restore, as much as I hate to do it.  I can&#8217;t think of how that&#8217;d solve the problem, but I also can&#8217;t think of any other possible solution.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/04/02/bad-laptop-usb-ports/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Slow file deletion on Thecus N7700</title>
		<link>http://blog.mischel.com/2010/03/22/slow-file-deletion-on-thecus-n7700/</link>
		<comments>http://blog.mischel.com/2010/03/22/slow-file-deletion-on-thecus-n7700/#comments</comments>
		<pubDate>Mon, 22 Mar 2010 23:56:18 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=814</guid>
		<description><![CDATA[I work with big files.  Really big.  Daily, I back up a file that is larger than 200 gigabytes.  We have a Thecus N7700 network attached storage (NAS) box that holds about seven terabytes.  Every day I copy the latest stuff there and delete some of the older files.  It all works fine except for [...]]]></description>
			<content:encoded><![CDATA[<p>I work with big files.  Really big.  Daily, I back up a file that is larger than 200 gigabytes.  We have a <a href="http://www.thecus.com/products_over.php?cid=11&amp;pid=82">Thecus N7700</a> network attached storage (NAS) box that holds about seven terabytes.  Every day I copy the latest stuff there and delete some of the older files.  It all works fine except for just one little problem:  deleting a 200 gigabyte file takes a long time and interrupts other processing.</p>
<p>How long?  More than a minute.  Seriously.  And during that time, any other process that is trying to access files on the NAS gets really slow.  Sometimes, deleting the file causes the NAS to become unresponsive so long that other processes&#8217; IO requests time out and the program crashes.  That is not a happy state of affairs when I&#8217;m running a job that takes 36 hours.</p>
<p>It appears that the slowdown is due to indirect block pointer updates in the ext3 file system, as described in <a href="http://old.nabble.com/Unlink-performance-td20184420.html">this post</a>.</p>
<p>Is this a fundamental shortcoming of the ext3 file system?  If it is, what are my options?  The Thecus supports a file system called ZFS, but from what I&#8217;ve read about it online, I don&#8217;t want to go down that path.  I wonder if a firmware upgrade would solve my problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/03/22/slow-file-deletion-on-thecus-n7700/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>New removable drives</title>
		<link>http://blog.mischel.com/2010/03/09/new-removable-drives/</link>
		<comments>http://blog.mischel.com/2010/03/09/new-removable-drives/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 20:04:09 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=802</guid>
		<description><![CDATA[Update on my removable drive troubles.
I tried drilling holes in the case (after opening it and removing the drive, of course) on one of those Seagate FreeAgent drives.  Getting the thing apart was quite a chore, and I had a fun time making a mess drilling holes in the case.  The unit tested fine afterwards [...]]]></description>
			<content:encoded><![CDATA[<p>Update on my <a href="http://blog.mischel.com/2010/03/03/more-removable-drive-troubles/">removable drive troubles</a>.</p>
<p>I tried drilling holes in the case (after opening it and removing the drive, of course) on one of those Seagate FreeAgent drives.  Getting the thing apart was quite a chore, and I had a fun time making a mess drilling holes in the case.  The unit tested fine afterwards when copying small files to it, but it went unresponsive after about three gigabytes of the large file.  It&#8217;s difficult to say what went wrong.  I suspect that the USB-to-SATA electronics, which were marginal to begin with, finally gave up the ghost.  At some point I&#8217;ll pull out the 1 TB Seagate drive that&#8217;s in there and see if I can use it as a normal SATA drive.</p>
<p>Yesterday I picked up two <a href="http://www.newegg.com/Product/Product.aspx?Item=N82E16817371008">Antec MX-1</a> external drive enclosures and fitted them with 500 GB drives.  I got them installed last night, and initial results are positive.  I&#8217;ve heard that there have been some fan failures with the Antec enclosures, but a search didn&#8217;t reveal an inordinate number.  For the price (about $55 each, with tax), I might pick up a third just to keep on hand in case a fan <em>does</em> fail.</p>
<p>The drive comes with USB and eSATA cables.  I was all ready to go eSATA until I discovered that my server doesn&#8217;t appear to have a spare SATA port inside.  I suppose I could go eSATA at the office and USB at the datacenter.  I might still do that, although it&#8217;ll have to wait until I can take down that office server.  It serves other important duties here, so I can&#8217;t just shut it down without affecting a lot of other things.</p>
<p>In any case, I think (hope) that my removable drive troubles are over, at least for a while.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/03/09/new-removable-drives/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More removable drive troubles</title>
		<link>http://blog.mischel.com/2010/03/03/more-removable-drive-troubles/</link>
		<comments>http://blog.mischel.com/2010/03/03/more-removable-drive-troubles/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 17:38:34 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=785</guid>
		<description><![CDATA[I&#8217;ve mentioned before that we use USB external drives for transportation of data from our colocation facility to the office.  After struggling to find reliable devices, we finally settled on the Seagate FreeAgent 1TB drives.  They&#8217;ve served us quite well for over a year now.  But recently it&#8217;s been taking a very long time to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve mentioned before that we use USB external drives for transportation of data from our colocation facility to the office.  After struggling to find reliable devices, we finally settled on the Seagate FreeAgent 1TB drives.  They&#8217;ve served us quite well for over a year now.  But recently it&#8217;s been taking a very long time to copy our data.</p>
<p>It used to take about three and a half hours to copy data (a couple hundred gigabytes) from the server to the removable drive.  Recently it&#8217;s been taking on the order of 10 to 12 hours.  At first I thought it was another idiotic problem with caching, similar to the problem I had <a href="http://blog.mischel.com/2008/10/14/copying-large-files-on-windows/">copying large files between servers</a>, except this copy would eventually complete.  The odd thing was that when I started the copy it would proceed at the expected rate and at some point slow to a crawl.</p>
<p>So I wrote my own program that reads a gigabyte at a time from the local drive and then writes it to the USB device, timing each write operation.  Running locally (at the office), the program reported a steady 24 MB/sec write speed, and copied the entire file at that rate.  Run at the data center copying the same file, the program reported the same 24 MB/sec for the first 20 gigabytes or so.  Then it slowed to about 4 MB/sec.</p>
<p>That smacks of a thermal problem.  Either the drive electronics or the server&#8217;s USB port was overheating.  I quickly eliminated the server&#8217;s USB port as the problem by hooking up a different USB device and checking to see that the server could pass more than 50 gigabytes of data without trouble.</p>
<p>So the problem is with the FreeAgent drive.  If you spend a little time searching online, you&#8217;ll see that other people have experienced overheating problems with the FreeAgent drives.  And looking at the design, I can see why:  the only ventilation is at the bottom of the device where the electronics are.</p>
<p><a href="http://blog.mischel.com/wp-content/uploads/2010/03/drive.jpg"><img class="aligncenter size-full wp-image-786" title="drive" src="http://blog.mischel.com/wp-content/uploads/2010/03/drive.jpg" alt="drive" width="614" height="480" /></a></p>
<p>The picture on the left, above, shows the drive as we typically would place it in the rack at the data center.  It sits on top of one of our servers.  The spot where it&#8217;s sitting is directly above one of the disk drives.  That spot is cool to the touch when I tested it yesterday.  Note, however, that you can&#8217;t see any ventilation holes.  Those are on the other side of the enclosure, as shown by the red arrow in the picture to the right.</p>
<p>Since air enters the cabinet from where I was standing taking this picture, and flows towards the back, mounting the drive as shown on the left doesn&#8217;t allow for very good airflow.  So yesterday I placed the drive in the cabinet as shown on the right.  Then I ran my test program.  I was able to write about 90 gigabytes before the drive slowed down.  I&#8217;m convinced now that it&#8217;s a thermal problem.</p>
<p>I don&#8217;t quite know where to go from here, though.  I think the first thing I&#8217;ll try is lifting the drive higher off the surface it&#8217;s sitting on.  That should allow for better airflow, and perhaps will be enough to keep the electronics cool.  (The problem, according to what I&#8217;ve found online, appears to be the USB to SATA conversion electronics at the base of the drive enclosure.)  If changing the drive location doesn&#8217;t solve the problem, I&#8217;ll have to find a different model of removable drive that has better ventilation or better heat tolerance.  Perhaps it&#8217;s time to visit Fry&#8217;s and see about buying an enclosure that&#8217;s designed for use in the warm environment of a server rack.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/03/03/more-removable-drive-troubles/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Infected!</title>
		<link>http://blog.mischel.com/2009/12/27/infected/</link>
		<comments>http://blog.mischel.com/2009/12/27/infected/#comments</comments>
		<pubDate>Sun, 27 Dec 2009 23:08:05 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=734</guid>
		<description><![CDATA[Updated.  See below.
I don&#8217;t know how, but I somehow managed to get the Malware Defense &#8220;anti-spyware&#8221; program on my system at home.  Fortunately for me, it doesn&#8217;t do anything malicious like delete files or install botnet sofware.  It just continually pops up virus warnings and giving opportunities to install.  For a price, of course.  If [...]]]></description>
			<content:encoded><![CDATA[<p><span style="color: #993300;">Updated.  See below.</span></p>
<p>I don&#8217;t know how, but I somehow managed to get the Malware Defense &#8220;anti-spyware&#8221; program on my system at home.  Fortunately for me, it doesn&#8217;t do anything malicious like delete files or install botnet sofware.  It just continually pops up virus warnings and giving opportunities to install.  For a price, of course.  If you pay, they go away.</p>
<p>The <a href="http://www.2-spyware.com/remove-malware-defense.html">removal instructions</a> I came across weren&#8217;t complete, as I completed those steps, rebooted the system, and the thing came right back.  I finally tracked down and eliminated the richtx64.exe trojan, which I think is what was re-running Malware Defense.</p>
<p>I&#8217;ve been running my computer for years without any kind of active anti-virus or such, and this is the first time I&#8217;ve <em>ever</em> been infected.  Now I&#8217;m not sure what to do.  I certainly won&#8217;t go back to Norton after the troubles I&#8217;ve had with them, and I don&#8217;t hear good reports about McAfee&#8217;s offering, either.  Is there a good anti-virus, anti-malware package that works, is inexpensive, and doesn&#8217;t take inordinate amounts of CPU time?</p>
<p>Update 12/28:</p>
<p>It took a while, but with some research and downloading and running a few cleanup utilities, it looks like I was successful in disinfecting the computer.  The thing kept getting re-infected whenever I&#8217;d reboot, and it would prevent me from installing or running common anti-malware utilities.  I found a program called rkill that kills common malware processes, and then I could install and run cleanup software.  This morning, a complete scan with <a href="http://www.malwarebytes.org/mbam.php">Malwarebytes&#8217; Anti-Malware</a> reported zero problems.  I then installed <a href="http://www.microsoft.com/Security_Essentials/">Microsoft Security Essentials</a> from a file that I downloaded from a different (uninfected) computer.  It reports no problems.</p>
<p>Darrin Chandler brings up an interesting point in the comments:  it&#8217;s all a matter of weighing the risks.  I&#8217;ve gone years without any kind of malware problems.  Even when I had anti-malware applications installed, they <em>never</em> reported that they&#8217;d blocked anything.  And those programs are very quick to notify whenever they see anything even vaguely suspicious.  So, as Darrin points out, my risk of being infected is pretty small.  However, the <em>cost</em> of being infected is fairly high.  It cost me most of a day to get rid of it.  And I was fortunate that it doesn&#8217;t seem to have deleted any files.  I have no idea if it copied anything from me.  I&#8217;m not too worried since I don&#8217;t keep financial information on this machine.</p>
<p>I&#8217;m hoping that Microsoft Security Essentials works well and doesn&#8217;t cause problems by being too chatty or sucking down too many resources.  We&#8217;ll see how it goes.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2009/12/27/infected/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Two useful, one marginal</title>
		<link>http://blog.mischel.com/2009/10/28/two-useful-one-marginal/</link>
		<comments>http://blog.mischel.com/2009/10/28/two-useful-one-marginal/#comments</comments>
		<pubDate>Wed, 28 Oct 2009 23:50:44 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=661</guid>
		<description><![CDATA[I recently had the need to delve into the world of JSON (Java Script Object Notation) to read some data from a particular Web site. For my purposes, the simple JSON reader provided by .NET worked just fine. The way it works is interesting: you call JsonReaderWriterFactory.CreateJsonReader, and it returns an XmlReader instance. That&#8217;s right, [...]]]></description>
			<content:encoded><![CDATA[<p>I recently had the need to delve into the world of JSON (Java Script Object Notation) to read some data from a particular Web site. For my purposes, the simple JSON reader provided by .NET worked just fine. The way it works is interesting: you call <tt>JsonReaderWriterFactory.CreateJsonReader</tt>, and it returns an <tt>XmlReader</tt> instance. That&#8217;s right, it converts the JSON to XML behind the scenes. Apparently there are some limitations in how it handles nested structures, but I didn&#8217;t encounter them. That&#8217;s useful thing #1.</p>
<p>I discovered useful thing #2 when my <tt>XmlReader</tt> threw an exception trying to parse the JSON I fed it. I originally thought that the problem was with the JSON-to-XML conversion. But then I fed the JSON to <a href="http://www.jsonlint.com/">JSONLint</a>.  It turns out that the string <tt>"It\'s an error"</tt> contains an error.  Escaping the apostrophe is an error in JSON.  There are only a handful of characters that can be legally escaped.  It&#8217;s nice to know that the site was in error and not my JSON-to-XML converter.  Either way, I still have to gracefully handle the error.</p>
<p>I had hoped to use the Windows command <a href="http://technet.microsoft.com/en-us/library/bb490907.aspx">FINDSTR</a> as a substitute for grep.  No such luck.  FINDSTR has two problems that make it marginally useful at best.  First, there&#8217;s no switch that corresponds to grep&#8217;s <tt>--only-matching</tt> (<tt>-o</tt>) option.  If you specify <tt>--only-matching</tt>, then grep outputs only the text that matches the query expression rather than outputting the entire line that contains the match.  FINDSTR lacks that option, making it useless for many of the things I do.</p>
<p>The other problem is very odd.  Both grep and FINDSTR are line-oriented tools.  But FINDSTR&#8217;s definition of a line is inconsistent when working with files whose lines end with just a line feed.  For example, if I&#8217;m looking for all lines that contain the text &#8220;.xml&#8221;, I&#8217;d write this:</p>
<pre>FINDSTR /R "\.xml" file.txt</pre>
<p>The /R switch tells FINDSTR to treat the search string as a regular expression.  I could have done a literal search in this instance, but I want to illustrate the error.  FINDSTR correctly finds and outputs all of the lines that contain the string &#8220;.xml&#8221;.</p>
<p>What I really want, though, is just those lines that <em>end</em> with &#8220;.xml&#8221;. So the command would be:</p>
<pre>FINDSTR /R "\.xml$" file.txt</pre>
<p>FINDSTR doesn&#8217;t find any lines that end in &#8220;.xml&#8221; unless I convert the file so that it has CR/LF line ends. grep correctly handles both line end conventions. Since I can&#8217;t guarantee the format of the files I work with (I often am working with files that I download with wget), FINDSTR is practically useless if I&#8217;m doing regular expression searches.</p>
<p>My advice, download <a href="http://gnuwin32.sourceforge.net/packages/grep.htm">GNU Grep for Windows</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2009/10/28/two-useful-one-marginal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
