<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jim's Random Notes &#187; Programming</title>
	<atom:link href="http://blog.mischel.com/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mischel.com</link>
	<description></description>
	<lastBuildDate>Wed, 21 Jul 2010 21:16:34 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Command line tools strike again</title>
		<link>http://blog.mischel.com/2010/05/16/command-line-tools-strike-again/</link>
		<comments>http://blog.mischel.com/2010/05/16/command-line-tools-strike-again/#comments</comments>
		<pubDate>Sun, 16 May 2010 17:39:49 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=849</guid>
		<description><![CDATA[Every morning at 3:00, one of our servers grabs the latest code from our source repository and runs the build script.  As you would expect, the build usually completes without error and everything&#8217;s fine.  From time to time, though, one of us will forget to check in a file or dependent project, and the build [...]]]></description>
			<content:encoded><![CDATA[<p>Every morning at 3:00, one of our servers grabs the latest code from our source repository and runs the build script.  As you would expect, the build usually completes without error and everything&#8217;s fine.  From time to time, though, one of us will forget to check in a file or dependent project, and the build will fail.  At that point, it&#8217;s nice to have a way to tell everybody that the build failed, and why.</p>
<p>The build is an MSBUILD script that compiles all of our projects and dependencies, and then copies the results to a staging directory from which we can run unit tests or build distribution packages for our internal customers (see below).  To this point, everything can be done with a minimal batch file script, the MSBUILD program supplied with the .NET development tools, and of course the <a href="http://subversion.apache.org/">subversion</a> command line client.  We have one other tool called sendEmail that notifies me of the build status.</p>
<p>I&#8217;d like to notify <em>everybody</em> when the build fails, but doing so requires that I tell them <em>why</em> it failed.  And the generated build log is very large:  about 120 kilobytes, most of which is irrelevant.  The important information is typically the last 10 or so lines of the file, and that&#8217;s what I&#8217;d like to send to people when the build fails.  Those lines say, in effect, &#8220;The build failed for these reasons.&#8221;  A programmer who receives that message can quickly determine if it&#8217;s his responsibility, and take steps to fix the problem.</p>
<p>The only trouble I have is that there is no simple way with Windows-supplied tools to extract those pertinent lines from the file.  At least, I can&#8217;t think of a way.  But the GNU awk (gawk) can do it trivially.</p>
<p>When the build fails, the last thing that MSBUILD outputs is a line that says, &#8220;Build FAILED&#8221;, followed by some lines that describe the error or errors.  So all I need is a program that will go through the file, locate the &#8220;Build FAILED&#8221; line, and then output that line and all following lines to the end of the file.  It&#8217;s been 20 years since I did any awk programming, but this script was simple:</p>
<pre>gawk "{ if (/^Build FAILED/) { doit=1 } if (doit) print $0 }" &lt; buildlog.txt</pre>
<p>Done and done.</p>
<p>The only problem I have now is deciding whether I want to install the full <a href="http://blog.mischel.com/2010/04/02/gnu-tools-for-windows/">GNU Tools for Windows</a> package on my server, or if I should just grab <a href="http://gnuwin32.sourceforge.net/packages/gawk.htm">Gawk for Windows</a>.  The full package is probably the right way to go because I suspect I&#8217;ll be needing some other tools in the future.</p>
<p>Either way, I&#8217;m annoyed that Windows doesn&#8217;t include these simple text processing tools.  I can perhaps understand why they don&#8217;t exist in desktop versions, but we do these types of things on servers all the time, and the standard server install should include a more robust toolset.</p>
<p>Above I mentioned &#8220;internal customers.&#8221;  In reality, we are our own customers.  There are only five of us here, and one of us doesn&#8217;t use the tools that the build creates.  In light of that, it&#8217;d be easy to take a more cavalier attitude towards our build process.  I&#8217;ve found, though, that things run smoother if, as a programmer, I think of the users of my software (in this case, the crawler subsystem and the tools that process the collected data) in much the same way as I would an external customer.  Even though the primary user of the crawler is me.  I wear a number of different hats around here (as does everybody else&#8211;we&#8217;re a startup, after all), and it&#8217;s useful to think of Jim the SysAdmin as a separate person from Jim the Programmer.  That way, when we can afford to hire a system administrator to take those duties from me, the systems will already be in place for him to step right in.</p>
<p>Like source code version control, a formal build process is one of those things that you don&#8217;t need to implement until the size of your project team exceeds zero people.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/05/16/command-line-tools-strike-again/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Command line XML processing</title>
		<link>http://blog.mischel.com/2010/04/07/command-line-xml-processing/</link>
		<comments>http://blog.mischel.com/2010/04/07/command-line-xml-processing/#comments</comments>
		<pubDate>Wed, 07 Apr 2010 23:21:02 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=835</guid>
		<description><![CDATA[Today I got a big XML file full of yummy audio and video links that my Web crawler will just love to slurp up.  Not thinking, I wrote a quick grep command to extract some of the links and send them to the crawler.  Later it dawned on me that some of those links are [...]]]></description>
			<content:encoded><![CDATA[<p>Today I got a big XML file full of yummy audio and video links that my Web crawler will just love to slurp up.  Not thinking, I wrote a quick grep command to extract some of the links and send them to the crawler.  Later it dawned on me that some of those links are broken because the XML is entity encoded.  That is, this link:</p>
<pre>http://www.example.com/videos/?id=23&amp;format=hd</pre>
<p>Will be encoded so that &#8220;&amp;&#8221; becomes &#8220;&amp;amp;&#8221;.  Any character that is &#8220;special&#8221; in XML will end up being entity encoded like that.  Oops.</p>
<p>A quick search for &#8220;xml grep&#8221; led me to <a href="http://xmlstar.sourceforge.net/">XMLStarlet</a>:  a command line XML toolkit that lets you examine, query, fold, spindle, and mutilate XML files from the command line.  I don&#8217;t know nearly as much as I should about XPath, XSLT, and XML in general, but after a few minutes of looking at examples and struggling with the syntax, I managed to pull those URLs out of the XML file and send them off to the crawler.</p>
<p>Granted, I spent a heck of a lot more time on this than I would have just writing a quick C# program to extract that one element from the file in question.  But my C# program would have worked for this situation only.  I already have other plans for XMLStarlet.</p>
<p>Highly recommended.  If you ever find yourself having to manipulate XML files outside of your application, you need this tool.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/04/07/command-line-xml-processing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GNU tools for Windows</title>
		<link>http://blog.mischel.com/2010/04/02/gnu-tools-for-windows/</link>
		<comments>http://blog.mischel.com/2010/04/02/gnu-tools-for-windows/#comments</comments>
		<pubDate>Sat, 03 Apr 2010 00:09:18 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=830</guid>
		<description><![CDATA[I got annoyed with Windows today.  I had this HTML file that contained a bunch of links to RSS files I wanted to download and examine.  The task before me was to extract the URLs, remove duplicates, and then download.  It&#8217;s basic text processing that you can solve trivially with a bare-bones Linux distribution.  It&#8217;s a [...]]]></description>
			<content:encoded><![CDATA[<p>I got annoyed with Windows today.  I had this HTML file that contained a bunch of links to RSS files I wanted to download and examine.  The task before me was to extract the URLs, remove duplicates, and then download.  It&#8217;s basic text processing that you can solve trivially with a bare-bones Linux distribution.  It&#8217;s a single command line (wrapped for readability):</p>
<pre>grep -o "http://www.example.com/feeds/rss/[^.]\+.rss" feedIndex.html
  | sort -u | xargs wget</pre>
<p>What makes that possible is the <a href="http://www.gnu.org/">GNU</a> tools&#8211;a standard set of tools that mimic and extend the standard tools that have been available for Unix-based systems for decades.</p>
<p>Although the Windows command line supports piping, it doesn&#8217;t include a comprehensive set tools that were designed to work together the way the GNU tools were designed.  The Windows toolset is primitive, and not up to solving this simple task.  I used <a href="http://gnuwin32.sourceforge.net/packages/grep.htm">GNU grep for Windows</a> to extract the URLs and save them to a file, <a href="http://www.textpad.com/">TextPad</a> to sort and then manually remove duplicates, and finally <a href="http://gnuwin32.sourceforge.net/packages/wget.htm">GNU wget for Windows</a> to download the files.</p>
<p>This isn&#8217;t the first time I&#8217;ve had to resort to a hodgepodge of tools to solve a problem that I could solve without trouble if I had the GNU tools.  But in the past, downloading and installing all the <a href="http://gnuwin32.sourceforge.net/">GNU tools for Windows</a> was a giant pain in the neck with version conflicts and such.  That&#8217;s not a problem any longer.  Today I discovered the <a href="http://sourceforge.net/projects/getgnuwin32/">getgnuwin32</a> project, which automates the process of downloading, installing, and maintaining a full set of GNU tools for Windows. </p>
<p>The few tools I&#8217;ve used so far work exactly as expected.  Time (and some effort:  it&#8217;s been a while since I used the GNU tools) will tell if this is as useful as I hope it is.</p>
<p>Update (later the same day):<br />
There is one slight problem:  some of the GNU tools have name conflicts with the Windows tools.  <strong>sort</strong> is a good example.  If I tried the above command line on a Windows machine, it would try to invoke the brain damaged Windows SORT utility, which is so bad that whoever wrote it should die from embarrassment.  It depends on where in your path you put the GnuWin32\bin directory.  Either way you go, name conflicts are going to give you some headaches.</p>
<p>I&#8217;m thinking that, since most programmers don&#8217;t even know that the Windows command line exists, I&#8217;ll put GnuWin32\bin ahead of the Windows directory that contains the standard tools.  Or maybe I should just delete or rename SORT and any other tools that have conflicting names.  It&#8217;s not like I ever run batch files that I get from other people.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/04/02/gnu-tools-for-windows/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>It&#8217;s harder than it looks</title>
		<link>http://blog.mischel.com/2010/02/19/its-harder-than-it-looks/</link>
		<comments>http://blog.mischel.com/2010/02/19/its-harder-than-it-looks/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 23:41:15 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=783</guid>
		<description><![CDATA[Imagine that you have a web site that, among other things, allows your users to search for media (audio and video) using a simple query language.  So, if you want to find Britney Spears videos, you&#8217;d just type britney spears in the search box and click the Search button.  Simple, right?

Disclaimer:
The examples below mention particular [...]]]></description>
			<content:encoded><![CDATA[<p>Imagine that you have a web site that, among other things, allows your users to search for media (audio and video) using a simple query language.  So, if you want to find Britney Spears videos, you&#8217;d just type <em>britney spears</em> in the search box and click the Search button.  Simple, right?</p>
<blockquote>
<h3>Disclaimer:</h3>
<p>The examples below mention particular artists whose content appears legitimately on YouTube and other media sites, and can be legally obtained with the blessing of the copyright holders.</p>
<p>Although it&#8217;s possible that content from these artists can also be obtained illegally from other sites, I do not advocate that practice.  I do not support the use of <em>any</em> Internet search technology to obtain music, video, or other electronic media illegally. </p>
<p>Companies that operate search engines do not knowingly index such illegal content.  Reputable companies remove links to illegal content as required by the Digital Millennium Copyright Act (DMCA), when the existence of that content is made known in accordance with the DMCA&#8217;s notification procedures.</p></blockquote>
<p>Except it turns out that <em>britney</em> and <em>spears</em> are pretty common spam terms in metadata (the keywords and description fields of YouTube videos, for example).  People will upload all manner of stuff to YouTube and put bogus terms in the description in an attempt to get people to watch the video.  To reduce the number of irrelevant or inappropriate results returned (it&#8217;s probably impossible to eliminate irrelevant content), you decide to index the metadata by field and allow the user to say which fields are searched.  So, if they want just those videos that have &#8220;Britney&#8221; and &#8220;Spears&#8221; in the title field, they would type <em>britney spears IN Title</em>.  That doesn&#8217;t eliminate all of the spam, but it reduces it quite a bit.</p>
<p>It turns out that you have to make the <em>IN</em> case sensitive.  Otherwise you&#8217;d never be able to search for the word &#8220;in&#8221; in any metadata.  The same is true for any word that you use in your query language.  For example, if wanted all the videos that contain &#8220;Britney&#8221; <em>or</em> &#8220;Spears&#8221;, we&#8217;d write <em>britney OR spears IN Title</em>.</p>
<p>Still, not too hard, right?  But what if you want to search the Title field and the Description field?  At first you&#8217;d think you could write:  <em>britney spears IN Title OR Description</em>.  You could make that work until you take into account the possibility of more complex query expressions.  For example, let&#8217;s say you wanted a list of all videos that claim to be a Led Zeppelin song, or some version of Stairway to Heaven.  One possible query would be:</p>
<p><em>led zeppelin IN Artist OR Description OR stairway heaven IN Title</em></p>
<p>Whereas that query might look reasonable to a non-programmer, writing a computer program to properly handle the general case of queries like that is non-trivial.  The query can be parsed in several different ways.  Three of which are:</p>
<p>(led zeppelin IN Artist OR Description) OR (stairway heaven IN Title)<br />
(led zeppelin IN Artist) OR )(Description OR stairway) heaven IN Title)<br />
(led zeppelin IN Artist) OR (description OR (stairway heaven) IN Title)</p>
<p>All three of those interpretations are perfectly valid.  Applying rules of operator precedence can disambiguate some of the cases, but if you go through the exercise you&#8217;ll find out that IN has to have lower precedence than OR, and if you do that, then you end up with:</p>
<p>(led zeppelin IN Artist OR (Description OR stairway heaven)) IN Title</p>
<p>You end up having to either decorate the field names (i.e. &#8220;@Artist&#8221;) or group them with brackets or parentheses (i.e IN [Artist or Description]).</p>
<p>All of this is doable, and not especially heavy lifting as far as parsing is concerned.  But then you have to explain it to a non-technical user and make it easy for the non-technical user to use.  Otherwise, only programmers will want to (or even be able to) use it.</p>
<p>I&#8217;ve heard many a programmer (myself included, come to think of it) complain about a search facility that doesn&#8217;t allow complex queries.  We look at it from a programmer&#8217;s perspective and think it&#8217;d be trivial to implement a comprehensive query facility.  And in most cases they&#8217;re probably right.  You could develop a query system that anybody with a couple years&#8217; of programming experience could use without trouble and get <em>exact</em> results.  And when you flipped the switch to turn it on, you&#8217;d hear crickets.  Most users don&#8217;t understand Boolean algebra or the difference in precedence between AND and OR.  Trust me, people will go somewhere else to get their information rather than have to <em>think</em> of how to ask for it.</p>
<p>What users really want is a DWIM mode:  Do What I Mean.  They want to type word soup into the search and get back exactly what they were looking for, with no false hits (i.e. asking for <em>beatles </em>the music group and getting back something about dung beetles because somebody misspelled &#8220;beetle&#8221;).</p>
<p>But DWIM doesn&#8217;t exist.  Not today, and not for a long time (perhaps ever) in the future.  As a result, we have to restrict what the user can type and very carefully specify how things will be interpreted.  We have to make it easy for the most common cases, but able to do moderately complex and powerful things.  That balance is difficult to achieve, and no matter what you come up with, somebody will complain.  You can only hope that the number of users you delight will vastly outweigh those whom you annoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2010/02/19/its-harder-than-it-looks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sniffing network traffic</title>
		<link>http://blog.mischel.com/2009/10/21/sniffing-network-traffic/</link>
		<comments>http://blog.mischel.com/2009/10/21/sniffing-network-traffic/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 01:00:38 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=647</guid>
		<description><![CDATA[My latest crawler modifications require me to scrape Web pages that host videos so that I can obtain metadata (title, description, date posted, etc.) that we place in our index.  Unfortunately, there&#8217;s no standard way for sites to present such information.  ESPN and Vimeo have HTML &#60;meta&#62; tags that provide some info, but I have [...]]]></description>
			<content:encoded><![CDATA[<p>My latest crawler modifications require me to scrape Web pages that host videos so that I can obtain metadata (title, description, date posted, etc.) that we place in our index.  Unfortunately, there&#8217;s no standard way for sites to present such information.  ESPN and Vimeo have HTML &lt;meta&gt; tags that provide some info, but I have to go parsing through the body of the document to find the date.  (And yes, I&#8217;m aware that Vimeo has an API that will make this a moot point.  I&#8217;ll be investigating that soon.)</p>
<p>Other sites are much worse in that they provide <em>no</em> metadata in the HTML.  For example, one site&#8217;s video page is very code-heavy.  Requiring that the page be reloaded every time you request a new video would require a lot of network traffic.  Their design instead uses JavaScript to request a particular video&#8217;s metadata from a server.  Loading a new video involves downloading just a few kilobytes of data.</p>
<p>I spent some time this afternoon searching through the a video page HTML and the associated JavaScript, looking for the magic incantation that would get me the data I&#8217;m looking for.  The amount of code involved is staggering, and I quickly went crosseyed trying to decipher it before I hit on the idea of hooking up a sniffer to see if I could identify the HTTP request that gets the data.</p>
<p>It took me all of five minutes to download and install <a href="http://www.cleanersoft.com/sniffer/free_http_sniffer.htm">Free Http Sniffer</a>, request a video from the site in question, and locate the magic line in the 230 or so requests that the page makes when it loads.  Problem solved.  Now all I have to do is write code that&#8217;ll transform a video page url into a request for the metadata, and I&#8217;m set.</p>
<p>I have no idea why I didn&#8217;t think of the sniffer earlier.  I&#8217;d used one before for a similar purpose.  I suspect I&#8217;ll be making heavy use of it in the near future as I expand the number of sites that we crawl for media.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2009/10/21/sniffing-network-traffic/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A small change?</title>
		<link>http://blog.mischel.com/2009/09/19/a-small-change/</link>
		<comments>http://blog.mischel.com/2009/09/19/a-small-change/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 14:51:01 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=611</guid>
		<description><![CDATA[I&#8217;ve been programming computers for a long time.  Getting paid to write computer programs, even, which I thought was pretty darned funny when I first started.  People were paying me to do something that I loved.  But I digress.
After 30 years, you&#8217;d think that I would have learned that there&#8217;s no such thing as a [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been programming computers for a long time.  Getting <em>paid</em> to write computer programs, even, which I thought was pretty darned funny when I first started.  People were paying me to do something that I loved.  But I digress.</p>
<p>After 30 years, you&#8217;d think that I would have learned that there&#8217;s no such thing as a small change that you can push into production code without having to test.  You might get away with it from time to time, but eventually that arrogance is going to cost you.</p>
<p>But, hey, it&#8217;s a <em>simple</em> change!  What could go wrong?</p>
<p>When you hear yourself say that, <em>think</em> about what you&#8217;re saying.  And then spend the few minutes it will take to test your assumption.  If nothing else, you&#8217;ll save yourself the embarrassment of explaining to your business partner that you made the kind of mistake that you&#8217;d reprimand an employee for.</p>
<p>Fortunately, all it cost me was a little embassassment, a few hours&#8217; lost sleep, and an additional hour of down time for the crawler.  I got off easy.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2009/09/19/a-small-change/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Looking for a Ghost Replacement</title>
		<link>http://blog.mischel.com/2009/08/27/looking-for-a-ghost-replacement/</link>
		<comments>http://blog.mischel.com/2009/08/27/looking-for-a-ghost-replacement/#comments</comments>
		<pubDate>Thu, 27 Aug 2009 19:44:40 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=602</guid>
		<description><![CDATA[When describing the problems I was having configuring our new servers, I mentioned that I was going to try using Clonezilla to speed the process.  The idea was to get Windows installed and all the other software configured on one machine, and then just clone the drive.  Seemed like a good thing to do.
So I [...]]]></description>
			<content:encoded><![CDATA[<p>When describing the problems I was having configuring our new servers, I mentioned that I was going to try using <a href="http://clonezilla.org/">Clonezilla</a> to speed the process.  The idea was to get Windows installed and all the other software configured on one machine, and then just clone the drive.  Seemed like a good thing to do.</p>
<p>So I fired up Clonezilla, fought through the user interface to tell it what I wanted backed up and where, and then pressed the any key (really!  There was a prompt that said, &#8220;Press the any key&#8221;) to start the copy.  Clonezilla promptly told me that my network card wasn&#8217;t supported.  It would have been nice if it would have checked that when I first started the program.</p>
<p>Slightly discouraged but not yet willing to give up, I decided to try <a href="http://ping.windowsdream.com/">PING</a>.  Another cryptic user interface, but I won&#8217;t complain too much considering the price.  This time my network card was supported and after a couple of house it had created a copy of my partition.  So I fired up the next machine, ran PING, told it to copy the partition image to the disk.  That went well, too.  Except that after I was done, the machine wouldn&#8217;t boot.  The BIOS doesn&#8217;t see a bootable image on the disk.</p>
<p>At that point I gave up.  I&#8217;d already spent almost a full day futzing with the things.  In that time I could have installed and configured all of the machines.  (Or so I thought.)  In any case, my experiments with free drive cloning software left me disappointed.</p>
<p>There&#8217;s a good <a href="http://packratstudios.com/index.php/2008/03/11/symantec-ghost-who-a-list-of-open-source-alternatives/">overview of Ghost alternatives</a> over at <a href="http://packratstudios.com/">pack rat studios</a>, but I haven&#8217;t had the opportunity to try any of the others mentioned.  Clonezilla didn&#8217;t support my hardware, and PING failed for reasons unknown.  Anybody know of a package that actually works?</p>
<p>By the way, telling a potential user, &#8220;if your network card isn&#8217;t supported, download it and compile it into the Clonezilla package&#8221; is not likely to be met with smiles and thanks.  More likely, users—even technically competent users like me who are capable of downloading and building—are more likely to say, &#8220;no thanks,&#8221; and move on to something else.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2009/08/27/looking-for-a-ghost-replacement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Another .NET Framework bug?</title>
		<link>http://blog.mischel.com/2009/07/21/another-net-framework-bug/</link>
		<comments>http://blog.mischel.com/2009/07/21/another-net-framework-bug/#comments</comments>
		<pubDate>Tue, 21 Jul 2009 06:00:50 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=550</guid>
		<description><![CDATA[When faced with inexplicable program behavior, inexperienced programmers often blame the operating system, the runtime library, the compiler, or some other external force for the error.  Even when they discover that the bug is in their code, these programmers often will try to blame it on something else.  A sure sign of maturity in a [...]]]></description>
			<content:encoded><![CDATA[<p>When faced with inexplicable program behavior, inexperienced programmers often blame the operating system, the runtime library, the compiler, or some other external force for the error.  Even when they discover that the bug is in their code, these programmers often will try to blame it on something else.  A sure sign of maturity in a programmer is how he approaches bugs.  If, when faced with a bug, he concentrates his initial efforts on looking for the problem in his own code, you know that he&#8217;s learned.</p>
<p>I learned long ago.  In a little more than 30 years of programming computers (about 25 years doing it full time), I can think of only a handful of cases in which a bug in one of my programs was caused by anything other than my own error.  I know that there are bugs in operating systems and runtime libraries, but it&#8217;s not often that they cause problems for me.</p>
<p>So imagine my surprise when, within the space of one week, I&#8217;ve identified two previously unknown (or unreported, as far as I can tell) genuine bugs in the Microsoft .NET Framework runtime libraries.  I <a href="http://blog.mischel.com/2009/07/18/highly-unlikely-is-not-the-same-as-impossible/">wrote about the first one</a> the other day.  Microsoft has acknowledged that it is a bug and is currently reviewing it.</p>
<p>I originally thought that the new bug was in the <tt>Uri.TryCreate</tt> method, because it throws an exception in some circumstances even though the documentation says that it won&#8217;t throw an exception but rather will return <tt>False</tt> if it&#8217;s unable to create a valid <tt>Uri</tt> from the input parameters. And although throwing an exception in this case <em>is</em> (maybe) a bug, the cause of the bug is something else: the <tt>Uri</tt> constructor allows you to construct invalid <tt>Uri</tt> instances.</p>
<p>In my particular case, my Web crawler crashed because <tt>Uri.TryCreate</tt> threw an exception. That was very unexpected, and whatever parameters caused it are lost to me. But it&#8217;s pretty rare. I pass somewhere upwards of 250 million urls through that function every day. Unable to reconstruct the exact parameters that caused the problem, I used what I learned from poking around in the disassembled code to come up with a string that illustrates the problem.</p>
<p>The <tt>Uri</tt> constructor creates a <tt>Uri</tt> instance from a passed string. <tt>Uri</tt> is pretty cool in that it supports many types of resources, not just HTTP.  One such type is a mailto: URI of the form <tt>mailto:jim@mischel.com</tt>.</p>
<p>But the <tt>Uri</tt> constructor will succeed when it should fail. For example:</p>
<pre>Uri mailUri = new Uri("mailto:jim@mischel.comtest@mischel.com");
// trying to access mailUri.AbsoluteUri at this point will throw UriFormatException
// with the message "The host name cannot be parsed"</pre>
<p>Passing that invalid <tt>Uri</tt> as the <tt>baseUri</tt> parameter to <tt>Uri.TryCreate</tt> throws the exception that I encountered in the crawler.</p>
<p>What I find most curious about all this is that the <tt>Uri</tt> class appears to have two different methods for parsing strings.  There appears to be one parser for constructing <tt>Uri</tt> instances from strings (as in the constructor), and a separate parser used internally for various things.   I know from experience that parsing URIs is a horrendously difficult problem and hard to do correctly.   Why anybody would want to write, test, and maintain two different versions of such difficult code is beyond me.</p>
<p><a href="https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=475897">Bug reported</a>.  Awaiting response.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2009/07/21/another-net-framework-bug/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Highly Unlikely&#8221; is not the same as &#8220;Impossible&#8221;</title>
		<link>http://blog.mischel.com/2009/07/18/highly-unlikely-is-not-the-same-as-impossible/</link>
		<comments>http://blog.mischel.com/2009/07/18/highly-unlikely-is-not-the-same-as-impossible/#comments</comments>
		<pubDate>Sat, 18 Jul 2009 19:21:44 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=545</guid>
		<description><![CDATA[One of my programs crashed the other day in a very unexpected place:  inside the runtime library.  The exception stack trace is pretty clear on where the error occurred:
System.OverflowException: Negating the minimum value of a twos complement number is invalid.
   at System.Math.AbsHelper(Int32 value)
   at System.Random..ctor(Int32 Seed)
   at System.Threading.Collections.ConcurrentQueue`1.TryDequeueCore(T&#38; result)
   at System.Threading.Collections.ConcurrentQueue`1.TryDequeue(T&#38; result)
   at MyProgram.ThreadProc() in [...]]]></description>
			<content:encoded><![CDATA[<p>One of my programs crashed the other day in a very unexpected place:  inside the runtime library.  The exception stack trace is pretty clear on where the error occurred:</p>
<pre>System.OverflowException: Negating the minimum value of a twos complement number is invalid.
   at System.Math.AbsHelper(Int32 value)
   at System.Random..ctor(Int32 Seed)
   at System.Threading.Collections.ConcurrentQueue`1.TryDequeueCore(T&amp; result)
   at System.Threading.Collections.ConcurrentQueue`1.TryDequeue(T&amp; result)
   at MyProgram.ThreadProc() in c:\MyProgram\Main.cs:line 118
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()</pre>
<p>The documentation for the <tt>Random</tt> constructor says that this exception will be thrown if the <tt>Seed</tt> parameter is equal to <tt>Int32.MinValue</tt>.  Curious, I thought I&#8217;d disassemble the <tt>ConcurrentQueue</tt> class to see what&#8217;s going on.  No problem there, as it&#8217;s calling the default (parameterless) <tt>Random</tt> constructor.  So I took a look at that.</p>
<pre>.method public hidebysig specialname rtspecialname
        instance void  .ctor() cil managed
{
  // Code size       12 (0xc)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  call       int32 System.Environment::get_TickCount()
  IL_0006:  call       instance void System.Random::.ctor(int32)
  IL_000b:  ret
} // end of method Random::.ctor</pre>
<p>That code gets the current tick count and then passes that value as a seed to the constructor.  So, what&#8217;s wrong with this?  Take a look at the documentation for <tt>System.Environment.TickCount</tt>:</p>
<blockquote><p>The value of this property is derived from the system timer and is stored as a 32-bit signed integer. Consequently, if the system runs continuously, <tt>TickCount</tt> will increment from zero to <tt>Int32.MaxValue</tt> for approximately 24.9 days, then jump to <tt>Int32.MinValue</tt>, which is a negative number, then increment back to zero during the next 24.9 days.</p></blockquote>
<p>What all this means is that if your program calls the <tt>Random</tt> constructor during that one millisecond (after the system has been up for <tt>Int.MaxValue</tt> milliseconds), the value returned by <tt>System.Environment.TickCount</tt> is going to be equal to <tt>Int.MinValue</tt>, and passing that value as the seed will result in an exception.</p>
<p>I&#8217;ll be the first to admit that encountering this bug is highly unlikely.  Your computer has to be up and running for almost 25 days, and there&#8217;s a one-millisecond window when it&#8217;s vulnerable.  But the fact that my program crashed is proof that it is possible for this error to cause a problem.</p>
<p>This is a huge oversight on the part of Microsoft&#8217;s runtime library team.  I&#8217;ve <a href="https://connect.microsoft.com/VisualStudio">reported the bug</a> and hope they manage to get it fixed before they release .NET 4.0.</p>
<p><span style="color: #993300;">Update 2009/07/21:  Microsoft&#8217;s Base Class Library team has said that the issue has been resolved for the next major release.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2009/07/18/highly-unlikely-is-not-the-same-as-impossible/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Stack Overflow</title>
		<link>http://blog.mischel.com/2009/03/25/stack-overflow/</link>
		<comments>http://blog.mischel.com/2009/03/25/stack-overflow/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 23:46:51 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=412</guid>
		<description><![CDATA[For most of the &#8217;90s, I was a part of TeamB—a group of volunteers who helped answer questions on Borland&#8217;s Compuserve forums.  I met a bunch of really great people doing that, got some free Compuserve time, a few trips to the Bay Area, and lots of Borland products.  But mostly, I learned a heck [...]]]></description>
			<content:encoded><![CDATA[<p>For most of the &#8217;90s, I was a part of <a href="http://www.teamb.com/">TeamB</a>—a group of volunteers who helped answer questions on Borland&#8217;s Compuserve forums.  I met a bunch of really great people doing that, got some free Compuserve time, a few trips to the Bay Area, and lots of Borland products.  But mostly, I learned a heck of a lot by helping to answer users&#8217; questions.</p>
<p>When Borland, Microsoft, and other development tool companies moved their online technical support to the Internet, their support was mostly done through newsgroups, and I found the signal-to-noise ratio there almost unbearable.  Except for the moderated newsgroups, which were few and far between, asking a question was like talking to a wall.  Worse, even, because a wall won&#8217;t give you wrong answers or call you stupid for doing something different.  Even with the advent of forums rather than newsgroups, online technical help was virtually non-existent for a number of years and I just stopped trying.</p>
<p>Enter <a href="http://stackoverflow.com/about">Stack Overflow</a>, a free programming Q&amp;A site where you can ask questions, share your expertise, or just browse for nuggets of programming wisdom.  Stack Overflow <em>works</em>.  In many ways it works better than the old Borland Compuserve forums that I enjoyed so much.</p>
<p><em>Why</em> it works is simple: they&#8217;ve found a way to reward people for supplying good answers and, to a lesser extent, asking good questions.  It all has to do with reputation:  ego.  You gain reputation points for supplying good answers, and asking good questions.  &#8220;Good&#8221; is determined by a simple up- or down-votes by site users.  As you gain reputation points, you gain the ability to help moderate the site: re-tag questions, vote to close, edit questions, etc.  And your current reputation is prominently displayed beside your name.  There are also awards (&#8220;Badges&#8221;) given for a number of different things.</p>
<p>If you don&#8217;t care about reputation, that&#8217;s fine.  You can use the site anonymously and still ask, answer, and comment on questions.  But Stack Overflow works because a whole lot of people there <em>do</em> care about their reputations.  Giving more experienced users the ability to help moderate keeps the flaming and other invective to a minimum, and the constant peer review ensures that (in general) the higher-rated answers really are the best.</p>
<p>My only real complaint with Stack Overflow (and it&#8217;s not huge) is that the format doesn&#8217;t encourage an ongoing threaded discussion as was available on the Compuserve forums.  That&#8217;s not a problem in most cases, but there are times when arriving at a satisfactory answer requires much back-and-forth, and it&#8217;d be nice to see questions and answers displayed in threaded newsgroup fashion.  The ability to see answers ordered by date helps a lot, as does the comments feature, and I suspect that adding a threaded view would be of only limited additional help.</p>
<p>Despite a few nitpicks, I&#8217;m seriously impressed with Stack Overflow.  If you have a programming question on any topic, you should search for the answer there.  And if you don&#8217;t find it, <em>ask</em>.  You&#8217;ll probably be surprised at the speed and the quality of the answers you get.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2009/03/25/stack-overflow/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
