<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jim&#039;s Random Notes</title>
	<atom:link href="http://blog.mischel.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mischel.com</link>
	<description></description>
	<lastBuildDate>Wed, 08 Feb 2012 03:49:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>FizzBuzz as a litmus test</title>
		<link>http://blog.mischel.com/2012/02/07/fizzbuzz-as-a-litmus-test/</link>
		<comments>http://blog.mischel.com/2012/02/07/fizzbuzz-as-a-litmus-test/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 03:49:34 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1531</guid>
		<description><![CDATA[<p>In The White Board Inquisition, I mentioned the FizzBuzz program as a minimum standard for identifying programmers. It&#8217;s a simple test that any programmer should be able to write in just a few minutes.</p> <p>Write a program that outputs the numbers from 1 to N on the console, with these exceptions. If the number is divisible <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2012/02/07/fizzbuzz-as-a-litmus-test/">FizzBuzz as a litmus test</a></span>]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://blog.mischel.com/2011/11/05/the-white-board-inquisition/">The White Board Inquisition</a>, I mentioned the <a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html">FizzBuzz program</a> as a minimum standard for identifying programmers. It&#8217;s a simple test that any programmer should be able to write in just a few minutes.</p>
<blockquote><p>Write a program that outputs the numbers from 1 to N on the console, with these exceptions. If the number is divisible by three, then instead of outputting the number, output the string &#8220;Fizz.&#8221; If the number is divisible by five, then output &#8220;Buzz.&#8221; And if the number is divisible by three <em>and</em> five, output &#8220;FizzBuzz.&#8221; So the first part of your output should be:</p>
<p><code>1,2,Fizz,4,Buzz,Fizz,7,8,Fizz,Buzz,11,Fizz,13,14,FizzBuzz,16</code></p>
<p>Writing the numbers one per line is fine. I wrote them as comma separated values here in the interest of saving space.</p></blockquote>
<p>If you can&#8217;t write that program, <em>you aren&#8217;t a programmer</em>. If you can&#8217;t write that program, you&#8217;re no good to me as a programmer and your degrees, work experience, and ability to spout buzzwords aren&#8217;t going to impress me. If you fail this simple test, I <em>will</em> terminate the interview early.</p>
<p>I am completely flabbergasted at the number of people who claim to be &#8220;senior programmers&#8221; who can&#8217;t even write that simple program.</p>
<p>Some people have told me that I expect too much. &#8220;After all,&#8221; they say, &#8220;we&#8217;re not trying to find <em>real</em> programmers, just people who can hook up some Web pages.&#8221; That&#8217;s crazy. &#8220;Hooking up Web pages,&#8221; at least in the applications I&#8217;ve seen, requires real programming. Increasingly so, in fact, what with the profusion of JavaScript. FizzBuzz tests basic programming knowledge: the use of loops, conditional statements, and simple math. I <em>guarantee</em> that those things are required in even the simplest of interactive Web applications.</p>
<p>Understand, FizzBuzz is just the first test. And don&#8217;t expect me to give <em>exactly</em> that problem. I might change it around a bit to see if you really understand it or if you just memorized some code for the interview. You&#8217;d be surprised at how many people can write a flawless FizzBuzz, but are totally mystified when asked to modify the program so that it counts down from 100 to 1. It&#8217;s frightening.</p>
<p>Years ago, I laughed at the idea of the <a href="http://en.wikipedia.org/wiki/Technological_singularity">singularity</a> occurring within my lifetime. Reaching that milestone will require systems that are orders of magnitude more complex than anything we&#8217;ve been able to build reliably. I did, however, agree that it could happen sometime in the future. Now, I&#8217;m not so sure. Even if we get enough smart people to <em>design</em> such a system, there&#8217;s no way a bunch of mush-for-brains &#8220;programmers&#8221; will ever be able to implement it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2012/02/07/fizzbuzz-as-a-litmus-test/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PowerShell parameter parsing</title>
		<link>http://blog.mischel.com/2012/02/03/powershell-parameter-parsing/</link>
		<comments>http://blog.mischel.com/2012/02/03/powershell-parameter-parsing/#comments</comments>
		<pubDate>Fri, 03 Feb 2012 05:24:56 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1528</guid>
		<description><![CDATA[<p>PowerShell has some nice built-in command line parameter parsing. I&#8217;ve only been wishing for something like this for &#8230; well, forever.</p> <p>Imagine you have a script that accepts four parameters:</p> <p>-EnvironmentName (or -e), which is mandatory -DestinationDir (or -d), which is mandatory -UserName (or -u), which is optional -Password (or -p), which is optional</p> <p>Usage <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2012/02/03/powershell-parameter-parsing/">PowerShell parameter parsing</a></span>]]></description>
			<content:encoded><![CDATA[<p>PowerShell has some nice built-in command line parameter parsing. I&#8217;ve only been wishing for something like this for &#8230; well, forever.</p>
<p>Imagine you have a script that accepts four parameters:</p>
<p>-EnvironmentName (or -e), which is mandatory<br />
-DestinationDir (or -d), which is mandatory<br />
-UserName (or -u), which is optional<br />
-Password (or -p), which is optional</p>
<p>Usage would be:</p>
<pre>myscript -e testing -d c:\test -u username -p password</pre>
<p>Writing code to parse and verify those parameters is just busy work. But because we haven&#8217;t had a good alternative (at least, not on the Windows platform), we&#8217;ve been doing it for years, writing new argument parsing code for every program. Sure, there&#8217;ve been some attempts at building a generalized argument parser and validator for particular languages or platforms (like .NET), but not one of those has really caught on.</p>
<p>Until now. PowerShell makes quick work of parameter parsing and validation. You can describe the arguments for the script mentioned above with just a few lines of code.</p>
<pre># ParamTest.ps1 - Show some parameter features
# Param statement must be first non-comment, non-blank line in the script
Param(
    [parameter(Mandatory=$true)]
    [alias("e")]
    $EnvironmentName,
    [parameter(Mandatory=$true)]
    [alias("d")]
    $Destination,
    [alias("u")]
    $UserName,
    [alias("p")]
    $Password)

Write-Host "EnvironmentName = $EnvironmentName"
Write-Host "Destination = $Destination"
Write-Host "UserName = $UserName"
Write-Host "Password = $Password"</pre>
<p>At a PowerShell prompt, run that script with this command:</p>
<pre>.\ParamTest -EnvironmentName MyEnvironment -Destination c:\logs\MyEnvironment</pre>
<p>The script will output the parameters as you entered them. The <code>UserName</code> and <code>Password</code> arguments are optional, so the output for them will be blank. If you want, you can include default values for those optional arguments.</p>
<p>I like that you can specify aliases for the parameters. So <code>-e MyEnvironment</code> is the same as <code>-EnvironmentName MyEnvironment</code>.</p>
<p>Note also that <code>-d dest -e env</code> will do the rational thing. That is, the order that you specify arguments doesn&#8217;t matter. Well, it <em>does</em> matter if you don&#8217;t name the parameters on the command line. That is, <code>.\ParamTest MyEnvironment c:\logs\MyEnvironment</code> will assign the value &#8220;MyEnvironment&#8221; to <code>$EnvironmentName</code>, and &#8220;c:\logs\MyEnvironment&#8221; to <code>$Destination</code>.</p>
<p>Unfortunately, there seems to be a bug in the positional parameters stuff. According to the documentation, if you have a <code>parameter</code> attribute on a parameter, then the default is that the parameter can&#8217;t be positional. If you use a <code>parameter</code> attribute, you&#8217;re <em>supposed to</em> include a <code>Position</code> argument if you want it to support positional processing. That is, in the above code, you should have:</p>
<pre>Param(
    [parameter(Mandatory=$true, Position=1)]
    [alias("e")]
    $EnvironmentName,
    [parameter(Mandatory=$true, Position=2)]</pre>
<p>Conversely, if you don&#8217;t want any positional parameters, you should be able to write:</p>
<pre>Param(
    [parameter(Mandatory=$true)]
    [alias("e")]
    $EnvironmentName,
    [parameter(Mandatory=$true)]
    [alias("d")]
    $Destination,
    [parameter()]
    [alias("u")]
    $UserName,
    [parameter()]
    [alias("p")]
    $Password)</pre>
<p>That doesn&#8217;t seem to work. The code above will still support positional parameters. I haven&#8217;t yet seen a good way to completely eliminate positional processing.</p>
<p>You can try <code>parameter(Position=-1)</code>, but then you&#8217;ll get an exception if you try to run <code>get-help</code> on your script. I&#8217;ve also seen a hack of using <code>Position=0</code> on all of the parameters, but that results in some unhelpful error messages if you forget to name your command line parameters.</p>
<p>Even with the oddities having to do with positional parameters, the <code>Param</code> statement is a welcome feature in any programming language.</p>
<p>What I&#8217;ve shown above barely scratches the surface of what you can do with <code>Param</code>. You can include a help message with each parameter, create <a href="http://msdn.microsoft.com/en-us/library/dd878348(v=vs.85).aspx">parameter sets</a>, and specify some basic argument validation, all with some simple syntax in the <code>Param</code> statement. If you&#8217;re writing scripts or cmdlets, you should study the <code>Param</code> statement.</p>
<p>If, like me, you&#8217;re relatively new to PowerShell, it can be difficult to find information about this stuff. A good place to start is the MSDN <a href="http://msdn.microsoft.com/en-us/library/dd835506(v=vs.85).aspx">Windows PowerShell</a> topic. I&#8217;ve been unable to find a PowerShell <em>reference</em> on MSDN. For reference material, I start at the <a href="http://technet.microsoft.com/en-us/library/bb978526.aspx">TechNet PowerShell page</a>. For information about Param, see <a href="http://technet.microsoft.com/en-us/library/dd347712.aspx">about_Functions</a> and <a href="http://technet.microsoft.com/en-us/library/dd315326.aspx">about_Functions_Advanced</a>, or type <code>help about_Functions</code> (or <code>help about_Functions_Advanced</code>) at a PowerShell command line.</p>
<p>The documentation I&#8217;ve seen lacks good examples, but a little searching and experimenting can yield good results.</p>
<p>With PowerShell, there&#8217;s simply no reason to write another batch file. And if you find yourself making large modifications to an existing batch file, you should think very seriously about just rewriting it to use PowerShell. It really is worth your time to learn and use it. I think you&#8217;ll find, as I have, that many of those little C# programs you&#8217;ve been writing to do various things can be replaced with simple PowerShell scripts.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2012/02/03/powershell-parameter-parsing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eventual consistency and client applications</title>
		<link>http://blog.mischel.com/2012/01/31/eventual-consistency-and-client-applications/</link>
		<comments>http://blog.mischel.com/2012/01/31/eventual-consistency-and-client-applications/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 05:12:29 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1524</guid>
		<description><![CDATA[<p>This is another one of those &#8220;I can&#8217;t believe I have to address this&#8221; posts.</p> <p>Eventual consistency is sometimes used as an optimization in middle tier and back end processing to help balance the load on busy servers and provide a scalable architectures. In client-centered applications like large Web sites, the idea is to respond <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2012/01/31/eventual-consistency-and-client-applications/">Eventual consistency and client applications</a></span>]]></description>
			<content:encoded><![CDATA[<p>This is another one of those &#8220;I can&#8217;t believe I have to address this&#8221; posts.</p>
<p><a href="http://en.wikipedia.org/wiki/Eventual_consistency">Eventual consistency</a> is sometimes used as an optimization in middle tier and back end processing to help balance the load on busy servers and provide a scalable architectures. In client-centered applications like large Web sites, the idea is to respond to user requests very quickly, with the understanding that for some period of time the data on the server will be inconsistent.</p>
<p>There are other uses of eventual consistency, but this post is about using eventual consistency as part of a client-centric application.</p>
<p>On the surface, implementing an eventual consistency model looks pretty easy. All the server has to do in response to a client request to modify data is queue the request and tell the client, &#8220;Okay, it&#8217;s done.&#8221; The client doesn&#8217;t really care <em>when</em> it the update actually happens. The server can take its own good time to update the database or whatever else needs to be done.</p>
<p>If only it were so simple.</p>
<p>Imagine a simple shopping cart like those that you see everywhere on the Web. Using a traditional (transaction-based) model, adding an item to the shopping cart sends a request to the server. The server makes the database modifications required to show that product XYZ is user ABC&#8217;s shopping cart. The server doesn&#8217;t send a response to the user application until all of the updates are done. After the server returns its response, the user can click the &#8220;View Cart&#8221; button and see everything that&#8217;s in his shopping cart.</p>
<p>With a simple eventual consistency model, the server&#8217;s action is somewhat different. The server queues a message that says, &#8220;Add product XYZ to the shopping cart for user ABC.&#8221; At some point in the future, a database server will see the &#8220;Add product to cart&#8221; message and process it. While that message is making its way through the queue machinery, the server returns a response to the client application that says, in effect, &#8220;The item was added to the cart,&#8221; even though there&#8217;s no guarantee that the item was actually added. Now, imagine this scenario:</p>
<ol>
<li>User clicks &#8220;Add to cart.&#8221;</li>
<li>Server queues message &#8220;Add product XYZ to the cart for user ABC.&#8221;</li>
<li>Server returns success message to client.</li>
<li>User clicks &#8220;View cart.&#8221;</li>
<li>Server receives request, &#8220;Return contents of user&#8217;s cart.&#8221;</li>
<li>Server returns cart contents.</li>
<li>Database server receives and processes &#8220;Add product&#8221; message.</li>
</ol>
<p>Because the database server didn&#8217;t receive and process the &#8220;Add product&#8221; message before the user requested the cart contents, the user is going to see his cart without that product in it. The system showed the user an inconsistent view of the data, which breaks the cardinal rule for client applications: never astonish the user.</p>
<p>Any attempt at explaining this behavior to users is doomed to fail. &#8220;Oh, you clicked on your cart too fast. Just wait a minute and then refresh the page,&#8221; is not a proper response. That&#8217;s just going to confuse the user even more. Computers are supposed to make things <em>easier</em> for users. Imagine ordering pizza over the phone:</p>
<p>You: &#8220;I&#8217;d like a medium pepperoni with onions and green peppers.&#8221;<br />
Pizza guy: &#8220;Anything else?&#8221;<br />
You: &#8220;And a two liter soda.&#8221;<br />
Pizza guy: &#8220;Anything else?&#8221;<br />
You: &#8220;No. That&#8217;s all.&#8221;<br />
Pizza guy: &#8220;So that&#8217;s a medium pepperoni with onions and green peppers. That&#8217;ll be $11.94&#8243;<br />
You: &#8220;And my two liter soda.&#8221;<br />
Pizza guy: &#8220;Oh, right. And your two liter soda. Your total is $14.98.&#8221;</p>
<p>If you want your customers to think your application is the digital equivalent of the stoned-out pizza guy, go ahead and implement a naive eventual consistency model. If you want people to take you seriously and actually <em>use</em> your site, take some time to read and understand what Dr. Werner Vogel, CTO and Vice President of Amazon.com has to say about eventual consistency in his article <a href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html">Eventually Consistent &#8211; Revisited</a>.</p>
<p>Pay particular attention to the section titled <strong>Consistency&#8211;Client and Server</strong>, where he talks about variations and conflict resolution. Especially notice that at no time does he mention the possibility of a process seeing <em>old</em> data. That is, if a process updates a data item and subsequently reads that data item back, it will never receive an old value. Using our shopping cart example, the client <em>knows</em> that it added an item to the cart. The server returned an acknowledgement. If the client subsequently reads the cart, <em>that item better be there</em>! If the previously added item is not in the cart, your program is in error, and no amount of explaining to the user is going to change that.</p>
<p>The potential for the user getting an inconsistent view of the data has to be immediately obvious to any programmer who&#8217;s competent enough to write the application. That being the case, I have to conclude that the programmer somehow thinks it&#8217;s okay to confuse the user. What really astonishes me is that the people in charge&#8211;the product designers and managers&#8211;accept it when programmers say, &#8220;That&#8217;s just the way things are when you write a scalable system.&#8221; One need only point to Amazon.com for a counter example.</p>
<p>An eventual consistency model that presents an inconsistent view to the client is just plain broken. I have yet to see a reasonable defense for confusing the user.</p>
<p>It&#8217;s relatively easy to provide session consistency (see Dr. Vogel&#8217;s blog post) so that the client&#8217;s view remains consistent, and there are simple and effective conflict resolution strategies you can use on the back end to ensure that your eventually consistent data model doesn&#8217;t remain perpetually inconsistent.</p>
<p>Eventual consistency is just one way to extend the scalability of a large distributed Web site. One can go very far (much further than the buzzword artists will have you believe) using a traditional transaction-based system. It&#8217;s always a good idea to start with a transactional system because it&#8217;s easier to implement and prove correct, and it will serve the needs of all but the largest of sites. Eventual consistency is an <em>optimization</em> that&#8217;s designed to extend the capatilities of a larger, working, system. It&#8217;s quite likely that, if it comes time to extend your system with an eventual consistency model, you can add it as a layer between the client-facing front end and the server-based back end. Large parts of your existing system won&#8217;t change. That won&#8217;t be the <em>final</em> word on scalability, but it will let you add capacity that will handle the traffic while you implement the final solution.</p>
<p>Starting with an eventual consistency model is premature optimization, with all of the customary pitfalls. It&#8217;s likely that a premature implementation will optimize the wrong thing and not scale the way you intended, because what you <em>think</em> will be the hot spots (the areas that need to be optimized) when you start rarely turn out to be the hot spots by the time you get around to having performance problems. Writing an eventual consistency model when you&#8217;re struggling to get a handful of users for your product is wasted effort. Worse, it&#8217;s wasted effort up front that costs even more down the road when you realize that you have to throw it out and start over.</p>
<p>My advice for those who are considering an eventual consistency model is the same as what I give to those who think their program needs to be as fast as possible. Make your program work. <em>Then</em> make it scale. It takes a bit of effort to make a working system scale, true. But if you don&#8217;t have a working system to start with, you won&#8217;t have enough customers to make scaling worthwhile.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2012/01/31/eventual-consistency-and-client-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A billion dollars that nobody wants</title>
		<link>http://blog.mischel.com/2012/01/25/a-billion-dollars-that-nobody-wants/</link>
		<comments>http://blog.mischel.com/2012/01/25/a-billion-dollars-that-nobody-wants/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 18:48:32 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Idiocy]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1519</guid>
		<description><![CDATA[<p>If you&#8217;re looking for examples of Congressional idiocy, it&#8217;s hard to beat the story of $1 Billion That Nobody Wants. In short, there are about 1.5 billion one-dollar coins piled in bags in Federal Reserve vaults. Why? Because nobody wants them. Why is the U.S. Mint still making them? Because Congress said so.</p> <p>Congress has <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2012/01/25/a-billion-dollars-that-nobody-wants/">A billion dollars that nobody wants</a></span>]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re looking for examples of Congressional idiocy, it&#8217;s hard to beat the story of <a href="http://www.npr.org/2011/06/28/137394348/-1-billion-that-nobody-wants">$1 Billion That Nobody Wants</a>. In short, there are about 1.5 billion one-dollar coins piled in bags in Federal Reserve vaults. Why? Because nobody wants them. Why is the U.S. Mint still making them? Because Congress said so.</p>
<p>Congress has been trying to shove dollar coins down our throats since the introduction of the Susan B. Anthony dollar in 1979. That turned out to be one of the most unpopular coins in U.S. history, and production stopped after 1981. An increase in dollar coin usage (primarily from vending machines) resulted in another 50 million or so coins being minted in 1999.</p>
<p>In 2000, Congress mandated the Sacagawea dollar. About 1.3 billion of them were minted in that year. Not surprisingly, the coin was highly unpopular with most people, and the number of coins minted per year dropped off sharply.</p>
<p>Still undeterred, Congress passed the Presidential $1 Coin Program in late 2005. This program, modeled after the State Quarter program, began in 2007 and will continue until 2016. It directs the U.S. Mint to product dollar coins with engravings of the presidents on one side. This despite warnings from the Congressional Budget Office that there would be low demand, and the Government Accountability Office warning that unused coin stockpiles and storage costs would increase.</p>
<p>Shockingly, demand for the coins is almost non-existent. Collectors want them. Nobody else cares.</p>
<p>It gets even sillier. Proponents of the Sacagawea dollar were reluncant to sign on to the Presidential coin program until some genius added a provision saying that the Sacagawea dollar must account for at one of every three dollar coins minted in any year.</p>
<p>A couple of quotes from the NPR article struck me as especially funny.</p>
<blockquote><p>Members of Congress reasoned that a coin series that changed frequently and had educational appeal would make dollar coins more popular. The idea came from the successful program that put each of the 50 states on the backs of quarters.</p></blockquote>
<p>This is a <em>perfect</em> example of Congressional reasoning. They failed to grasp the most important point. The State Quarter program didn&#8217;t magically make people like quarters. People <em>already</em> used quarters. A lot. On the other hand, 30 years of experience show us that people in general just don&#8217;t like the dollar coin. One has to think that, after a few major redesigns and a few minor redesigns, the <em>design</em> isn&#8217;t the problem. The American public doesn&#8217;t want a dollar coin! Stop wasting time and money trying to force one on us.</p>
<p>Here&#8217;s the other quote that I found especially amusing. Or frightening, I suppose.</p>
<blockquote><p>Leslie Paige, who represents watchdog group Citizens Against Government Waste, says the government should withdraw the dollar bill from the market and force Americans to use the coins.</p>
<p>&#8220;I think Americans will definitely embrace the dollar coin if they&#8217;re just given the opportunity,&#8221; she says.</p></blockquote>
<p>There is a difference, Ms. Paige, between <em>giving me an opportunity</em> and <em>forcing me</em> to use the coin. Please consult your dictionary. And by the way, the most optimistic projections of cost savings by switching from the dollar bill to the dollar coin are about $5 billion over 30 years. That works out to $166 million per year, or less than 5% of what it costs to run Congress for a single year. Just cut the staff of every Senator and Representative by one person, and we&#8217;d make up the difference.</p>
<p>But has Congress passed a bill to stop the insanity? Of course not. That would make too much sense. Instead, the Obama Administration has announced that minting of the coins for circulation will be suspended. They&#8217;ll still make some for collectors, but that&#8217;s about it.</p>
<p>I&#8217;ll grant that the amount of money we&#8217;re talking about is small. But the reasoning behind the dollar coin idiocy is exactly the same as the reasoning behind everything Congress does. They invent problems and then invent solutions that wouldn&#8217;t solve the problems, even if the problems really existed. And yet we continue to <em>choose</em> to pay these people and give them power over us.</p>
<p>We really need to wake up.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2012/01/25/a-billion-dollars-that-nobody-wants/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>There is one source of Truth</title>
		<link>http://blog.mischel.com/2012/01/18/there-is-one-source-of-truth/</link>
		<comments>http://blog.mischel.com/2012/01/18/there-is-one-source-of-truth/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 03:45:15 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1516</guid>
		<description><![CDATA[<p>In religion, politics, and other endeavors, Truth is an elusive goal. Depending on your beliefs, Truth might be found in the Bible, the Torah, Koran, the Democratic Party platform, or the lessons you learned while traipsing through the woods. Truth, in most endeavors, is highly subjective.</p> <p>Truth is subjective in programming, too. If you have <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2012/01/18/there-is-one-source-of-truth/">There is one source of Truth</a></span>]]></description>
			<content:encoded><![CDATA[<p>In religion, politics, and other endeavors, Truth is an elusive goal. Depending on your beliefs, Truth might be found in the Bible, the Torah, Koran, the Democratic Party platform, or the lessons you learned while traipsing through the woods. Truth, in most endeavors, is highly subjective.</p>
<p>Truth is subjective in programming, too. If you have any doubt, just ask a dozen different programmers to tell you what is the best programming language, the best indentation style, whether domain driven design is a good idea, or whether inversion of control is just a fancy way to say, &#8220;do things in the most complicated way possible.&#8221; There are plenty of &#8220;truths&#8221; in programming.</p>
<p>But in a computer program, there can be only one source of Truth. That is, there can be many representations of the data that your program relies on, but only one representation can be considered the Authority. If you create different views of the data or cache some data in order to speed access, you are <em>making a copy</em> that at some point will differ from the Authority. It is no longer Truth.</p>
<p>Once you do this, you have to make a decision. Your choices are:</p>
<ol>
<li>Periodically invalidate the cache so that it will be updated from the Authority from time to time. This ensures that your cache will reflect the Authority with a maximum latency of some given period of time. The cache represents Truth as it existed the last time the cache was refreshed.This technique works well if your program can function well with data that is slightly out of date. We use this technique in the crawler to cache robots.txt files. If we always required the most up to date robots.txt, the crawler would have to issue two Web requests for every page it downloaded (one for robots.txt, and then one for the page). Instead, our crawler caches a site&#8217;s robots.txt file for a maximum of 24 hours. Truth, in this case, is &#8220;as it existed the last time I downloaded the robots.txt,&#8221; which will never be more than 24 hours out of date.</li>
<li>Update the cache whenever the Authority changes. This sounds like a good idea, but there are drawbacks.First, the Authority has to be built with caching in mind, and must supply an API that clients can plug in to. The clients have to accept the Authority&#8217;s caching API, which might be overly restrictive.
<p>This can also put an unacceptable performance burden on Authority updates, especially if more than one client is updating its cache. If the Authority has to call each client&#8217;s update method, then update speed is limited by the speed of all the subscribed clients. If, instead, the Authority posts updates to a message queue, then there won&#8217;t be a perceptible delay in Authority updates, but there will be a non-zero and potentially large latency in the cache updates.</p>
<p>There are many ways of reacting to an update message posted by the Authority. The simplest is to invalidate any cache of the affected data. That can be quite effective, but you have to be careful that your caching layer knows exactly what data it&#8217;s holding on to. That turns out to be a rather difficult task, at times.</p>
<p>This update strategy is usually used when you want to maintain an up-to-date view of the Authority data, but with a different organization. It works best when updates are infrequent. If you&#8217;re doing frequent updates to the view, you probably want to re-think the Authority and have it maintain a view that&#8217;s more amenable to however you&#8217;re querying it.</li>
<li>Understand that your alternate view is a snapshot of Truth as it existed at some point in time, and it is never updated. This works well if you&#8217;re reporting on a snapshot, but it&#8217;s not a good general caching solution.</li>
</ol>
<p>There are hybrid solutions that combine options 1 and 2, but in general that&#8217;s pretty rare. It seems like the height of folly to implement option 3 if you&#8217;re working with live data, but it&#8217;s distressingly easy to fall into that trap inadvertently. For example, you might build a denormalized view of some data in your database because querying the normalized view is prohibitively expensive. You initially use that denormalized view for reporting purposes, but then you foolishly decide that you can use it for other things, too. Pretty soon, large parts of your system are depending on the denormalized view, and changes to the Authority aren&#8217;t reflected, or aren&#8217;t reflected quickly enough. At that point, your system is broken because your user interface isn&#8217;t reflecting Truth.</p>
<p>My experience with relational databases has been that if you denormalize the data, you cannot rely on it reflecting any further changes. You can try to write your code so that it maintains the denormalized view whenever updates are made to the normalized data, but those efforts will almost certainly fail. This is especially true over time, when the original developer moves off the project and somebody new who doesn&#8217;t understand all of the denormalized structures is assigned to the project. The result is &#8230; well, it&#8217;s not pretty. I&#8217;ve <em>never</em> seen a case in which trying to maintain two separate views of a database worked well over the long term. <em>Don&#8217;t try it!</em></p>
<p>Where relational databases are concerned, your best bet is to design your database so that you can update and query it efficiently. If it&#8217;s still too slow after you&#8217;re sure that your design is as good as it can be, then you throw hardware at the problem: more memory, a faster processor, faster drives, or a distributed databse.</p>
<p>Note that I&#8217;m not necessarily advocating a fully normalized database design. There are very good and compelling reasons to design your database to be partially denormalized. What I&#8217;m arguing against is maintaining a denormalized view in addition to a fully normalized view. I know that it&#8217;s possible with triggers and other such database machinery. It can even be done well if you fully understand the ramifications of what you&#8217;re doing and if you are meticulous in adding and maintaining your triggers. I&#8217;ve found, though, that most development teams are incapable of that level of attention to detail.</p>
<p><a href="http://www.quotationspage.com/quotes/Segal's_Law">Segal&#8217;s Law</a> states, &#8220;A man with a watch knows what time it is. A man with two watches is never sure.&#8221; The same holds true when you have more than one source of Truth in your system. You have to understand that, unless you&#8217;re querying the Authority, the data you get back will be, at best, slightly out of date. At worst, it will be so wildly out of date that it&#8217;s <em>just plain wrong</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2012/01/18/there-is-one-source-of-truth/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why you shouldn&#8217;t use the .NET sort</title>
		<link>http://blog.mischel.com/2012/01/12/why-you-shouldnt-use-the-net-sort/</link>
		<comments>http://blog.mischel.com/2012/01/12/why-you-shouldnt-use-the-net-sort/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 05:47:14 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1512</guid>
		<description><![CDATA[<p>My friend and business partner David Stafford recently posted a blog entry, .Net&#8217;s Sort Is Not Secure. Don&#8217;t Use It. Here&#8217;s a Better One, in which he shows that the .NET sort implementation (used by Array.Sort and List.Sort, and possibly others) can easily be made to exhibit pathological behavior.</p> <p>How bad is it? You can construct <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2012/01/12/why-you-shouldnt-use-the-net-sort/">Why you shouldn&#8217;t use the .NET sort</a></span>]]></description>
			<content:encoded><![CDATA[<p>My friend and business partner David Stafford recently posted a blog entry, <a href="http://zimbry.blogspot.com/2012/01/nets-sort-is-not-secure-dont-use-it.html">.Net&#8217;s Sort Is Not Secure. Don&#8217;t Use It. Here&#8217;s a Better One</a>, in which he shows that the .NET sort implementation (used by <code>Array.Sort</code> and <code>List.Sort</code>, and possibly others) can easily be made to exhibit pathological behavior.</p>
<p>How bad is it? You can construct an array of one million items that will take the .NET sort implementation more than 80 minutes to sort. The <em>average</em> case is something like half a second.</p>
<p>David&#8217;s contention is that this is a security vulnerability. Others might disagree with him, but that&#8217;s their choice. David is right. It&#8217;s not a great stretch to imagine a Web site that, as part of its work, accepts a data file from a user and sorts that data. A malicious user, knowing that the server is using .NET, could construct a data file that causes the sort to exhibit this pathological behavior, causing the site to become unresponsive. This is nothing short of a denial of service attack, made possible by the poor sorting implementation. As David shows in his post, it&#8217;s not terribly difficult to construct a worst-case array.</p>
<p>That fits very comfortably within the definition of a <a href="http://en.wikipedia.org/wiki/Vulnerability_(computing)">security vulnerability</a>.</p>
<p>David makes two other assertions: that the sort is inflexible, and that the sort is slower than it should be, even in the absence of a malicious adversary.</p>
<p>The sort is somewhat flexible in that it lets you supply a comparison delegate. It does not, however, let you supply a swap delegate. That&#8217;s okay in many cases. However, if you&#8217;re sorting large structures (value types), or if you want to do an indirect sort (often referred to as a tag sort), a swap delegate is a very useful thing to have. The LINQ to Objects sorting algorithm, for example uses a tag sort internally. You can verify that by examining the source, which is available in the <a href="http://referencesource.microsoft.com/netframework.aspx">.NET Reference Source</a>. Letting you pass a swap delegate would make the thing much more flexible.</p>
<p>David&#8217;s tests show that the .NET sort implementation <em>is</em> slower than it could be. In my opinion, it&#8217;s slower than it <em>should</em> be. David&#8217;s implementation is faster than the .NET sort in the general case, and doesn&#8217;t exhibit pathological behavior in the worst case. The worst case is so terrible, in fact, and so easy to provoke, that the .NET sort should be rewritten.</p>
<p>And yet, the .NET team has <a href="https://connect.microsoft.com/VisualStudio/feedback/details/716864/nets-sort-is-not-secure-and-is-vulnerable-to-an-attacker-who-can-use-it-to-create-a-dos-attack">refused to address this issue</a>. At best, that&#8217;s irresponsible. One can only hope that enough users log in and vote to have the issue addressed, forcing the .NET team to reconsider their decision.</p>
<p>David also noted that <a href="http://zimbry.blogspot.com/2012/01/linq-sorting-is-also-vulnerable.html">LINQ sorting is also vulnerable</a>. What he didn&#8217;t point out is that LINQ to Objects uses a completely different algorithm than does <code>Array.Sort</code>. The LINQ to Objects sort is a standard naive Quicksort implementation. As you can see from <a href="http://dl.dropbox.com/u/40731642/sorting_performance_tests/sort_performance_tests_linq.html">his timings</a>, the LINQ sort is 50% <em>slower</em> than the already tortoise-like <code>Array.Sort</code> in the face of an adversary.</p>
<p>Understand, the .NET sort will be faster than David&#8217;s Introsort in the general case if you&#8217;re sorting a primitive type (<code>int</code>, <code>double</code>, etc.) <em>and</em> you&#8217;re not supplying a comparison delegate. The .NET sort is faster in that case because it has special-case code to sort primitive types. If David took the time to make special-case additions to his sorting algorithm, it would outperform the .NET sort in those cases, as well.</p>
<p>Of course, even the special-case sorts in the .NET runtime are vulnerable in the face of an array constructed to provoke the worst case.</p>
<p>So take David&#8217;s advice: don&#8217;t rely on the .NET sort. <a href="http://dl.dropbox.com/u/40731642/sorting_performance_tests/sorting_experiments.zip">Download his code and use it.</a></p>
<p>I&#8217;m considering putting together something similar to replace the LINQ to Objects sort. The general idea is to create a class called <code>SafeOrderedEnumerable</code> that implements <a href="http://msdn.microsoft.com/en-us/library/bb534852.aspx">IOrderedEnumerable</a>, and uses David&#8217;s Introsort in the <a href="http://msdn.microsoft.com/en-us/library/bb548559.aspx">CreateOrderedEnumerable</a> method. To invoke it, I&#8217;ll create extension methods <code>SafeOrderBy</code> and <code>SafeOrderByDescending</code> so that you can write, for example:</p>
<pre>var sorted = myList.SafeOrderBy(x =&gt; x);</pre>
<p>That should put LINQ to Objects sorting in the same ballpark as sorting an array. Not the same, of course, but close, and it will avoid the potential pathological cases.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2012/01/12/why-you-shouldnt-use-the-net-sort/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In Praise of Technical Debt</title>
		<link>http://blog.mischel.com/2012/01/07/in-praise-of-technical-debt/</link>
		<comments>http://blog.mischel.com/2012/01/07/in-praise-of-technical-debt/#comments</comments>
		<pubDate>Sat, 07 Jan 2012 23:42:41 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1507</guid>
		<description><![CDATA[<p>The term &#8220;technical debt&#8220;, as commonly used, refers to the eventual consequences of poor software design or development practices. The Wikipedia article and most other references consider technical debt to be a Very Bad Thing. The literature is filled with examples of development projects whose combined technical debt eventually killed or seriously hampered the company.</p> <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2012/01/07/in-praise-of-technical-debt/">In Praise of Technical Debt</a></span>]]></description>
			<content:encoded><![CDATA[<p>The term &#8220;<a href="http://en.wikipedia.org/wiki/Technical_debt">technical debt</a>&#8220;, as commonly used, refers to the eventual consequences of poor software design or development practices. The Wikipedia article and most other references consider technical debt to be a Very Bad Thing. The literature is filled with examples of development projects whose combined technical debt eventually killed or seriously hampered the company.</p>
<p>There is a huge amount of literature about techniques that claim to reduce or eliminate technical debt, and there are countless design patterns and development practices to go along with all that talk. Those patterns and practices are usually ways of paying up front in order to avoid having to pay more later. And to the extent that they actually meet that standard, using them to avoid technical debt is a Good Thing.</p>
<p>Unfortunately, those techniques intended to avoid technical debt often cause more problems than they solve, and end up costing more than it would have cost to incur the debt.</p>
<p>It seems every software development pundit preaches a &#8220;no technical debt&#8221; sermon with the fervor that some misguided financial advisors preach a debt-free lifestyle. And, unsurprisingly, they all have their books, articles, software packages, and training programs that will teach you how to avoid technical debt.</p>
<p>Yes, there&#8217;s a lot of hype and snake oil in the software development methodologies business.</p>
<p>Just as there are sound financial reasons to incure financial debt, there are sound business and technical reasons to incur technical debt. In fact, I think technical debt is more often a Good Thing. It takes a very strong argument to convince me <em>not</em> to incur many kinds of technical debt.</p>
<p>The comparison of technical debt to financial debt isn&#8217;t perfect. When you take out a loan at a bank or other lending institution, you agree unconditionally to repay the money you borrowed, with interest. I know of only two ways you can avoid paying: bankruptcy and death. In bankruptcy, you might have to pay back some of the debt, and if you die you might leave somebody else with the obligation. And, of course, there are the ongoing interest payments. Financial debt is never free.</p>
<p>Technical debt, on the other hand, often doesn&#8217;t need to be repaid, and often has no interest payments. It&#8217;s a &#8220;free loan.&#8221; Not always, but in my experience more often than not. And when it <em>does</em> have to be repaid, the cost is usually quite reasonable.</p>
<p>A good example of technical debt causing a problem is in the development of a Web site. Imagine that you have an idea for a new site. You slap something together in a few days or weeks, post it on your site, and it&#8217;s immediately a hit. Within weeks you&#8217;re getting more traffic than your poor server can handle. It&#8217;s pretty easy to expand your site to use more Web servers, but then your back end database server melts down under the load. That, too, is easily expanded, but eventually you reach a point where some critical component of your system is the bottleneck and there&#8217;s no easy way to scale it. Adding a faster server with more memory and a faster hard disk just postpones the problem.</p>
<p>You incurred the technical debt when you slapped together a simple Web site without taking into account the possibility of massive scaling, and now it&#8217;s time to repay that debt. And it&#8217;s <em>painful</em>. You also have to do it pretty quickly or all your customers will move over to your competitor who spent six months developing a copycat site. You might find yourself, as you crawl in bed at 9 A.M. after another sleepless night trying to retrofit your program, lamenting the decision to ignore the scaling problem in favor of getting something working. And you vow never to do that again.</p>
<p>That&#8217;s the wrong attitude to have.</p>
<p>To hear the &#8220;no technical debt&#8221; preachers tell it, the world is full of failures who would have succeeded had they not taken on the technical debt. And they&#8217;ll show you the successes who refused to incur technical debt, opting instead to spend the extra time required to &#8220;do it right&#8221; up front. What they won&#8217;t tell you about, because they don&#8217;t know of or choose not to mention, are the many successes who just &#8220;slap things together&#8221; and deal with the consequences successfully, and the many projects that fail despite &#8220;doing it right.&#8221; Most sites fail, not because they didn&#8217;t develop their software correctly, but because their product idea just didn&#8217;t fly. It doesn&#8217;t matter how well your code base is constructed if your business idea just doesn&#8217;t work.</p>
<p>The &#8220;no technical debt&#8221; preachers are wrong, plain and simple. They&#8217;ll have you believe that <em>any</em> technical debt will crater your project. Even those who say that you should repay technical debt as soon as possible are wrong. As with financial debt, the secret to managing technical debt is to examine each case and make an informed decision. You have to balance the likelihood of having to repay the debt against the cost of repaying it. It&#8217;s a simple (in concept, sometimes not in implementation) risk / reward calculation. What is the risk of incuring the debt, and what is the potential reward?</p>
<p>In the case of our hypothetical Web startup, the risk is that your server melts down before you can modify the code to be more scalable. But the likelihood of that risk is pretty darned small. First, you have to build something that people actually care about. The truth is that most Web startups turn on their servers and hear crickets. A few people will come to check it out, yawn, and move on to the next cool new thing. If that happens, you&#8217;ll be really happy that you didn&#8217;t waste a bunch of time writing your code to support massive scaling.</p>
<p>Even if your site starts getting traffic, it&#8217;s not like you&#8217;ll get a million dedicated users the first month. You&#8217;ll see traffic growing, and you&#8217;ll have time to refactor or rewrite your code to meet the increased demand. It might be painful&#8211;rewrites often are&#8211;but it&#8217;s unlikely that traffic will increase so quickly that you can&#8217;t keep up with it.</p>
<p>Some developers attempt to design for scalability to start, and in doing so end up making <em>everything</em> scalable. They spend a lot of time building a scalability framework so that every component can be scaled easily. Every component is split into smaller pieces, and each of those pieces is built to be scaled. That sounds like a good idea, but there are huge drawbacks to doing things that way.</p>
<p>The first problem is that not all components need to support massive scaling. In most software systems, there are one or two, or at most a small handful of, components that are bottlenecks. Designing those to be scalable is a Good Thing. Time spent making anything else scalable is a waste of resources. Even the time spent on those things you <em>think</em> will be bottlenecks is often a waste, because it&#8217;s incredibly difficult to tell where the bottlenecks will be before you have customers banging on your site. In all too many cases, designing for scalability is like attempting to optimize a program as you&#8217;re writing it&#8211;before you do a performance analysis to determine where the bottlenecks are.</p>
<p>Designing for scalability makes the code more complicated. Techniques like <a href="http://en.wikipedia.org/wiki/Dependency_injection">dependency injection</a> and <a href="http://en.wikipedia.org/wiki/Inversion_of_control">inversion of control</a> are very effective ways to create more flexible systems, but they make the code more complicated by inserting levels of indirection that often are difficult to follow. Taken to their extremes, these and similar techniques create a bloated code base that is less maintainable and harder to change than the &#8220;old style&#8221; code they replace. This is especially true when a development team loses sight of the objective (make something that works) in favor of the process. When you see a system that has a nearly one-to-one mapping between interface and implementation, you know that the people who designed and wrote it lost sight of the forest because they were too busy examining the trees.</p>
<p>Those who preach these techniques for reducing or eliminating technical debt assume that you know what you want to build and how you want to build it, and that you have the time and resources to become fully buzzword compliant. In the case of a startup business, <em>none</em> of those three things is true. Startups typically have a few guys, an idea, and lots of enthusiasm. They&#8217;re going to try things, quickly building a prototype, making it available for others to look at, and then discarding it to try something else when that idea fails. Eventually, if they&#8217;re lucky, they&#8217;ll hit on something that seems to resonate, and they&#8217;ll start concentrating on that idea. That&#8217;s the nature of a startup. Those guys are living on ramen noodles and little sleep, hoping that one of their ideas strikes a nerve before the savings runs out. They don&#8217;t have time to waste worrying about things like technical debt.</p>
<p>Fred Brooks, in his 1975 book <a href="http://en.wikipedia.org/wiki/The_Mythical_Man-Month">The Mythical Man Month</a>, famously said:</p>
<blockquote><p>The management question, therefore, is not <em>whether</em> to build a pilot system and throw it away. You <em>will</em> do that. […] Hence <em>plan to throw one away; you will, anyhow.</em></p></blockquote>
<p>That&#8217;s as true today as it was back then. Time spent making your first prototype scalable is wasted. You&#8217;re going to throw it away. If you&#8217;re lucky, you&#8217;ll reuse some of your underlying technology. But most of what you write in your first attempt will be gone by the time you finish the project.</p>
<blockquote><p>As an aside, there are those who say that <em>The Mythical Man Month</em> is outdated and that many of its lessons are irrelevant today because we have faster, more powerful, and less expensive computers, better tools, and smarter programmers. I&#8217;ve seen studies showing that programmer productivity is five to ten times what it was 35 years ago. Whereas programmers <em>can</em> do more in less time today, we&#8217;re also trying to build systems that are two or more orders of magnitude larger and more complex than the systems being built back then. Brooks&#8217; cautions are <em>more</em> pertinent today than they were in 1975 because teams are larger and the problems we&#8217;re trying to solve are more difficult.</p></blockquote>
<p>Avoiding technical debt is like paying for flood insurance and building a dike around your house when you live on the top of a mountain in the desert. Sure, it&#8217;s <em>possible</em> that your house will flood, but it&#8217;s highly unlikely. And if the water does get that high, you have much more pressing problems that make the insurance policy and the dike irrelevant. You&#8217;ve wasted time, money, and other resources to handle an event that almost certainly won&#8217;t ever occur, and if it does occur, your solution won&#8217;t matter one bit.</p>
<p>So go ahead and incur that technical debt, but do so intelligently, with full knowledge of what it will cost you <em>if</em> it comes due. But know also that in many, perhaps most, cases, you&#8217;ll never have to pay it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2012/01/07/in-praise-of-technical-debt/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Half of forever is still forever</title>
		<link>http://blog.mischel.com/2012/01/03/half-of-forever-is-still-forever/</link>
		<comments>http://blog.mischel.com/2012/01/03/half-of-forever-is-still-forever/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 05:24:25 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1502</guid>
		<description><![CDATA[<p>You&#8217;re probably wondering if this is really necessary. Believe me, I&#8217;m a bit surprised by it myself. But every day I see evidence that supposedly competent programmers don&#8217;t understand this fundamental point.</p> <p>What am I talking about?</p> <p>Let&#8217;s say you have two lists. One is a list of accounts and the other is a list <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2012/01/03/half-of-forever-is-still-forever/">Half of forever is still forever</a></span>]]></description>
			<content:encoded><![CDATA[<p>You&#8217;re probably wondering if this is really necessary. Believe me, I&#8217;m a bit surprised by it myself. But <em>every day</em> I see evidence that supposedly competent programmers don&#8217;t understand this fundamental point.</p>
<p>What am I talking about?</p>
<p>Let&#8217;s say you have two lists. One is a list of accounts and the other is a list of transactions that you need to apply to those accounts. Each type of record has an <code>AccountNumber</code> property. In essence, you have this:</p>
<pre>class Account
{
    public int AccountNumber { get; private set; }
    // other properties and methods
}

class Transaction
{
    public int AccountNumber { get; private set; }
    // other properties and methods
}

List Accounts;
List Transactions;</pre>
<p>The items in the <code>Accounts</code> and <code>Transactions</code> lists aren&#8217;t in any particular order. Your task is to create output that groups the accounts and transactions, so it will look like this:</p>
<pre>Account #1
    Transaction
    Transaction
Account #2
    Transaction
    Transaction
    Transaction
...etc</pre>
<p>The naive way to do this is, for each account, search all the transactions to see if its <code>AccountNumber</code> field matches the account number. Something like:</p>
<pre>foreach (var account in Accounts)
{
    Console.WriteLine("Account #{0}", account.AccountNumber);
    foreach (var trans in Transactions)
    {
        if (trans.AccountNumber == account.AccountNumber)
        {
            // output transaction
        }
    }
}</pre>
<p>If the number of accounts and transactions is even moderately large, this is going to take a very long time. If we say that <code>m</code> is equal to the number of accounts and <code>n</code> is the number of transactions, then this will take time proportional to <code>m * n</code>. Imagine you have 10,000 accounts and 5,000 transactions. Your code will look at every transaction 10,000 times, meaning you end up doing 50 million comparisions.</p>
<p>The faster way to do this is to sort the accounts and the transactions, and then do a <a href="http://en.wikipedia.org/wiki/Merge_algorithm">standard merge</a>, which is among the oldest concepts in computing. The merge itself takes time proportional to <code>m + n</code>, but sorting is a little more expensive. Sorting will take <code>m log m</code> time for the accounts and <code>n log n</code> for the transactions. So the total time is <code>(m log m) + (n log n) + m + n</code>. Let&#8217;s do the numbers:</p>
<table border="0">
<tbody>
<tr>
<td>Sort accounts</td>
<td>m log m</td>
<td>10,000 * 15</td>
<td>150,000</td>
</tr>
<tr>
<td>Sort transactions</td>
<td>n log n</td>
<td>5,000 * 14</td>
<td>70,000</td>
</tr>
<tr>
<td>Merge</td>
<td>n + m</td>
<td>10,000 + 5,000</td>
<td>15,000</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td></td>
<td></td>
<td><strong>235,000</strong></td>
</tr>
</tbody>
</table>
<p>Now I&#8217;ll be the first to admit that these numbers aren&#8217;t perfectly comparable. That is, sorting the transactions list is more expensive than iterating over it, so the merge won&#8217;t be 200 times faster than the naive method. But it&#8217;ll probably be 100 times as fast. At least. And that&#8217;s with pretty small lists. If you&#8217;re working with 100,000 accounts and a million transactions, you&#8217;re talking maybe 22 million operations for the merge and 100 <em>billion</em> operations for the naive method. The merge will complete in a few minutes (if that), and the naive method will take essentially forever.</p>
<p>In practice you could probably merge those two large lists faster than the naive method would do the smaller lists.</p>
<p>All of the above is elementary computer science. Really, they teach this stuff in 100 level computer science courses. And yet every day I see people asking &#8220;how do I make this faster?&#8221; That&#8217;s bad enough. What&#8217;s worse&#8211;and what makes me fear for the future&#8211;is how many people answer with, &#8220;use multithreading!&#8221; (Or &#8220;Use the Task Parallel Library!&#8221;) It&#8217;s maddening.</p>
<p>If you have a quad-core machine and you get perfect parallelism, then your program will execute in one-fourth of the time it takes to execute on a single thread. But one fourth of 50 million is still 12.5 million. Even if you applied parallel processing to our simple case above, the naive method will <em>still</em> be two orders of magnitude slower than the single-threaded merge.</p>
<p>No amount of parallelism will save a bad algorithm, just as no amount assembly language optimization will make a bubble sort execute faster than quick sort in the general case.</p>
<p>Remember, a better algorithm gives you orders of magnitude (or even more) increases in performance. Parallel processing gives you, at best, <em>linear</em> increases. Spend your time on improving your algorithm. <em>Then</em> worry about how you might bring multiple threads to bear in order to make it faster.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2012/01/03/half-of-forever-is-still-forever/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Elephant on a trampoline</title>
		<link>http://blog.mischel.com/2011/12/31/elephant-on-a-trampoline/</link>
		<comments>http://blog.mischel.com/2011/12/31/elephant-on-a-trampoline/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 05:48:02 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Memories]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1494</guid>
		<description><![CDATA[<p>I laughed out loud when I saw this picture.</p> <p></p> <p>I was nine years old when Dad bought a trampoline. He had us check out some books from the library so we could learn the proper way to jump. I don&#8217;t know how much attention everybody else paid to those books, but I kind of <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2011/12/31/elephant-on-a-trampoline/">Elephant on a trampoline</a></span>]]></description>
			<content:encoded><![CDATA[<p>I laughed out loud when I saw this picture.</p>
<p><a href="http://blog.mischel.com/wp-content/uploads/2011/12/elephanttrampoline.gif"><img class="aligncenter size-full wp-image-1495" title="elephanttrampoline" src="http://blog.mischel.com/wp-content/uploads/2011/12/elephanttrampoline.gif" alt="" width="486" height="391" /></a></p>
<p>I was nine years old when Dad bought a trampoline. He had us check out some books from the library so we could learn the proper way to jump. I don&#8217;t know how much attention everybody else paid to those books, but I kind of glanced through them to get ideas for weird and wacky ways I could court death.</p>
<p>All five of us kids got pretty good on the trampoline. My older brother and sister had better form than I did, but I was the wild man. I&#8217;d try pretty much any trick I heard about, saw a picture of, or saw somebody else do. I even made up a few myself, although I learned later that they weren&#8217;t exactly original.</p>
<p>I do think I came up with original ways to perform unexpected dismounts. On one occasion when I was trying a new trick, I did a back flip right off the trampoline and came down directly in front of my brother, who slowed my descent enough that I landed on my feet. My recollection is that it looked almost like we planned the whole thing. Perhaps his memory of the event is better than mine. I was a bit preoccupied with my life flashing in front of my eyes as I sailed through the air to my doom.</p>
<p>I spent a lot of time on that trampoline from the time Dad bought it until I was 16 or 17. One summer I made a point to spend an hour every day on the thing.</p>
<p>Fast forward 20 years or so when Debra bought me a trampoline for my birthday and I set it up out here in the back yard. It was fun for a while&#8211;a week or so&#8211;but then the new wore off. It wasn&#8217;t nearly as much fun as when I was younger and always had friends who would come over and play on the thing with me. And for whom I could show off my latest trick. Plus, I wasn&#8217;t in nearly as good physical shape at 35 as I was when I was 16. Jumping on a trampoline is <em>work</em>.</p>
<p>We sold the trampoline a few years later, and a few years after that we were at a friend&#8217;s house. He had a trampoline for the kids, so I took it on myself to teach them a few tricks. No crazy flips or anything&#8211;Mom didn&#8217;t want me teaching her kids <em>that</em>, although I did have to see if I could still do that back and a half.</p>
<p>I was showing them how to get some real air (feet at shoulder width, pushing off gradually at the right time, using arms to control balance, etc.). I was getting some pretty good height. Then I noticed that the kids were looking under the trampoline, pointing, and giggling. And I was transported back 30 years to when we first got the trampoline and Dad was showing us how to use it. When he jumped, the mat nearly hit the ground! We all giggled at that.</p>
<p>Anyway, seeing the kids pointing and laughing made me start laughing with the memory. So I bounced, shifted to a sitting position, hit the trampoline with my butt &#8230; <em>and hit the ground</em>! Yes, the combination of my weight and the height I was getting stretched the springs far enough that I hit the ground.</p>
<p>I checked after that. The trampoline&#8217;s weight limit was 200 pounds. And I only weighed 180 at the time. I guess they didn&#8217;t think a 180 pound man could get enough altitude to push the mat that far.</p>
<p>And that&#8217;s why I laughed out loud when I saw the elephant on the trampoline.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2011/12/31/elephant-on-a-trampoline/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More about cache contention</title>
		<link>http://blog.mischel.com/2011/12/29/more-about-cache-contention/</link>
		<comments>http://blog.mischel.com/2011/12/29/more-about-cache-contention/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 21:57:24 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.mischel.com/?p=1491</guid>
		<description><![CDATA[<p>When I started working on yesterday&#8217;s blog entry about cache contention, I built an example program that used an array to illustrate the problem. That is, rather than having a struct that contains four counters, I just allocated an array. It made the code somewhat simpler, as you can see here.</p> const long maxCount = <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.mischel.com/2011/12/29/more-about-cache-contention/">More about cache contention</a></span>]]></description>
			<content:encoded><![CDATA[<p>When I started working on <a href="http://blog.mischel.com/2011/12/28/more-threads-makes-the-program-slower/">yesterday&#8217;s blog entry about cache contention</a>, I built an example program that used an array to illustrate the problem. That is, rather than having a <code>struct</code> that contains four counters, I just allocated an array. It made the code somewhat simpler, as you can see here.</p>
<pre>const long maxCount = 500000000;
const int numThreads = 2;
const int Multiplier = 1;
static void DoIt()
{
    long[] c = new long[Multiplier * numThreads];
    var threads = new Thread[numThreads];

    // Create the threads
    for (int i = 0; i &lt; numThreads; ++i)
     {
         threads[i] = new Thread((s) =&gt;
            {
                int x = (int)s;
                while (c[x] &gt; 0)
                {
                    --c[x];
                }
            });
    }

    // start threads
    var sw = Stopwatch.StartNew();
    for (int i = 0; i &lt; numThreads; ++i)
    {
        int z = Multiplier * i;
        c[z] = maxCount;
        threads[i].Start(z);
    }

    // Wait for 500 ms and then access the counters.
    // This just proves that the threads are actually updating the counters.
    Thread.Sleep(500);
    for (int i = 0; i &lt; numThreads; ++i)
    {
        Console.WriteLine(c[Multiplier * i]);
    }

    // Wait for threads to stop
    for (int i = 0; i &lt; numThreads; ++i)
    {
        threads[i].Join();
    }
    sw.Stop();
    Console.WriteLine();
    Console.WriteLine("Elapsed time = {0:N0} ms", sw.ElapsedMilliseconds);
}</pre>
<p>The purpose of the <code>Multiplier</code> in that program will become evident soon.</p>
<p>Run with a single thread, that code executes in about 1,700 ms on my work machine&#8211;same as the version that uses a <code>struct</code>. But run with two threads, the code takes a whopping <em>25 seconds</em>! At first I thought that this was evidence of cache contention, so I changed the <code>Multiplier</code> to <code>8</code>, which spreads out the counters so that they&#8217;re guaranteed to be on different cache lines. That is, the first thread will access <code>c[0]</code>, and the second thread will access <code>c[8]</code>.</p>
<p>That change did indeed improve the run time. The two thread case went from 25 seconds to about 12 seconds. Cache contention was <em>part</em> of the problem, but certainly not all of it. Remember, the two-thread version of my first test yesterday ran in about 2,200 ms on my system.</p>
<p>I ruled out array indexing overhead, figuring that if it was a problem it would show up in the single-threaded case. After ruling out everything else, I was left with two possibilities: either there&#8217;s some explicit mutual exclusion going on in the runtime, or there&#8217;s some other cache contention that I didn&#8217;t take into account.</p>
<p>It turns out that there <em>is</em> more cache contention. You just can&#8217;t see it directly because it has to do with the way that arrays are allocated.</p>
<p>When the runtime allocates an array, it allocates enough space for the array contents <em>plus</em> a little extra memory to hold metadata: the number of dimensions, the size of each dimension, etc. This is all contained in a single memory allocation. The layout of an array in memory looks like this:</p>
<pre>---------------------------------
|  metadata  | array contents   |
---------------------------------
             ^
          array[0]</pre>
<p>The array metadata is smaller than 64 bytes, so the chances of the first array element sharing the same cache line as part or all of the metadata is very high.</p>
<p>That&#8217;s half of the problem. The other half of the problem is that whenever code needs to access an element in the array, it has to read the metadata in order to do bounds checking and to compute the offset into the memory block. So whenever you write <code>x = a[i]</code> or <code>a[i] = x</code>, the code accesses that metadata.</p>
<p>If the first array element is on the same cache line as the parts of the metadata used for indexing, then every time you modify that first element, any other thread&#8217;s access to the metadata is going to incur a wait for the cache to be flushed. <em>Modifying the first array element invalidates the cached metadata.</em></p>
<p>The reason it&#8217;s worse with arrays than with yesterday&#8217;s program is because every time the code executes <code>--c[x]</code>, it actually makes two array accesses: one to read the current value, and one to write the modified value. Every array access makes multiple requests for the metadata, so there can be multiple stalls per iteration. That&#8217;s not true when accessing fields in a structure like we did yesterday.</p>
<p>The solution is to put some padding in the front of the array, as well as between the items we&#8217;re incrementing. As it stands right now, the indexes being used are 0, 8, 16, and 24. Shifting that right by eight elements would give us 8, 16, 24, and 32. That&#8217;s an easy change to make, as you can see here.</p>
<pre>long[] c = new long[Multiplier * (numThreads+1)];
var threads = new Thread[numThreads];

// Create the threads
for (int i = 0; i &lt; numThreads; ++i) {     threads[i] = new Thread((s) =&gt;
        {
            int x = (int)s;
            while (c[x] &gt; 0)
            {
                --c[x];
            }
        });
}

// start threads
var sw = Stopwatch.StartNew();
for (int i = 0; i &lt; numThreads; ++i)
{
    int z = Multiplier * (i + 1);
    c[z] = maxCount;
    threads[i].Start(z);
}
// Wait for 500 ms and then access the counters.
// This just proves that the threads are actually updating the counters.
Thread.Sleep(500);
for (int i = 0; i &lt; numThreads; ++i)
{
    Console.WriteLine(c[Multiplier * (i + 1)]);
}</pre>
<p>I just showed the parts of the program that have to change. Instead of computing the index using <code>Multiplier * i</code>, I used <code>Multiplier * (i + 1)</code>, the counters array is allocated with <code>Multiplier * (numThreads + 1)</code>.</p>
<p>If you compile and run that program, you&#8217;ll see that the results for one, two, and three threads are almost identical. The result with four threads will be slightly higher, again because system services take a small amount of processor time away from the test program.</p>
<p>What I&#8217;ve been calling cache contention is more often referred to in the literature as <a href="http://en.wikipedia.org/wiki/False_sharing">false sharing</a>. I&#8217;d like to thank Nicholas Butler for answering <a href="http://stackoverflow.com/questions/8672628/why-is-concurrent-modification-of-arrays-so-slow">my Stack Overlow question</a> about this, and pointing me to his article, <a href="http://nickbutler.net/Article/FalseSharing">Concurrency Hazards: False Sharing</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mischel.com/2011/12/29/more-about-cache-contention/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

