Jim’s Random Notes

Musings on technology and life

October 6th, 2008

Hardware Problems

I’ve mentioned before that we use removable drives to transfer data between the data center and the office. Some of those files are very large—50 gigabytes or larger. The other day we discovered an error in one of the files that we had here at the office. The original copy at the data center was okay. Somewhere between when it was created at the data center and when we read it here, an error crept in. There is plenty of room for corruption. The file is copied to the removable, then copied from the removable, transferred across the network, and stored on the repository machine.

The quick solution to that problem is to copy with verify. That takes a little longer, but it should at least let us know if a bit gets flipped.

Saturday we ran into another error when copying a file from the removable drive to its destination on the network:

F: is the removable drive. The machine it was connected to disappeared from the network. I’m still trying to decipher that error message. I can’t decide if we got a disk error or if there was a network error. Did the disk error cause the network error? Or perhaps Windows considers a USB storage device to be a network drive. We removed the drive from that machine, connected it directly to the repository machine, and the copy went just fine. The file checked out okay, leaving me to think that the first machine is flaky.

About a year ago we purchased a Netgear ProSafe 16 port Gigabit Switch (model GS116 v1). It’s been a reliable performer, although it does get a little warm to the touch. Still, we ran it pretty hard and it never had a glitch. We bought another about 6 months ago. Last month, the first one flaked out and started running at 100 Mbps. Not good when you’re trying to copy multi-gigabyte files. This morning, the other one gave up the ghost completely and wouldn’t pass traffic at all.

I suspect that excess heat caused both switch failures. The units were operating in a normal office environment where the ambient temperature is between 75 and 80 degrees. There was no special cooling and we ran the units pretty hard what with the Web crawler and all. As I said, the switch did get very warm to the touch. In a normal office configuration where the switch doesn’t get a lot of traffic, it probably will hold up fine. But I would not recommend this switch for high duty cycles unless you have special cooling for it.

September 22nd, 2008

Removable Drives

Although we’ve moved the crawlers and a large part of our workflow to a co-location facility, we still do some of our processing here at the office.  So on a daily basis I hop on my bike and ride down the road (a little over a mile) to pick up the daily data dump.  We do have a VPN to the data center, but the 100 gigabytes in a daily dump is more than we could download.

We have two USB hard drives that we use for this shuttle service.  It didn’t take us long to learn that it’s best to have two of the same type of drive.  That way all you need to carry back and forth is the drive itself.  You can leave a USB cable and power supply at each location.

In the short time we’ve been doing this, we’ve tested a number of different drives.  My favorite is the 500 GB Maxtor Personal Storage 3200.  It has a very conveinent form factor (just a brick), the power supply is not a wall wart, and it has a standard USB 2.0 connector.  We have used several of these around the office, and I have one at home.  It’s been a reliable drive and a good performer.  Unfortunately, you can’t get them anymore.

We picked up a couple of Iomega eGo 1TB Desktop Hard Drives on sale at Fry’s for about $160 each.  Hard to argue with the price, and the form factor is nice.  However, the drive made a lot of clicking noises, got very hot (uncomfortable to touch), and one of them failed outright.  The other started giving errors and we ended up taking them both back.  Considering my experience with the eGo, and also the experience I’ve had with the 160 GB unit I bought several years ago (slow, noisy, and hot), I would not recommend Iomega removable hard drives.

About two weeks ago we picked up two Seagate FreeAgent 1 TB (it’s a PDF) drives on sale at Fry’s for around $150 each.  I’m not real wild about the form factor and the wall wart power supply, nor do I much care for the mini-USB connector.  But the drives are quiet, fast, and reliable.  They do get significantly hotter than the 500 GB Maxtors that I prefer, but not near as hot as the Iomega drives.  We also have two of the 750 GB FreeAgent drives in the office for backups and haven’t had any problems with them.

So far, I haven’t encountered the mysterious drive removal problem with the FreeAgent drives.  I’m beginning to think that permissions might have something to do with it.  Let me explain why.

I edited the security properties on the FreeAgent drives (right-click on the drive, select Properties, and then click on the Security tab) to give Everyone full control.  This allows me to attach the drive to any of our machines at the office or at the data center, and be able to add, copy, rename, delete, or otherwise manipulate files without trouble.  Since I did that, I haven’t had any trouble removing the drive.  I’ve yet to try this on the other drives I was having trouble with.

August 25th, 2008

Remove Hardware Mystery

Updated

Two different times now, when trying to “safely remove” a USB hard drive, I’ve had this error message pop up when I’m not actually accessing the drive.

I closed all open applications, even logged out to make sure that no user applications were open.  When I logged back in and tried to remove the device, I got the same error message.

The Windows 2008 Resource Monitor tells me that the System process (process ID 4) has two files open on the device:  F:\$LogFile (NTFS Volume Log) and F:\$Mft (NTFS Master File Table).  Why it’s holding those files open when I told it to remove the device is beyond me.  And I have absolutely no idea how to tell Windows to let go of the device.

I just realized that both times this happened, I was logged in to the machine via Remote Desktop.  That shouldn’t be an issue, but it’s probably worth looking into.  I know that I’ve removed drives via Remote Desktop before.

If anybody has a clue why this is happening, I’d sure like to hear about it.

Update Wednesday, August 27

It happened again yesterday, so I downloaded Process Explorer to see if I could get a little more information.  Searching for handles to “F:\” produces these search results:

That tells me what handles are open, but it sure doesn’t give me much in the way of useful information.  It seems like the remove hardware action should tell those services to let go of the handles.

There is a way within Process Explorer to force-close those handles, but attempting to do so results in a dire warning about possibly causing a crash or instability, and I wasn’t prepared at the time to crash my server.  Closing the Remote Desktop window and logging in as Administrator at the machine’s console didn’t allow me to remove the drive.  So I just pulled the plug.  No ill effect.

After disconnecting the drive yesterday, I took it to the data center, copied some stuff to it, and brought it back here.  I connected the drive, ran the program that copies data off the drive, and attempted to disconnect it again.  That time it worked.  As far as I recall, I performed the same steps as I always do.  Why it worked this last time when it hasn’t worked previously is beyond me.

August 20th, 2008

Should VPN be this hard?

Last week we moved the crawlers from our office to a real data center where we can get more, and more reliable, bandwidth.  Getting everything installed and working wasn’t too much trouble, although the next time I have to do something like that I’m going to do a lot more pre-installation work here at the office before taking the machines to the data center.  Installing and configuring 10 machines while standing in the cold, noisy data center isn’t my idea of a good time.

Having machines at the data center means that we need some way to log in and check on them.  Not a problem, as the Cisco security appliance we bought supports VPN.  And configuring the Cisco IPSec VPN was quite simple.  I was pretty happy when, with just an hour of looking at the documentation and fiddling with the configuration, I was able to log in to the VPN from my laptop.  I packed up my stuff and headed back here to get everybody set up to use the VPN.

And then I found out that Cisco’s IPSec VPN client won’t run on 64-bit versions of Windows.  Nor does Cisco have any plans to upgrade it.  Since I’m not willing to create a 32-bit virtual machine just for running the VPN client, that leaves me with the option of configuring the router for some other type of VPN.  And there things get difficult.  The documentation that came with the router doesn’t discuss any type of VPN configuration other than IPSec, and the online documentation I’ve seen makes the assumption that I understand everything there is to know about VPN.  It gets confusing in a real hurry.

There are VPN standards.  There are so many, in fact, that no mere mortal can begin to understand them.  It might as well be a free for all with all those competing protocols.  Just the acronyms are enough to push a questionably sane person such as myself over the edge into babbling lunacy.  I’ve yet to find a document that explains, in terms a reasonably bright person who hasn’t passed Cisco’s certification can understand, how to configure the VPN.  I can’t even find a good discussion of the benefits and drawbacks of the different VPN technologies:  IPSec, L2TP, or SSL.

I also need to configure VPN on our pfSense box here at the office.  That looks almost as daunting as the Cisco’s configuration and the documentation is, if you can imagine, even worse.

I realize that much of my frustration stems from my lack of expertise in this area.  I’m a programmer, not a network admin.  But I have to think that VPN just doesn’t need to be this hard.

I can find lots of “how VPN works” types of discussions online, but they’re presented at a very high level.  There also is plenty of detailed documentation about VPN configurations for very specific situations.  But I’ve found nothing in the middle.  Something like “Simple VPN configuration for people who don’t live and breathe this stuff.”

Pointers to good discussions of the different types of VPN, and good tutorials about configuring VPN on the Cisco ASA or pfSense would be greatly appreciated…

August 4th, 2008

Multicore Crisis?

There’s been some talk recently of the next “programming crisis”: multicore computing. I’ll agree that we should be concerned, but I don’t think we’re anywhere near the crisis point. Before I address that specifically, I think it’s instructive to review the background: why multicore processors exist, how they affect existing software, and the issues involved in writing code to make use of multiple cores.

Moore’s law has been quoted and misquoted so often that it’s almost a cliché. His original statement was simply an observation on the rate at which transistor counts were increasing on integrated circuits, and that he expected the trend to continue for at least 10 years. That was 1965. The trend has continued, and there’s no indication that it will slow.

Some people think Moore’s Law has become something of a self-fulfilling prophecy: because we believe that it’s possible, somehow we strive to make it so. One wonders what would have happened if Moore had said that he expected the rate of growth to increase. Would transistor densities have increased at an exponential rate?

Self-fulfilling prophecy or not, it’s almost certain that the trend in increasing transistor densities will continue (it has through 2007) and that as a result we’ll get ever more powerful CPUs as well as faster, higher-capacity RAM. Absolute processor speed as measured by clock rate will continue to increase, but not at the astounding rates that we saw up to 2005 or so. Quantum effects and current leakage have put a little damper on the rate of growth there. Better materials will solve the problem–are solving the problem–but absent a fundamental breakthrough by the chemists working on the problem, clock speeds won’t be doubling every 18 months like they had been in the recent past. The Clock Speed Timeline graph makes this quite evident.

Today’s trend is towards multiple cores on a single processor, running at a somewhat slower clock rate. The machine I’m writing this on, for example, has a quad-core Intel Xeon processor running at 2 GHz. The clock speed is somewhat slower than you can get in a high-end Pentium, but the multiple cores provide more total computing power. Quad core processors today are quite common. Intel demonstrated an 80-core chip in February of 2007, and promised to deliver it within five years. I fully expect to have a 256-core processor in my desktop computer ten years from now.

The trend towards multiple cores and very slowly increasing clock rates has some interesting ramifications for software developers. In the past, we have depended on more RAM and faster processors to give us some very nice performance boosts. All indications are that the amount of available RAM and the size of on-chip caches will continue to grow, but we can’t count on the biannual doubling of processor speed. Unless we learn to write programs that use multiple cores, we will soon reach a very real performance ceiling.

Not all applications can benefit from multiple cores, but you’d be surprised at how many can. And even in those cases when a single program can’t make use of multiple cores, users still benefit from having a multicore processor because the machine is better at multi-tasking. Imagine running four virtual machines on one computer, for example. If the computer has a single processor core, all four virtual machines and all of the operating system services share that one core. On a quad-core processor, the work load is spread out over all four cores. The result is more processor cycles per virtual machine, meaning that all four virtual machines should run faster.

Software systems that consist of multiple mostly-independent processes can make good use of multicore processors without any modification. Consider a system consisting of two services that are constantly running. On a single-core computer, only one can actually be working at a time. You could almost double performance simply by upgrading to a dual-core processor. Such software systems are quite common, and they require no code changes in order to benefit immediately from the new multicore processor designs.

Contrary to popular belief, writing code that is explicitly multi-threaded–designed to take advantage of multiple cores–isn’t necessarily a huge step up in complexity. Such code can be much more complex than single-threaded code, but it doesn’t have to be. Some programs are more multi-threaded than others. I’ve found it useful to think of programs in terms of the following four levels of complexity:

  1. No explicit multi-threading.
  2. Infrequent, mostly independent asynchronous tasks.
  3. Loosely coupled cooperating tasks.
  4. Tightly coupled cooperating tasks.

Obviously, it’s impossible to draw exact boundaries between the levels, and many programs will use features found in two or more of the levels. In general, I would classify a program by the highest level of multi-threading features that it uses.

Level 1 requires little in the way of explanation. This is the most common type of application in use today. In a batch mode program, execution proceeds sequentially from start to finish. In a GUI program, user interface events and processing execute on the same thread. This type of application has served us well over the years.

Most Windows programmers have some experience with the next level of complexity. A GUI application that performs background processing and periodically updates the display is an example of this type of program. Typically, the program starts the background process, which from time to time raises events which the GUI thread handles and updates the display. Data and process synchronization between tasks is limited to the event handlers that respond to asynchronous events. Modern development environments make it very easy to create such programs. These programs can benefit from multiple processor cores because the background thread can operate independently of the the GUI thread, making the GUI thread much more responsive.

I have found the third level of complexity–loosely coupled cooperating tasks–to be a very useful and relatively simple way to make use of multiple cores. The idea is to construct a program that operates in an assembly line fashion. For example, consider a program that gathers input, does some complex processing of the input data, and then generates some output. Many such programs are processor bound. If you structure the program such that it maintains an input queue, a pool of independent worker threads, and an output queue, then there is little danger of running into the problems that often plague more complex programs. You have to supply synchronization (mutual exclusion locks, or similar) on the input and output queues, but the worker threads operate independently. Using this technique on a quad-core processor, it’s possible to get an almost 4x increase in throughput over a single-core processor, with very little danger of running into resource contention issues.

Written correctly, programs that have multiple tightly-coupled cooperating tasks make the best possible use of processor resources. However, explictly coding thread synchronization is perhaps the most difficult type of programming imaginable. Forgetting to lock a resource before accessing it can lead to unexplained crashes or data corruption. Holding a lock for too long can create a performance bottleneck. Locks that are too granular increase complexity and also the chance for deadlock situations. Locks that are not granular enough will stall worker threads. Race conditions are endemic. Assuming you get such a program working, even a small change will often cause new, unanticipated problems. Writing this kind of code is hard. You’re much better off re-thinking your approach to the problem and casting it as a Level 3 problem. Whatever price you pay in performance will be returned many fold in increased reliability and reduced development time.

If you’re writing a Level 3 or Level 4 program, you should very seriously consider using a existing multi-tasking library if at all possible. Doing so will require that you think about your problem differently, but you leverage a lot of known-working code that is almost certainly more robust in all ways than what you’re likely to write yourself in the time allotted. Two good examples of such libraries are the Parallel Extensions to .NET 3.5 and the Java Parallel Processing Framework. Such libraries exist for many other programming environments. Although still in their infancy, these libraries promise to greatly simplify the move to multicore. If you’re contemplating development of a program that makes good use of multiple cores, you definitely should learn about any parallel computing libraries that support your platform.

So, back to the crisis. Bob Warfield over at SmoothSpan Blog has had and continues to have quite a lot to say about it, and many others share his sentiments. I, on the other hand, don’t think we’re anywhere near the crisis point. Nor do I think we’re likely to get there. Whereas it’s true that most current software isn’t multicore ready, software developers have understood for several years now that they need to begin writing applications that take advantage of multiple processor cores. It’s likely that some shops have taken an ad hoc approach to the problem, and they’re probably suffering with the issues I pointed out above. It’s also likely that many (and I would hope, most) development shops have done the prudent thing and adopted a parallel computing library that takes care of the difficult areas, leaving the programmers to worry about their specific applications. Doing so is no different than adopting an operating system, development environment, GUI library, report generator, or any other third party component–something that development shops have long experience with.

In short, the multicore “crisis” that the doomsayers are warning us about is almost a non-issue. It’s going to require a small amount of programmer retraining and there will undoubtedly be a temporary plateau in the rate at which our processing of data increases, but in a very short time we’ll again have mainstream applications that push all this fancy hardware to its limits.

July 29th, 2008

The Ultimate Development Machine?

In Understanding the Hardware, Jeff Atwood describes his “best bang for the buck developer x86 box,” at a cost of about $1,100.  The system he describes is quite a nice development machine, although it’s probably overkill for a lot of developers.  Seriously.  How many developers do you know who really need a 10,000 RPM drive and a screaming video card?

Surprisingly, he doesn’t mention what case he’s going to put all that fancy hardware in.  I’d really like to know.  I’ve mentioned before that I like the Antec Sonata cases because they’re very quiet.  But with their fans, they almost certainly create more noise than whatever Jeff’s using for passive cooling.

My development machine these days is quite a bit different from what he describes, but I realize that I have somewhat different needs.  I’ll give you a quick rundown.

Start with a Dell Precision 490 case, with power supply and motherboard.  These can be had for under $200 on eBay, or from Dell surplus suppliers.  They’re starting to become a bit scarce on the surplus market now, because most have gone off lease and Dell doesn’t make that model anymore.  One drawback to this system is that it creates a bit more noise than the Antec case, but I’ve found that I can accept a certain amount of noise.  And it’s hard to beat the price.

Add a quad-core Xeon E5335 processor running at 2 GHz.  Granted, 2 GHz isn’t exactly blindingly fast, but it’s quite well suited to the work that I do.  Unlike most developers, the code I’m working on does benefit from multiple cores.  The motherboard in this 490 has two processor slots, so I could potentially run two of those quad-core Xeons.  And I can make good use of all eight cores.  The Xeon is pretty pricey if you buy it new.  You might consider picking one up on eBay.  We’ve purchased dozens of these processors on eBay and haven’t had a problem with any of them.

I would have been shocked a year ago if somebody told me that I’d have a need for more than 8 gigabytes of RAM.  But the stuff I’m doing is memory hungry in the extreme.  This is another reason we go for the Dell 490 motherboard:  it was one of very few that supported 16 gigabytes a year ago, and I use every bit of it.  At about $80 for four gigabytes, memory is still a bit expensive.  But the stuff we’re working on really does need all the memory it can get.

I also use a lot of disk space.  Hard disk speed is important, but capacity is way more important to me.  I’ve loaded the box with two 7,200 RPM 750-gigabyte drives.  Terabyte drives are available, but at a huge premium.  The 750 GB drives go for about $120, or 6.25 cents per gigabyte.  A terabyte drive will run about $220, or 22 cents per gigabyte.  If I need more storage, I’ll find a way to shoehorn a third drive into this Dell box.

I’m not writing computer games, and I’ve turned off all the fancy Windows Aero features that do nothing but annoy me and chew up system resources.  My video card is a low-end ATI Sapphire 1650 for which we paid less than $50.  It drives my 24″ LCD at 1920 x 1200 resolution just fine.  I have no need for really high end video performance.

When you add everything up and throw in the DVD burner, we can put together one of these machines for under $1,500, which isn’t very much more than Jeff’s system once he adds the case and DVD.

I realize that I’m somewhat out of the ordinary, working with programs that require multiple cores as well as enormous amounts of memory and disk space.  I suspect that my ultimate development machine would be complete overkill for most developers.  But I find it interesting to compare what other developers need against what I’m using.

Do you have an ultimate developer machine?  Drop me a note.

An aside:
Jeff also uses the word commodification, as in, “This industry was built on the commodification of hardware. If you can snap together a Lego kit, you can build a computer.”  I had to read that twice before I realized that he wasn’t talking about turning hardware into toilets.  Commodification?  Please stop.

July 24th, 2008

Is that code really from Sun?

I updated my Java runtime the other day, and now every time I open a new tab in Internet Explorer, I get this message box:

It looks like somebody at Sun forgot to sign their update agent.  At least, I think this control came from Sun.  But there’s no way to be sure, is there?  Do I blindly assume that this really is from Sun and that they made a mistake in generating the build, or do I do the prudent thing and permanently disallow it?

In a security conscious world, there’s no excuse for a major player like Sun to have released something with this error.  One wonders, if an obvious bug like this makes it through their quality control, what other less obvious nasties are lurking in the code.

To heck with it.  If Sun wants to push their software on me, they’ll have to get it right.  I’m going to disallow the update agent.  If I ever need to update my Java runtime, I guess I’ll just have to do it manually.

July 16th, 2008

Exceeding the Limits

We generate a lot of data here, some of which we want to keep around. Yesterday I noticed that I was running out of space on one of my 750 GB archive drives and figured it was time to start compressing some of the data. The data in question is reasonably compressible. A quick test with Windows’ .zip file creator indicated that I’d get a 30% or better reduction in size.

The data is generated on a continuous basis by a program that is always running.  The program rotates its log once per hour, and the hourly log files can be anywhere from 75 to 200 megabytes in size.  Figuring I’d reduce the number of files while also compressing the data, I wrote a script that uses INFO-ZIP’s Zip utility to create one .zip file for each day’s data.

And then I hit a wall.  It seems that the largest archive that Zip can create is 2 gigabytes.  As their FAQ entry about Limits says:

While the only theoretical limit on the size of an archive is given by (65,536 files x 4 GB each), realistically UnZip’s random-access operation and (partial) dependence on the stored compressed-size values limits the total size to something in the neighborhood of 2 to 4 GB. This restriction may be relaxed in a future release.

With 24 files ranging in size from 75 to 200 megabytes, it’s inevitable that some days will generate more than 3 gigabytes of data.  At about 30% compression, that’s not going to fit into the 2 GB file.

My immediate solution will be to compress the files individually.  It’s less than ideal, but at least it’ll give me some breathing room while I look for a new archive utility.

I’m surprised that in today’s world of cheap terabyte-sized hard drives, the most popular compression tools have the same limitations they had 20 years ago.  Every modern operating system has supported files larger than 4 gigabytes for at least 10 years.  It’s time our tools let us use that functionality.

I’m in the market for a good command-line compression/archiver utility that has true 64-bit file support.  Any suggestions?

July 7th, 2008

Computer Notes

  • One thing I haven’t figured out yet with my new Dell 490 system running Windows Server 2008 is how to burn a CD. I have a LITE-ON DVD RW in it–the same drive that was in my old Windows XP system–but for some reason Windows Server reports it as a DVD-ROM. This one has me stumped, but I don’t have the time to really track it down. Although I am getting tired of going to some other machine for burning CDs.
  • Several years ago I bought a Shuttle SK41G computer that served me quite well, first as a Linux test machine, then as a development platform, and finally as a small DNS server. About a year ago it lost its mind. I thought the problem was the battery for the CMOS RAM, but after replacing the battery the machine still loses the time whenever I shut it off. I hate to throw out a perfectly good (if somewhat aging) computer, but have a hard time justifying the time I’d spend puzzling this out.
  • I’ve been considering buying a clone of my Dell Latitude D610 laptop. Dell doesn’t sell that machine anymore, but they’re plentiful on eBay: at about $400 for a fully loaded machine, shipping included. That’s about 20% of the new price from three years ago. It’s a very serviceable machine, with a 2 GHz processor, 2 GB of RAM, hard drives from 30 to 160 GB, and a nice display that’ll do 1400×1050 pixels. The only possible drawback is that it’s a single core 32-bit processor.
  • Dual core laptops are pretty resonable. You can pick up a new Dell Inspiron 1525 on eBay for $500 or $600. For $700, you can get one fully loaded with lots of RAM and a big hard drive. I wonder about battery life, though. Can I get five hours out of it with the optional battery in the expansion bay? And honestly: do I really need multiple cores in a laptop?
June 11th, 2008

Can’t Configure Windows DNS Resolver Cache

In experimenting with the program I described yesterday, I got to fiddling with the DNS resolver cache, called dnscache. Briefly, dnscache saves the results from recent DNS queries so that it doesn’t have to keep querying the DNS server. Considering that a DNS query can take 100 milliseconds or more to resolve, this can save considerable time. For example, for your browser to load this Web page, it has to make many different requests to my server: one for the base page, one for the stylesheet, one for each image, etc. It wouldn’t be uncommon to require a dozen separate requests to get all the resources that make up the page. If each resource required a separate DNS request, it would take more than a second just for DNS!

I got to wondering just how large the DNS cache is. A little bit of searching brings up any number of pages claiming that you can “speed up your connection” by tweaking the DNS resolver cache parameters. Specifically, they talk about changing registry keys for the cache hash table size, maximum time to live, etc. There’s even a Microsoft TechNet article describing these parameters for Windows Server 2003 (and, by extension, Windows XP). It’s interesting to note that the information on most of the pages claiming to speed things up conflicts rather badly with the information in the TechNet article.

After reading the tweaks and the TechNet article, I figured I’d give it a shot. I fired up the Registry Editor, made the changes, and … is it working? How can I tell? I tried browsing a few Web sites, but I couldn’t see any difference.

A little more searching and I found the command ipconfig /displaydns. This writes the contents of the DNS resolver cache to the console. A little work with the FIND utility, and I was able to count the number of entries in the cache. 34 on my Windows XP box. Interesting, considering that I set the CacheHashTableSize registry entry to over 7,000. I fiddled and tweaked, restarted the DNS Client service, flushed the cache, rebooted my computer, faced Redmond and cursed, and generally tried everything I could think of. No matter what settings I used, I always ended up with between 30 and 40 entries in my DNS cache.

On my Windows Server 2008 machine at the office, I always got between 270 and 300 entries, no matter what I tried.

So that leaves me with the following possibilities:

  1. It’s not possible to change the size of the DNS resolver cache in Windows XP or Windows Server 2008.
  2. It is possible, but the documentation is wrong.
  3. The documentation is correct as far as it goes, but it’s incomplete.
  4. The documentation is correct and complete, but I’m too dumb to make sense of it.
  5. The documented registry entries actually changed the size of the cache, but ipconfig isn’t showing me all the entries that are in the cache.

At this point, all possibilities seem almost equally likely. I could do some indirect testing based on the amount of time it takes to resolve a series of DNS requests, but even that would be inconclusive. There are no documented API calls that allow me to examine the DNS cache or its size. (And the undocumented ones aren’t described well enough to be worth checking out.) My only means of seeing what’s in the cache is the ipconfig tool.

So I ask: does anybody know how to change the size of the Windows DNS resolver cache and prove that those changes actually work? Do I have to restart the DNS Client service? Reboot the machine? Set some super magic registry entry?

Any information greatly appreciated.

May 20th, 2008

Goodbye Windows Vista

I upgraded to Windows Vista (from Windows XP) back in November, when I moved from a dual-core to a quad-core machine. I was less than pleased with Vista, for a number of reasons, but primarily because I found the Aero user interface enhancements more annoying than useful. That’s all pretty eye candy, but the few benefits it brought were not worth the 2 gigabyte footprint or the continual distraction. I turned off what I could in a few minutes of tinkering, but didn’t spend a lot of time trying to turn everything off.

And that dang machine was flakey! The system would become unresponsive for no apparent reason. Windows Explorer would lock up and even Task Manager wouldn’t come up. It got progressively worse until I was hitting the reset button a couple of times per day. Yahoo Messenger, for some reason, often seemed to cause the lock up. If Messenger lost its connection, it would try to re-connect, and sometimes that would cause the entire user interface to lock up. I still don’t understand how an application can bring down the whole operating system, but there you have it.

At one point I was getting a number of blue screen crashes (a few per week), so I down-clocked the machine (it had been slightly over-clocked) thinking that was the problem. Then I thought memory was the problem, so I spent a couple of nights running the Windows Memory Diagnostic (available on the Administrative Tools menu). That didn’t reveal any errors, either. I had pretty much decided that the problem was with the video driver (GeForce 8500 GT), but never tested it because at that point yet another new machine arrived: a Dell Precision 490 (used) with 16 gigabytes of RAM and a quad-core Xeon running at 2 gigahertz.

We’ve been running Windows Server 2008 on the servers here, and have been very happy with its performance and stability. Given the choice between Vista and Server 2008 on the desktop, there was no contest. I ran Server 2003 on a laptop development machine for two years and was extremely pleased with it–much more so than with Windows XP–and I expect to be much happier with Server 2008 than with Vista. One really nice thing is that the user interface is clean and lacking all those annoying Aero enhancements.

There are a few things you probably want to change in the default configuration of Server 2008 if you’re going to use it as a desktop development system. I’ve found several sites that talk about this, the best being Vijayshinva Karnure’s Windows Server 2008 as a SUPER workstation OS. It’s only been a few days, but so far I’m really liking the switch.

I’ve also said goodbye to Firefox in favor of (gasp) Internet Explorer. I’m not especially fond of IE, but Firefox has been unreliable for over a year–ever since I installed it on Windows XP 64. The 32-bit version of Firefox tends to crash, hang, or do unexpected things when running on a 64-bit version of Windows. And since they aren’t planning a 64-bit Windows version any time soon, I’ll move on. I understand that there are third-party x64 builds, but those don’t have full plug-in support nor do they appear to have the same quality standards as the official Firefox builds.

If I get ambitious, I might give Opera a try. For the near term, IE will do. At the moment I’m more interested in getting my new Server 2008 machine fully configured with all the development tools and such. Changing machines takes so much longer than you think it will.

April 9th, 2008

HashSet Limitations

Version 3.5 of the .NET runtime class library introduced the HashSet generic collection type. HashSet represents a set of values that you can quickly query to determine if a value exists in the set, or enumerate to list all of the items in the set. You can also perform standard set operations: union, intersection, determine subset or superset, etc. HashSet is a very handy thing to have. Simulating the same functionality in prior versions of .NET was very difficult.

I’ve made heavy use of HashSet in my code since it was introduced, and I’ve been very happy with its performance. Until today. Today I ran into a limitation that makes HashSet (and the generic Dictionary collection type, as well) useless for moderately large data sets. It’s a memory limitation, and how many items you can store in the HashSet depends on how large your key is.

I’ve mentioned before that the .NET runtime has a 2 gigabyte limit on the size of a single object. Even in the 64-bit version, you can’t make a single allocation that’s larger than 2 gigabytes. I’ve bumped into that limitation a few times in the past, but have been able to work around them by restructuring some things. I thought I was safe with the HashSet, though. Even with an 8-byte key, I figured I should be able to store on the order of 250 million items. I found out today that the number is quite a bit lower: a little less than 50 million. 47,995,853 to be exact. After I figured out what was causing my problem, I verified it with this program:

static void Main(string[] args)
{
    HashSet<long> bighash = new HashSet<long>();
    for (long i = 0; i < 50000000; ++i)
    {
        if ((i % 100000) == 0)
        {
            Console.Write("r{0:N0}", i);
        }
        bighash.Add(i);
    }
    Console.WriteLine();
    Console.Write("Press Enter");
    Console.ReadLine();
}

The program throws OutOfMemoryException when it tries to add the 47,995,853rd (or perhaps the 47,995,854th) item, because it’s increasing the capacity of an internal data structure and that data structure exceeds 2 gigabytes.

If I reduce the size of the key to 4 bytes (a .NET long is 8 bytes), then I can add just a little less than 100 million items before hitting the limit. Let’s think about that a little bit.

50 million keys of 8 bytes each should take up about 400 megabytes. 100 million keys of 4 bytes each should take up about 400 megabytes. I realize that there’s some overhead in a hash table to deal with collisions, but five times is excessive! I can’t imagine a hash table implementation that has an overhead of five times the total key size. And yet, that’s what we have in .NET.

It’s bad enough in today’s world, where a machine with 16 gigabytes of RAM can be had for under $2,000, that we have to deal with the 2-gigabyte-per-object limitation in .NET. But to have the runtime library’s implementation of a critical data structure squander memory in this way is too much.

Any workaround is very painful. We’ll have to write our own hash table implementation that allocates unmanaged memory and mucks around with pointers in unsafe code. We’re old C programmers, so that’s not beyond our capabilities. But it sure makes me wonder why I selected .NET for this project. In the process, we’re going to lose a lot of the functionality of Dictionary and of HashSet.

I can’t be the only one running up against these kinds of limitations. 10 years ago, a data set of 100 million items may have been considered large. Today 100 million is, at best, moderately large. There are plenty of applications that work with billions of items and today’s computers have the capacity to store them all in RAM. We damned well should be able to index them in RAM using modern tools.

I hope the .NET team is working on a solution to the 2-gigabyte limit, and I’d strongly suggest that they take a very close look at their hash table implementation.

March 15th, 2008

More Windows Vista bits

Windows Vista (and Windows Server 2008) have formalized the idea of a “public” directory–a directory on your computer where you can share files with other users. In previous versions, you had to create a folder yourself (often called “Public”) and share it. Vista has a special folder called “Public”, and subfolders named “Public Documents,” “Public Downloads,” “Public Music,” etc. If you enable public folder sharing, then files in those directories are accessible by anybody who can locate your computer on the network.

It can be a bit confusing, though. Here’s a screen shot of Windows Explorer showing the Public folder on my machine:

Looking at that, you’d expect the UNC path to my “Public Downloads” directory to be “\\JIMM\Public\Public Downloads”. But if you try it, you’ll quickly find that the path does not exist. Where, then, is it?

If you click in the address bar of Windows Explorer (image below), you’ll see that the local path to my “Public Downloads” directory is “C:\Users\Public\Downloads”.

Since the main Public directory is \\JIMM\Public (although you won’t see C:\Users\Public if you select the Public directory and then click in the address bar), then it follows that the Downloads directory would be \\JIMM\Public\Downloads. And that’s what it is.

it’s kind of confusing that the real directory name is something different from what’s shown in Windows Explorer. But I’m happy that the names don’t have embedded spaces. Filenames with embedded spaces make working with command line tools difficult.

February 20th, 2008

The default is default

Some days I just don’t understand what people are thinking when they write documentation. Yesterday I installed a simple caching DNS server. That was easy enough with Ubuntu, and the thing is up and running. But my experience with these things tells me that I’d better look into its configuration. We do a lot of DNS resolutions, and I want to make sure that the server doesn’t run out of memory or something.

So I thought I’d start by determining how much memory BIND (the DNS server software) is configured to use. Simple enough, right? As it turns out, no.

According to the BIND Manual (search for “datasize” on that page):

The maximum amount of data memory the server may use. The default is default. This is a hard limit on server memory usage. If the server attempts to allocate memory in excess of this limit, the allocation will fail, which may in turn leave the server unable to perform DNS service. Therefore, this option is rarely useful as a way of limiting the amount of memory used by the server, but it can be used to raise an operating system data size limit that is too small by default. If you wish to limit the amount of memory used by the server, use the max-cache-size and recursive-clients options instead.

That’s nice to know (and I’ll come back to it in a minute), but what is default? I’ve searched up and down through the manual, in the Pro DNS and BIND book, and uncounted Web sites. I even downloaded the BIND source code and spent some quality time with grep. Not a clue. I still have no idea what default is.

“Okay,” I thought, “so I don’t really need to know what the default is. How do I set the cache size?” On the face of it, that turns out to be pretty easy. After all, there’s a max-cache-size option that I can set. The BIND manual says:

The maximum amount of memory to use for the server’s cache, in bytes. When the amount of data in the cache reaches this limit, the server will cause records to expire prematurely so that the limit is not exceeded. In a server with multiple views, the limit applies separately to the cache of each view. The default is unlimited, meaning that records are purged from the cache only when their TTLs expire.

That’s all well and good. I’ll go set my datasize and max-cache-size so that the thing won’t run out of memory. But thinking about the descriptions of those two variables, I got to wondering if the default configuration of BIND is a crash waiting to happen. Consider:

  • The default datasize is some unspecified default value. If the server attempts to allocate memory in excess of this limit, the allocation will fail, which may in turn leave the server unable to perform DNS service.
  • The default max-cache-size value is unlimited.

The only logical conclusion from those two statements is that the default configuration will lead to a crash if the cache grows large enough. Doesn’t seem too reliable to me. [Note: It occurs to me that if the cache handles the failed allocation reasonably--by purging older records--then the thing won't crash. I have no idea if that's how it actually behaves.]

In any case, if you’re using BIND for a DNS cache, you might want to change your datasize and max-cache-size values, just to be on the safe side.

[Additional info added 2008-02-21]

According to Cricket Liu in DNS & BIND Cookbook:

Some administrators are tempted to use the datasize options substatement to limit the size of the data segment the named process uses. Unfortunately, when named reaches the datasize limit, it exits. And then, of course, you have no name server running at all — though I guess that minimizes its memory utilization.

If that’s true, then it’s probably a very good idea to change the default configuration if you’re running a large DNS cache.

February 19th, 2008

Take back the desktop!

I know, I’ve dipped into this well before. But this bears repeating.

At some point in the 30 years or so that I’ve been working with computers, we’ve lost sight of the most important fact: computers are supposed to be tools that serve us. All too often these days, I feel like I’m the one serving the computer. At other times, the computer reminds me of an over-eager employee who comes running to the office after completing every minor task, enthusiastically telling me how impressed I should be that he managed to find and actually work the photocopier.

You know what I’m talking about. When was the last time you spent an entire day not being annoyed by some pop-up message that Windows or some application program decided was important enough to interrupt whatever you’re working on? The last time I spent such a day was when I went on vacation and didn’t have access to a computer. If I’m working on the computer, I’m subjected to a never-ending barrage of pop-up messages and sounds that amount to little more than, “Hey! Look at me!”, and do nothing but interrupt my train of thought and annoy me.

You want examples? Oh, I have plenty:

  • The Firefox Web browser will automatically download updates and then pop up a message box asking if I want to restart.
  • When new Windows updates become available, Windows displays one of those notification balloons down near my task bar.
  • If I tell Windows Update to download and install updates, all too often when it’s done it pops up a message box asking if I want to reboot.
  • I minimize Windows Media Player to my task bar. Whenever it starts a new song, Media Player displays a little information box for a few seconds: “Look what I’m playing now!”
  • The default configuration of Yahoo Messenger will pop up a message window in the middle of the screen when somebody sends me a message.
  • If somebody else takes control of a machine that I have in Windows Remote Desktop, Remote Desktop pops up a message box telling me that my desktop session has ended.
  • Norton Antivirus (which I don’t use any more) would forever be displaying mostly meaningless notifications at the bottom of the screen.
  • Email clients can play sounds or flash the screen when you receive mail. In some programs, such actions are enabled by default.
  • If you’re running a program under Visual Studio and the program hits a breakpoint, Visual Studio will bring itself to the front, regardless of what you’re working on.

I know, some of you are wondering what I’m complaining about. Let me give you an example of why I get annoyed. If I happen to be typing (an activity that occupies a large part of my day) when one of those pop-ups grabs the keyboard focus, whatever I’m typing will end up in the new window. This is not good. I’ve actually re-booted the computer accidentally because I was typing while looking out the window when the “Reboot now?” confirmation box appeared.

Let me repeat that. I suffered a very annoying interruption and lost some important work because somebody decided that their program was more important than whatever I was working on at the time. That’s unforgiveable.

Let’s be clear about this one: a program should never grab the keyboard focus from the window that I’ve selected. I can’t think of a single instance in which I want some random program to pop up in front of my text editor and start swallowing what I’m typing. There is no excuse for such rude behavior. Designers who create such things should be shot, right along with any programmers who have the poor sense to actually implement the designs.

I’m slightly more forgiving of the ostensibly innocuous notifications that pop up in balloons all over the place, but not much more. It’s nice that programs keep me informed of what they’re doing: security updates are available, new updates were downloaded, a friend messaged me, there are unused icons on my desktop, etc. But most of those things just aren’t important, with the exception of a text message from my friend, none are important right now. Those messages should be placed in a notification queue that I can check at a time of my own choosing. If it requires immediate attention (like my friend messaging me), it should display a message on the corner of the screen to get my attention, but under no circumstances should it grab my keyboard focus.

Like many other people, the work I do requires intense concentration. Most people require a certain amount of time (five to 30 minutes, typically) to “get into the groove” where they’re concentrating deeply and able to be productive. Any interruption will snap them out of that groove, and it takes time to get back into it. So a “brief interruption” can cost 30 minutes in lost productivity. Is it any wonder I get annoyed by all the crap that Windows and other programs throw at me?

To the designers and programmers responsible for these atrocities: The desktop is my workspace, dang it. Popping your idiotic message on top of it and stealing my keyboard focus is akin to throwing a rotting fish in the middle of my desk. It disrupts my work, makes a mess of things, stinks the whole place up, and ticks me off.

And don’t tell me, “You can turn those notifications off if you want.” That’s exactly the wrong attitude. The default configuration should be to leave me in charge of my desktop. I should have the option of turning those notifications on if I want them. I shouldn’t be forced to go hunting through your overly complicated user interface options dialog box to figure out how to teach your program its place on my desktop.

If, like me, you’re tired of being interrupted by inconsequential messages and having your keyboard focus stolen by rude programs, I suggest you start filing bug reports against the offending applications. That includes Windows, Visual Studio, and any other program that takes the attitude that its status messages are more important than your work. Filing those bug reports is the only way we can get software developers to re-think their attitudes and build software that does its job without nattering at us.

January 23rd, 2008

OK or Cancel?

I decided to cancel the email message I was composing, and my mail program responded with this confirmation dialog box. I actually read it twice and then pressed the Cancel button as an experiment. Pressing Cancel cancels the Cancel operation. OK completes the Cancel operation.

Is it any wonder that people find computers confusing?

Is there anybody who finds OK and Cancel on this dialog less confusing or more informative than Yes and No options?

I’ll ask again: What idiot decided that Yes and No responses to a question should be replaced by OK and Cancel?

December 31st, 2007

Can’t Select Multiple Files in Windows Vista

Today I was trying to copy files in Windows Explorer and ran into a rather nasty little bug. Explorer wouldn’t let me select multiple files. I couldn’t Shift+Click to select a range or Ctrl+Click to select individual files. Keyboard shortcuts didn’t work, and the Edit | Select All menu option was disabled.

This appears to be a bug in Windows Explorer, although I’ve seen conflicting information. Microsoft’s knowledge base article about the problem says, “This problem occurs because certain applications add a key to the registry. The key prevents you from selecting multiple items in Windows Explorer.” Their recommended solution is to reset the view. That didn’t work for me.

The solution I found requires editing the registry. You have to start RegEdit and navigate to HKEY_CURRENT_USER\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell, and delete the BagMRU and Bags registry keys. Before you do that, you should close all Windows Explorer windows. If you’re uncomfortable fiddling with the registry, it’s probably a good idea to set a system restore point before you start.

I don’t have the time (or the inclination, truth to be told) to dig in and figure out what Bags and BagMRU do. It seems odd, though, that “certain applications” would be able to cause this behavior, unless they were malicious–deliberately trying to cause grief. Is this a bug in the Vista version of Windows Explorer?

December 5th, 2007

Some useful utilities

I’ve run across a few utilities lately that I thought others might also find useful.

ISO Recorder is a very handy way to create CDs or DVDs from ISO files. No frills: just a Windows shell extension that works. Right-click on a .ISO file, select the drive you want to burn it to, and go. I would complain about Windows lacking this feature natively, but I think I’d rather have the simple right-click-go interface of ISO Recorder than whatever overly complicated interface the Windows design team would come up with.

I’ve mentioned Info-ZIP before. They have the best command line Zip and Unzip tools that I know of. I went to the site the other day, looking to install the tools on my new Vista system, and found that they now have 64-bit versions. Now if only we could get past the 4 gigabyte limitation.

Speaking of compression, I’ve been seeing a lot of bzipped stuff for some reason lately. Unfortunately, bzip and bunzip aren’t included in the Subsystem for UNIX-based Applications tools and utilities. You can download Bzip2 and other utilities for Windows from the GnuWin32 project.

We’re using Subversion for version control here. That could be the subject for a post by itself. If you’re using Subversion on a Windows client, don’t even bother with installing the command line tools. Rather, download and install TortoiseSVN. It’s a Windows shell extension that lets you work with your version control system visually rather than futzing with the command line. It’s nicely polished, and the new version works well with Windows Vista.

Most open source or free software sites will give the MD5 hash of the files that are available for download. The md5 utility is a standard component of most Linux distributions, but Windows doesn’t include such thing out of the box. MD5sums is a handy thing to have. You can operate it from the command line, or drag files from Explorer onto the .exe. Nicely done.

December 1st, 2007

Vista Subsystem for UNIX-based Applications

The Enterprise and Ultimate editions of Windows Vista include a component called Subsystem for UNIX-based Applications, or SUA. This subsystem is also available in Windows Server 2003 R2, and will be available in Windows Server 2008 (Longhorn). SUA by itself is just a Windows component that provides platform services for UNIX_based applications. You get UNIX tools and an SDK from a separate download.

SUA is the new version of Windows Services for UNIX (SFU), which is available as a separate download for Windows 2000, Windows Server 2003, and Windows XP (except Home edition). Also see the SFU blog. According to Wikipedia, SFU has a long history.

To install SUA in Windows Vista, go to to Control Panel | Programs and Features, and click on the “Turn Windows features on or off” link. Be patient. It takes a minute or two for Windows to populate a list box of all the available features. Once you see the list, scroll down to “Subsystem for UNIX-based Applications,” and check that box. Press OK after you’ve selected any other features you want to turn on or off.

I don’t know why, but it takes a very long time for Windows to install or configure the Subsystem for Unix Applications. That progress bar stays at zero for at least 10 minutes, and then grows in small leaps. Even after it looks “done,” it keeps thrashing the disk. I didn’t start a timer when I began the installation, but I know took over 30 minutes.

It’s probably a good idea to restart your system after installing SUA. The install didn’t say that I should, but a Microsoft knowledge base article describes problems with programs not responding if you don’t restart after installation. You should also visit Windows Update to obtain any critical updates for SUA. People in the U.S. almost certainly need the update that deals with Daylight Savings Time.

As far as I can tell, just enabling SUA doesn’t actually give you anything useful. I guess it gives you the ability to run UNIX-based apps (that have been recompiled to work with SUA), but you have to download and install the Utilities and SDK for Subsystem for UNIX-based Applications in order to get the common UNIX utilities. Be sure to get the right file for your OS and processor. There are separate versions for Windows Vista and Server 2003 R2, also differentiated among x86, amd64, and IA64.

I have it all installed, and will be experimenting with it over the next few days. More once I’ve had time to figure things out.

November 27th, 2007

Weird Computer Problem

This one is right up there with the strangest problems I’ve ever seen.

We bought the parts for and built four new machines, each with a Gigabyte GA-P35-DS4 motherboard, Intel Core 2 Quad (Q6600) processor, 8 gigabytes of RAM. Three of them are working fine. One of them (mine, unfortunately), exhibits a rather odd flaw: Windows Task Manager reports that there are only two CPUs. On the others, it reports four CPUs. Other than that, the computer seems to run just fine.

I tried all the obvious things: re-seating the CPU, reducing the overclocking (back to 2.4 GHz processor from the 2.8 GHz we had it at), and finally swapping the processor with one of the other machines. No dice. The other machine reports four CPUs and mine still reports just two CPUs.

The problem is almost certainly with the motherboard, but I can’t prove that yet. Is it possible that I have four cores all running, and Windows is reporting the wrong thing? We ran CPU-Z, which also reports only two cores, but I don’t know if it’s getting information from the hardware or from Windows.

The really odd thing here is that I’ve never heard of a partial failure like this. That is, if the motherboard was faulty, wouldn’t the dang thing just not work at all?

In any case, I’m looking for a program that I can boot and have it tell me about the CPU: what type, how fast, and how many cores are actually running. Does such a thing exist? I downloaded the Ultimate Boot CD, but the tools on that disk don’t support the Core 2 Quad. Nor do they appear to identify how many cores are actually functioning.

If you know of a utility I can download and burn to a bootable CD so that I can test the CPU outside of Windows, I’d really like to hear about it. Leave a comment here on the blog, or drop me mail: jim at mischel.com.