Monthly Archives: May 2008

What’s that house worth?

It’s property appraisal time again. We got our Notice of Appraised Value in the mail last month, and were shocked to learn that our property value increased by 16% last year. That’s on top of the 33% increase from the … Continue reading

Posted in Odds 'n Ends | 1 Comment

I was hacked

I discovered last week that somebody had hacked my blog and added a bunch of link spam at the end of the footer script. For some unknown period of time, all of my blog pages contained hundreds of spam links–mostly … Continue reading

Posted in Oops | Comments Off

Clearing the book list

I’ve been meaning to review or at least mention the books I’ve been reading lately. I realized after I posted my negative review of Infinite Ascent that there are plenty of good books that I haven’t mentioned. So, here are … Continue reading

Posted in Book Reviews | Comments Off

Infinite Annoyance

Browsing the remainder table in Half Price Books a few weeks ago, I ran across David Berlinski’s Infinite Ascent: A short history of mathematics. The cover copy looked good, and a quick flip through a few pages was enough to … Continue reading

Posted in Book Reviews, Idiocy | 1 Comment

Reducing bandwidth used by crawlers

Some site operators block web crawlers because they’re concerned that the crawlers will use too much of the site’s allocated bandwidth. What they don’t realize is that most companies that operate large-scale crawlers are much more concerned with bandwidth usage … Continue reading

Posted in Web Crawling | Comments Off

More on .NET Collection Sizes

Last month in HashSet Limitations, I noted what I thought was an absurd limitation on the maximum number of items that you can store in a .NET HashSet or Dictionary collection. I did more research on all the major collection … Continue reading

Posted in Programming | Comments Off

Goodbye Windows Vista

I upgraded to Windows Vista (from Windows XP) back in November, when I moved from a dual-core to a quad-core machine. I was less than pleased with Vista, for a number of reasons, but primarily because I found the Aero … Continue reading

Posted in Computers | Comments Off

Where is everybody?

From Jeff Duntemann comes a link to an article on the Fermi Paradox, which puts forth the idea that it may be a good thing that we’ve been unable to find proof of extraterrestrial life. Put simply, Fermi’s Paradox is … Continue reading

Posted in Odds 'n Ends | 1 Comment

A variation on the homegrown DOS attack

Tuesday, in How to DOS yourself, I described how to erroneously configure an Apache server and cause what appears to be a denial of service attack. There’s another way to do it that is even more insidious. In Tuesday’s post … Continue reading

Posted in Web Crawling | Comments Off

How to DOS yourself

It’s surprising the things you’ll learn when you write a Web crawler. Today’s lesson: how to be both perpetrator and victim of your own denial of service attack. Not everybody likes crawlers accessing their sites. Most will modify their robots.txt … Continue reading

Posted in Web Crawling | 1 Comment

The New ReaderWriterLockSlim Class

Last year, in Improper Use of Exceptions, I mentioned that the ReaderWriterLock.AcquireReaderLock and ReaderWriterLock.AcquireWriterLock methods were improperly written because they throw exceptions when the lock is not available. I mentioned further that whoever designed the ReaderWriterLock should have studied the … Continue reading

Posted in Programming | Comments Off

Opt in or opt out?

I mentioned before that there is a small but very vocal group of webmasters who say that crawlers should stay off their sites unless specifically invited. It is their opinion that they shouldn’t have to include a robots.txt file in … Continue reading

Posted in Web Crawling | 2 Comments

Why every site should have a robots.txt file

People often ask if they need a robots.txt file on their sites. I’ve seen some Web site tutorials that say, in effect, “don’t post a robots.txt file unless you really need it.” I think that is bad advice. In my … Continue reading

Posted in Web Crawling | Comments Off

More On Robots Exclusion

As I mentioned yesterday, the Robots Exclusion Standard is a very simple protocol that lets webmasters tell well-behaved crawlers how to access their sites. But the “standard” isn’t as well defined as some would have you think, and there’s plenty … Continue reading

Posted in Web Crawling | Comments Off

Struggling with the Robots Exclusion Standard

The Internet community loves standards. We must. We have so many of them. Many of those “standards” are poorly defined or, even worse, ambiguous. Or, in the case of robots.txt, subject to a large number of extensions that have become … Continue reading

Posted in Web Crawling | Comments Off