It’s property appraisal time again. We got our Notice of Appraised Value in the mail last month, and were shocked to learn that our property value increased by 16% last year. That’s on top of the 33% increase from the year before. At least, that’s what the local appraisal district would have you believe. Last . . . → Read More: What’s that house worth?
I discovered last week that somebody had hacked my blog and added a bunch of link spam at the end of the footer script. For some unknown period of time, all of my blog pages contained hundreds of spam links–mostly for prescription drugs. But nobody saw them.
I don’t understand why it was done that . . . → Read More: I was hacked
I’ve been meaning to review or at least mention the books I’ve been reading lately. I realized after I posted my negative review of Infinite Ascent that there are plenty of good books that I haven’t mentioned. So, here are capsule reviews of five books I’ve read recently–all picked up at either the remainder table . . . → Read More: Clearing the book list
Browsing the remainder table in Half Price Books a few weeks ago, I ran across David Berlinski’s Infinite Ascent: A short history of mathematics. The cover copy looked good, and a quick flip through a few pages was enough to convince me that it was worth the three bucks. At 180 pages, you’d expect it . . . → Read More: Infinite Annoyance
Some site operators block web crawlers because they’re concerned that the crawlers will use too much of the site’s allocated bandwidth. What they don’t realize is that most companies that operate large-scale crawlers are much more concerned with bandwidth usage than the people running the sites that the crawlers visit. There are several reasons for . . . → Read More: Reducing bandwidth used by crawlers
Last month in HashSet Limitations, I noted what I thought was an absurd limitation on the maximum number of items that you can store in a .NET HashSet or Dictionary collection. I did more research on all the major collection types and wrote a series of articles on the topic for my .NET Reference Guide . . . → Read More: More on .NET Collection Sizes
I upgraded to Windows Vista (from Windows XP) back in November, when I moved from a dual-core to a quad-core machine. I was less than pleased with Vista, for a number of reasons, but primarily because I found the Aero user interface enhancements more annoying than useful. That’s all pretty eye candy, but the few . . . → Read More: Goodbye Windows Vista
From Jeff Duntemann comes a link to an article on the Fermi Paradox, which puts forth the idea that it may be a good thing that we’ve been unable to find proof of extraterrestrial life.
Put simply, Fermi’s Paradox is a simple question: If there is other life in the universe, where is everybody? Given . . . → Read More: Where is everybody?
Tuesday, in How to DOS yourself, I described how to erroneously configure an Apache server and cause what appears to be a denial of service attack. There’s another way to do it that is even more insidious.
In Tuesday’s post I showed how to configure error documents. There’s apparently another way to configure things so . . . → Read More: A variation on the homegrown DOS attack
It’s surprising the things you’ll learn when you write a Web crawler. Today’s lesson: how to be both perpetrator and victim of your own denial of service attack.
Not everybody likes crawlers accessing their sites. Most will modify their robots.txt files first, which will prevent polite bots from crawling. But blocking impolite bots requires that . . . → Read More: How to DOS yourself
Last year, in Improper Use of Exceptions, I mentioned that the ReaderWriterLock.AcquireReaderLock and ReaderWriterLock.AcquireWriterLock methods were improperly written because they throw exceptions when the lock is not available. I mentioned further that whoever designed the ReaderWriterLock should have studied the Monitor class for a more rational API.
Apparently I wasn’t the only one to think . . . → Read More: The New ReaderWriterLockSlim Class
I mentioned before that there is a small but very vocal group of webmasters who say that crawlers should stay off their sites unless specifically invited. It is their opinion that they shouldn’t have to include a robots.txt file in order to prevent bots from crawling their sites. Their reasons for holding this opinion vary, . . . → Read More: Opt in or opt out?
People often ask if they need a robots.txt file on their sites. I’ve seen some Web site tutorials that say, in effect, “don’t post a robots.txt file unless you really need it.” I think that is bad advice. In my opinion, every site needs a robots.txt file.
First a disclaimer. I’ve had my own Web . . . → Read More: Why every site should have a robots.txt file
As I mentioned yesterday, the Robots Exclusion Standard is a very simple protocol that lets webmasters tell well-behaved crawlers how to access their sites. But the “standard” isn’t as well defined as some would have you think, and there’s plenty of room for interpretation.
Consider this simple file:
User-agent: * Disallow: /xfiles/ User-agent: YourBot Disallow: . . . → Read More: More On Robots Exclusion
The Internet community loves standards. We must. We have so many of them. Many of those “standards” are poorly defined or, even worse, ambiguous. Or, in the case of robots.txt, subject to a large number of extensions that have become something of a de facto standard because they’re supported by Google, Yahoo, and MSN Search. . . . → Read More: Struggling with the Robots Exclusion Standard