-
Archives
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
-
Meta
Monthly Archives: May 2008
What’s that house worth?
It’s property appraisal time again. We got our Notice of Appraised Value in the mail last month, and were shocked to learn that our property value increased by 16% last year. That’s on top of the 33% increase from the … Continue reading
Posted in Odds 'n Ends
1 Comment
I was hacked
I discovered last week that somebody had hacked my blog and added a bunch of link spam at the end of the footer script. For some unknown period of time, all of my blog pages contained hundreds of spam links–mostly … Continue reading
Posted in Oops
Comments Off
Clearing the book list
I’ve been meaning to review or at least mention the books I’ve been reading lately. I realized after I posted my negative review of Infinite Ascent that there are plenty of good books that I haven’t mentioned. So, here are … Continue reading
Posted in Book Reviews
Comments Off
Infinite Annoyance
Browsing the remainder table in Half Price Books a few weeks ago, I ran across David Berlinski’s Infinite Ascent: A short history of mathematics. The cover copy looked good, and a quick flip through a few pages was enough to … Continue reading
Posted in Book Reviews, Idiocy
1 Comment
Reducing bandwidth used by crawlers
Some site operators block web crawlers because they’re concerned that the crawlers will use too much of the site’s allocated bandwidth. What they don’t realize is that most companies that operate large-scale crawlers are much more concerned with bandwidth usage … Continue reading
Posted in Web Crawling
Comments Off
More on .NET Collection Sizes
Last month in HashSet Limitations, I noted what I thought was an absurd limitation on the maximum number of items that you can store in a .NET HashSet or Dictionary collection. I did more research on all the major collection … Continue reading
Posted in Programming
Comments Off
Goodbye Windows Vista
I upgraded to Windows Vista (from Windows XP) back in November, when I moved from a dual-core to a quad-core machine. I was less than pleased with Vista, for a number of reasons, but primarily because I found the Aero … Continue reading
Posted in Computers
Comments Off
Where is everybody?
From Jeff Duntemann comes a link to an article on the Fermi Paradox, which puts forth the idea that it may be a good thing that we’ve been unable to find proof of extraterrestrial life. Put simply, Fermi’s Paradox is … Continue reading
Posted in Odds 'n Ends
1 Comment
A variation on the homegrown DOS attack
Tuesday, in How to DOS yourself, I described how to erroneously configure an Apache server and cause what appears to be a denial of service attack. There’s another way to do it that is even more insidious. In Tuesday’s post … Continue reading
Posted in Web Crawling
Comments Off
How to DOS yourself
It’s surprising the things you’ll learn when you write a Web crawler. Today’s lesson: how to be both perpetrator and victim of your own denial of service attack. Not everybody likes crawlers accessing their sites. Most will modify their robots.txt … Continue reading
Posted in Web Crawling
1 Comment
The New ReaderWriterLockSlim Class
Last year, in Improper Use of Exceptions, I mentioned that the ReaderWriterLock.AcquireReaderLock and ReaderWriterLock.AcquireWriterLock methods were improperly written because they throw exceptions when the lock is not available. I mentioned further that whoever designed the ReaderWriterLock should have studied the … Continue reading
Posted in Programming
Comments Off
Opt in or opt out?
I mentioned before that there is a small but very vocal group of webmasters who say that crawlers should stay off their sites unless specifically invited. It is their opinion that they shouldn’t have to include a robots.txt file in … Continue reading
Posted in Web Crawling
2 Comments
Why every site should have a robots.txt file
People often ask if they need a robots.txt file on their sites. I’ve seen some Web site tutorials that say, in effect, “don’t post a robots.txt file unless you really need it.” I think that is bad advice. In my … Continue reading
Posted in Web Crawling
Comments Off
More On Robots Exclusion
As I mentioned yesterday, the Robots Exclusion Standard is a very simple protocol that lets webmasters tell well-behaved crawlers how to access their sites. But the “standard” isn’t as well defined as some would have you think, and there’s plenty … Continue reading
Posted in Web Crawling
Comments Off
Struggling with the Robots Exclusion Standard
The Internet community loves standards. We must. We have so many of them. Many of those “standards” are poorly defined or, even worse, ambiguous. Or, in the case of robots.txt, subject to a large number of extensions that have become … Continue reading
Posted in Web Crawling
Comments Off