The web crawler I’m working on, as I’ve mentioned before, is a distributed application. Currently it consists of a URL Server and multiple Crawlers. The basic idea is that the URL Server is a traffic director that tells each Crawler machine which Web sites to visit. Each Crawler machine hosts multiple Worker threads, each of . . . → Read More: You want it when?
Debra called while I was on my way to lunch this afternoon. Somebody purporting to be the Capital One fraud department had left a message on the home answering machine saying that it was imperative that I contact them. They left a toll-free callback number. That got me to thinking, though. How could I trust . . . → Read More: Credit Card Fraud
Further to yesterday’s entry about notebook flash drives, I forgot to mention the Addonics CF drive adapter. For $30, you get an adapter that plugs into your notebook’s IDE connector. Add a 16 GB compact flash card, and you have a solid state “hard drive” in your notebook. There’s also a dual CF adapter with . . . → Read More: Build your own notebook flash drive
A few items that have been gathering dust here while I bang away on the crawler.
Ever wonder what you could do with a terabyte of really fast storage? Check out the Tera-RamSan. I hope you have a big budget, though. The unit is undoubtedly expensive, and it requires 2,500 watts. That’s some heater! Speaking . . . → Read More: Odds ‘n Ends
After a little more than three weeks with the new computer, I’m mostly happy with it. It’s blindingly fast, both in processing and in disk access. And whisper quiet, really. I got into the office very early the other day–about 4:30 AM–when there was almost no background noise. I could barely hear the computer. The . . . → Read More: Computers Update
As I’ve pointed out before, writing a Web crawler is conceptually simple: read a page, extract the links, and then go visit those links. Lather, rinse, repeat. But it gets complicated in a hurry.
The first thing that comes to mind is the problem of duplicate URLs. If you don’t have a way to discard . . . → Read More: Bloom Filters in C#
A problem I was working on recently got me to wishing that I could lop off the front of a file. Kind of like a “truncate at front,” if you will. Truncating a file at the back end is a common operation–something we do without even thinking much about it. But lopping off the front . . . → Read More: A new file system primitive?