Hey, you deleted my files!

We got a rather strongly worded message the other day from a Webmaster who was threatening legal action because our crawler deleted a bunch of files from his site.  The news that our crawler is capable of deleting files was quite a surprise to us.  Like other crawlers, ours just downloads HTML files, extracts links, and then visits those links.  There is no “delete a file” logic in there.  But if the crawler stumbles upon a link whose action is to delete a file, then visiting that link will indeed delete the file.

Further investigation in this particular case revealed a file management page that includes, among other things, links that have the form:  www.example.com/files/?delete=filename.txt.  Surprisingly enough, clicking on that link deletes the file.  The file management page is not protected by a password, nor is there any kind of confirmation displayed before the file is permanently deleted.

Examining the logs, we saw accesses from other search engine crawlers.  We also learned from the Webmaster that some time back, a kid had “hacked in” to the site and deleted a bunch of files.

I’m a little surprised that anybody would create such a page and not provide any protection.  I’m very surprised to find out that a supposedly professional Web developer would do such a thing and not learn the lesson when a random surfer came in and deleted files.  And I’m shocked that, even after we explained this to the Webmaster, he insists that we can take this as an opportunity to learn from our “mistake” and “fix” the crawler so that it doesn’t happen again.

It’s unfortunate that our crawler visited those links, causing the files to be deleted.  But the mistake was on the part of the person who posted those destructive links.  The crawler was operating exactly as it should.  Exactly, in fact, as every major search engine crawler acts.  It’d be nice if we could imbue the crawler with enough intelligence to “understand” Web pages and know in advance what the effects of clicking a link will be.  But that kind of machine intelligence is far, far in the future.

If you post something on the Web, it will be found, unless you take active measures to protect it.  Posting a destructive link on an unprotected page and then blaming somebody else when the link is clicked by an “unauthorized” person is akin to running out into a busy street and then blaming your injuries on the driver of the bus that hits you.

3 comments to Hey, you deleted my files!

  • Steve

    “Posting a destructive link on an unprotected page and then blaming somebody else when the link is clicked by an “unauthorized” person is akin to running out into a busy street and then blaming your injuries on the driver of the bus that hits you.”

    The sad thing about that though is that most of the time the bus driver will be considered at fault.

  • Roy Harvey

    I think a better analogy than the bus would be wiring your doorbell button to a bomb and blaming the person who presses it for blowing up your house.

  • The exact thing to say to him and to anyone who wants to know what happened: “You posted links on your web site which were open for access by the public; which looked as if they led to further web pages; and were rigged to delete a file whenever a member of the public accessed them in the normal manner. You are the victim of your own booby-trap. In fact, you are guilty of creating what the law calls an ‘attractive nuisance,’ and it is fortunate that you are its only victim.”

Categories

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.