Robots.txt is not a security measure

Sep 10 2008

I am increasingly coming across people who think robots.txt file can be used to prevent search engine crawlers from crawling sensitive data in their websites. Seriously.

This is just plain wrong. Data to be excluded using a robots.txt file is: unwanted, redundant or useless data. An entry in the robots.txt file cannot protect your sensitive data from going out. Sensitive data should not be left open in your website in the first place.

There are many malicious crawlers which crawl only the pages blocked by the robot.txt file in every website. I bet many interesting stuff will turn up in their search results.

4 responses so far

Leave a Reply