How to defend against Yahoo! Slurp

Oct 09 2009

I was going through the logs of my web server for the last month and was shocked to see that a whopping 22.93% of the total bandwidth of a particular website of mine was used by the Yahoo crawler called Slurp (I should have known better, given the revealing name).

This is just ridiculous particularly when taking into account the fact that Yahoo sends negligible number of visitors to the website.

Search Engine market share for Yahoo is coming down anyway - it is at 6.84% currently. For most of my sites Yahoo never send more than 4% of the total traffic. This means that I have to pull the plug on Yahoo! Slurp’s free run for the time being.

So how do I stop the Yahoo! crawler?

Create a file named robots.txt in the root folder of the website with the following lines of text in it:

User-Agent: Slurp

Disallow: /

User-Agent: *

Disallow:

If you don’t want to completely block the Yahoo crawler, you can just reduce the amount of requests Slurp sends to your server. To do this use the following lines in your robot.txt file:

User-agent: Slurp

Crawl-delay: 1

This “delay value” increases the time between successive Yahoo! crawler activities, and lowers the access rate of Slurp to your server. In the official FAQ you can see the details about Yahoo! Slurp and several ways to reduce the number of requests it makes to your site. For me though, supporting the Crawler is not worth the cost.

3 responses so far

  • TheAnand says:

    ah yahoo! it sends me as much as 80% traffic on one site and 0% on the other…guess it depends on the type of site audience you have.

    In my case, women, kids based site gains a lot of traffic from yahoo for some reason.

  • Niyaz PK says:

    Of course all these depend on how much traffic the search engine brings you.

    I think the search engines should consider this fact before crawling my server to death.

  • Joyce says:

    I once had problem with a russian search engine (Yandex.ru). It is the biggest Russian Search engine and they where crawling my site at a rate of 5-10 pages per minute. What disturbed me more was that they were not following my robots.txt, so adding them to robots.txt was not an option. I finally had to add them to my iptables to block the crawler.

Leave a Reply