How To Block the CUIL Spider Bot

Some people have landed on my previous rant about CUIL after searching Google for “block cuil bot”.  I realize that article does not answer that question (unless you click the link to read madstatter’s robots.txt file).  So, here is how you block CUIL’s insane spiders.

CUIL’s spider is called “Twiceler”.  Why?  I have no idea.  To block it from your site, add the following lines to your site’s robots.txt file:

User-agent: twiceler
Disallow: /

That’s it!

For more information about robots.txt files, there is some great information at robotstxt.org.

Popularity: 63%

Comments

4 Responses to “How To Block the CUIL Spider Bot”

  1. Alexander Higgins on August 5th, 2008 2:07 pm

    I have found that the bot does not respect robots.txt. Others have also had the same complaint. However, Cuil claims their bot does respect robots.txt, but only after 7 days.

  2. Cuil Bots Misbehaving | Domain Name News | Domain News | Expired Domains on September 2nd, 2008 4:52 am

    [...] the search engine that claimed they will be the new Google, is getting a lot of bad press [...]

  3. Dan on September 12th, 2008 11:59 am

    Similar issues here. Referral logs are a mess with badly predicted links, and I haven’t even looked at the 404 logs yet.

    Took out my poor VPS for a few days, and I had to restore from an old image (logs got too loaded? I dunno, but things are fine now).

    I’ve “adjusted” all of my robots.txt files, thanks for the crawler name.

  4. F. Andy Seidl on September 20th, 2008 10:35 pm

    As a webmaster, you definitely should use user-agent headers to manager server traffic. But understand that this is purely a pragmatic tactic and not a serious security measure.

    I wrote more about this here:

    Webmaster Tips: Blocking Selected User-Agents
    http://faseidl.com/public/item/213126

Leave a Reply