How Can You Manage Access for Spiders, Crawlers and Other Bots on Your Website? Protecting From Content and Personal Data Parsing

Every day your website is being visited by different bots, the majority of their visits are invisible to you even in Google Analytics, however, it doesn’t mean that the bots don’t visit your website. Your web server logs can help you with the visit statistics or you can install the CleanTalk Security Plugin so you can see all bot requests in your Security Firewall statistics including all types of bots. To learn more about the CleanTalk Security for websites, please go here: https://cleantalk.org/help/security-features

Website owners and webmasters try to give full access to web crawlers for indexing their websites. But there are bots that are not needed for the functioning of your website and you don’t use the company services that their bots provide.

Frequently, the number of bot visits could be higher than the number of real user visits. Aggressive scanning and a lot of questionable visits of bots may be unwanted and can load your webserver or you just don’t want to give information about your website.

How to Protect Your Website From Content, Personal Data and Black SEO Parsing

Bad bots not only send spam but they try to hack websites, do parsing and copying content, look for vulnerabilities, can collect any data from websites including email addresses, user names, user contacts and so on. Settings up the file “robots.txt” not always helpful and often useless because such bots can ignore those rules.

Personal data might be sold afterward and can be used in other types of fraud, spam/flood in mailing lists, phone numbers and etc. Copied information could be published under a different name.

Price parsing, parsing goods might be used by competitors, pictures of goods might be used on other websites and etc.

Bots can click ads, links, etc. either on your website or messing with your ads. Or they could try lowering your website index rank by increasing the number of Bounce rate. Going to the website from search results and the length of their visit is very short.

To block bad bots CleanTalk added a new option Anti-Crawler https://cleantalk.org/help/anti-flood-and-anti-crawler. Official bots will not be blocked by this option such as Yandex, Mail.ru, Baidu. While using this new option a part of automatic bots will be blocked on the second visit.

Or there is an alternative way when any official service, like MOZ.org for example, that doesn’t have its own IP address range and doesn’t have DNS records for its bots so it’s impossible to identify them, so the Anti-Crawler option will block these bots. To avoid that you can whitelist User-Agents of the bots.

For our clients who want to restrict access to their websites for such official bots or vice versa who want to allow the bots, we added tools to set up the Anti-Crawler option.

To control your Personal Lists for the Anti-Crawler option, go to your administrator panel of your website WordPress Dashboard —> Settings —> Anti-Spam by CleanTalk —> Advanced Settings) and enable the option “Block bots by User-Agents”. Our guide is here: https://cleantalk.org/help/filter-ua

Google bots and Bing bots are whitelisted by default and it can not be changed.

Here is the current list of the User-Agents of bots:


Mail.ru
SEOkicks-Robot
Sogou Spider
Applebot
Pingdom.com Bot
Akamai Crawler
Rogerbot – MOZ.org
FacebookBot
Twitterbot
HuaweiWebCatBot
Pinterest Bot
GTmetrix
CloudFlare crawler
Serpstatbot
Archive.org_bot
Alexabot
Yandex
Baidu
AhrefsBot
DuckDuckGo
Semrush
Seznam
Petalbot
Wikipedia Crawler
Rambler Bot
AspiegelBot – HuaweiWebCatBot

You can put any bot from the list above to your black or white list.
As we gather more statistics we will add more User-Agents of bots to the list.

Even if you don’t plan to use black/white lists for the bots and the option Anti-Crawler is enabled in the CleanTalk plugin, we recommend enabling the option “Block bots by User-Agents” too. The Google bots and the Bing bots will be whitelisted for your website in any case.

If you have any questions, you can contact our support team. https://cleantalk.org/my/support/open

Comments

2 responses to “How Can You Manage Access for Spiders, Crawlers and Other Bots on Your Website? Protecting From Content and Personal Data Parsing”

  1. Miro Avatar

    My website was recently “attacked” by bots mostly from the address trafficbot.live and bot-traffic.xyz. I do have a CleanTalk installed but they get to my website anyway. Didn’t do any harm as my server managed over 6000 visits in two days but still. You might want to add them to your list.

  2. CleanTalk Support Team Avatar
    CleanTalk Support Team

    Hello.
    We need to know more details.
    Please, contact us via https://cleantalk.org/my/support/open
    Be well,

Leave a Reply

Your email address will not be published. Required fields are marked *