We changed the old title of the plugin for WordPress “Anti-Spam by CleanTalk” to the new “Spam protection by CleanTalk”. Don’t worry, we want to test how people perceive the long and short titles.
Category: CleanTalk
-
Non-visual methods to protect the site from spam. Part 3. Repeats
Continuation of the article Non-visual methods to protect the site from spam
Part 3: Repeats of substrings
As mentioned above, non-visual methods for site protection against spam using text analysis. One of the most common spam signals – is the presence of repeated strings. As always, these examples are taken from actual company data CleanTalk.
The search of such repeats must be minimally resource-intensive. Better if it will be called after the test from the first and second parts of the article that will be eliminated obvious spam and bring the text into a form suitable for analysis. Here I will give some statistics, as well as sample code.
- The sample of the code
We use a function of determining the longest repeated substrings made by naive algorithm described here http://algolist.manual.ru/search/lrs/naive.php
Example output is shown below.
s a l e f o r s a l e f o r s a l e
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21s 0 + . . . . . . . . + . . . . . . . . + . . .
a 1 . + . . . . . . . . + . . . . . . . . + . .
l 2 . . + . . . . . . . . + . . . . . . . . + .
e 3 . . . + . . . . . . . . + . . . . . . . . +
4 . . . . + . . . + . . . . + . . . + . . . .
f 5 . . . . . + . . . . . . . . + . . . . . . .
o 6 . . . . . . + . . . . . . . . + . . . . . .
r 7 . . . . . . . + . . . . . . . . + . . . . .
8 . . . . . . . . + . . . . + . . . + . . . .
s 9 . . . . . . . . . + . . . . . . . . + . . .
a 10 . . . . . . . . . . + . . . . . . . . + . .
l 11 . . . . . . . . . . . + . . . . . . . . + .
e 12 . . . . . . . . . . . . + . . . . . . . . +
13 . . . . . . . . . . . . . + . . . + . . . .
f 14 . . . . . . . . . . . . . . + . . . . . . .
o 15 . . . . . . . . . . . . . . . + . . . . . .
r 16 . . . . . . . . . . . . . . . . + . . . . .
17 . . . . . . . . . . . . . . . . . + . . . .
s 18 . . . . . . . . . . . . . . . . . . + . . .
a 19 . . . . . . . . . . . . . . . . . . . + . .
l 20 . . . . . . . . . . . . . . . . . . . . + .
e 21 . . . . . . . . . . . . . . . . . . . . . +$VAR1 = { 'sale' => 3, 'for sale' => 2 };
And here is the function in Perl with minimal changes. For convenience, here is the full text that displays the matrix above.
#!/usr/bin/perl -w use strict; use utf8; use Data::Dumper; binmode(STDOUT, ':utf8'); my $min_longest_repeat_length = 4; my $message = 'sale for sale for sale'; my %longest_repeates = (); get_longest_repeates(\$message, \%longest_repeates); print Dumper(\%longest_repeates); sub get_longest_repeates { my $test_ref = shift; # Link to text for analysis my $reps_ref = shift; # Link to a hash of the result my @symbols = split //, $$test_ref; my $m_len = scalar @symbols; my @matrix = (); # A square matrix of symbols matches # Filling the matrix to the right of the main diagonal for (my $i = 0; $i < $m_len; $i++) { # Strings $matrix[$i] = []; for (my $j = $i; $j < $m_len; $j++) { # Columns only to the right of the main diagonal $matrix[$i][$j] = 1 if $symbols[$i] eq $symbols[$j]; } } # Analysis of the diagonal of the matrix to the right of the main diagonal and filling results my %repeats_tmp = (); # Hash of repeats my ($i, $j); # Search diagonal from right to left, ie from short to long repeats for ($i = $m_len - 1; $i > 0; $i--) { my $repeat = ''; my $repeat_pos = undef; my $repeat_temp; for ($j = $i; $j < $m_len; $j++) { if (defined($matrix[$j-$i][$j]) && $matrix[$j-$i][$j] == 1) { $repeat_temp = $repeat; $repeat_temp =~ s/^ //; # If the received string of repeat is already in the hash of repeats if (defined($repeats_tmp{$repeat_temp})) { $repeat_pos = $j - length($repeat_temp); $repeats_tmp{$repeat_temp}{$repeat_pos} = 1; $repeat = $symbols[$j]; } else { $repeat .= $symbols[$j]; } } else { if ($repeat ne '') { $repeat =~ s/^ //; $repeat_pos = $j - length($repeat); if (length($repeat) >= $min_longest_repeat_length) { if (defined($repeats_tmp{$repeat})) { $repeats_tmp{$repeat}{$repeat_pos} = 1; } else { $repeats_tmp{$repeat} = {$repeat_pos => 1}; } } $repeat = ''; } } } if ($repeat ne '') { $repeat =~ s/^ //; $repeat_pos = $j - length($repeat); if (length($repeat) >= $min_longest_repeat_length) { if (defined($repeats_tmp{$repeat})) { $repeats_tmp{$repeat}{$repeat_pos} = 1; } else { $repeats_tmp{$repeat} = {$repeat_pos => 1}; } } $repeat = ''; } } foreach (keys %repeats_tmp){ $$reps_ref{$_} = 1 + scalar keys %{$repeats_tmp{$_}}; } # Output matrix for diagnostics print "\n"; print ' '; for (my $i = 0; $i < $m_len; $i++) { print ' ' . $symbols[$i]; } print "\n"; print ' '; for (my $i = 0; $i < $m_len; $i++) { printf '%3d', $i; } print "\n"; print "\n"; for (my $i = 0; $i < $m_len; $i++) { print $symbols[$i]; printf '%3d ', $i; for (my $j = 0; $j < $m_len; $j++) { my $value = '.'; $value = '+' if (defined $matrix[$i][$j] && $matrix[$i][$j] == 1); printf(' %1s', $value); } print "\n"; } print "\n"; }
2. Statistics of repeats
We have selected the threshold of the minimum repeat length (it I do not give specifically), which gave the maximum efficiency in the tests. The results on the number of repeats as follows:
The number of repeats In spam, % In not spam, % 2 78,58 90,28 3 11,93 4,86 4 4,45 2,08 5 2,30 1,39 6 1,93 0 7 0,22 0 8 0,37 0 9 0,07 0 3. Conclusion
I showed an implementation of the naive algorithm of search of repeating substring in the text. For the analysis can be used as the number of repetitions, and repetitions (e.g., stop-word). I repeat that in the fight against spam integrated tests are more effective.
Learn more about CleanTalk Anti-Spam.
-
Non-visual methods to protect the site from spam. Part 2. The true face of symbols
Continuation of the article Non-visual methods to protect the site from spam
Part 2: The true face of symbols
Non-visual methods to protect website from spam use, in particular, the analysis of the transmitted text. Spammers use many techniques to complicate the analysis. Here will be shown examples of one of them, namely, substitution of symbols. Examples are taken from actual company data CleanTalk.
Symbols substitution is very simple, but as a result it can not run filters on stop-words, may worse working Bayesian filters, and filters with the definition of the language. Therefore, before using these filters it makes sense to return to the symbols their true face.
Specify at once that replace symbols directly, for example, national symbols with the mark of the Latin ‘a’ to the very Latin ‘a’, is totally unacceptable without an analysis of the language and context. Also replace the letters, similar to zero by zero is possible only when you know exactly what to look for in the text (for example, telephone numbers).
However, the character replacement is permitted in the case where the meaning of the written text is saved after changing. And the replacement is necessary to bring certain sets of special symbols to one.
Here I will show you two of the most interesting ways of substitution of symbols we have encountered.
-
Symbols replacement a normal typeface
Spammers do everything to make text conspicuous, even at a cursory glance. Fortunately for them, Unicode provides a set of extended Latin characters typefaces. Fortunately for us, it is easily corrected.
Below are the most common methods, as Latin characters are substituted with the same Latin, but not from the main range of the Latin alphabet.
Replacement of Latin characters in the ordinary becomes a simple regular expression. After this change the following filters work better and faster, because input range greatly narrowed.
-
Replacing the point
The point is used as the symbol much wider than the punctuation mark – it is a field delimiter, and positions and the delimiter in numbers spam phone numbers, etc.
So we are faced with the need to bring the variety of spam points into one single.
The most common of such substitution points we encountered are shown below.
Substitute, code
Substitute, view
U+3002 。 U+0701 ܁ U+0702 ܂ U+2024 ․ U+FE12 ︒ U+FE52 ﹒ U+FF61 。 Replacement points can be made simple regular expression
tr/
\N{U+3002}\N{U+0701}\N{U+0702}\N {U+2024}\N{U+FE1 2}\N{U+FE52}\N{U +FF61}
/
\N{U+002E}\N{U+002E}\N{U+002E}\N {U+002E}\N{U+002 E}\N{U+002E}\N{U +002E}
/It is noticed that after replacing the points subsequent filters operate really effectively.
-
Conclusion
I brought two ways of substitution of symbols. Inverse replacement is simple, low system requirements and greatly increases the accuracy of the filters based on the analysis of words and expressions.
-
-
Non-visual methods to protect the site from spam. Part 1. Statistics
Part 1. What statistic says
Non-visual methods to protect the site from spam suggest automatic analysis of data coming from the visitor. As more data is analyzed, the more fully and more accurately visitor can be defined and made a decision is he a spammer or not.
Systems that analyze such data usually accumulate visitor data statistics and the judgments. We offer an overview of the statistical data collected by us (service to protect sites from spam CleanTalk).
Here I purposely do not cite the data analysis of IP addresses on black lists. Without them, you can obtain enough data, analyzing only the contents of form fields and HTTP headers.
I’ll review the data by text message, nickname and email address and HTTP headers and the audit results of JavaScript test.
Analysis on these figures algorithmically very simple and not demanding to resources, so it can be used before other more resource-intensi
ve inspections. The data reflect the real picture at the time of writing and made on the basis of our analysis of the current traffic (more than 2 000 000 requests per day). Data can be freely used in the analysis of visitors to your sites. I note that the judgment for each criterion separately is not true — the best result will be achieved with a comprehensive analysis.
-
Message text
Message text – it is certainly the main thing in the spam. Consequently, spammers will build their posts so that on several criteria, they are clearly different from normal messages.
The following table shows the most, in my view, informative statistics.
Message text settings (average values) Not spam Spam Number of links, pcs 1.47 4.27 Number of contacts (phone, e-mail), pcs 1.72 6.38 Form filling time, sec 177 8 The ratio of the length of the message to the time of filling, symbols/sec 23.81 308.54 Amount of links speaks for itself. The amount of contact information can also be said about spam. Form filling time and, as a consequence, the rate of posts set differ most strongly.
-
The nickname of the visitor
The nickname can also tell about a lot of things. Probable cause is the quality of the algorithms of generating names that spammers use.
Parameters of nickname (average values) Not spam Spam Length, symbols 7.40 16.52 The number of delimiters, pcs 1.89 3.80 The number of digits, pcs 3.29 7.59 The length of a continuous sequence of consonant letters (for Latin), symbols 3.61 5.90 One of the tasks of the spammer is not stumble on an error that a user with the same name is already on the site. So the uniqueness of nicknames currently provided, according to statistics, in the forehead – length, insert delimiters and numbers. As a result, you get a lot of nicknames with a large number of adjacent vowels and consonants, with the latter more.
-
Name in e-mail
Everything said for nicknames true for the name in the email.
Parameters of name in e-mail (average values) Not spam Spam Length, symbols 10.09 19.16 The number of delimiters, pcs 1.62 4.12 The number of digits, pcs 4.30 9.57 Note that as the delimiters characters are often used point – generated character string, then it randomly adds points, so you get a lot of e-mail names.
-
HTTP-headers
Spam-bots forge their headers to not be very different from the browser.
However, statistics show that this is often true only at the time of writing the bot. In the future, it continues to work and send clearly outdated titles that can be seen in the table below.
The percentage of HTTP headers User-Agent Not spam Spam Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) 0.01% 11.42% Opera/9.80 (Windows NT 6.2; Win64; x64) Presto/2.12.388 Version/12.17 0.01% 10.84% Ready spam solutions may also leave their headings, in particular, when using HTTP-proxy. And this is also reflected in our statistics.
The percentage of HTTP headers Via Not spam Spam Mikrotik HttpProxy 0.86% 33.07% -
JavaScript-test
Additional simple but very effective check can be JavaScript-test. For example, changing the JS-code the desired cookies, the options are many.
The most advanced (and expensive) bots pass JS-tests. However, as can be seen from the statistics, a large percentage of spam comes from very simple programs, unable to do so.
Percentage of failing JS-test Not spam Spam change cookies through JS 0.41% 68.53% - Conclusion
I have shown statistical data collected by our system at the moment. Again, for the most accurate solution to spam/not spam you need to analyze the indexes comprehensively, as well as in combination with other methods of spam checks.
-
-
Statistics for the personal blacklist
Dear Customers,
CleanTalk informs you that in the dashboard query statistics is available on the personal blacklist.
You can track the number, date and time of all attempts to use forms on the website by users from your personal blacklist.
To view the statistics, go to the dashboard CleanTalk https://cleantalk.org/my/ select the website and go to Settings->Personal blacklists.
Remember that if you mark a request as Spam in the CleanTalk dashboard, the IP/email will be added to your personal blacklist.
To view existing entries, you can follow this link https://cleantalk.org/my/show_requests?int=week
If you have questions, we will be happy to answer them!
Supervise your anti spam through Dashboard. -
CleanTalk Anti-Spam Released a New Version of the Spam FireWall
CleanTalk company Inc is a cloud service protecting websites from spam bots, has announced the launch of a new version of the Spam FireWall which is designed to block spam attacks on the web sites.
The CleanTalk SpamFirewall manages and filters all inbound HTTP traffic to protect web sites from spam bots and to reduce the load on the web servers.
Spam FireWall – allows blocking the most active spam bots before they get access to web site. It prevents loading of pages of the web site by spam bots, so your web server doesn’t need perform all scripts on these pages. Also it prevents scanning of pages of the web site spam bots. Therefore Spam FireWall significantly can reduce the load on your web server. Spam FireWall also makes CleanTalk the two-step protection from spam bots. Spam FireWall is the first step and it blocks the most active spam bots, CleanTalk Anti-Spam is the second step and it checks all other requests on the web site in the moment before submit comments/registers and etc.
How Spam FireWall works?
-The visitor enters to your web site.
-HTTP request data is checked of the nearly 5,8 million of certain IP spam bots.
-If it is an active spam bot, it gets a blank page, if it is a visitor then it gets a site page. This is completely transparent to the visitors.
-All the CleanTalk Spam FireWall activity is being logged in the process of filtering.CleanTalk’s Spam FireWall Features
-Protection from spam bots without access to the web site. Spam FireWall blocks most of the spam bots before they load the page of the website.
-Reducing the load on a web server. In order to post spam, many spam bots load the page, this creates a burden on the database and the server, and when a large amount of spam attacks it can have a significant impact on the performance of the website.
-Protection against HTTP/HTTPS DDoS attacks. This is one of the most common types of DDoS attacks with the aim to load a web server so that it was not able to handle all other requests.
-Protection against RPC-XML attacks. One of the most common types of attacks on sites running WordPress in order to pick up the username and password of the administrator of the web site or to organize DDoS attacks. Spam FireWall’s SQL Protection provides an affordable, automated solution for protecting from a variety of SQL injection attacks.
-Spam FireWall’s logs allows you to monitor the service work and reporting all incidents.
-Installation for 60 sec does not require modification of configuration files and others.
-Spam FireWall is available for web sites on WordPress and Joomla
Spam bots messages (comments) often disguised as ordinary users posts, but contain advertising links or text. The main objectives of such messages are the translation of the user to a malicious resource, advertisement, or by the links to raise the position of their site. This compromises the site and can spoil the reputation, the search engines lower the position of the site in the search results. That is why reliable protection from spam bots is only way to prevent the undesirable effects of cyber attacks. CleanTalk provides reliable protection from attacks and spam bots and promotes strengthening information security throughout the world.
CleanTalk Spam protection FireWall based on the use of private data black lists of IP addresses.
The main consumers are the administrators and owners of web sites, the solutions offered by CleanTalk allows to obtain an effective and automated solution to many security problems of the web sites and to save time for business development.
Another area of use is the use CleanTalk for hosting providers, as it can reduce the load on web servers to save resources and costs.
About CleanTalk
CleanTalk is a SaaS spam protection service for Web sites. CleanTalk uses protection methods which are invisible for site visitors. Connecting to the service eliminates needs for CAPTCHA, questions and answers and other methods of protection, complicating the exchange of information on the site. Their solutions are reliable, easy and efficient. The module is completely invisible to the visitors and allows you to permanently abandon the ways of protection that impedes the communication of visitors to the site (CAPTCHA, question-answer, etc.). CleanTalk allows you to automate protection against distributed from spam and registration spam bots.
The team CleanTalk has been developing a cloud spam protection system for 4 years and has created a truly reliable anti-spam service designed for you to ensure your safety.
-
Manage Personal Black/White Lists
CleanTalk informs you about the occurrence of an opportunity to manage personal black/white lists. You can view, add, and delete their items in the Control Panel->Logs
How it works
Go to CleanTalk Dashboard
Select web site and click “Logs”Each records has menu for manage, if you mark record as “Spam” – this IP/Email will be added in your personal BlackList and will be always blocked on your website
If you mark record as “Not Spam” – CleanTalk will not check this IP/Email
To view personal lists click “Personal blacklists” under the record.
Here you can change status – just click on status and it will be changed. If you delete item then CleanTalk checks it as usually. You should delete or change status of both IP and email because if you delete only IP so that visitor will be blocked because his email is still in your personal blacklist.
Also you can add other IP or email in your personal BlackLists or WhiteLists. Enter the necessary IP or email then select status and click Save.
-
How to protect your WordPress site against spam and spam bots
There are many plugins to protect against spam, almost all of them have some disadvantages. In our view it is optimal to use the cloud service CleanTalk.
Since this is a cloud service, by obtaining and analyzing data from over 100,000 web sites, CleanTalk very effectively protects against spam. The algorithms analyze the behavior of spam bots increase service efficiency up to 99.998%. This is one of the fastest anti spam plugins and does not load the server and database.
To start use CleanTalk on your WordPress site, follow these steps:
Go to WordPress Dashboard->Plugins->Add New and in the search bar, type CleanTalk and click Install.
Activate the plugin and go to settings CleanTalk.
To connect the plugin to the service, you’ll need your Access key. To get the key click on the button “Get access key”.
You will be taken to the website CleanTalk. You can change your email to register for the service.
Push the button and get your access key.
Return to the plugin settings, insert the access key and click “Save Changes”. The installation and configuration of the plugin completed, changes in Advanced Settings needed in rare cases.
To test the plugin, log out of the account administrator and go to your website. Write a test review or make a test registration with e-mail *@cl*******.org, these messages will be blocked.
Next, you should get a message about blocking
Great, your website protected from spam bots!
Similarly you can check any form in your website.
Additional features CleanTalk. Dashboard, view logs.
To view service logs, go to CleanTalk Dashboard. Or log in to your WordPress Dashboard->Settings-CleanTalk and click “Click here to get anti-spam statistics”
If you have any questions you can always contact us. We will be happy to help you.
For more info
-
Short statistics of six months 2015
We decided to bring intermediate results of six months 2015.
This is the results observed in spam attacks.
The number of spam attacks prevented on the sites for CMS:
WordPress 241 753 033
Joomla 35 511 344
phpBB 4 019 935
Drupal 2 633 882Of six months 2015 CleanTalk prevented 331 395 138 spam attacks.
As we observe the reduction in the average number of spam attacks per day on site for about 40-50% compared to 2014. Perhaps, this is due to the fact that spam bots can’t post spam on the site and don’t want to spend more time and resources.
-
Spam FireWall – how to reduce CPU usage on website and to block DDoS attacks
The CleanTalk SpamFirewall manages and filtres all inbound HTTP traffic to protect web sites from spam bots and to reduce the load on the web servers.
CleanTalk has got an advanced option “Spam FireWall” for WordPress and Joomla!, this option allows blocking the most active spam bots before they get access to web site. It prevents loading of pages of the web site by spam bots, so your web server doesn’t need perform all scripts on these pages. Also it prevents scanning of pages of the web site spam bots.
Therefore Spam FireWall significantly can reduce the load on your web server.
Spam FireWall also makes Cleantalk the two-step protection from spam bots. Spam FireWall is the first step and it blocks the most active spam bots, CleanTalk Anti-Spam is the second step and it checks all other requests on the web site in the moment before submit comments/registers and etc.
How Spam FireWall works
• The visitor enters to your web site.
• HTTP request data is checked of the nearly 5,8 million of certain IP spam bots
• If it is an active spam bot, it gets a blank page, if it is a visitor then it gets a site page. This is completely transparent to the visitors.All the CleanTalk Spam FireWall activity is being logged in the process of filtering. The logs will be available for viewing in CleanTalk Dashboard since 10/15/2015.
Spam FireWall DDos Protection
Spam FireWall can mitigate the HTTP/HTTPS DDoS attacks. When an intruder makes use of GET/POST requests to attacks on your website. Spam FireWall blocks all requests from the bad IP addresses. Your website will issue give for infringer a special page instead of the website pages. Therefore Spam FireWall can help to reduce of CPU usage on your server.