Category: WordPress

  • New anti-spam checks for WordPress, XenForo, phpBB 3.1, SMF, Bitrix

    We are pleased to announce that we have released new versions of plugins for WordPress, XenForo, phpBB 3.1, SMF, Bitrix.

    In the new version, we have added some new checks for spam to improve anti-spam service.

    Mouse tracking and Time zone monitoring give good results against spam bots which simulate the behavior of real visitors.

    These checks for other CMS will be added soon.

    Please, update your anti-spam plugins for latest version:

    WordPress
    XenForo
    phpBB 3.1
    Simple Machines Forum
    Bitrix

  • Breeding Business: from ordinary blog to extraordinary magazine

    Geek at heart, I always have been coding littles projects on localhost and a few failing websites. I guess I never really took Internet seriously.

    Then, I realized these jobs I was doing in luxury hospitality were not making me happy. I just loved coming back home and writing, developing and designing. It’s just what I love. So I started looking at opportunities to generate a very small income that could make a website sustainable. And I had zero money to invest.

    Over the last years, WordPress and blogging have been a huge hit and a lot of people go for it. They think about the monetization before having thought of their content, I took it the other way around.

    Why Blogging About Dog Breeding?

    When I set my mind to start an online blog, I looked at the usual ways of finding the perfect “keyword”, “topic”, “niche”. These include Google Keyword Planner, Google Trends and some paying softwares. I managed to have three topics that seemingly were searched for and that I was happy to write posts on.

    Then, I picked the best topics and started writing. And this is when I realized I couldn’t write on anything else than what I truly loved — responsible and ethical dog breeding. I was writing one article after another. It just felt right.

    Breeding dogs is something that has been running through several generations in my family and although I haven’t done it extensively myself, I am passionate by the canine genetics and mechanisms that make you have the best bloodline of all.

    Dog breeding is a passion of mine and it would be hard for me not to write about it.

    What Is Breeding Business?

    Breeding Business was born after I wrote a few articles. I was going on Facebook Groups at the time to promote my articles (and eventually got suspended!) because Google wasn’t sending me enough traffic at first.

    The website consists of a lot of articles written and published in different categories: how-to’s, interviews of breeders, reviews of dog breeding supplies, and obviously in-depth articles on how to breed dogs.

    After just a few weeks, some visitors started asking what books were we recommending. Unfortunately most books are either too narrow in their topics or too breed-specific. A dog is a dog and the principles remain the same for a Chihuahua or a Rottweiler.

    Therefore, we created our very own ebook, The Dog Breeder’s Handbook. It was created on iBooks Author since it’s a free application built by Apple and at the time, I didn’t know if the ebook was going to be a hit, or a miss. I like to be in motion, try things and if they fail, move on to the next one.

    The Dog Breeder’s Handbook offers all the theoretical knowledge dog breeders need and a lot of actionable tips for them to put into practice. Yet, the launch was slow because the traffic was low. It was definitely generating a few hundred dollars every month. This is what kept me going and made me believe in it even more.

    From then on, I thought I was going to add another product many visitors were hinting at: a WordPress plugin for dog breeders. I built it in few weeks and it is today a very good seller. I release updates using the feedback loop and have a similar project to be released soon.

    Challenges When Growing a Simple Blog Into an Online Magazine

    Being alone and seeing the traffic (and revenue) growing, questions start to pop in your mind.

    It’s time for some business decisions

    A blogger and solo-entrepreneur always strives for steady growth. I do not identify myself with mega-growth startups we read about everywhere. To each their own!

    With Breeding Business, the growth has been great especially since Google sent traffic our way. No specific strategy that we followed, we just put out great content. Often.

    Yet, we’re still asking ourselves a million of questions…

    • Should I add another product or should I focus and grow these?
    • Communities around blogs are hype, should I make one?
    • Is the traffic growth normal or too slow?
    • Subscriptions are so popular these days, but what to offer?

    These are business decisions to make. I added another product: a course. It never took off mainly because it was kind of duplication what was in the ebook. We’re thinking a new use for courses for the future because I could see people were interested.

    Communities are great but there is nothing worse than a dead forum so we never took that risk and are waiting to have a bigger email list to perhaps one day launch a community. Subscriptions are great but just not for us right now. A lot of blogs start charging a monthly or yearly fee for members to be part of a special club but most of them see a huge churn and give that model up after a few months.

    Growth requires a technical overhaul, too

    Our traffic has been growing very well thanks to search engines. This is why we needed a quality anti-spam and CleanTalk has been doing a sublime job at keeping these fake user accounts and comments away.

    With traffic growth comes a whole new set of interrogations:

    • Why am I not converting more visitors into optins or customers?
    • GTmetrix and page speed tests are giving me low scores, how can I optimize my website?
    • Why so many people read one article and leave?

    These are technical issues that truly take time to be fixed. There are mainly two ways we could tackle these:

    1. Patch each little issue one by one
    2. Build a brand new website from scratch with these issues factored in

    After a few months, we were patching issues one by one but today, I am almost finished with a brand new version of the website to be released in two or three months after extensive testing. We’re also pairing that new website with a move from cloud hosting to a VPS (ten folding the monthly hosting cost…)

    Restructure the tree of information

    Our current website was up and running when we had around 20-30 articles. We have over 300 articles today. People aren’t visiting other pages because the information is badly structured and they can’t find their way around.

    Categories are being completely revamped. Stuff we thought was going to attract a lot of people, ended up being a graveyard and vice versa. So we’re cleaning the way the posts are categorized and tagged while updating old pages as well.

    Speed and page load

    Google is apparently using your website’s loading speed as a signal to decide on your ranking. My website is currently performing very poorly in terms of page load speed.

    And these results are after several fixes here and there. So it’s the second main focus for the update. We’re also making sure the website loads much much faster on mobile devices thanks to wp_is_mobile(), the WordPress function to detect mobile devices. We load lower-quality images, less widgets.

    Another WordPress optimisation is the use of the Transients API for our most repeated and complicated queries such as our top menu, footer, home queries, related posts, etc. The way it works is simple and allows you to store cached data in the database temporarily. Instead of retrieving the full menu at each page load, using a transient only requires a single database call for the menu to be fetched.

    Add new UX features

    The new version of Breeding Business brings its own set of new UX features. More AJAX calls, less page refreshes. More white spaces and an easier scroll through our entire page. We’ve also decluttered the article’s footer so our calls to action can jump to my visitors’ eyes.

    Conclusion is… One man can only do so much!

    Everything is wrote here is what I do daily. Article writing, support emails, plugin updates, website updates, email outreach, designing illustrations, social media promotions, bookkeeping and accounting, strategizing and long-term planning, etc. And I’m not helping myself by adding a new recurring item to our new upcoming version: biweekly giveaways!

    Over the last weeks, I realized how stupid it is to rely on your own self only. It’s self-destructive and counterproductive. I genuinely believe that delegating any of these tasks will result in a loss of quality and will cost me money.

    Yet, I have to leave my ego at the door and put some faith in other people. Sure, I may work with some disappointing people at first but it is also my duty to teach them how I want them to work.

    This is my focus for 2017 — learn how to surround myself with the right people (or person) to free some time for me to focus on what I do best.

     

    About the author

    Lazhar is the founder of Breeding Business, a free online magazine educating responsible dog breeders all around the world through in-depth dog breeding articles, interviews, ebooks and comprehensive guides.

  • What is AMP (Accelerated Mobile Pages)? How to setup CleanTalk for AMP

    What is AMP?

    Accelerated Mobile Pages — it’s the tool for static content web-page creation with almost instant load for mobile devices. It consists of three parts:

    1. AMP HTML — it’s HTML with limitations for reliable performance and some extensions for building rich content.
    2. AMP JS — is library which ensures the fast rendering of pages. Third-party JavaScripts are forbidden.
    3. Google AMP Cache — is a proxy-based content delivery network for delivering all valid AMP documents.  It fetches AMP HTML pages, caches and improves page performance automatically.

    Advantages

    • Lightweight version of standard web-pages with high speed load.
    • Instant multimedia content load: videos, animations, graphics.
    • Identical encoding — the same fast rendered website content on different devices.
    • AMP project is open source, it enables free information sharing and ideas contribution.
    • Possible advantage in SEO as page load speed is one of the ranking factors.
    • There are plugins for popular CMS to make AMP usage easier in your website.

    How to use it in WordPress

    When you choose what AMP plugin to use keep in mind the following:

    — Integration with SEO plugin for attaching corresponding metadata.

    — Analytics gathering with traffic tracking of your AMP page.

    — Displaying ads if you are a publisher.

    Available plugins in the WordPress catalog:

    1. AMP by Automattic
    2. Facebook Instant Articles & Google AMP Pages by PageFrog
    3. AMP – Accelerated Mobile Pages
    4. AMP Supremacy
    5. Custom AMP (requires installed AMP by Automattic)

    As example let’s install and activate AMP by Automattic and create a new post with multimedia content. Please, take note that not page but post. Pages and archives are not currently supported.

    AMP by Automattic plugin converts your post into accelerated version of the post automatically and you don’t have to duplicate by yourself. Just add /amp/ (or ?amp=1) to the end of your link and that would be enough.

    How to setup CleanTalk for AMP

    Please, make sure that the option “Use AJAX for JavaScript check” is disabled as it will prevent regular JavaScript execution.

    The option is here:

    WordPress Admin Page —> Settings —> CleanTalk and uncheck SpamFireWall.  

    Then, click on Advanced settings —> disable “Use AJAX for JavaScript check” —> Save Changes.

    Other options will not interrupt AMP post functioning. The CleanTalk Anti-Spam plugin will protect all data sending fields that were rendered after the conversion.

    For now, most AMP plugins remove the possibility to comments and send contact form data on accelerated pages.

    Google validation

    Now you need to validate your website structured data using the tool “Google Validator”:

    https://search.google.com/structured-data/testing-tool/

    If you don’t do this a search bot will not simply pay its attention to your post and no one will see it in the search results.

    Copy and paste the link to your AMP post and see the result. Fix the problems you will be pointed at.

    After that your AMP version of the post will be ready to use.

    Links

    AMP project:
    https://www.ampproject.org/

    AMP blog:
    https://amphtml.wordpress.com/

    AMP plugins in the WordPress catalog:
    https://wordpress.org/plugins/search.php?q=AMP

    Google Search recommendations of how to create accelerated mobile pages:
    https://support.google.com/webmasters/answer/6340290?hl=en

  • How to reduce a possibility of brute force attacks on WordPress

    How to reduce a possibility of brute force attacks on WordPress

    Until the moment when CleanTalk launched a security plugin, I didn’t pay much attention to the security of the admin account of WordPress and relied only on the complexity of the password.

    The most dangerous thing is when the bots use brute-force; pick up the password to the administrator account of the site. This can lead to very serious problems, as the attacker gets full access to the administrator account. On your website can be added malicious code, the site can be added to a botnet and participate in other attacks or the spread of viruses. The consequences for the reputation can be very sad.

    When the security plugin was launched I began to receive reports on the work of the plugin in which specify the statistics of failed login attempts to the admin account of WordPress. And for each day of such attempts was from 4 to 25, from different IP addresses. These were attempts of bots password guessing.

    What I noticed:

    1. Bots knew my login and password was selected to it.
    2. I do not use the default username Admin and changed it.
    3. In the blog there are other admin accounts, but attempts to break them for a few days of observation did not happen.

    Wondering how the bots found out my account and why not try to hack other accounts of administrators? Quite simply, under my account I place posts and write comments, and other accounts are made for employees, host and other people that perform actions only in the dashboard of the website.

    Based on this, I realized that the bots find out the login via the parsing of pages. Many publish posts and comments from the admin account.

    For example, you publish a blog post; the link to the author will be like this http://example.com/author/admin***/. Bots browsing the code of your website looking for recordings of this type on all pages of the website and collect links from all accounts.

    The same thing will happen if you write a comment from the admin account, only the link will be a bit of a different kind http://example.com/members/admin***/

    Even if you once published a post or comment from admin account, then the bots will find it and will try to crack it.

    I described one of the possible scenarios of obtaining a list of accounts for hacking, there may be others. But experience has shown that if the WordPress administrator account is not used for publications and comments on the website, its bots do not know.

    What to do in order to minimize the possibility of hacking the account of the administrator of the website.

    1. Not to publish posts and comments from the administrator account.
    2. Create an account for each administrator with another role such as Author or Editor. It all depends on your needs.
    3. Change the current administrator user. Attention! Before that, you need to backup your website and databases. I can’t recommend this and if you do this at your own risk, as this may lead to undesirable consequences.

    You will need to create a new user with administrator rights and a user with another role such as Author. Login to the dashboard with the new account and test the capabilities of the Administrator to manage site, settings and users.

    Go to the “Users” and delete the previous admin account, WordPress will ask you to whom to reassign the articles and comments, here is useful pre-created user Author. Reassign articles on it and in the future use to publish posts and comments.

    These actions can be done for other accounts administrators. But for most WordPress users would rather to install one of the plugins for protection from brute-force attacks, such as plugin Security & Firewall from CleanTalk.

  • CleanTalk launches a project to ensure the safety of websites

    CleanTalk launches a major project to create a cloud service for the safety of websites. The project will include several functions: protect the site against brute force attacks, vulnerability scanner and virus removal.

    Each function will have a number of features which help you easily keep the website safe from hackers.

    (more…)

  • SpamFireWall – prohibition of access to the site for spambots

    Every owner of the website or the webmaster is faced with such a scourge as spam in the comments or contact forms, registration by spambots in the guise of users. As a result, the form in the website processes these messages, which spend resources on the server. Some spam bots load the page to bypass the anti-spam protection, because of what resources are spent even more. In small amounts it is imperceptible, but when the web site per day receives thousands of such requests, this may significantly affect the CPU load of the server.

    Now we will tell you about a new option in the anti-spam plug-in for CleanTalk, which can effectively repel the attacks of spambots on your website. The option is called SpamFireWall (SFW), it blocks POST- and GET-requests from the most active spambots and does not allow them to download the server.

    How it works

    1. The user visits the website.
    2. His IP-address is checked against a database that contains records about more than two million IP-addresses that belong to the spambots.
    3. If the IP-address is contained in the database, the site displays a special page. Ordinary users will not notice anything, as the protection works in an invisible mode.
    4. All information about the process is stored in the database and available in the dashboard.

    The special page, which is displayed when suspected spam activity, not time-consuming for users who saw her by mistake. After 3 seconds, this user goes to the page automatically or sooner after clicking the link.

    This blocks all HTTP/HTTPS-traffic from spam active IP-addresses. Thus, in addition to spam attacks, from these IP-addresses will no longer able to be carried out and other types of attacks on the websites: bruteforce, DDoS, SQL injection, scanning of site by spambots, referral spam, etc.

    SpamFireWall allows users to configure their own “black lists” and allows you to add as a separate IP-address and a network.

    Currently SpamFireWall available for WordPress, Joomla, Drupal, Bitrix, SMF, MediaWiki, IPS Community Suite. In addition, you can use API-method to get a list of spam-active network https://cleantalk.org/help/api-spam-check).

    Logging requests SFW

    All the queries that triggered the SFW option, are stored in a log and then available in the control dashboard.

    In the statistics you can see the number of blocked requests as well as requests that have been blocked, but went to the site. At this point in the base SFW is 3.22 million IP-addresses. During 7 days, from 3 to 10 May, the SFW blocked 3,858,562 requests.

    About the service CleanTalk

    CleanTalk is a cloud service to protect websites from spam bots. CleanTalk uses protection methods that are invisible to the visitors of the website. This allows you to abandon the methods of protection that require the user to prove that he is a human (captcha, question-answer etc.).

  • The change the title of the WordPress plugin

    We changed the old title of the plugin for WordPress “Anti-Spam by CleanTalk” to the new “Spam protection by CleanTalk”. Don’t worry, we want to test how people perceive the long and short titles.

  • Non-visual methods to protect the site from spam. Part 3. Repeats

    Continuation of the article Non-visual methods to protect the site from spam

    Part 3: Repeats of substrings

    As mentioned above, non-visual methods for site protection against spam using text analysis. One of the most common spam signals – is the presence of repeated strings. As always, these examples are taken from actual company data CleanTalk.

    The search of such repeats must be minimally resource-intensive. Better if it will be called after the test from the first and second parts of the article that will be eliminated obvious spam and bring the text into a form suitable for analysis. Here I will give some statistics, as well as sample code.

    1. The sample of the code

    We use a function of determining the longest repeated substrings made by naive algorithm described here http://algolist.manual.ru/search/lrs/naive.php

    Example output is shown below.

     s  a  l  e     f  o  r     s  a  l  e     f  o  r     s  a  l  e
    0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21

    s  0   +  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .  .  .
    a  1   .  +  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .  .
    l  2   .  .  +  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .
    e  3   .  .  .  +  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +
    4   .  .  .  .  +  .  .  .  +  .  .  .  .  +  .  .  .  +  .  .  .  .
    f  5   .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .
    o  6   .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .
    r  7   .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .  .  .  .  .
    8   .  .  .  .  .  .  .  .  +  .  .  .  .  +  .  .  .  +  .  .  .  .
    s  9   .  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .  .  .
    a 10   .  .  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .  .
    l 11   .  .  .  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +  .
    e 12   .  .  .  .  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .  .  +
    13   .  .  .  .  .  .  .  .  .  .  .  .  .  +  .  .  .  +  .  .  .  .
    f 14   .  .  .  .  .  .  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .  .
    o 15   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  +  .  .  .  .  .  .
    r 16   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  +  .  .  .  .  .
    17   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  +  .  .  .  .
    s 18   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  +  .  .  .
    a 19   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  +  .  .
    l 20   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  +  .
    e 21   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  +

    $VAR1 = {
    'sale' => 3,
    'for sale' => 2
    };
    

    And here is the function in Perl with minimal changes. For convenience, here is the full text that displays the matrix above.

    #!/usr/bin/perl -w
    
    use strict;
    use utf8;
    use Data::Dumper;
    
    binmode(STDOUT, ':utf8');
    
    my $min_longest_repeat_length = 4;
    
    my $message = 'sale for sale for sale';
    my %longest_repeates = ();
    
    get_longest_repeates(\$message, \%longest_repeates);
    print Dumper(\%longest_repeates);
    
    sub get_longest_repeates {
    my $test_ref = shift;	# Link to text for analysis
    my $reps_ref = shift;	# Link to a hash of the result
    
    my @symbols = split //, $$test_ref;
    my $m_len = scalar @symbols;
    
    my @matrix = ();	# A square matrix of symbols matches
    
    # Filling the matrix to the right of the main diagonal
    for (my $i = 0; $i < $m_len; $i++) {	# Strings
    $matrix[$i] = [];
    for (my $j = $i; $j < $m_len; $j++) { # Columns only to the right of the main diagonal $matrix[$i][$j] = 1 if $symbols[$i] eq $symbols[$j]; } } # Analysis of the diagonal of the matrix to the right of the main diagonal and filling results my %repeats_tmp = (); # Hash of repeats my ($i, $j); # Search diagonal from right to left, ie from short to long repeats for ($i = $m_len - 1; $i > 0; $i--) {
    my $repeat = '';
    my $repeat_pos = undef;
    my $repeat_temp;
    
    for ($j = $i; $j < $m_len; $j++) { if (defined($matrix[$j-$i][$j]) && $matrix[$j-$i][$j] == 1) { $repeat_temp = $repeat; $repeat_temp =~ s/^ //; # If the received string of repeat is already in the hash of repeats if (defined($repeats_tmp{$repeat_temp})) { $repeat_pos = $j - length($repeat_temp); $repeats_tmp{$repeat_temp}{$repeat_pos} = 1; $repeat = $symbols[$j]; } else { $repeat .= $symbols[$j]; } } else { if ($repeat ne '') { $repeat =~ s/^ //; $repeat_pos = $j - length($repeat); if (length($repeat) >= $min_longest_repeat_length) {
    if (defined($repeats_tmp{$repeat})) {
    $repeats_tmp{$repeat}{$repeat_pos} = 1;
    } else {
    $repeats_tmp{$repeat} = {$repeat_pos => 1};
    }
    }
    $repeat = '';
    }
    }
    }
    if ($repeat ne '') {
    $repeat =~ s/^ //;
    $repeat_pos = $j - length($repeat);
    if (length($repeat) >= $min_longest_repeat_length) {
    if (defined($repeats_tmp{$repeat})) {
    $repeats_tmp{$repeat}{$repeat_pos} = 1;
    } else {
    $repeats_tmp{$repeat} = {$repeat_pos => 1};
    }
    }
    $repeat = '';
    }
    }
    
    foreach (keys %repeats_tmp){
    $$reps_ref{$_} = 1 + scalar keys %{$repeats_tmp{$_}};
    }
    
    # Output matrix for diagnostics
    print "\n";
    print ' ';
    for (my $i = 0; $i < $m_len; $i++) {
    print ' ' . $symbols[$i];
    }
    print "\n";
    print ' ';
    for (my $i = 0; $i < $m_len; $i++) {
    printf '%3d', $i;
    }
    print "\n";
    print "\n";
    for (my $i = 0; $i < $m_len; $i++) {
    print $symbols[$i];
    printf '%3d ', $i;
    for (my $j = 0; $j < $m_len; $j++) {
    my $value = '.';
    $value = '+' if (defined $matrix[$i][$j] && $matrix[$i][$j] == 1);
    printf(' %1s', $value);
    }
    print "\n";
    }
    print "\n";
    }

    2. Statistics of repeats

    We have selected the threshold of the minimum repeat length (it I do not give specifically), which gave the maximum efficiency in the tests. The results on the number of repeats as follows:

    The number of repeats In spam, % In not spam, %
    2 78,58 90,28
    3 11,93 4,86
    4 4,45 2,08
    5 2,30 1,39
    6 1,93 0
    7 0,22 0
    8 0,37 0
    9 0,07 0

    3. Conclusion

    I showed an implementation of the naive algorithm of search of repeating substring in the text. For the analysis can be used as the number of repetitions, and repetitions (e.g., stop-word). I repeat that in the fight against spam integrated tests are more effective.

    Learn more about CleanTalk Anti-Spam.

     

  • Non-visual methods to protect the site from spam. Part 2. The true face of symbols

    Continuation of the article Non-visual methods to protect the site from spam

    Part 2: The true face of symbols

    Non-visual methods to protect website from spam use, in particular, the analysis of the transmitted text. Spammers use many techniques to complicate the analysis. Here will be shown examples of one of them, namely, substitution of symbols. Examples are taken from actual company data CleanTalk.

    Symbols substitution is very simple, but as a result it can not run filters on stop-words, may worse working Bayesian filters, and filters with the definition of the language. Therefore, before using these filters it makes sense to return to the symbols their true face.

    Specify at once that replace symbols directly, for example, national symbols with the mark of the Latin ‘a’ to the very Latin ‘a’, is totally unacceptable without an analysis of the language and context. Also replace the letters, similar to zero by zero is possible only when you know exactly what to look for in the text (for example, telephone numbers).

    However, the character replacement is permitted in the case where the meaning of the written text is saved after changing. And the replacement is necessary to bring certain sets of special symbols to one.

    Here I will show you two of the most interesting ways of substitution of symbols we have encountered.

    1. Symbols replacement a normal typeface

    Spammers do everything to make text conspicuous, even at a cursory glance. Fortunately for them, Unicode provides a set of extended Latin characters typefaces. Fortunately for us, it is easily corrected.

    Below are the most common methods, as Latin characters are substituted with the same Latin, but not from the main range of the Latin alphabet.

    Replacement of Latin characters in the ordinary becomes a simple regular expression. After this change the following filters work better and faster, because input range greatly narrowed.

    1. Replacing the point

    The point is used as the symbol much wider than the punctuation mark – it is a field delimiter, and positions and the delimiter in numbers spam phone numbers, etc.

    So we are faced with the need to bring the variety of spam points into one single.

    The most common of such substitution points we encountered are shown below.

    Substitute, code

    Substitute, view

    U+3002
    U+0701 ܁
    U+0702 ܂
    U+2024
    U+FE12
    U+FE52
    U+FF61

    Replacement points can be made simple regular expression

    tr/
    \N{U+3002}\N{U+0701}\N{U+0702}\N{U+2024}\N{U+FE12}\N{U+FE52}\N{U+FF61}
    /
    \N{U+002E}\N{U+002E}\N{U+002E}\N{U+002E}\N{U+002E}\N{U+002E}\N{U+002E}
    /

    It is noticed that after replacing the points subsequent filters operate really effectively.

    1. Conclusion

    I brought two ways of substitution of symbols. Inverse replacement is simple, low system requirements and greatly increases the accuracy of the filters based on the analysis of words and expressions.

    Learn more about CleanTalk Anti-Spam.

  • Non-visual methods to protect the site from spam. Part 1. Statistics

    Part 1. What statistic says

    Non-visual methods to protect the site from spam suggest automatic analysis of data coming from the visitor. As more data is analyzed, the more fully and more accurately visitor can be defined and made a decision is he a spammer or not.

    Systems that analyze such data usually accumulate visitor data statistics and the judgments. We offer an overview of the statistical data collected by us (service to protect sites from spam CleanTalk).

    Here I purposely do not cite the data analysis of IP addresses on black lists. Without them, you can obtain enough data, analyzing only the contents of form fields and HTTP headers.

    I’ll review the data by text message, nickname and email address and HTTP headers and the audit results of JavaScript test.

    Analysis on these figures algorithmically very simple and not demanding to resources, so it can be used before other more resource-intensive inspections.

    The data reflect the real picture at the time of writing and made on the basis of our analysis of the current traffic (more than 2 000 000 requests per day). Data can be freely used in the analysis of visitors to your sites. I note that the judgment for each criterion separately is not true — the best result will be achieved with a comprehensive analysis.

    1. Message text

    Message text – it is certainly the main thing in the spam. Consequently, spammers will build their posts so that on several criteria, they are clearly different from normal messages.

    The following table shows the most, in my view, informative statistics.

    Message text settings (average values) Not spam Spam
    Number of links, pcs 1.47 4.27
    Number of contacts (phone, e-mail), pcs 1.72 6.38
    Form filling time, sec 177 8
    The ratio of the length of the message to the time of filling, symbols/sec 23.81 308.54

    Amount of links speaks for itself. The amount of contact information can also be said about spam. Form filling time and, as a consequence, the rate of posts set differ most strongly.

    1. The nickname of the visitor

    The nickname can also tell about a lot of things. Probable cause is the quality of the algorithms of generating names that spammers use.

    Parameters of nickname (average values) Not spam Spam
    Length, symbols 7.40 16.52
    The number of delimiters, pcs 1.89 3.80
    The number of digits, pcs 3.29 7.59
    The length of a continuous sequence of consonant letters (for Latin), symbols 3.61 5.90

    One of the tasks of the spammer is not stumble on an error that a user with the same name is already on the site. So the uniqueness of nicknames currently provided, according to statistics, in the forehead – length, insert delimiters and numbers. As a result, you get a lot of nicknames with a large number of adjacent vowels and consonants, with the latter more.

    1. Name in e-mail

    Everything said for nicknames true for the name in the email.

    Parameters of name in e-mail (average values) Not spam Spam
    Length, symbols 10.09 19.16
    The number of delimiters, pcs 1.62 4.12
    The number of digits, pcs 4.30 9.57

    Note that as the delimiters characters are often used point – generated character string, then it randomly adds points, so you get a lot of e-mail names.

    1. HTTP-headers

    Spam-bots forge their headers to not be very different from the browser.

    However, statistics show that this is often true only at the time of writing the bot. In the future, it continues to work and send clearly outdated titles that can be seen in the table below.

    The percentage of HTTP headers User-Agent Not spam Spam
    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) 0.01% 11.42%
    Opera/9.80 (Windows NT 6.2; Win64; x64) Presto/2.12.388 Version/12.17 0.01% 10.84%

    Ready spam solutions may also leave their headings, in particular, when using HTTP-proxy. And this is also reflected in our statistics.

    The percentage of HTTP headers Via Not spam Spam
    Mikrotik HttpProxy 0.86% 33.07%
    1. JavaScript-test

    Additional simple but very effective check can be JavaScript-test. For example, changing the JS-code the desired cookies, the options are many.

    The most advanced (and expensive) bots pass JS-tests. However, as can be seen from the statistics, a large percentage of spam comes from very simple programs, unable to do so.

    Percentage of failing JS-test Not spam Spam
    change cookies through JS 0.41% 68.53%
    1. Conclusion

    I have shown statistical data collected by our system at the moment. Again, for the most accurate solution to spam/not spam you need to analyze the indexes comprehensively, as well as in combination with other methods of spam checks.

    Learn more about CleanTalk Anti-Spam.