recaptcha Archives - CleanTalk's blog

Who is this article for?

We’ve been closely following the thread https://github.com/google/recaptcha/issues/235 and noticed that, despite being closed, users continue to report issues.

We’ve decided to investigate the problem and share our findings with you.

How ReCaptcha v3 works
What is a score
Why you might get a score other than 0.9 in ReCaptcha v2
Why you always get a score of 0.9 in ReCaptcha v3
Our testing process
How to get an accurate score in a test environment
CleanTalk’s solutions

Research Objective

Users complain that when testing ReCaptcha v3, they always receive the same score of 0.9. However, in the same environments with ReCaptcha v2, the score varies.

What is a Score?

The score is the result of the ReCaptcha check. The closer it is to 1, the more likely the visitor is human. The closer it is to 0, the more likely the visitor is a bot.

How ReCaptcha v3 Works

Note: The following findings are based on publicly available code and our interpretation.

A user integrates the ReCaptcha script on a form page.
A unique frontend token is added to each form.
The script loads additional obfuscated code.
The obfuscated code collects frontend data (a “black box” not accessible due to Google’s code obfuscation).
Aggregated and encoded data + frontend token is sent to Google’s cloud to get a result token.
The result token is sent to the backend of the testing environment.
The backend validates the token via Google’s API, sending the backend token, result token, and the visitor’s IP address.
Based on the score result, the backend environment can decide whether to allow the visitor to proceed.

The backend environment decides whether to allow the visitor to proceed based on the score.

We believe ReCaptcha v3 relies on machine learning based on the traffic environment. The exact decision-making algorithms are proprietary and remain a trade secret of Google.

Why You Get Score <> 0.9 in ReCaptcha v2

ReCaptcha v2 does not use machine learning for decision-making.
It operates in one of two modes:

in the user interaction mode (presence of click-the-flag mechanism on the page).
In silent mode (reCaptcha v2 badge on the page).

The data collection and processing occur in real time, allowing for accurate, immediate results. Learn more: https://developers.google.com/recaptcha/docs/versions.

Why You Always Get a Score = 0.9 in ReCaptcha v3:

ReCaptcha v3 relies on machine learning based on traffic data.
A consistent score of 0.9 indicates the system lacks sufficient data about your typical traffic to make an accurate decision. To avoid false positives, the system grants a 0.9 score to all visitors until trained.

Our Testing Process

Test Environment

A PHP website running WordPress 6.2.
ReCaptcha v3 integrated according to instructions.

Bot

A simple bot created in Python using Selenium.

The bot was run from three IP addresses, emulating the following parameters

headless
user agents
headers
clicks
form submissions

Process

The bot ran for 24 hours, performing sequential visits and form submissions with random parameters.

No live traffic was sent to the site.

Results

All bot requests returned a score of 0.9.
The score did not change over time.
No statistics appeared in Google Analytics.
We hypothesize that traffic presence, volume, and quality in Google Analytics may act as a training marker for the ReCaptcha system.

How to Get an Accurate Score in a Test Environment

The recaptcha v3 model assumes long-lasting training on live traffic.

This means that the test environment must be loaded in the same way as the production environment. Which will undoubtedly cause some difficulties in deploying such an environment and getting the payload.

We believe that to get the right score a user will have to turn to testing in a productive environment.

However, the policy of most companies we know of (including CleanTalk of course) restricts any testing in a production environment.

Unfortunately, we couldn’t find specific terms for the duration of training in Google’s official documentation. We believe that the duration of training depends on the following parameters:

Traffic load
Ratio of bots to real users
Percentage of “intelligent” bots among total bot traffic

Without live traffic, no settings or configurations will yield an accurate score in a test environment.

CleanTalk’s Solutions

CleanTalk Check Bot

Decisions are made online without machine learning.
Simpler integration—no need to manually add tokens to forms.
Extensive documentation available: GitHub CleanTalk API
Immediate and relevant testing results.
Technical support response within 24 hours.

Anti-Spam SAAS for CMS

CleanTalk provides a cloud-based anti-spam service for websites, blocking spam in real time without CAPTCHAs. It integrates with CMS platforms like WordPress and Joomla, securing comments, registrations, and contact forms. Features include SpamFireWall to block spambots, email validation, and detailed logs, ensuring seamless protection and improved user experience.

Anti-Spam CleanTalk API

CleanTalk offers a suite of APIs that integrate anti-spam functionalities into various applications. The Anti-Spam API includes methods like

check_newuser() for registration checks;
check_message() for evaluating comments and contact form submissions;
send_feedback() for moderator inputs.

The Database (Blacklists) API provides

spam_check() to verify IP and email records against CleanTalk’s database;
backlinks_check() to detect domains associated with spam;
the ip_info() method returns country codes for IP addresses.

For managing personal lists and uptime monitoring, the Dashboard API offers dedicated methods. These APIs enable developers to enhance their applications’ security and spam prevention capabilities effectively.

Tag: recaptcha

Recaptcha v3 always returns 0.9 score – research by CleanTalk