Tag: recaptcha

  • Recaptcha v3 always returns 0.9 score – research by CleanTalk

    Recaptcha v3 always returns 0.9 score – research by CleanTalk

    Who is this article for?

    We’ve been closely following the thread https://github.com/google/recaptcha/issues/235 and noticed that, despite being closed, users continue to report issues.

    We’ve decided to investigate the problem and share our findings with you.

    • How ReCaptcha v3 works
    • What is a score
    • Why you might get a score other than 0.9 in ReCaptcha v2
    • Why you always get a score of 0.9 in ReCaptcha v3
    • Our testing process
    • How to get an accurate score in a test environment
    • CleanTalk’s solutions

    Research Objective

    Users complain that when testing ReCaptcha v3, they always receive the same score of 0.9. However, in the same environments with ReCaptcha v2, the score varies.

    What is a Score?

    The score is the result of the ReCaptcha check. The closer it is to 1, the more likely the visitor is human. The closer it is to 0, the more likely the visitor is a bot.

    How ReCaptcha v3 Works

    Note: The following findings are based on publicly available code and our interpretation.

    1. A user integrates the ReCaptcha script on a form page.
    2. A unique frontend token is added to each form.
    3. The script loads additional obfuscated code.
    4. The obfuscated code collects frontend data (a “black box” not accessible due to Google’s code obfuscation).
    5. Aggregated and encoded data + frontend token is sent to Google’s cloud to get a result token.
    6. The result token is sent to the backend of the testing environment.
    7. The backend validates the token via Google’s API, sending the backend token, result token, and the visitor’s IP address.
    8. Based on the score result, the backend environment can decide whether to allow the visitor to proceed.

    The backend environment decides whether to allow the visitor to proceed based on the score.

    We believe ReCaptcha v3 relies on machine learning based on the traffic environment. The exact decision-making algorithms are proprietary and remain a trade secret of Google.

    Why You Get Score <> 0.9 in ReCaptcha v2

    ReCaptcha v2 does not use machine learning for decision-making.
    It operates in one of two modes:

    1. in the user interaction mode (presence of click-the-flag mechanism on the page).
    2. In silent mode (reCaptcha v2 badge on the page).

    The data collection and processing occur in real time, allowing for accurate, immediate results. Learn more: https://developers.google.com/recaptcha/docs/versions.

    Why You Always Get a Score = 0.9 in ReCaptcha v3:

    ReCaptcha v3 relies on machine learning based on traffic data.
    A consistent score of 0.9 indicates the system lacks sufficient data about your typical traffic to make an accurate decision. To avoid false positives, the system grants a 0.9 score to all visitors until trained.

    Our Testing Process

    Test Environment

    • A PHP website running WordPress 6.2.
    • ReCaptcha v3 integrated according to instructions.

    Bot

    A simple bot created in Python using Selenium.

    The bot was run from three IP addresses, emulating the following parameters

    • headless
    • user agents
    • headers
    • clicks
    • form submissions

    Process

    The bot ran for 24 hours, performing sequential visits and form submissions with random parameters.

    No live traffic was sent to the site.

    Results

    • All bot requests returned a score of 0.9.
    • The score did not change over time.
    • No statistics appeared in Google Analytics.
      We hypothesize that traffic presence, volume, and quality in Google Analytics may act as a training marker for the ReCaptcha system.

    How to Get an Accurate Score in a Test Environment

    The recaptcha v3 model assumes long-lasting training on live traffic.

    This means that the test environment must be loaded in the same way as the production environment. Which will undoubtedly cause some difficulties in deploying such an environment and getting the payload.

    We believe that to get the right score a user will have to turn to testing in a productive environment.

    However, the policy of most companies we know of (including CleanTalk of course) restricts any testing in a production environment.

    Unfortunately, we couldn’t find specific terms for the duration of training in Google’s official documentation. We believe that the duration of training depends on the following parameters:

    • Traffic load
    • Ratio of bots to real users
    • Percentage of “intelligent” bots among total bot traffic

    Without live traffic, no settings or configurations will yield an accurate score in a test environment.

    CleanTalk’s Solutions

    CleanTalk Check Bot

    • Decisions are made online without machine learning.
    • Simpler integration—no need to manually add tokens to forms.
    • Extensive documentation available: GitHub CleanTalk API
    • Immediate and relevant testing results.
    • Technical support response within 24 hours.

    Anti-Spam SAAS for CMS

    CleanTalk provides a cloud-based anti-spam service for websites, blocking spam in real time without CAPTCHAs. It integrates with CMS platforms like WordPress and Joomla, securing comments, registrations, and contact forms. Features include SpamFireWall to block spambots, email validation, and detailed logs, ensuring seamless protection and improved user experience.

    Anti-Spam CleanTalk API

    CleanTalk offers a suite of APIs that integrate anti-spam functionalities into various applications. The Anti-Spam API includes methods like

    • check_newuser() for registration checks;
    • check_message() for evaluating comments and contact form submissions;
    • send_feedback() for moderator inputs.

    The Database (Blacklists) API provides

    • spam_check() to verify IP and email records against CleanTalk’s database;
    • backlinks_check() to detect domains associated with spam;
    • the ip_info() method returns country codes for IP addresses.

    For managing personal lists and uptime monitoring, the Dashboard API offers dedicated methods. These APIs enable developers to enhance their applications’ security and spam prevention capabilities effectively.