Stop spam without frustrating your visitors

Create your CleanTalk account and start blocking spam — no CAPTCHA challenges and no impact on visitors.

Security Block Lists

CleanTalk Account

No credit card required • Setup takes less than a minute • Your temporary password will be sent by email.

Author: Vitalii

  • How to Reduce Server Load by Simply Filtering Bad Traffic

    How to Reduce Server Load by Simply Filtering Bad Traffic

    When a website starts slowing down, many teams immediately think about scaling infrastructure: adding CPU, RAM, more servers, or optimizing the database. In reality, a significant part of server load is often caused not by real users, but by automated traffic — bots, scrapers, vulnerability scanners, spam robots, and aggressive crawlers.

    These requests may continuously scan pages, submit forms, probe URLs, overload search, login, registration, and API endpoints. As a result, your server wastes resources processing useless traffic: PHP workers are occupied, database connections increase, memory is consumed, and response times get worse for legitimate visitors.

    How to Reduce Server Load
    How to Reduce Server Load

    Why Blocking Traffic Early Matters

    Once a malicious or unwanted request reaches backend logic, some resources have already been spent. That is why one of the most effective ways to reduce load is to block suspicious traffic as early as possible, before expensive application code runs.

    Even basic filtering can provide immediate benefits:

    • lower CPU usage;
    • fewer PHP worker bottlenecks;
    • reduced MySQL load;
    • faster page responses for real users;
    • better stability during traffic spikes;
    • cleaner analytics data.

    A Practical Solution for PHP Projects — Anti-Crawler PHP Library

    For PHP-based websites and services, a useful option is CleanTalk php-anticrawler — an open-source Anti-Crawler PHP Library designed to detect and filter unwanted bot traffic.

    It can be integrated into PHP applications as an additional protection layer without requiring a major architecture rebuild.

    Real Usage Example: CleanTalk.org

    The library has already been successfully connected to the CleanTalk website in the Blacklists section. Over the last 60 days, the system processed a large volume of traffic and showed clear filtering results:

    • BLOCKED: 1,791,250 requests
    • LEGITIMATE: 460,502 requests

    This means a significant amount of unwanted automated traffic was stopped before consuming backend resources. A traffic chart for this period can clearly demonstrate how early filtering helps reduce unnecessary server load.

    What the Anti-Crawler PHP Library Can Help With

    The library helps detect and limit suspicious requests using signals such as:

    • IP address reputation;
    • abnormal request frequency;
    • bot-like behavior patterns;
    • technical signs of automated clients;
    • aggressive crawling activity.

    It is especially useful for protecting:

    • login pages;
    • registration forms;
    • contact forms;
    • search pages;
    • REST/API endpoints;
    • resource-heavy pages.

    Business Benefits

    Many companies try to solve load issues by upgrading servers or paying for more infrastructure. But if a large share of requests has no business value, reducing useless traffic is often the smarter first step.

    Filtering bad traffic can help:

    • lower hosting and infrastructure costs;
    • reduce downtime and overload incidents;
    • improve website speed and uptime;
    • increase conversion rates through faster UX;
    • clean traffic reports and analytics.

    Best Use Cases

    This approach is highly effective for:

    • eCommerce stores;
    • SaaS platforms;
    • WordPress and other PHP CMS websites;
    • lead generation websites;
    • public API services.

    Final Thoughts

    Not every performance issue requires more servers. In many cases, the first step should be identifying how much of your resources are wasted on useless automated traffic.

    For PHP projects, the Anti-Crawler PHP Library by CleanTalk can be a practical way to reduce backend load, improve performance, and protect your website from unwanted traffic.

  • Reducing Disk Load in High-Traffic PHP Applications: Switching from SQLite to Redis for Anti-Crawler Storage

    Reducing Disk Load in High-Traffic PHP Applications: Switching from SQLite to Redis for Anti-Crawler Storage

    Automated crawlers and scraping bots are a growing problem for modern websites. While search engine bots are useful, many other crawlers generate excessive traffic, scrape content, or overload servers.

    To help website owners control this type of traffic, we recently released the Anti-Crawler PHP Library by CleanTalk, an open-source tool designed to detect and limit aggressive crawlers before they cause performance problems.

    GitHub repository: https://github.com/CleanTalk/php-anticrawler

    The library analyzes incoming requests and applies rate-limiting logic to detect crawler-like behavior. Once a bot exceeds defined limits, the system blocks or restricts further requests.

    In the first version of the library we chose SQLite as the storage backend. SQLite allowed the library to work immediately after installation without requiring additional infrastructure such as Redis or Memcached.

    However, after deploying the library on our own high-traffic website cleantalk.org, we encountered an unexpected performance issue: disk load increased significantly.

    The result was a simple architectural change that completely removed the disk load increase while improving scalability.

    The First Version of the Anti-Crawler Library

    The goal of the library was to provide a simple crawler protection mechanism for PHP applications. Typical anti-crawler logic requires storing temporary request data. Each request updates this data so the system can determine whether a visitor behaves like a normal user or an automated crawler. Because the data must be updated frequently, the storage backend plays a critical role in overall performance.

    Why SQLite Was Chosen

    For the initial release we selected SQLite for several reasons:

    1. Zero configuration. SQLite is included in most PHP environments and does not require running an additional service.
    2. Single-file storage. All data is stored in a single database file, making installation extremely simple.
    3. Good performance for moderate workloads. SQLite performs very well for many typical web applications.
    4. Easy deployment. Users could install the library without modifying their infrastructure.

    This approach allowed the library to work immediately after installation and made it suitable for shared hosting environments. For many websites this configuration works perfectly. However, high-traffic environments behave differently.

    Deploying the Library on a High-Traffic Website

    After releasing the first version of the library, we deployed it on our own website https://cleantalk.org Our infrastructure handles a large volume of traffic, including both legitimate users and automated bots. Shortly after enabling the library, our monitoring systems detected something unusual. Disk Activity Increased. Server monitoring showed a noticeable increase in disk activity. After analyzing the metrics we observed: Disk load increased by approximately 30%.

    This was unexpected because the library itself performs only lightweight operations. The problem was not CPU usage or memory consumption. Instead, the issue was directly related to disk I/O. Further investigation showed that the additional disk operations were coming from the SQLite database used by the anti-crawler system.

    Why SQLite Became a Bottleneck

    SQLite is a reliable and efficient embedded database, but its design has limitations under certain workloads. The anti-crawler system generates a very specific traffic pattern. For each HTTP request the library needs to:

    • read crawler counters
    • update request statistics
    • write the updated data back to storage

    This means the database receives frequent write operations.

    Because SQLite stores data on disk, every update results in disk activity. Under high traffic this leads to a large number of disk writes. SQLite also uses file-level locking to ensure consistency. When many requests attempt to update the database simultaneously, additional locking overhead appears.

    As a result, frequent writes combined with locking increased disk activity on our production servers.

    Moving the Storage Layer to Redis / KeyDB

    To eliminate disk operations we needed a storage system optimized for frequent updates. The natural solution was an in-memory data store, so we added support for: Redis and KeyDB. Both systems keep data in memory and provide extremely fast read and write operations. This approach removes disk I/O and allows the crawler detection logic to update counters much more efficiently.

    The Anti-Crawler PHP Library was updated to support multiple storage backends. Users can now choose between:

    • SQLite (default)
    • Redis
    • KeyDB

    SQLite remains useful for simple deployments, while Redis or KeyDB can be enabled for high-traffic environments. The crawler detection logic itself remains unchanged — only the storage backend is replaced.

    Results After Switching to Redis

    After switching the storage backend to Redis on our production servers we immediately saw improvements. Disk activity returned to normal because the crawler counters were now stored in memory instead of on disk. The previous 30% increase in disk load disappeared, and request processing became faster. The Redis-based architecture also scales better under heavy traffic and avoids locking issues associated with file-based databases.

    disk io
    disk io

    When to Use SQLite vs Redis

    Both storage options remain available because they fit different environments.

    SQLite works well for:

    • small and medium websites
    • environments without Redis
    • simple installations

    Redis or KeyDB is recommended for:

    • high-traffic websites
    • infrastructure already using Redis
    • environments with heavy bot traffic

    How to Use the Anti-Crawler PHP Library

    The library is open source and available on GitHub: https://github.com/CleanTalk/php-anticrawler It can be integrated into any PHP application to detect aggressive crawlers and limit automated traffic.

    Installation

    composer require cleantalk/php-anticrawler

    Quick starthttps://github.com/CleanTalk/php-anticrawler?tab=readme-ov-file#anti-crawler-php-library-by-cleantalk

    Conclusion

    Switching the storage backend of our Anti-Crawler PHP Library from SQLite to Redis/KeyDB allowed us to eliminate the disk I/O overhead that appeared under high traffic. This small architectural change removed the 30% disk load increase and made the crawler detection system faster and more scalable for busy websites.

    On cleantalk.org Anti-Crawler PHP Library serves about 20k sessions weekly, wich gives roughly 500k hits weekl.

    Anti-Crawler PHP Library by CleanTalk

    Protect your website from aggressive crawlers, automated scraping, and unwanted bot traffic using the CleanTalk Anti-Crawler PHP library.