Stop spam without frustrating your visitors

Create your CleanTalk account and start blocking spam — no CAPTCHA challenges and no impact on visitors.

Security Block Lists

CleanTalk Account

No credit card required • Setup takes less than a minute • Your temporary password will be sent by email.

Tag: php bot protection

  • Reducing Disk Load in High-Traffic PHP Applications: Switching from SQLite to Redis for Anti-Crawler Storage

    Automated crawlers and scraping bots are a growing problem for modern websites. While search engine bots are useful, many other crawlers generate excessive traffic, scrape content, or overload servers.

    To help website owners control this type of traffic, we recently released the Anti-Crawler PHP Library by CleanTalk, an open-source tool designed to detect and limit aggressive crawlers before they cause performance problems.

    GitHub repository: https://github.com/CleanTalk/php-anticrawler

    anti crawler storage architecture

    The library analyzes incoming requests and applies rate-limiting logic to detect crawler-like behavior. Once a bot exceeds defined limits, the system blocks or restricts further requests.

    In the first version of the library we chose SQLite as the storage backend. SQLite allowed the library to work immediately after installation without requiring additional infrastructure such as Redis or Memcached.

    However, after deploying the library on our own high-traffic website cleantalk.org, we encountered an unexpected performance issue: disk load increased significantly.

    The result was a simple architectural change that completely removed the disk load increase while improving scalability.

    The First Version of the Anti-Crawler Library

    The goal of the library was to provide a simple crawler protection mechanism for PHP applications. Typical anti-crawler logic requires storing temporary request data. Each request updates this data so the system can determine whether a visitor behaves like a normal user or an automated crawler. Because the data must be updated frequently, the storage backend plays a critical role in overall performance.

    Why SQLite Was Chosen

    For the initial release we selected SQLite for several reasons:

    1. Zero configuration. SQLite is included in most PHP environments and does not require running an additional service.
    2. Single-file storage. All data is stored in a single database file, making installation extremely simple.
    3. Good performance for moderate workloads. SQLite performs very well for many typical web applications.
    4. Easy deployment. Users could install the library without modifying their infrastructure.

    This approach allowed the library to work immediately after installation and made it suitable for shared hosting environments. For many websites this configuration works perfectly. However, high-traffic environments behave differently.

    Deploying the Library on a High-Traffic Website

    After releasing the first version of the library, we deployed it on our own website https://cleantalk.org Our infrastructure handles a large volume of traffic, including both legitimate users and automated bots. Shortly after enabling the library, our monitoring systems detected something unusual. Disk Activity Increased. Server monitoring showed a noticeable increase in disk activity. After analyzing the metrics we observed: Disk load increased by approximately 30%.

    This was unexpected because the library itself performs only lightweight operations. The problem was not CPU usage or memory consumption. Instead, the issue was directly related to disk I/O. Further investigation showed that the additional disk operations were coming from the SQLite database used by the anti-crawler system.

    Why SQLite Became a Bottleneck

    SQLite is a reliable and efficient embedded database, but its design has limitations under certain workloads. The anti-crawler system generates a very specific traffic pattern. For each HTTP request the library needs to:

    • read crawler counters
    • update request statistics
    • write the updated data back to storage

    This means the database receives frequent write operations.

    Because SQLite stores data on disk, every update results in disk activity. Under high traffic this leads to a large number of disk writes. SQLite also uses file-level locking to ensure consistency. When many requests attempt to update the database simultaneously, additional locking overhead appears.

    As a result, frequent writes combined with locking increased disk activity on our production servers.

    Moving the Storage Layer to Redis / KeyDB

    To eliminate disk operations we needed a storage system optimized for frequent updates. The natural solution was an in-memory data store, so we added support for: Redis and KeyDB. Both systems keep data in memory and provide extremely fast read and write operations. This approach removes disk I/O and allows the crawler detection logic to update counters much more efficiently.

    The Anti-Crawler PHP Library was updated to support multiple storage backends. Users can now choose between:

    • SQLite (default)
    • Redis
    • KeyDB

    SQLite remains useful for simple deployments, while Redis or KeyDB can be enabled for high-traffic environments. The crawler detection logic itself remains unchanged — only the storage backend is replaced.

    Results After Switching to Redis

    After switching the storage backend to Redis on our production servers we immediately saw improvements. Disk activity returned to normal because the crawler counters were now stored in memory instead of on disk. The previous 30% increase in disk load disappeared, and request processing became faster. The Redis-based architecture also scales better under heavy traffic and avoids locking issues associated with file-based databases.

    When to Use SQLite vs Redis

    Both storage options remain available because they fit different environments.

    SQLite works well for:

    • small and medium websites
    • environments without Redis
    • simple installations

    Redis or KeyDB is recommended for:

    • high-traffic websites
    • infrastructure already using Redis
    • environments with heavy bot traffic

    How to Use the Anti-Crawler PHP Library

    The library is open source and available on GitHub: https://github.com/CleanTalk/php-anticrawler It can be integrated into any PHP application to detect aggressive crawlers and limit automated traffic.

    Installation

    composer require cleantalk/php-anticrawler

    Quick starthttps://github.com/CleanTalk/php-anticrawler?tab=readme-ov-file#anti-crawler-php-library-by-cleantalk

    Conclusion

    Switching the storage backend of our Anti-Crawler PHP Library from SQLite to Redis/KeyDB allowed us to eliminate the disk I/O overhead that appeared under high traffic. This small architectural change removed the 30% disk load increase and made the crawler detection system faster and more scalable for busy websites.

    Anti-Crawler PHP Library by CleanTalk

    Protect your website from aggressive crawlers and automated scraping.

    GitHub: https://github.com/CleanTalk/php-anticrawler