Our Magento2 (Magento 2.4.1) runs with configured Redis. Also, we have a scheduled task configured, which checks Magento2 using the health-check. (request on www.magento2shop.com/health_check.php for checking if status-code is 200)
The whole setup actually works quite well, except when the cache is flushed!
By flushing the cache I mean one of the following actions:
- Admin > Cache-Management > Click on "Flush Magento Cache"
- Admin > Cache-Management > Select Configuration cache > Click on "Submit"
- CLI: bin/magento cache:clean
- Or CLI: bin/magento cache:clean config
Then the following behavior is observed:
- The CPU-utilization of the server suddenly reaches 100% (usually the CPU-utilization is somewhat at 5%)
- In the PHP-FPM-Logs, there is a bunch of following entries:
[pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 44 idle, and 252 total children
[pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 49 idle, and 258 total children
...
[pool www] server reached pm.max_children setting (300), consider raising it
- At that time the server usually crashes and is not available anymore (causing our ngnix-server to respond with a 502-Error (Bad Gateway)
- Also the connections of the Redis-Server are increasing (from about 10 to 1000), but for the Redis-Server this is not a problem at all, cause it can handle up to 10.000 (or something like this)
What seems to be the problem is:
- When flushing the cache, Magento somehow connects to the Redis-Server to either flush the redis-cache or to rebuild it.
- Meanwhile the health-check also tries to connect to the Redis. This actually can be observed via
redis-cli monitor
, because here the output is as shown below:
... "HGET" "zc:k:xxx_TEST_CACHE_ID" "m"
... "HGET" "zc:k:xxx_TEST_CACHE_ID" "m"
Possible fixes could be:
- Reducing the functionality of the health-check. Simply do not check connection to redis.
- Deactivating the health-check during Cache-Flush-Operations
But in my eyes this would not completely solve the problem and only cure the symptoms. Moreover we cannot be sure, if the problem occurs on other scheduled operations like cronjsobs etc. So I'm looking for a more sophisticated solution.
My Answer:
One possible solution to this issue could be to implement a locking mechanism during cache flush operations to prevent simultaneous access to the Redis server by both the cache flush process and the health check process. This can be achieved by using a mutex or semaphore to ensure that only one process can access the Redis server at a time.
Another approach could be to optimize the Redis configuration to handle concurrent connections more efficiently. This could involve increasing the max connections setting in the Redis configuration or implementing connection pooling to better manage the connections.
Additionally, you could consider optimizing the health check process to reduce its impact on server resources during cache flush operations. This could involve reducing the frequency of health checks or optimizing the health check script to be more resource-efficient.
Overall, it may require a combination of these approaches to effectively address the issue and prevent server crashes during cache flush operations while the health check is being performed.
Rate this post
5 of 5 based on 1445 votesComments