php5-fpm randomly starts consuming a lot of CPU

eugene_b

php5-fpm randomly starts consuming a lot of CPU
December 05, 2012 03:58PM

Registered: 11 years ago
Posts: 3

I've run into a really strange problem which I am not sure how to debug further. I have an NGINX + PHP5-FPM + APC Amazon Ubuntu instance and there is a website installed on it which is a complex PHP framework. While trying to debug the problem, I've reduced the flow to this: a lot of big classes get included, main objects are created, session is started, array of configs is retrieved from memcached, an XML file is retrieved from memcached, HTML templates are included, output is sent to the client.

Then I use http_load tool to put the website under the load of 20 requests per second: http_load -timeout 10 -rate 20 -fetches 10000 ./urls.txt

What happens next is rather strange. `top` shows a bunch of php5-fpm processes spawned each taking a few % of CPU and everything runs smoothly, like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28440 www-data 20 0 67352 10m 5372 S 4.3 1.8 0:20.33 php5-fpm
28431 www-data 20 0 67608 10m 5304 S 3.3 1.8 0:16.77 php5-fpm
28444 www-data 20 0 67352 10m 5372 S 3.3 1.8 0:17.17 php5-fpm
28445 www-data 20 0 67352 10m 5372 S 3.0 1.8 0:16.83 php5-fpm
28422 www-data 20 0 67608 10m 5292 S 2.3 1.8 0:18.99 php5-fpm
28424 www-data 20 0 67352 10m 5368 S 2.0 1.8 0:16.59 php5-fpm
28438 www-data 20 0 67608 10m 5304 S 2.0 1.8 0:17.91 php5-fpm
28439 www-data 20 0 67608 10m 5304 S 2.0 1.8 0:23.34 php5-fpm
28423 www-data 20 0 67608 10m 5292 S 1.7 1.8 0:20.02 php5-fpm
28430 www-data 20 0 67608 10m 5300 S 1.7 1.8 0:15.77 php5-fpm
28433 www-data 20 0 67352 10m 5372 S 1.7 1.8 0:17.08 php5-fpm
28434 www-data 20 0 67608 10m 5292 S 1.7 1.8 0:18.56 php5-fpm
20648 memcache 20 0 51568 8192 708 S 1.3 1.3 2:51.06 memcached
28420 www-data 20 0 69876 13m 6300 S 1.3 2.3 0:20.89 php5-fpm
28421 www-data 20 0 67608 10m 5300 S 1.3 1.8 0:21.19 php5-fpm
28429 www-data 20 0 9524 2260 992 S 1.3 0.4 0:11.68 nginx
28435 www-data 20 0 67608 10m 5304 S 1.3 1.8 0:18.58 php5-fpm
28437 www-data 20 0 67352 10m 5372 S 1.3 1.8 0:17.87 php5-fpm
28441 www-data 20 0 67608 10m 5292 S 1.3 1.8 0:20.75 php5-fpm

Then after some time which can be anywhere between one second and minutes, several (usually two) php5-fpm processes suddenly consume all the CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28436 www-data 20 0 67608 10m 5304 R 48.5 1.8 0:23.68 php5-fpm
28548 www-data 20 0 67608 10m 5276 R 45.2 1.7 0:07.62 php5-fpm
28434 www-data 20 0 67608 10m 5292 R 2.0 1.8 0:23.28 php5-fpm
28439 www-data 20 0 67608 10m 5304 R 2.0 1.8 0:26.63 php5-fpm

At this point everything gets stuck and all new HTTP requests timeout. If I stop http_load tool, the php5-fpm will hang there for many minutes. Interestingly enough if I do `php5-fpm stop`, the php5-fpm processes will disappear but any commands that make use of filesystem will have problems executing. E.g. if I try to download a file via ssh, `top` will show the following, taking many minutes to initiate the actual download:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3298 sshd 20 0 7032 876 416 R 75.2 0.1 0:04.52 sshd
3297 sshd 20 0 7032 876 416 R 24.9 0.1 0:04.49 sshd

PHP error log usually has this:
[05-Dec-2012 20:31:39] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 58 total children
[05-Dec-2012 20:32:08] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 66 total children

Nginx error log is flooded with these entries:
2012/12/05 20:31:36 [error] 4800#0: *5559 connect() to unix:/dev/shm/php-fpm-www.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: ..., server: ec2-....compute-1.amazonaws.com, request: "GET /usa/index.php?page=contact_us HTTP/1.0", upstream: "fastcgi://unix:/dev/shm/php-fpm-www.sock:", host: "ec2-....compute-1.amazonaws.com"

PHP-FPM slow log doesn't show anything interesting, swapping never happens and I didn't manage to gather any other interesting facts about the problem. I've gone through many iterations of config file changes, the most recent ones being
nginx.conf: http://pastebin.com/uaD56hJF
pool.d/www.conf: http://pastebin.com/mFeeUULC

Edited 1 time(s). Last edit at 12/05/2012 04:01PM by eugene_b.

Reply Quote

eugene_b

Re: php5-fpm randomly starts consuming a lot of CPU
December 05, 2012 05:36PM

Registered: 11 years ago
Posts: 3

Also just found that dmesg reports errors like this

[6483131.164331] php5-fpm[28687]: segfault at b6ec8ff4 ip b78c3c32 sp bff551f0 error 4 in ld-2.13.so[b78b5000+1c000]

Reply Quote

sjark

Re: php5-fpm randomly starts consuming a lot of CPU
December 06, 2012 04:40AM

Registered: 13 years ago
Posts: 17

Try to change the pool settings for php-fpm to
pm = static
pm.max_children = 100

or how meney children you need. php-fpm sems more stable under static

also check your apc.shm_size / apc.max_file_size in php.ini or where it is saved at.

Take a look at www.dotdeb.org they have a nice nginx with php-fpm.

Reply Quote

eugene_b

Re: php5-fpm randomly starts consuming a lot of CPU
December 07, 2012 05:32PM

Registered: 11 years ago
Posts: 3

I did try to play with these values including a set up like that without any luck unfortunately.

We've got a new Amazon EC2 micro instance just in case, to exclude possible hardware issues. Also I am using php-fastcgi now to exclude possible fpm bugs. Other differences are minor, I think the only thing that change is Ubuntu->Debian. The same problem still happens except that now server manages to slightly recover after the max_execution_time seconds (and then spikes again).

I tried playing with a separate test.php and I am not sure if it's the same issue but at least in top it looks the same. I created a test.php and included a bunch of libs that belong to our framework. The libs don't do anything except for defining classes or including other libs that define classes. I checked with APC and all of this gets successfully served by it. The I started pressuring test.php with 200 requests per second and after some time the same thing happened. Except that now I managed to get some errors saying "too many open files". It doesn't happen always though, sometimes it just starts timing out without outputting the error and a few php processes are stuck consuming all CPU. I played only a bit with it but I think there is a correlation here - by controlling the number of included libs or slightly varying requests/second rate, I can control when the CPU spike will happen. I increased the relevant OS variables but the issue is still there although it takes longer for it to happen (also note that I've set the limits to values N times larger than the total number of requests I do during tests).

s.file-max = 70000
...
* soft nofile 10000
* hard nofile 30000
...
worker_rlimit_nofile 10000;
...
(reloaded all the configs and made sure the new system vars actually took affect)

So the next best and only explanation I can come up with so far is that even though APC is supposed to pull files from memory, internally it is implemented in a way that still uses a file descriptor whenever PHP include-s are called. And either because it releases them with a delay or when at some unfortunate moment too many requests arrive at the same moment, system runs our of descriptors and newly arriving HTTP requests get quickly stacked into a huge queue. I'll try to test this somehow.

Reply Quote

php5-fpm randomly starts consuming a lot of CPU

Online Users