Monitoring Per-Process Memory Usage with Munin

Warning: This post is now pretty damn old, and refers to the situation before 5.3, when the FPM extension was a set of patches you applied yourself, and handling pools of PHP workers was a pain. It's now much, much better: use FPM and laugh all the way to the memory bank.

Also: The Munin plugin referred to is redundant: it's now included in the default plugin set as 'multips_memory'.

Something Emporium and a few associated websites are run off my VPS server from Rimuhosting. (I used to work for Rimuhosting: they're awesome.) Recently, I moved from running a traditional mod_php/mpm_prefork setup to a multi-threaded mod_fcgid/mpm_worker setup.

In a mod_fcgid PHP setup, a few persistant php-cgi workers are set up and then requests for dynamic content are farmed out to them. By stripping PHP out of the apache workers, you get a setup that uses far less memory to serve each connection, and you're free to use a multi-threaded MPM (PHP is not thread-safe, so you can't with mod_php).

OK, so, I've set this up, which is a process with spotty documentation. I've got it working nicely, and I've been able to reduce the stack's memory footprint considerably and increase its performance.

The snag so far is this:

This graph shows total memory usage on the server, catagorised by type, as we transitioned from mod_php to mod_fcgid. It's a graph provided by Debian's default install of Munin. This VPS has 224MB of memory, so it's something I have to watch fairly closely. You'll notice the sawtooth shape of the graph after the changeover. Here's a closeup of that:

This is pretty good evidence of some sort of memory leak. There's a process there that's allocating memory and can't (or won't) deallocate it. It's being reset at fairly regular intervals, either by monit or perhaps by some other worker handler: apache, mod_fcgid. Actually, it could be almost any process. Damn.

So, my next task was to identify the culprit. Which comes down to the following question: which processes are using memory in this sawtooth pattern? Sounds like a task for Munin, huh. But out of the box, Debian's Munin ~~doesn't~~ (didn't: try multips_memory from version 1.4.2) have a recipe for graphing the memory usage of individual processes.

Which lead me to write a Munin "plugin", multimemory. Based off multips, this plugin lets you specify a bunch of process names, and it'll graph the total memory usage of processes matching each one. It's available at GitHub if anyone wants it.

It gave me the following graph, which, twelve hours later, makes the culprit in the case of the leaking memory pretty clear.

Next steps: wait for 5.3's new garbage collector. Well, probably more likely I'll just jump ship to 5.3 sooner, stable version be damned. That, and reduce the number of requests each php-cgi gets to serve before it's replaced.

Apparently, this isn't PHP's fault:

PHP is glue. It is the glue used to build cool web applications by sticking dozens of 3rd-party libraries together and making it all appear as one coherent entity through an intuitive and easy to learn language interface. The flexibility and power of PHP relies on the stability and robustness of the underlying platform. It needs a working OS, a working web server and working 3rd-party libraries to glue together. When any of these stop working PHP needs ways to identify the problems and fix them quickly. When you make the underlying framework more complex by not having completely separate execution threads, completely separate memory segments and a strong sandbox for each request to play in, feet of clay are introduced into PHP's system.

Sure, I get it, you like having your own process to play around in. But dammit, PHP, why don't you clean up after yourself each request? Is this the experience of everyone who runs PHP with fcgid?

Comments

Nice Work!
But you forget a description of the plugin.sh!
:)

plugin.sh came with my copy of Munin (which is just the Debian package) - I didn't write it myself.

dominic@web:/$ cat /usr/share/munin/plugins/plugin.sh
# -*- shell -*-
# Support functions for shell munin plugins
#

clean_fieldname () {
# Clean up field name so it complies with munin requirements.
#
# usage: name="$(clean_fieldname "$item")"
#
echo "$@" | sed -e 's/^[^A-Za-z_]/_/' -e 's/[^A-Za-z0-9_]/_/g'
}

Thanks, I'm using this now!

One minor improvement, change "{print total}" to "{print total * 1000}" (or 1024?) on your ps/grep line. This will make the min/max values in the legend correct as far as displaying the right abbreviation (M, k, etc) - for example in your graph above it shows mysql using "17.54k" instead of what I imagine is really "17.54M"...

Thanks for the suggestion! I've done that in the latest version on github:
http://github.com/dominics/munin-plugins/commit/d03792c5c9df6c7f771f7847d8301d226c235671

Note that a normal UNIX process will not return free'ed memory back to the system. It will simply reuse it itself. That means that as a process runs, your PHP processes included, the amount of memory taken by that process is equal to the most memory any script needed to allocate, and it won't shrink back down until the process exits. So you will always see the memory usage graph either going up or staying flat. Hopefully in a stable system it will flatten out at some point. But people often mistake this upward trend for a memory leak. Not sure this is what is happening here, but it is something to keep in mind.

Yeah, I don't think that's what I'm seeing in this case.

I'm having trouble with this plugin. All processes report 'nan'. Under 'Information' it says things like "Processes matching this regular expression: /cgi-\<php-cgi\>/".

What's going wrong? Thanks.

What does your configuration line look like? It should be something like:

[multimemory]
env.names apache2 mysqld php

Those arguments determine the process names to collect memory stats for. The plugin needs this config before it'll work.

Hi,

cool plugin, and it's even documented! It is in munin trunk (will be in 1.4.2) with some modifications to make it more unix-generic: http://munin-monitoring.org/browser/trunk/plugins/node.d/multips_memory.in

Thanks for sharing the plugin, looks like it will come in handy for diagnosing some memory issues I'm dealing with myself.

As to your problem, was FCGI really that big a win? Deployment of php apps with apache mpm-prefork + mod_php is incredibly well understood, and it is the default and best tested approach for most opensource PHP apps. You've traded that for a bunch of troubleshooting. Judging from your graphs, once you get things smoothed out, you'll save, what, 20MB? I guess that's something on a VPS with 224MB of memory, but even then, its maybe a 25% increase in memory available for cache and buffers?

I took a different approach on my VPS. I set up nginx to serve static files and reverse proxy everything else to apache + mod_php. Its still nonstandard, but Apache + mod_php does what its good at, and nginx does what it is good at, and the interface between the two is simple. After the initial config, I've had to do very little fussing.

The config could be even simpler if I'd forgone serving the static files with nginx, which is not as important as it would seem, because apache hands off everything (static files, dynamically generated HTML, etc) to nginx, which buffers it, leaving Apache free to move on to a new request.

I think apache should provide an easy option of using a hybrid configuration. One of their threading or polling servers as a front end, handling static files, buffering dynamic requests, redirects, etc, and prefork for mod_php, etc.

As it turns out, it probably wasn't much of a win at all, but it's been useful to keep things going for a few months. Now I'm looking at just adding more memory to the server, as money is easier to spend than time in this case.

I've experimented with both setups.
I found:
- Fcgid memory with mpm worker is highly configurable and predictable, and thus scalable.

The problem with fastcgi is its way of forking large amounts of requests.
This was very unreliable:
once i jammed up the concurrency on requests, there was a fair amount of requests not being served.
Also i saw a large performance drop when i was adding a session cookie for my app (and thus increasing mysql queries etc) with my siege benchmark.

Also theres a problem with running an opcode cache like APC, which memory is not shared on processes, thus the APC cache will be seperated for each thread, and quickly fill up.
This seems to be the way for xcache and eaccelerator in fastcgi aswell..

I ended up reverting back to apache prefork and mod_php with Varnish as a front end cache. This means i am using a fair amount of RAM,
and when demand increases, my server will start swapping, and on a constricted linode with already added RAM, i will definetly add even more RAM in the future.

However having 100% of requests served is more important for me, and apache prefork can be tuned to scale.
Ofcourse when the sites get hammered, i won't be able to serve the requests, so i have to keep adding ram + changing Apache config, but its still more reliable and recourse friendly than Fastcgi in my opinion

Thanks, I'll try this on my VPS