Reigning In LVM pvmove Memory Leakage
If you use the Linux Logical Volume Manager, pvmove can be a wonderful utility. It lets you move a logical volume (or many, in parallel) from one physical volume to another. Engine Yard was recently attacked by raptors. To help deal with this, we needed to move off of the disks before they exploded. pvmove to the rescue.
It turns out that we hit a few bumps. The biggest was that the pvmove command would slowly consume massive quantities of memory until a node would run out. This was both vexing and confusing, as we were pretty sure that the pvmove command wasn't really doing any of the actual moving work.
Some digging in the source code indicated that the pvmove command utilized some sort of generic polling daemon implemented within LVM. A little more inspection showed that this daemon would scan the volume groups for an in-progress move, then would check on the progress of the move. This was done on some interval (15 seconds by default).
It turns out that the functions that scan the volume groups don't really release memory properly (at least not the way they are currently used). In most LVM invocations, this is fine. The data is only loaded once each invocation. For pvmove, this caused a leak of memory every interval.
While some deep work on the source code could eventually fix this memory leak, we took the easy out. We realized that you can just crank up the polling interval and it cuts the rate of leak to an acceptable level. On our test server, we show the leak around of 120MB per hour at the default 15-second polling interval. Increasing the polling interval to five minutes reduces the leak to less than 10MB per hour, at the expense of an average of 2.5 minutes of "wasted" time between pvmove invocations. Not too shabby.
Changing "pvmove /dev/a /dev/b" to "pvmove -i 300 /dev/a /dev/b" makes a big difference.
Hopefully this can help out some other people. Enjoy.
Labels: hackhackhack, lvm, pvmove, raptors

