Reigning In LVM pvmove Memory Leakage

PostsReigning In LVM pvmove Memory Leakage
If you use the Linux Logical Volume Manager, pvmove can be a wonderful utility.  It lets you move a logical volume (or many, in parallel) from one physical volume to another.  Engine Yard was recently attacked by raptors.  To help deal with this, we needed to move off of the disks before they exploded.  pvmove to the rescue.
It turns out that we hit a few bumps.  The biggest was that the pvmove command would slowly consume massive quantities of memory until a node would run out.  This was both vexing and confusing, as we were pretty sure that the pvmove command wasn’t really doing any of the actual moving work.
Some digging in the source code indicated that the pvmove command utilized some sort of generic polling daemon implemented within LVM.  A little more inspection showed that this daemon would scan the volume groups for an in-progress move, then would check on the progress of the move.  This was done on some interval (15 seconds by default).
It turns out that the functions that scan the volume groups don’t really release memory properly (at least not the way they are currently used).  In most LVM invocations, this is fine.  The data is only loaded once each invocation.  For pvmove, this caused a leak of memory every interval.
While some deep work on the source code could eventually fix this memory leak, we took the easy out.  We realized that you can just crank up the polling interval and it cuts the rate of leak to an acceptable level.  On our test server, we show the leak around of 120MB per hour at the default 15-second polling interval.  Increasing the polling interval to five minutes reduces the leak to less than 10MB per hour, at the expense of an average of 2.5 minutes of “wasted” time between pvmove invocations.  Not too shabby.
Changing “pvmove /dev/a /dev/b” to “pvmove -i 300 /dev/a /dev/b” makes a big difference.
Hopefully this can help out some other people.  Enjoy.

About Jayson Vantuyl

I live in San Francisco, California; have an awesome son, David; and make machines do (subjectively) interesting things. I'm generally an all around handy fellow.

3 comments

  • Reply Dan said: January 24, 2010 4:32 am

    This helps a lot - thanks. Our servers have 18 hard drives with many physical volumes, volume groups, and logical volumes. Plus, we run pvmove in a Xen dom0 with only 512 MB RAM. This combination caused a move of only 30 GB to fail miserably.

  • Reply Alexandre said: April 14, 2010 5:36 pm

    Soooooooo Helpful!!! Having TB of data to move I could not imagine to miss this tweak.

  • Reply DavidR said: February 28, 2011 11:16 am

    Thanks so much for posting this! I don't know if anyone is still following this thread, but my 6-disk LVM setup choked while I was using pvmove to remove an old 1 TB disk. (I'm running Ubuntu 10.10, with everything pretty much up to date.) I'm in the middle of another pvmove attempt now, using the "-i 300" option and am hopeful that will solve my problem. (I'm already beyond the percent moved where things got hung up previously.)

    By the way, for anyone who sees this after having a pvmove command look up your system, you may need to use "pvmove --abort" to clear a hidden logical volume that is probably left behind after your system freezes.

Leave a Comment

Your email address will not be published. Required fields are marked *