Diamond Notes

Just another WordPress weblog

Mysqlpdump Test

I should be sleeping. But I am playing :). I have fairly serious dump of a couple of servers that will probably occur late this week. I have been tweaking various things and testing various configurations to get the best dump and restore speeds. Then, in the last couple of days, I ran across this by Monty Taylor:

http://mysql-ha.com/2007/09/12/mysqlpdump-for-parallel-dumps/

which links to this site:

http://www.fr3nd.net/projects/mysqlpdump/

It is a python wrapper script around mysqldump that executes the dumps using multiple threads — from what I see it each thread dumps an individual table.
I executed some tests on a slave that we run backups and a small DB that is right at 900M in size. It was fairly eye opening:

1:37 — 917M mysqlpdump.py -u root -ppassword -v -d /var/backup/directory -i database
2:23 — 915M mysqldump database -u root -ppassword > /var/backup/directory/database.sql

So, on this small database, it made a significant % speed increase. This was with the default 5 threads. Next I increased the thread count to 10, 15 and then 20. The results are:
1:50 — mysqlpdump.py -u root -ppassword -v -t 10 -d /var/backup/directory -i database
1:40 — mysqlpdump.py -u root -ppassword -v -t 15 -d /var/backup/directory -i database
1:59 — mysqlpdump.py -u root -ppassword -v -t 20 -d /var/backup/directory -i database

While these aren’t controlled test I think it is clear that the mysqlpdump utility script is faster than the mysqldump command. Speed was best with a thread count of five. I don’t have an explanation for this other than the possibility that my server (a dual-xeon) is really just simulating a higher thread count. I don’t know. Not as much a hardware person as I might like. Even so, just with default settings it is a nice little speed-up.

As for the restore - Brian Aker says in this post: http://krow.livejournal.com/519655.html that there is an option now to mysqlimport to specify a number of threads..the reverse of this script. Or, as an alternative, I could use the mysqlprestore script developed by Monty Taylor after he looked at mysqlpdump. Not sure which I will use, but it seems that both will do the job.
Cool stuff. Hoping something gets directly integrated into mysqldump soon. Or, even better, an online backup that didn’t disrupt things :)

4 Comments so far

  1. Chris September 16th, 2007 10:47 pm

    Using too many threads will work against you because you’ll be saturating your disk i/o - that is, your disk(s) can only read/write so fast. By having more reading all over the place, it’s going to slow things down as the disk moves around looking for the data.

  2. bmurphy September 16th, 2007 10:58 pm

    The disk I/O issue is certainly a logical possibility. And, this backup was done to a NFS mounted partition. Not exactly awe-inspiring for speed I suppose. Thanks for the input!!

    Keith

  3. Xaprb September 26th, 2007 2:35 pm

    Chris is right about the disk, but it also depends on how much CPU you have. If dumping pegs a CPU and you have two CPUs, you won’t benefit from running more than two dumps in parallel. You can see the bottleneck by watching vmstat while the dumps are going.

    On one of my systems, plain ol’ mysqldump takes about 65% CPU on a dual-core machine. Looking at top I see one CPU is running around 50-60% and another about 60-70%. I am too lazy to verify, but I assume one is running mysqld and one is running mysqldump.

    I’m getting ready to release the parallel dump script I wrote at work. It’ll be part of MySQL Toolkit. It isn’t all that different from mysqlpdump, but it has some extra features and I designed it from the start for SELECT INTO OUTFILE, which is much faster when it comes time to reload. Plus there are some features Peter Z requested for “safe” dumps.

  4. bmurphy September 26th, 2007 2:55 pm

    I didn’t check the cpu usage. Should have. You could be right.

    thanks,

    Keith

Leave a reply