Testing mysqldump and mysqlimport
So we are doing a major project of migrating from version 4.1.22 to 5.0.45. I am testing various things in preparation for this migration. We have a large amount of data to dump and re-load so it is important that this is done in the most efficient way possible. I began doing some testing to see what was possible. I have heard that using a mysqldump command to dump everything in one file (and then importing with mysql -u < databases.sql) is slower than using mysqlimport but I wanted to test for myself. The results were not that surprising but I thought it might help others if I post the summary here.
I set things up on our test server. All database data was stored on an individual hard drive (sdb). The dump was written and read from our Coraid (which is still attached to the server from the previous tests). I did two sets of tests..one with the innodb buffer set to one gigabyte and one set of test with the innodb buffer set to two gigabytes.
Test Results
The first two series of test were run with the innodb buffer set to two gigs.
importing database3 with mysql -u root -p database3 < /data2/database3.sql
| start time | stop time | elapsed time |
| 18:03:00 | 18:27:19 | 24:19 |
| 18:27:34 | 18:51:51 | 24:17 |
| 18:52:03 | 19:16:20 | 24:17 |
| 19:16:33 | 19:40:47 | 24:14 |
| 19:40:59 | 20:05:25 | 24:24 |
| 20:05:37 | 20:29:56 | 24:19 |
| 20:30:09 | 20:54:29 | 24:20 |
| 20:54:41 | 21:18:57 | 24:22 |
| 21:19:10 | 21:43:31 | 24:21 |
| 21:43:43 | 22:07:56 | 24:21 |
importing database3 with mysqlimport -u root -p database3 /data2/database3/*.txt
| start time | stop time | elapsed time |
| 10:21:58 | 10:40:55 | 18:57 |
| 22:08:08 | 22:27:06 | 18:58 |
| 22:27:19 | 22:46:05 | 18:46 |
| 22:46:17 | 23:05:00 | 18:43 |
| 23:05:13 | 23:23:51 | 18:38 |
| 23:24:04 | 23:42:52 | 18:48 |
| 23:43:05 | 00:01:55 | 18:50 |
| 00:02:08 | 00:21:04 | 18:56 |
| 00:21:17 | 00:40:05 | 18:38 |
| 00:40:17 | 00:58:58 | 18:41 |
The next series of test were run with the innodb buffer set to one gig.
importing database3 with mysql -u root -p database3 < /data2/database3.sql
| start time | stop time | elapsed time |
| 04:35:09 | 04:58:28 | 23:19 |
| 04:58:37 | 05:22:01 | 23:24 |
| 05:22:11 | 05:45:55 | 23:44 |
| 05:46:05 | 06:09:42 | 23:37 |
| 06:09:52 | 06:33:31 | 23:39 |
| 06:33:40 | 06:57:22 | 23:42 |
| 06:57:32 | 07:21:17 | 23:45 |
| 07:21:26 | 07:45:07 | 23:41 |
| 07:45:16 | 08:08:56 | 23:40 |
| 08:09:06 | 08:32:43 | 23:37 |
importing database3 with mysqlimport -u root -p database3 /data2/database3/*.txt
| start time | stop time | elapsed time |
| 08:32:53 | 08:49:27 | 16:34 |
| 08:49:35 | 09:06:17 | 16:42 |
| 09:06:28 | 09:23:00 | 16:32 |
| 09:23:10 | 09:39:46 | 16:36 |
| 09:39:57 | 09:56:35 | 16:38 |
| 09:56:45 | 10:13:25 | 16:40 |
| 10:13:35 | 10:30:04 | 16:29 |
| 10:30:15 | 10:46:49 | 16:34 |
| 10:46:59 | 11:03:36 | 16:37 |
| 11:03:47 | 11:20:25 | 16:48 |
The results are interesting. It does prove that the mysqlimport works faster. Quite a bit faster actually. I thought that increasing the memory for the innodb buffer would speed things up also. This was not the case and I am not sure why. I didn’t change anything else. Running the top command didn’t show any apparent problems but I am guessing that a two gigabytes buffer was just to much for the setup.
5 Comments so far
Leave a reply
You should check out this posting from Brian Aker: http://krow.livejournal.com/519655.html
I’ve tested this on a ~1TB database running on FreeBSD, and it was insanely fast compared to singlethreaded loading. This was against 5.0 and 4.1 using the 5.1 mysqlimport binary. Just use -T with mysqldump.
One word of warning — mysqlimport might be multithreaded, but replication is not. I lagged up a bunch of slaves pretty badly when I didnt think of that.
Good luck!
-Peter
PS What kind of space savings did you see going to 5.0? For my data, it was about 43%.
How much memory did you have and did you use innodb_flush_method=O_DIRECT ?
There is OS caching on the top of Innodb caching which can affect things a lot.
Ie on 4GB 2GB of RAM will mean there is 2GB of Innodb cache and 2GB of OS cache which effectively duplicate each other while 1:3GB will give 3GB of effective OS cache.
Not to mention swapping and memory pressure for reading mysqlimport data file may affect things.
Here is what I found that is important to import data into innodb tables:
* large innodb log size (the largest the better, you can still reduce it afterwards)
* innodb_flush_log_at_trx_commit = 2 if you can sacrifice the security while the import is progressing. Revert to zero afterwards.
* load in primary key order (there is an option in mysqldump to generate a primary key ordered dump)
* SET UNIQUE_CHECKS=0 and SET FOREIGN_KEY_CHECKS=0 (I don’t know if there is an impact on mysqlimport)
* load several tables in parallel (this is easy in the mysqlimport case)
thanks for the responses. I am going to work on this some more with your suggestions. I will put up a new posting in a couple of days. I need to have the testing wrapped up by Friday so you can expect an update by then.
I am running tests right now with innodb_flush_method=O_DIRECT. It doesn’t seem to be making any difference. As for the difference in space question..I am only testing with one fairly small database. Under 4.1 it is 1468M and under 5.0 it is 1315M. So the difference is 10.4% if I calculated correctly. Definitely smaller.