Diamond Notes

Just another WordPress weblog

100 million rows — oh my!!

What do you do when a developer walks up and says I need this file of approximately 100 million rows imported into this table?

I don’t know if you have ever had the pleasure of doing a file import on something like that but it is glacially slow when done via one large file.  To be honest, I understand Innodb well enough to know why it is slow, but not well enough to explain why so I have to leave it at that.
That being said, I decided this time around to break things up.  On of the characteristics of Innodb is that imports happen much faster when they are smaller transactions.  So, I used the Unix/Linux ’split’ command to convert the file with 100 million rows into 100 files of one million rows.  Next I wrote a bash script to actually pull in each file.  Here it is:

#!/bin/bash
for i in $( ls -1 /data/tmp ); do
mysql -e “load data infile ‘/data/tmp/$i’ into table db_name.table.name;”
done

You very likely will need to change  the load data statement somewhat.  And of course you will want to change the directory that the script is reading the files from (this is the directory where all the split files are stored .. and don’t put any other files in there or it will try and read them also).

I know this is “simple” or “trivial” for many but maybe it is useful for someone.

As a rough benchmark it looked like after I split things up the files were being pulled in at a rate of just under 20 seconds.  So, 20 seconds x 100 = 2000 seconds (or 33 minutes).  The import of the single file never really finished but it ran for hours.

2 Comments so far

  1. Morgan Tocker January 30th, 2008 12:45 pm

    FWIW, this also only works on newer MySQL 5.0 releases (5.0.26+):
    ;)

    In theory, you could probably reload the files in parallel for an even greater speed improvement.

  2. J Jorgenson January 30th, 2008 2:52 pm

    Another performance trick is to disable the Indexes (keys) on the table before the massive data load:
    alter table xyz disable keys;
    load data infile ‘/data/tmp/xyz.txt’ into table db_name.table.name
    alter table xyz enable keys;

    The result is the keys are rebuilt in 1 step and the overall loading process is faster than loading the data with keys enabled.

    If you are attempting to transfer MyISAM tables between physical servers, use FTP of the raw data files: .MYD, .MYI, .frm.

    – JJ –

Leave a reply