Tip of the Day — Speeding up Imports by Sorting

When doing large imports into a table it can be very helpful to do different optimizations to speed up the import.  Sometimes this can make a dramatic difference.  For example, just recently I imported 106 million rows into a MyISAM table.  Initially I just took the data and imported it into the server.  This took about 60 hours.  Then, because a mistake was made in the setup of the data, we had to re-import.  This time the data was sorted before importing.  While this operation took several hours, the import time dropped to one hour.  Sorting the data might not always be practical, but in this case it worked — and worked dramatically.

There are other things to do that can speed up data imports.  Sounds like good topics for future tips!!

8 comments ↓

#1 Brian Ivins on 03.03.08 at 10:55 am

Out of curiosity, how did you do the import into the database? Did you use mysqlimport/load data infile?
Were keys enabled on the table while you did the import?

#2 RST on 03.03.08 at 11:19 am

> Sorting the data might not always be practical,
> but in this case it worked — and worked
> dramatically

Why? I guess it has something to do with the indexes, but I’d like to be sure.

#3 William on 03.03.08 at 11:27 am

Interesting. I would expect that behavior if you were importing to a innodb table, as it orders the data by primary key. Does MyIsam do the same thing? Is that why its faster?

#4 admin on 03.03.08 at 1:06 pm

So the table consisted of two columns. One was a hash of something (not important). This was what was sorted. This hash column became the primary key in the table. No disabling of keys was done on the import, just sorting the data before import.

William …. not exactly sure, but I suspect that is case.

#5 Justin Swanhart on 03.03.08 at 4:42 pm

What storage engine are you using?

If you are using MyISAM, this optimization should have about the same effect as disabling keys on the table, importing and the re-enabling keys.

If this is InnoDB, then sorting the import data can make a tremendous difference, because InnoDB will have to do a lot less work splitting pages, etc.. The disable/enable key trick doesn’t really work with InnoDB, so sorted imports make a lot of sense with this engine. I’d be curious how this affects performance on Solid/Falcon/Primebase/etc…

#6 admin on 03.03.08 at 5:20 pm

It was MyISAM

#7 William on 03.03.08 at 5:53 pm

I think I understand now. The Keys weren’t turned off. So with each disordered insert it had to seek to put the entry in the index. I’m pretty sure that the actual MyIsam table is not in primary key order, so the performance penalty for importing a unsorted list into innodb would have been higher.

#8 Log Buffer #87: a Carnival of the Vanities for DBAs on 03.07.08 at 10:48 am

[…] Diamond Notes, a tip on speeding up imports by sorting, coming from a 60-hour, 106 million row MyISAM […]

Leave a Comment