Tricks with mysqldump

More than once I have wished that mysqldump was never created. In many backup scenarios it is better to use other tools such as xtrabackup. However, there is still times when I end up using mysqldump. For one thing it is included with the mysql packages. It's "always there". Secondly, it sometimes is really a good fit for the tasks. And occassionally, with a bit of work, you can make it do neat tricks. 

I'm going to cover a few of the tricks I have used over the years. Hopefully you will find them helpful. In some cases one of these might even save your bacon.

Trick One:

One of the annoying issues with mysqldump is its lack of flexibility. It's very easy to dump all databases (--all-databases option), or one or more databases (--databases). Even dumping a single table isn't too difficult. But what about all the tables that start with "a"? You could specify each table on the command line, but I have ADD and I can't focus for more than a couple of tables :). So, here is one trick to emulate regex patterns to an extent....

mysql --skip-column-name information_schema -e "SELECT table_name FROM TABLES WHERE table_schema='schema_name' and table_name LIKE 'a%'" > tables.txt 

Playing around with that a bit and you can probably get the results you want. There is even a REGEXP option that can give you some nice results. If you need to make multiple selects you can use the ">>" output option instead of ">" so that the results are appended to the end of the text file instead of overwriting what is currently there.

Once you have your list of tables you can then do the actual dump:

mysqldump schema_name `cat tables.txt` > tables.sql

Trick Two:

With mysqldump there is no option to encrypt backups. Sometimes a client is required, due to regulations, to have backups encrypted. Sometimes they just want the backup encrypted.  As with our previous tip, combining bash scripting and mysqldump makes this possible. With the method shown, the backup is never sitting on disk unencrytped.

First you need GPG (GnuPrivacyGuard) installed and configured. This is how the encryption is done. You need a "public key" generated for the server where you are making the backup. 

Step One - GPG Setup

Setting up GPG is fairly simple. You just run it with the --gen-key option:

[root]# gpg --gen-key
gpg (GnuPG) 1.4.5; Copyright (C) 2006 Free Software Foundation, Inc.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions. See the file COPYING for details.
Please select what kind of key you want:
   (1) DSA and Elgamal (default)
   (2) DSA (sign only)
   (5) RSA (sign only)
Your selection? 

I would go with DSA and Elgamal. Once you choose your encryption method, then it is going to ask a series of questions including keysize, how long the key will last until expiring, name, address and comment. I would use 2048 for the keysize and tell it to never expire (0).Please understand you should not use these values just because I say that I use them. You use what is best for you.

The last question is for you to input a passphrase. You have two choices, either leave it empty for no passphrase or generate a password and then input it using a file stored somewhere and specified from the command line. Since keeping the passphrase around in a textfile isn't really safe the only recommendation I have is to not use a passphrase. That isn't necessarilly safe either, but I am not aware of a better option.

Step Two - Backup

Now that you generated your key it's time to run the backup:

# mysqldump database_name | gpg --encrypt -r 'user_id' | gzip > dump.sql.gz

This is a three step process -- the mysqldump, piped to the gpg process, piped to gzip for compression. Because the results are being piped everything stays in memory (and off disk) until it lands in the compressed file.

Of course the "--encrypt" option is used to encrypt the data. The "-r" option is used to specify the recipient of the encryted file. This is the backup user in this case.

When decrypting the backup you use the --decrypt option similar to this:

gpg -r user_id --output dump.sql --decrypt encrypted-file.gpg

Trick Three:

Within the last few months we had an issue arise at a client where weekly backups of prod are being made and then restorations to dev were happing. We were trying to trim down the time this cycle was taking. These backups include what amounts to archival data that were not being used in dev. This amounted to something like 100 GB of data spread across 40 tables or so. I really don't remember whose idea it was (my colleagues or mine?) but a colleague implemented a bit of sed magic that would scan the dump file and replace the innodb table type for these archival tables with the blackhole table type. The inserts of the archival tables still happen of course, but you are essentially sending these inserts into /dev/null. They happen MUCH faster than inserts of an innodb table. Massive time savings happened. Only one line of actual code added to the backup script. 

Here is the cleaned up code:

echo "Starting Blackhole replacement: `date`"
sed -i -e '/KEY `tablename_productid`/{n;s/InnoDB/Blackhole/;}' /backup_dir/backup.sql
echo "Singularity created: `date`"

We had to be careful. Most tables in this database where InnoDB. We didn't want to convert all of the table -- just the archival tables. What we did was scan for a certain index ("tablename_productid") that each of the archival tables contained and was the last index on the archival table. The next line was the table storage engine definition so then we replaced InnoDB with Blackhole.  The sed tool is very powerful and can serve you well if you take the time learn its use.

Links:

http://dev.mysql.com/doc/refman/5.6/en/pattern-matching.html

http://dev.mysql.com/doc/refman/5.6/en/mysqldump.html

echo "Starting Blackhole replacement: `date`"echo "Singularity created: `date`"

Comments

I would have added an other trick: how to dump only certains columns of a table. For example, let's say you want to transfer data from log tables accross several db servers to one unique table on an archive server. Now, all tables have the same name and structure but you have an annoying auto-incremented ID that will probably create conflicts when you insert all your dumps in the unique table. Given a simple table with columns ID,DATE,EVENT, you can write something like this:

mysqldump DATABASE TABLE --where='0=1 UNION SELECT NULL, DATE, EVENT FROM TABLE'

Basically you do an SQL injection in that convenient --where option. Add some extra options to mysqldump and some 'pipe' magic and you can do the transfer in one command:

mysqldump --skip-opt --compact -cte -h LOG_SERVER DATABASE TABLE --where='0=1 UNION SELECT NULL, DATE, EVENT FROM TABLE' | mysql -h CENTRAL_SERVER DATABASE

I really wish MySQL SELECT .. INTO OUTFILE supported writing into a named pipe so SQL injection like the last comment wasn't necessary to get data from one database into another conveniently. There is a feature request open on that since 4.1 I think :)

You could maybe use a FEDERATED table and insert into it with INSERT .. SELECT but this will perform poorly given that each row will be sent as an individual insert statement.

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.