Diamond Notes

Just another WordPress weblog

Backups

I have been either a DBA or a system administrator since 1997.  That is ten years now.  A long time.  Not to poke fun at them, but the majority of people I currently work with were between the age of seven and sixteen when I first started getting paid to work on a database.  After that length of time some things should just be in your core - part of your DNA.  Backups are one of those things.  You KNOW you back up anything of importance.  You KNOW that you back up regularly and you KNOW that you restore periodically to make sure that those backups are working.  Right?

In 10 years of work I recall one backup disaster.  In January of 1998 I started a dial-up ISP with one server running Windows NT.  Please stop laughing.  Windows was the only thing at the time that I had enough experience to do this with.  So, I am diligently backing up my one server on a tiny tape drive (4mm maybe?).  I made a full backup once a week, incremental backups every night, and even kept copies off site.  Standard stuff.  Because I only had one server I did not/could not perform a restore.

One night I applied a service pack to the server and it had to be booted when done.  I booted the server around 3 am.  It blue-screened.  Not only did this server host the websites of my customer, the 3com rack of modems used RAS in Windows to authenticate users before they were allowed online access.  No one could get online.  So, after maybe an hour and half of trying to get the server back up I decided to do a bare-metal restore.  After the initial install of the OS (and then the service pack .. which worked the second time) I restored the directories off the tape.  Everything looked good.  Then I realized my user accounts didn’t get restored!!  I didn’t know that the user accounts were never being backed up.  Because I didn’t do a restore I never had a chance to discover this was the case.

I had around 300 or 350 customers at the time and cardboard box full of file folders with a single sheet of paper in each folder with the users information.  So at 6:00 AM that morning after being up for around 23 hours I began putting in those user accounts from the sheets in the folders.  By 8:00 AM the phone is ringing off the hook with people who can’t get online.  Many of the user information sheets I had didn’t have the right password.  I was so tired by that point that I am sure I mis-typed many of the passwords.

A shortened version of the story is that it took three days to get the majority of the problems sorted out.  I know I lost some customers from this experience.   Because of this experience I have been rather paranoid about insuring  backups are performed and working as they should.

Now fast forward nine years.  Not too long ago I set up a development cluster of two servers - each running a SQL node and a data node.  As I have blogged about on multiple occasions, I am not that experienced with MySQL Cluster.  This is new territory for me.  Even so, that is no excuse for not doing fundamental things like insuring that backups are being done.  I have no excuse, but I didn’t set up backups on this cluster.   About a week ago I worked on the cluster and changed some of the configuration parameters.  After you change the configuration of the cluster you have to restart each node so that it reads the new configuration.  On data nodes for configuration changes to take effect you usually have to start with the –initial parameter.  This actually cleans out the data logs.  If all (in this case two) data nodes are shut down at the same time and restarted with the –initial parameter you know what you get when log in to the SQL node?  A list of empty databases.

It was at that point that I realized that not only did I not have regular backups running, I didn’t even dump the database before I started working on it.  I HAD NO BACKUPS.

I will spare the readers a lot of details about what went on after that.  Just remember that not only do backups affect you they affect everyone on your team.

Hope this post makes you pause and think!!

2 Comments so far

  1. Geert Vanderkelen October 14th, 2007 3:04 am

    Ah, the pain of backups.. ‘great’ story there! I can tell a few of my own, but it’s Sunday and I want to keep happy thoughts :)
    Just a node of what you did restarting the data nodes.

    This is a gotcha a few people already ran into, the EVIL –initial option. I can’t say it enough: do NEVER use it. There are (rare) cases were you need to clean up the filesystems of the Data Nodes, but do it manually and if you have disk space: move the file system directory away (remove it later when all done, and all working fine).

    Never, ever start your data nodes with the –initial option. I have seen folks just going in the bash history and then start it with the previous command.. Disasters occur then. –initial is not needed for first start at all. I’m on a crusade changing the name of the option :)
    Good we have hot backups in MySQL Cluster, but there is indeed nothing wrong with doing a mysqldump of the schema too (data is going to be inconsistent anyway). Sometimes the structure of a schema is more important than the data.

  2. Glennie October 14th, 2007 3:25 am

    I just had this nasty experience of not having a backup and blogged about it. It’s strange to admit that pepole we don’t know had to face the same problems at the same time!

Leave a reply