Maintenance Plan
Whenever you need to perform maintenance it is critical that you take the time before to write down what is going to happen. Step by detailed step. Have it reviewed, critiqued and torn apart. Then, when it is time for maintenance, you have an easily executed plan with much less change for error.
Let me elaborate.
I know people who just fly by the seat of their pants. Impressive maybe, but really irresponsible. It is way too easy to make mistakes.
Take the time to craft a skeleton plan to build upon. What I mean by that is a written plan that only has the things that happen every time you do maintenance. Things such as before maint you need to tell the proper people that maint is happening, turn off monitoring, make changes to your backup plans if necessary — things like that. Post maint you will need to update notices about maintenance being performed. These can be added to your skeleton plan because they are going to happen every time. A checklist is very helpful. It insures that things don’t get missed.
Take your time. Do it right!!
3 Comments so far
Leave a reply
Good point! Also see my blog post on checklists:
http://karwin.blogspot.com/2007/12/how-to-save-100-million.html
That is awesome. Wished I would have seen this before the post. Not sure how I missed it.
Thanks!!!
It’s especially important in so many ways. You can have someone review your checklist so you get a “second set of eyes”. You also get a bit of CYA as well — if something happens that was unexpected, having it be unexpected to more than one party is very useful.
I tend to write things out step by step so I could hand it off to anyone with some knowledge….for instance, part of a plan to propagate a slave to a master might be:
9) turn off mysql on the master
10) RESET the slave settings on the new master
11) RESET the master settings on the new master
12) CHANGE MASTER TO on all the slaves to point them to the new master
and then I’ll have rollback plans, like “until step 9, rollback consists of turning on the master mysql instance. For steps 10-12, rollback consists of that plus making sure all slaves read from the old master”.
Also, putting time frames on things are helpful, ie, “if backing up takes longer than 20 minutes, abort/notify customer service the downtime will be longer than you thought/get a cup of coffee.”