Archive for August, 2007
MySQL Forge and Worklog
Two good resources for DBA’s are the MySQL forge and the MySQL worklog. The forge is the place for user contributed code, snippets, and documentation. It is at http://forge.mysql.com. Another, less known, resource is the worklog. The worklog has a link at the forge or you can go directly there at http://forge.mysql.com/worklog/.
It shows what upcoming features are being worked on and in what version the features are planned for release. For example, WL#2387, is an entry entitled “Replication Master Filtering”. When this filter is enabled it will allow you to implement filtering on the master server instead of the slave. This will cut down on unnecessary network traffic. Currently, you can filter on the slave, say only database X, but the data still gets sent across the network. There is no version set for potential release of this feature.
I wasn’t aware of the worklog until the last couple of days. Someone mentioned it in a blog posting and I looked it up for myself. I am afraid I can’t figure out who it was that did the original posting, but if they want to contact me or post a comment..I will be happy to update this.
No commentsWebDevRadio Podcast
At my first Raleigh area MySQL users group meeting I attended in July, which I have blogged about previously, I met Michael Kimsal. He is the organizer of the group and does a fabulous job. For some reason he wanted to interview me for the WebDevRadio podcast that he has been doing for quite some time. I’m not sure why
So, last night, he posted the interview. I am listening to it as I type. How embarrassing!!! For those who choose to listen..yes..I am from the south (of the US). I talk –very– slowly. Please be patient with me!! In the podcast we discuss some of the things I do at work, MySQL clusters, the Querysniffer tool one of my co-workers developed at work and the MySQL magazine.
It is available at http://www.webdevradio.com
1 commentMySQL with SSD drives
Hey everyone..I want to get some feedback. I was wondering if anyone had experience with SSD (Solid State Disk) drives. They are basically flash memory mounted and designed to be used as a replacement for a typical hard drive. The prices are really falling on SSDs and, despite people who say otherwise, they are now as reliable, or more reliable, than typical hard drives. MTBF was rated at 2,000,000 hours for one drive and they all seem to have good warranties (I saw five years on one drive).
What makes me drool is not the throughput of a typical drive (which is about the same as a high-end drive) but the .1 ms random-access seek time. Yes, that is POINT one!! As in 1/10th of a millisecond!! I saw one that was advertised to have a seek time of .03 - .1 milli-seconds.
They are still very expensive. But, what if I replaced the hard drive holding my data files with one of these drives. Now my seek time for reading tables off disk is essentially zero. I wonder how that would affect performance. Ideally I would like three drives, one for the OS, one for the log files, and one for the data. I bet your I/O bus bandwith/speed would suddenly become your new bottleneck - not your hard drives themselves.
Size of the drive is an issue. Sixty-four GB size drives are now common and 128GB are becoming available. I have several databases that would not fit on the current drives, but still I think I could manage to split things up if needed. Once 128GB becomes the norm even this wouldn’t be a problem.
So, if anyone has had any experience with these I would love to hear from you. Benchmarks would be great!! Now if I could just get a half dozen to throw in my Coraid and test it out..
Replication Trick
I just wanted to write a quick post about something I did earlier today. For many of you this will be no trick, but for new DBA’s it is something you probably haven’t run across before.
So here is the problem. I have a master server that has a slave currently replicating from it. I need to add another slave to the mix for an upgrade we are performing. I can snapshot the data from the master using LVM snapshots and rsync it across, but even then you have to lock the tables for a few seconds. Not something you want to do on a production master if you can avoid it.
Instead I use the current slave to bring the new slave online. Here are the steps:
Assuming database1-a is current master, database1-b is current slave and database1-c is the going to be the new slave:
-
stop the slave thread from running on database1-b
stop slave
-
execute table lock on database1-b
flush tables with READ lock; -
assess free lvm space with the pvs command
-
create snapshot partition on database1-b
lvcreate -s -n snap-data -L 10G /dev/database-1b/data
-
show slave status on database1-b
SHOW slave status; -
release table lock
unlock tables;
-
restart slave thread on database1-b
start slave;
-
mount new snapshot
mount /dev/database1-a/snap-data /mnt
-
rsync from database1-b to database1-c
rsync -Wav --delete --progress /mnt/mysql/ database1-c:/data/mysql
-
get mysql up and running on database1-c
/etc/init.d/mysql start
-
configure database1-c to point to the new master as follows
>reset slave; >change master to master_log_file= ‘mysql-bin.160150′ , master_log_pos= “current position as of snapshot”, master_user= ‘replication-user’ , master_password= ‘master-password’ , master_host= ‘IP OF database1-a’;
All of this information comes from the show slave status command we issued earlier.
-
start up the slave on database1-c
start slave;
The important thing here is that we use the log position shown in the slave status command and the master_host parameter points to database1-a … NOT … database1-b which is where we got the data from.
Hope this is useful to someone.
3 commentsFeature Requests for Query Tool
Haven’t posted since late last month. Been busy. We have decided to expand the project around the query sniffer tool to make something a little more all-inclusive. One of my major goals is to have a toolset that I can use to grab queries on the fly from a production server (as querysniffer will) and then feed that it another tool that uses that data to fire them off at a test server to see how the server handles a production load. Kind of like Jeremy Zawodny’s Mybench program on steroids.
Another part to this toolkit is a query analyzer. The idea is this. Grab data with querysniffer and use this tool to analyze the data and report statistics for your production query data. The obvious stat is as follows:
- ranking the queries by number of times in data (showing that query XY executed 1,000 times in 100 MB of captured data, query XZ executed 900 times and query AA execute 765 times and so forth.)
That one is being implemented. I am curious as to what others think would be good stats to gather with the stats part of the package. For now at least, stay away from stats requiring the tracking of time. We are working on how to implement time stamping, but it may not even be possible. Feel free to comment on the posting with statistics features. If you want, you can email me at “kmurphy” at “icontact.com” with these request.
We are making significant progress on the tool bundle. I hope to release some type of beta next week. I actually tested the querybench program yesterday with data from a production server but it didn’t work. Releasing a beta next week isn’t a promise but we are working hard on it and I think there is a reasonable chance it will happen. I will post as soon as there is something for everyone to work with.