Archive for the 'Clustering' Category
Calling on Sales
I had an interesting experience two days ago. I actually had a sales person talk me out of using a rather expensive product. Really. And yes, this does relate to MySQL!!
We had been evaluating using the MySQL Cluster product here at work for an application we are developing. Since I have never used a cluster in production before we looked into support options as a failsafe if we run into problems. Turns out MySQL AB is the only option out there for real in-depth support when it comes to the clustering product.
So I called MySQL to see what was it was going to cost and such. Let me just say up-front that I do not like talking to sales people. In today’s marketing-driven world it is almost always an unpleasant experience of someone trying to sell you something that you don’t necessarily need.
I won’t blog the names of the two people I talked too. Mostly because I don’t remember. Sorry guys :(. I was connected to the actual salesperson first. We spent about five minutes talking about what my company was looking for and then he conferenced me in with a sales engineer. This is where it got interesting.
The sales engineer and I talked for maybe ten to fifteen minutes. During this time he asked me questions like:
- what is our current application like?
- total amount of data expected in the cluster down the road?
- why are we looking at a cluster solution?
After listening to my thoughts on our application and what we are trying to do he made the suggestion that a cluster might not really fit the project. There are two basic guidelines that he had that we did not really fit within:
- the dataset should be at most 80gb
- our queries should be fairly simple — not a lot of joins
While our dataset might be less than 80gb for quite some time I fully expect it to exceed that within a year to a year and a half. And unfortunately, due to the nature of the application, our queries will never be that simple. It looks like we are going to go with a more conventional solution.
MySQL AB does have a program were an engineer visits you on-site and very closely evaluates the needs of your company - running load tests and such on a cluster. While this certainly isn’t cheap it is a great option for companies that just aren’t sure if clustering is something for them.
I will try and write a wrap-up posting on clustering to hopefully help other people evaluate if it is appropriate for them. While I enjoyed working on our test/devel cluster my life would have been a little easier if I had realized some of this up front!
6 commentsCluster Tests - Part Two
A few days ago I wrote about my testing of clustering. I wasn’t able to finish writing about all the tests I had already run and I had a few left to run. As of tonight I have finished up the testing. There is one that I will need to go back and re-visit but I am overall very happy with how things have gone:
- running ndbd –initial on one node
- running ndbd –initial on both data nodes at start-up
As I have documented I have had some trouble starting up ndb nodes. It has been recommended by multiple people that you should almost never run ndbd –initial. I am really starting to have that point of view. Bascially the outcome of these tests were that you ended up with no data. There are very few, if any, configuration changes to the cluster that require running ndbd –initial.
- making configuration changes
Simple enough. Make your changes, shutdown and restart the management node, followed by the ndb nodes and then the sql nodes. (Almost) never had any trouble with it — see test on management node.
- rolling upgrade from 5.1.20 to 5.1.22
The only trouble I had here was with file paths. The idea is that you upgrade or downgrade your cluster from one version to another. I upgraded from 5.1.20 to 5.1.22. As is normal you start with the management node, shut it down, upgrade the binaries, and restart it. Running ’show’ on the management client should then show the new version on the management node. Rinse and repeat with the data nodes and then the sql nodes. With my setups I use the binaries downloaded from dev.mysql.com so I download the tar.gz file to /usr/local/ and then unzip/untar the binaries and then softlink the directory to /usr/local/mysql. This causes some difficulties when doing the upgrade. Even so, it is “do-able”.
- convert ndbcluster - memory - table to disk-based table
This worked. I am not going to take the time to outline the exact procedure here (maybe another blog posting some time :)). The only thing I am not sure of is if the file you create is auto-expanding. I don’t think it is but don’t know this for certain.
- add secondary management node
I already wrote about this. It didn’t go well. Not at all. Could be easier and is definitely something I need to re-do. We need a secondary management node for backup.
- backups and restores
I just completed this tonight. Went well. I did a mysqldump of the cluster and then a online backup of the cluster. I restored the cluster from the online backup and then verified it was correct by restoring the mysqldump on another test server and comparing the two. All was well :) I will probably blog about this procedure soon as even though it is documented places it isn’t necessarily clear or in a “tutorial” fashion.
1 commentFun with Running a Cluster on Two Servers
In preparation for rolling out a production cluster I had to set up a development cluster. I set things up on two servers so that a SQL node and a ndb node ran on each server. What fun. Right.
To start of with I set the two servers up as two virtual machine images. My initial problem came because I was running both of the VMs on the same physical server. Heartbeats were being missed and nodes were dropping off because both servers would try and write to the same physical drive at the same time. This was causing latency issues/dropped heartbeats and then dropped nodes.
It took probably a day and a half to figure this out. We then moved one of the VMs to another physical server and this resolved the latency issue.
I thought I was home free. Should have known better. A couple of days later one of the developer asked if I was having a problem with the development cluster. I sat down and start looking around. The SQL node on one of the servers was not running. I looked in the .err file (and if you don’t have it set in your configuration files…shame on you). Hmmm… a nice little dump of the mysqld daemon. It seg faulted. Here is the information from a later dump (every dump looked very much the same):
071018 16:38:14 - mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=33
max_threads=60
threads_connected=29
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 147493 K
bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.
thd: 0×174f760
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
Cannot determine thread, fp=0xb, backtrace may not be correct.
Bogus stack limit or frame pointer, fp=0xb, stack_bottom=0×40bf0000, thread_stack=131072, aborting backtrace.
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort…
thd->query at 0×1759c50 = REPLACE INTO file (fid, dmid, dkey, length, classid, devcount) VALUES (’424939′,’4′,’Post_416_Content’,'15′,’0′,0)
thd->thread_id=27
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
071018 16:38:14 mysqld_safe Number of processes running now: 0
071018 16:38:14 mysqld_safe mysqld restarted
071018 16:38:14 [Note] Plugin ‘InnoDB’ disabled by command line option
071018 16:38:17 [Note] Starting MySQL Cluster Binlog Thread
071018 16:38:18 [Note] Event Scheduler: Loaded 0 events
071018 16:38:18 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: ‘5.1.22-rc’ socket: ‘/var/run/mysqld/mysqld.sock’ port: 3306 MySQL Community Server (GPL)
The statement in the variable dump (REPLACE INTO) was in every dump. Still really don’t know why that is the case, but there is no syntax error with the statement. The statement itself is actually code being executed as part of the Mogile Filesystem.
The system was dumping a mysqld daemon about three times in a twenty-four hour period. Every time the dump looked almost identical to what I listed above.
To make a long story a little shorter — because of the issues I had with latencies and nodes shutting down on the VM I decided to move the cluster from the two VMs to two physical computers. Memory was identical (one gig for each) between the VMs and the physical servers. So, off we go. I honestly was very hopeful at this point that my problems were over.
It didn’t take long and the mysqld daemon seg faults again. Wow, this is getting frustrating. I was monitoring around the clock the two servers. Time for some (more) configuration changes. I had configured the ndbd daemon so that it was locked into memory. This is a good thing and something I HIGHLY recommend. The ‘top’ command showed that it was steadily holding about 80% of the total memory. I lowered that down to 71.3% and added some configuration parameters to the my.cnf to try and keep it smaller. I restarted the mysql daemon yesterday at one p.m. so it has been running for thirty-four hours. Everything looks good. I have been running the developer unit tests repeatedly including the one test that exercises the mogile filesystem. I am starting to breath a little easier that things might be going well this time. Total time spent on this one problem — a week. What a waste. Now I was doing other things too but this was consuming a major portion of my days (and nights).
Others might argue with this, but I would never put the SQL nodes on the same servers as the ndbd nodes for production. Some say you can run multiple ndbd nodes on the same server and I am more comfortable with that since I can lock the ndbd daemon into memory and know its not going to change (my ndbd nodes on those two servers have been at exactly 71.3% since I started them up. If I had servers for the ndbd nodes that had 16+ gigs of RAM I might start allocating 4 gigs of RAM to a ndbd daemon with 3+ daemons per node. My understanding is that this helps keep the transactional logs for the nodes under control. When you do a ndbd node restart it takes less time for a node to get up and running because of the smaller files to read. I might be mistaken and its too late for me to look it up :) Anyone got other reasons or maybe (if I am right) someone can elaborate.
4 commentsAdventures with Clusters
Ah, the fun. I haven’t been writing much lately. Been spending a great deal of time wrestling with clusters. The latest event started last night around 9:30 pm and ended at 6:15 this morning.
I began by performing a rolling upgrade on our new products beta testing / production cluster taking it from 5.1.20 to 5.1.22 (rc). That went off without a hitch. That was the third or fourth time I have done a rolling upgrade and I haven’t had a problem with it yet. It’s actually kind of cool to do a ’show’ from the management client and see the new version and the old version running at the same time.
In light of my previous posts on backups you will be happy to know that I did a backup of the data before I began this little process. About 15 minutes until 3:00 a.m. I began to add a second management node. I thought this was a fairly well documented process. You add the information about the second node to the config.ini file and copy the new config.ini to the second management node. In addition you have to add information to the my.cnf files of each sql/data nodes.
Pretty straightforward. Or so I thought. I was putting the backup management node on a lightly used server. I updated the primary management node’s config.ini and copied it over the second node. Then, from the management client, I stopped the management node. I then restarted it from the shell and then started the management client back up and did a ’show’.
Wait, this doesn’t look right. The IPs didn’t look right. The data and sql nodes were not even showing as running. Kill the managment node, check the other daemons (running), checked the config.ini file (correct) and restarted the management node. Not looking right again. Hmmmm…fine. I manually killed everything, updated the config.ini so it only shows primary management node and then restart the management node. Now the management client ’show’ command looks correct. Alright, fine, time to start the data nodes.
I have always started the cluster up in this manner: management node, data nodes (one by one..waiting until each starts up before working on the next), and then sql nodes (once again, one by one). As I said, the management node was now up and running correctly. Time to start the first data node. The data node didn’t want to start. Management node show the data node starting, but it never starts. I try various things, google things, etc. Nothing. Then I try starting ndbd with the –initial parameter. Still hanging. By this time I had looked at the log files and did a status command from the management client. It showed that the data node was hanging in phase 0.
By now I am getting nervous. The cluster has to be back up before 9:00 am or our testing won’t be able to continue. I did find one posting very similar to my problem, but there was no resolution. Actually, the resolution of one of the repliers was “I had a similar problem and ended up doing a re-install of both mysql and the OS. Hmmmm…. not an option for me.
To make a long and painful story a bit shorter, at 6.00 am while sitting and staring at the manaagment console showing a “starting” data node (ndbd –initial) that wasn’t really starting I decided to start up the second data node at the same time. I didn’t know what this could do, but I figured at this point it couldn’t be any worse than what I was doing. Suddenly the phase changes on the starting data nodes and all is well.
It never crossed my mind do do this before. I have been told for quite some time you must start the management node, then each date node in turn.
Here is what the management log file shows:
2007-10-17 05:56:05 [MgmSrvr] INFO — Node 3: Initial start, waiting for 0000000000000010 to connect, no des [ all: 0000000000000018 connected: 0000000000000008 no-wait: 0000000000000000 ]
2007-10-17 05:56:08 [MgmSrvr] INFO — Node 3: Initial start, waiting for 0000000000000010 to connect, no des [ all: 0000000000000018 connected: 0000000000000008 no-wait: 0000000000000000 ]
2007-10-17 05:56:11 [MgmSrvr] INFO — Node 3: Initial start, waiting for 0000000000000010 to connect, no des [ all: 0000000000000018 connected: 0000000000000008 no-wait: 0000000000000000 ]
2007-10-17 05:56:14 [MgmSrvr] INFO — Node 3: Initial start, waiting for 0000000000000010 to connect, no des [ all: 0000000000000018 connected: 0000000000000008 no-wait: 0000000000000000 ]
2007-10-17 05:56:17 [MgmSrvr] INFO — Node 3: Initial start, waiting for 0000000000000010 to connect, no des [ all: 0000000000000018 connected: 0000000000000008 no-wait: 0000000000000000 ]
This is what actually tipped me off in the end. For the longest time I thought the data node was not communicating with the management node. Turns out it was looking for the second data node.
The second management node log file showed this:
NDB Cluster Management Server. Version 5.1.22 (rc)
Id: 1, Command port: 1186
setEventReportingLevelImpl: failed 2!
setEventReportingLevelImpl: failed 2!
Googling this error gave me this page. This was very -very- similar to my exact experience. Doesn’t look like these guys ever came up with a solution. I don’t know that I found a “solution”, but I did get the cluster back up without re-installing cluster or the OS. I did have to re-install the DB from the dump I took. Whew..glad I did that backup.
Anyone seen this behavour before? And what does it really mean by starting both nodes at the same time. I really thought this was no-no. Are there “rules” about this startup?
5 commentsMySQL Cluster Testing - Part One
So today I really started testing MySQL Cluster. I have been using it for around six to eight weeks. However, we are very soon rolling out an application that will use a cluster in the back so I needed to understand what was really going on.
I think I have said before that the MySQL cluster product is very different than the “standard” MySQL server. It is important that if you are evaluating the MySQL Cluster for your use that you understand that. Take time to understand what is going on. Don’t be easily frustrated. And don’t expect it to be some silver bullet that will solve your database problems.
Some of these test are rather simplistic. However, one of the things I have found is that the MySQL Clustering product is moving very very fast. And much of the documentation is dated. I have the MySQL Clustering book and it is very well written, but it was written either before MySQL 5.1 was released or just shortly after. They do not cover on-disk tables or any of the other new features of 5.1. So, even though I thought I knew what the outcome of most of these test would be, it was worthwhile to perform them.
The one place that I have found that the information is consistently up to date is at MySQL AB’s website. That shouldn’t be a surprise, but the documentation on 5.1 should be required reading . This overall lack of documentation is one of the big reasons that I am taking the time to write about my clustering experience.
On to the tests!!
I have twelve tests currently planned. There is always the possibility that I might add some more over the next few days. Eight of the test I performed today. I will write about the first five and then do another posting or two over the next week with the other tests.
One note about the configuration. I have two servers set up as a cluster. Each server runs a SQL node and data node. One server also runs the management node. Not an ideal setup, but the best I could do. I began testing running 5.1.20 and one of the test is a rolling upgrade to 5.1.22 (rc).
Test one: shut of power on one server.
This simple test required that I shut of the power to the server NOT running the management node. Then, I verified that the cluster was still up and operational. Finally, I booted the server, started the data node and then the SQL node. Afterwards I verified once again that the cluster was up and operating as expected.
Everything went fine with cluster availability throughout and no loss of data. The reason I didn’t bring down the server with the management node is that I know that this would have caused an arbitration issue and the cluster would have been shut down. If the management node is on located on a third server then either of the two sql/data node servers should be able to be shut down.
Test two: shut of power to both servers.
This is just simulating a complete power failure. After both servers were shut down I booted them back up and then restarted the cluster in the order that MySQL specifies: management node, data nodes and last the SQL nodes (making sure each one starts before moving on to the next).
This worked exactly as expected with a complete recovery. Even so, this was a fairly simplistic exercise with no data being read from or written to the database at the time the power was shut off. I would like to test both that and what happens when a node doesn’t really come up properly because of corruption.
Test three: kill management node.
According to everything I have read and in my (limited) experience, you can bring the management node down and the cluster does not stop. This is because for normal operation the management node is not needed. You really only use the management node to tell the nodes their configuration on startup and to stop either nodes or the entire cluster. There are other functions, but none that impact the ongoing operation of the cluster.
This test worked as expected. I brought the management node down and verified that the cluster was still up and running. I created a database and verified that it showed up in the other SQL node.
Test four: killing ndbd daemon.
In this test I executed a ‘kill -9′ command against the process id of one of the ndbd daemons. I should have been able to kill either one of the data nodes and have the cluster stay operational.
Everything worked as advertised. I killed one data node and the cluster stayed up. I verified through the opposite servers SQL node and then restarted the data node.
Test five: Stopping (and restarting) a SQL node.
I know, this is simplistic stuff. But, it’s good just to verify this. We will have a load balancer in front of our SQL nodes so that to the application there is just one IP address for however many SQL nodes we have. Any node should be able to go down without affecting operation.
This worked of course. I shut down one SQL node and verified through the other SQL node that the cluster was up and worked as expected.
The remainder of the tests I have planned cover the following:
- running ndbd –initial on one node
- running ndbd –initial on both data nodes at start-up
- making configuration changes
- rolling upgrade from 5.1.20 to 5.1.22
- convert ndbcluster - memory - table to disk-based table
- add secondary management node
- backups and restores
Hope this has been useful. I will write about the other tests as soon as possible.
6 commentsUsing Disk-based Tables in MySQL Cluster (5.1.x)
I have one project going on that is using a MySQL Cluster. We have some tables that are going to grow extremely rapidly, so I am going to convert them to storing (non-indexed) columns on disk. Before I started working on this, I had the naive thought that this was a fairly simple thing. Maybe a table alter command of some type. Nope. Not the case. As detailed here: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-disk-data-objects.html there are several steps required to create a table that is stored on disk.
1. Create a log file group. This is used to create undo logs for the table.
2. Create a tablespace and assign the log file group to the tablespace and also assign at least one data file.
3. Create a “Disk Data Table” that uses the tablespace on disk for storage of table data.
This doesn’t really help my particular situation out as I already have the table created. While there isn’t a huge amount of data in it currently, I didn’t want to dump the data out of the table(s), rebuild them with this method and then re-import the data. I know, I am lazy :) So, I asked on the cluster mailing list what I could do and Stewart Smith replied:
“It’s not an online operation, but you can ALTER TABLE foo TABLESPACE ts1 STORAGE DISK ENGINE=NDB; to convert table foo to be a disk data table. It’s a copying alter, so you should be in single user mode and the table will be locked.”
So, I can create the log files and table space, shut my cluster down to single-user mode, and then just alter the table. Simple(r) even for the lazy among us. If you are using a MySQL Cluster you absolutely should be subscribed to the cluster mailing list. It (and the other MySQL mailing lists) can be subscribed to at http://lists.mysql.com.
Thanks Stewart!!
1 commentAdding a SQL node to a cluster
Just a quick post. This information is fairly readily available but more information is better right?
Adding a SQL node to a cluster is a fairly simple task:
Step 1: Add the new SQL node to the config.ini file on the management node.
Step 2: Make sure that the SQL node is configured identically (i.e. my.cnf) to the other cluster SQL nodes,
Step 3: Issue the ’stop’ command from the ndb_mgm console and then do a normal restart of the cluster: management node first, data nodes second and then SQL nodes.
Step 4: Issue the ’show’ command from the ndb_mgm console and you see the new SQL node (it takes about a minute to show up)
Hope that is useful for someone!!
No commentsBeginning MySQL Cluster
So, at work, we are implementing a cluster. Actually, we are implementing several clusters. This is my first production experience with clustering. Trust me, it’s a whole different animal.
A couple of observations:
- I find it much easier to split up the sql and data nodes. Originally I had put a SQL node and a data node on the same physical server. Because I didn’t go with the default locations for the data I kept having trouble. Not sure why, but once I separated things out it became much easier.
- Having multiple SQL nodes creates some complexity. We want a single IP address for the application so, obviously, we have to have a load balancer of some type. This isn’t quite as easy as it might seem. We have already had one situation where the cluster thought a SQL node failed and shut it off. The SQL node itself was up and running and unless you actually tried to execute a SQL command on the node and you couldn’t tell the SQL node wasn’t working properly.
A typical load balancer might either just ping the IP or possibly try and log in to the SQL node. Both of those would have worked in this case. So, the load balancer would assume everything was cool and keep trying to serve connections to the SQL node. This is still being investigated and any feedback would be welcome!!
- scripting is going to be your friend. It’s one thing to make a change on three or four servers. It’s another to make a change to twenty. That is one of the cool things about the MySQL Clustering book..quite a few scripts. I have found quite a few on the old ‘net also.
There will be more posts about this in the future I am sure.
3 comments