Diamond Notes

Just another WordPress weblog

Archive for February, 2008

MySQL for Your Database Warehouse??

A couple of days ago I listened to Stewart Smith’s talk on upcoming MySQL features from the MySQL mini-conf at linux.conf.au.  This triggered other thoughts and the outcome was this post.  Hope you enjoy!

I realize that MySQL is used in various places as a tool for data warehousing.  It isn’t common knowledge, but the predecessor to MySQL, TCX (founded by Monty Widenius), was a data warehousing company - http://en.wikipedia.org/wiki/Michael_Widenius.  That being said, MySQL isn’t know for serving as the centerpiece of a data warehouse.  OLTP (On Line Transaction Processing) — ie the typical website — is the most common use for MySQL.

I should stop pulling out the crystal ball, but I just can’t seem to help it.  Several weeks ago Monty made a few waves by announcing the release of a new storage engine - Maria.  The blog posting is here: Monty’s announcement.   Briefly, the new Maria engine seems to be a replacement for the aging MyISAM engine.  MyISAM is a great engine, and has certainly proven its worth.  Even so, without transactional support, it just isn’t a good choice.  The point of Maria seems to be a lightweight engine that will provide transactional support.  In addition, as Stewart points out, it will be “crash safe”(unlike MyISAM) in the initial release.

When  I first read the post I pretty much said to myself “Nice . . . another storage engine.”  I didn’t even blog about it.  But, on further thought, I think this might be a good opportunity for MySQL down the road.  Imagine Maria replacing MyISAM in the “MySQL Universe”.  This is speculation somewhat as there are no released benchmarks for Maria, but it seems that if it is going to replace MyISAM it wouldn’t be suprising if it is going to enjoy a better read rate than Innodb (as MyISAM always has).  If it has ACID compliance, wouldn’t it be a great potential engine for a data warehouse?  Lightweight, fast reads and acid compliant/transactionally safe.  Sounds like a good thing to me if you are data warehouse engineer.

It will be a long time until this takes place (plans are for inclusion in the 6.0 release tree) and is stable, but it is intriguing.

7 comments

Maintenance Plan

Whenever you need to perform maintenance it is critical that you take the time before to write down what is going to happen.  Step by detailed step.  Have it reviewed, critiqued and torn apart.  Then, when it is time for maintenance, you have an easily executed plan with much less change for error.

Let me elaborate.

I know people who just fly by the seat of their pants.  Impressive maybe, but really irresponsible.  It is way too easy to make mistakes.

Take the time to craft a skeleton plan to build upon.  What I mean by that is a written plan that only has the things that happen every time you do maintenance.  Things such as before maint you need to tell the proper people that maint is happening, turn off monitoring, make changes to your backup plans if necessary — things like that.  Post maint you will need to update notices about  maintenance being performed.  These can be added to your skeleton plan because they are going to happen every time.   A checklist is very helpful.  It insures that things don’t get missed.

Take your time.  Do it right!!

3 comments

New Benchmarks Coming Soon — Your Chance to Participate!

I am going to be getting a new master/slave pair of production servers. I will have between a one and two weeks before they have to go into production.  This will be a great chance to do a little benchmarking. I am going to begin writing up what I plan on testing in the next few days. I would love to here your suggestions about what to test!!

Here is the specs as I know them:

8 gigs of RAM

4 or 6 disks — 10k or 15k drives on a RAID controller with 256MB write-back cache.

4 or 8 cores (cpu) — most likely eight though

debian etch

MySQL 5.0.45

Things I want to test:

write-back turned on vs write-back turned off on RAID controller

xfs/reiserfs/ext3 on the RAID partition

I/O schedular (debian defaults to CFQ — want to test deadline also)

innodb-flush-log-at-trx-commit = 0 vs 1

I will be using our QPP (query processing programs) tools to run tests with production data and production client requests.

I very likely will not have time to run all the tests I want. However, the more tests the merrier. Please comment with your test ideas.

7 comments

MUG Meeting Tonight — Raleigh/Durham/Chapel Hill

Sorry I didn’t post this earlier.   Our monthly MUG meeting is tonight.  Meeting starts at 7:00 — people begin gathering at 6:30 pm.  Come join the fun!!  We meet at 2635 Meridian Parkway, Suite 200  Durham, NC

No comments

New Querysniffer Version

This is a “reprint” of a post I did yesterday.  For some reason Planetmysql didn’t pick it up and I wanted to make sure people knew about the new version of querysniffer.

We updated querysniffer just a bit. A user pointed out that the sniffer doesn’t actually start recording data until a “use” statement is issued. Ian added a flag so you can change this behavior if you want.

There is a memory leak when the sniffer runs for an exteneded period of time. We are working on that. Should have something soon. The new sniffer files are available at http://www.paragon-cs.com/queryprogs/querysniffer.tar.gz

No comments

Can We Scale?

Before I get going I would like to beg MySQL AB/Sun to not get upset. Call it constructive criticism.
I was laying in bed last night thinking. I should have been sleeping, but I was thinking about how MySQL could create a better product. Not for version 6.0 but post-6.0. What would it really take for our little dolphin to be the number one database product in the world — not just the open-source world.

I was thinking about how in 2001 I worked at a little company called Netzero. In addition to the other 800 servers that our team of system administrators had to manage we had two large Sun servers. I don’t recall the model anymore but they were the equivalent of a modern Sunfire and could hold up to 64 gigabytes of RAM and 64 (Sparc) CPUs. They were each the size of a refrigerator and one was located on the east coast of the US and one on the west coast of the US. They ran Solaris and Oracle. It was our billing application. Each of these servers at the time had 32 gigs of RAM and 32 CPUs. I have no clue how much the hardware and software cost but I am certain the pricetag was over one million dollars (USD).

Fast forward seven years. Today I work at iContact (remember kids..the views and opinions expressed in this blog are mine and mine alone. They are not the views and opinions of my employers. Although they should be :)). We have several servers that each have two cpu chips with four cores on each chip. A total of eight cores. And eight gigs of RAM. This is not all that uncommon these days. The premium over dual-core chips was a couple of hundred dollars. The problem is that MySQL doesn’t really utilize all these cores. And, if a server has a lot of memory, it wouldn’t really utilize all of it. I know “a lot of memory” is vague and to be more precise I believe it doesn’t really use more than 64 gigs. I read that “limit” somewhere but do not quote me on it. In 2001 32 gigs was a tremendous amount of memory. Today it isn’t so much. I recently looked at a Sun server that ran either Solaris or Linux and could support up to 32 cores and 256 gigs of RAM. For an entry price (not fully loaded of course) of about $25,000.

Now, peer into your crystal ball a little. In five more years it is a fairly safe bet that you can, for a reasonable cost, buy a CPU with sixteen cores. Take a server board with two CPU slots and bingo..you have 32 cores. Now throw on 64 or 128 gigs of RAM and you have a very nice server capable of some serious processing.

This isn’t fantasy. It is what is coming down the road. I want my database program to be able to utilize a setup like this. I don’t know what it takes to parallelize queries (as much as is possible). I don’t know what it takes to break a complex query up and run it over multiple threads/cores. I don’t know what it takes to have the daemon utilize multiple cores (I am not sure it is even possible). Maybe you have to have multiple daemons running on different cores communicating between each other. I know it isn’t trivial. It probably is extremely difficult. I would guess that it is going to require a fundamental restructuring of the inner workings of MySQL. Just googling “parallel dbms” and reading some of the academic papers is kind of mind-boggling. The one thing I do know is: IT HAS TO BE DONE!

Wishing it were different won’t change it. Intel and AMD are both committed to building up the number of cores and not the megahertz. If MySQL can’t scale with the cpu count down the road it will become irrelevant. On the other hand, MySQL has the opportunity to become a market leader by building a product that is truly revolutionary. Why should a DBMS that has a price tag in the hundreds of thousands of dollars be the only one that can really utilize many many cores?

I would think that the RAM utilization issue is much simpler to resolve. I really don’t know what the issue is here so I am speaking completely off the top of my head. Probably means I am incorrect. Just remember, as time goes on, higher amounts of RAM are going to become much more common. While 16 and 32 gigs of RAM in a server are fairly common now, 64 gigs and more will be common in just a few years.

I am not going to go into any depth here, and I don’t want to start any flame-wars, but sometimes I think we get stuck on only using scale-out instead of scale-up. We scale out, not because it is cheaper (it can be argued that it isn’t really in terms of hardware footprint, energy and hosting costs and system/db admin costs) but it is easier (or seems easier). And that is OK. Scaling out makes perfect sense in many situations and if you are looking at scaling your application I would recommend that you at least understand how to scale out. As a good reference I would recommend the upcoming second edition of High Performance MySQL as they have several chapters devoted to these issues. It is going to be released in April at the MySQL Users Conference. When the discussion turns to scaling up it is almost taboo. Maybe it is because in the past MySQL didn’t scale up very well. If this trend continues and some of these “log jams” aren’t resolved companies will switch to other products that better utilize modern hardware. They can’t buy equipment and only have it 10% utilized. It doesn’t make financial sense.

Interesting that in the descriptions I see of the Falcon engine it is described as being built for 64-bit hardware/software with multiple CPUs. I don’t know how much of a difference the storage engine can make. I would think that the upper layers of the MySQL software are just as important to be built with support for multiple (16/32/64) cpus and oodles of RAM.

Obviously I glossed over a major part of the equation (I/O). You can build a great system with all this and be hobbled by your I/O. For the sake of the keeping this post below the length of War of Peace I am going to skip this part. However, it is one very good reason to scale out - not up, as extremely fast I/O can be very expensive and this doesn’t really look to change in (near) future.

It’s fun to want new features. Some of them are even very important. But to me if MySQL Server can be structured to take advantage of the cheap CPU and RAM power that will be available in the near future it will be really be quite a feat. And one that will be sure to help accelerate it’s growth. Why should Oracle have the title of the most throughput on one server (at least I think it is them…even if it isn’t .. it serves the point)?

14 comments

New Querysniffer version

We updated querysniffer just a bit.  A user pointed out that the sniffer doesn’t actually start recording data until a “use” statement is issued.  Ian added a flag so you can change this behavior if you want.

There is a memory leak when the sniffer runs for an exteneded period of time.  We are working on that. Should have something soon.  The new sniffer files are available at http://www.paragon-cs.com/queryprogs/querysniffer.tar.gz

No comments

« Previous Page