MySQL for Your Database Warehouse??
A couple of days ago I listened to Stewart Smith’s talk on upcoming MySQL features from the MySQL mini-conf at linux.conf.au. This triggered other thoughts and the outcome was this post. Hope you enjoy!
I realize that MySQL is used in various places as a tool for data warehousing. It isn’t common knowledge, but the predecessor to MySQL, TCX (founded by Monty Widenius), was a data warehousing company - http://en.wikipedia.org/wiki/Michael_Widenius. That being said, MySQL isn’t know for serving as the centerpiece of a data warehouse. OLTP (On Line Transaction Processing) — ie the typical website — is the most common use for MySQL.
I should stop pulling out the crystal ball, but I just can’t seem to help it. Several weeks ago Monty made a few waves by announcing the release of a new storage engine - Maria. The blog posting is here: Monty’s announcement. Briefly, the new Maria engine seems to be a replacement for the aging MyISAM engine. MyISAM is a great engine, and has certainly proven its worth. Even so, without transactional support, it just isn’t a good choice. The point of Maria seems to be a lightweight engine that will provide transactional support. In addition, as Stewart points out, it will be “crash safe”(unlike MyISAM) in the initial release.
When I first read the post I pretty much said to myself “Nice . . . another storage engine.” I didn’t even blog about it. But, on further thought, I think this might be a good opportunity for MySQL down the road. Imagine Maria replacing MyISAM in the “MySQL Universe”. This is speculation somewhat as there are no released benchmarks for Maria, but it seems that if it is going to replace MyISAM it wouldn’t be suprising if it is going to enjoy a better read rate than Innodb (as MyISAM always has). If it has ACID compliance, wouldn’t it be a great potential engine for a data warehouse? Lightweight, fast reads and acid compliant/transactionally safe. Sounds like a good thing to me if you are data warehouse engineer.
It will be a long time until this takes place (plans are for inclusion in the 6.0 release tree) and is stable, but it is intriguing.
7 Comments so far
Leave a reply
But it wouldn’t be column oriented (http://en.wikipedia.org/wiki/Column-oriented_DBMS), which is central to data warehouses. The ability to run one query across several nodes would also not be there so it would not be any good as a data warehouse with reasonable amounts of data.
So, I am going to go out on a very thin limb. Let me preface this by stating I AM NOT A DATA WAREHOUSE EXPERT. I work on OLTP systems. However, googling a bit and it seems to me that while column-oriented is nice and might provide a speed-boost it isn’t the be-all to end all of data warehouses. Maybe it is a size thing where these column-oriented db’s are used for much larger datasets.
As I said, I am just speculating. But I do think that it is being dismissive to say that just because it isn’t column-oriented means that it can’t be used for data warehousing.
I don’t think that the claim that “Column Oriented is central to data warehouses” holds.
It may be true that for ultra large DW’s column oriented DB’s may offer better performance than row-based ones, but there is a very large segment of small (10 GB) to mid-sized (100 GB) data warehouse applications that can get by fine with row-based db’s. Partitioning allows some more stretching towards the TB range.
Of course, if you are convinced you really do need column-based storage, you can use Infobright’s Brighthouse storage engine for MySQL.
MySQL doesn’t talk about this, but as they push more SQL processing down to the data nodes, then MySQL Cluster (NDB) will become a viable data warehouse solution.
I agree that it could be a possibility. However, currently, cluster is pretty limited in the amount of data it can handle. I was told by a sales engineer a couple of months ago that 80 gigs was their self-proclaimed “don’t go above this” limit.
Of course that could (and I hope will) change down the road.
[…] his Diamond Notes, Keith Murphy looks into the future of MySQL for Your Database Warehouse. “It isn’t common knowledge, but the predecessor to MySQL, TCX (founded by Monty […]
[…] contro se il vostro interesse è verso il DataWarehouse vi potrebbe interessare l’articolo di Diamond Notes. Se siete indecisi nella scelta del DBMS adatto a voi un post di DBMS2 potrebbe esservi molto […]