April 17th, 2008 — MySQL
One of the benefits of going to the conference is that you can take various certification tests (CMA, CMDBA, CMDEV and the CMCDBA) for a very low price ($25.00 a test).
I planned on taking the first CMDBA (the cert requires two test) and I was able to take the first test this morning. I work daily in an environment where all I do is “DBA” functions (I don’t do any coding). Plus, I have been studied the certification book for a while. I was fairly confident that I could pass the first test. Even so, I was nervous going in to take the test. Not enough sleep this week, overloaded with session information. Not really ideal. However, I really breezed through it.
Then I started thinking I should take the second test. I didn’t study for it and while we run 5.0 in production we don’t currently use any stored procedures/triggers/etc. So, I don’t have a great deal of experience on something that was probably a third of the test. Well, I did take it. And while many people would probably think it was actually easier than the first test I was a little worried I might not pass. However, about an hour ago, I did find out that I passed the second test so I am “unofficially” a CMDBA. Yeah!!
Should you do it? Well, for conference attendees I am afraid it is too late. Testing stopped at 3:00 pm today or something like that. However, MySQL offers the tests through Pearson/vue (sp?) testing centers. It is worth it, and does show you know a great deal about the MySQL server. I wouldn’t recommend you try and take this test without production experience. The certification guide says that you need six months of production experience if I am not mistaken. I agree, although you better study well also!
April 17th, 2008 — Benchmarking, MySQL
Goals
Make Innodb scale on big servers
- fix bottlenecks on big SMP server
- utilize server with many disk
- support thousands of database connections
- handle corruption in mormroy and on disk
- make query plans perdictable
- support thousands of tables and accounts
Mentioned a problem with SANs?
Desirable Features
- throughput scales with number of CPU cores
- efficient support for 128 GB buffer cache
- performance for servers with man and or remote disk
- recover from corruption
CPU Problems
- Mutex implementations favor portability over performance
- Mutex hotspots
- buffer cache
- memory allocation
- transaction log
- adaptive hash latch
Symtops of CPU problems
- adaptive hash latch contention
- SHOW INNODB STATUS displays the session that holds the adaptive hash latch
- a background thread logs SHOW INNODB STATUS into the error log when there is along lock wait
- excessive mutex contention
- server has many queries, is slow and is not IO bound
- on linux, vmstat will report a lot of idle time
Workarounds for CPU problems
- Upgrade
- MySQL 5.0.30 has import improvements
- MySQL 5.0.54 fixes a bug that causes some threads to miss wakeup signals
- experiment with innodb_thread_concurrency to limit the number of threads that run concurrently
Making the InnoDB RW-mutex fast
- Use atomic ops to change internal state (replacing the Innodb spin lock mutex)
- use separate events to wake readers and writers
Work in progress for CPU problems
- change the innodb mutex to use atomic ops
- change the Innodb read-write mutex to use atomic ops
Performance Tests
Eight-CPU core server. Data is from the sbtest table generated by sysbench. Data is cached (key buffer, InnoDB buffer cache).
It was amazing the difference that these patches make. The scalability for these servers (number of sessions) was almost linear.
Memory
To support a 128GB buffer cache:
- data structures must scale
- walking a list with 8M page entries might be slow
- Resources need to be split
- more than one mutex might be needed for the buffer cache and LRU chain
- Detection of corruption is more important
- memory will be corrupted by software and hardware bugs
Detect and correct corruption
Features in InnoDB
- page checksums to detect corruption that occurs after a page has been written to disk
- doublewrite buffer to correct partial page writes that occur during a server crash
Innodb crashes when it reads a page with an invalid checksum
if this is a page for a secondary index, than the index can be rebuilt
Disk performance
- Innodb uses one background thread to process prefetch requests.
- Innodb uses one background thread to flush dirty buffer cache pages to disk. This is fine as long as writes go to the OS buffer cache. Otherwise, writes may be slow
- The background IO threads assume a server with one disk and don’t run fast enough when there is work to be done
Fixes
Google patch adds support for multiple background IO threads.
The Google patch will soon have an option to tune the maximum rate of background IO
Connections
- need to support thousands of connections, but not will to use one thread per connection
- MySQL 6 separates threads from connections
Query Plans
- innodb uses sampling to gather stats for the optimizer
- stats are not stored on disk
- stats are collected the first time a table is used after startup and after many rows has been modified
stats can be off between slaves with identical dataset. This happens (sometimes) except if the is a unique key
Overall a good talk. Mark got off several times on things that, frankly, if you weren’t deeply involved in the Innodb code you wouldn’t understand (code snippets). However, it sounds like they are solving some important problems. I am really looking forward to these being put into the “official” MySQL code.
April 17th, 2008 — MySQL
Talk about:
MySQL Server limitations: skeletons, “official secrets”, embarrassing things in the server
Why this Talk
- MySQL and Sun needs to become more transparent in what they are doing
- when users know the limitations they can go around them
- It’s easier to trust someone when they acknowledge a problem (hear hear!!)
Threads
- Problems one connection per thread doesn’t work in all cases
- no way to give priority to thread
- no way to ensure that we have X active threads running
Symptoms
- too many context switches
- no multi-core cpu scalability (efficiently)
- Solutions (one of many)
- –thread-handling=–pool-off-threads (MySQL 6.0)
Work to be done
- all Innodb concurrency patches
- spawn more threads when threads are blocked
- removing overal mutex contention in the server
- give higher/lower priority to some threads
- allow ’super’ user to login when all threads are in use
Memory as a resource
Single biggest problem is no single memory allocator (server/engine)
Privileges
Problems
not modular/pluggable (extendable to things such LDAP)
no ROLES
Symptoms
- hard to maintain lots of users
- hard/impossible to use external authentication
Solution
get the community to implement an authenctication module (four summer of code projects for LDAP)
Pluggable storage engines
Problems
storage engines are depending on internal MySQL structures
Symptoms
storage engines con only be used with the exact MySQL server version they were compiled against
Parser
Problems
- state machine to loarge
- not pluggable
- not cacheable
- still Bison
- Bad error messages
Symptoms
- Parsing has a high overhead for simple queries (12% time spent in parser)
- Parser takes a lot of code space
Modularity
Problems
- Server is very monolithic
- Few defined interfaces (not often stable)
- Server and libraries not documented
- Multiple Execution paths
- no rewrite state for optimizer
Symptoms
- hard to change code without introducing bugs
- hard for newcomers to understand the server
Stored Procedures and Triggers
Problems
- stored procedure are not cacheable across connections
- we only support SQL (but this is changing )
- Pre-locking of all tables (deadlock-free algorithm)
- all cursors are materialized
- trigger code is not shared across open tables
- no constaint of resources
- we don’t support stored procedures as table
Replication
Problems
- replication is not fail safe
- no synchronous options
- no checking consistency option
- setup and resync of slave is complicated
- single thread on the slave
- no multi-master
- only InnodDB synchronizes with the replication (binary) log
Symptoms
- slaves can catch up with master
- hard to do clean fail overs
- we are dependent on Innodb
Solutions
- use backup to setup slave
- replicate CHECKSUM TABLE and do consistency checking on slave (such as maatkit)
Table names
Problems
- tables are stored as files (nam.frm)
- file system may be case sensitive
- falcon has it’s own interpretation of how things should be done
Symptoms
- SELECT * from TableName and SELECT * from ‘TABLEname’ may or may not refer to different tables depending on file system
- hard to move apps between OS’s
- doing alter table of all tables to Falcon may delete data from tables on Unix fro table names that only differ in case
Solutions
- use –lower-case-table-names when running Windows and Mac
- add modes to be backword compatabile, ANSI compatible and PostgreSQL compatible
Why Falcon and Maria
Problem
MySQL/Sun doesn’t have it’s own transactional storage engine
Solution
Falcon/Maria
Interesting. This is the first time I have heard MySQL officially say this although I suspected this for some time.
Open Source Project?
This part of the discussion was a very frank assessment of how things are. Monty would like to make things transparent. He would like to change the development model to attract outside developers. One major change is giving outside developers commit and decision rights to the MySQL server code base.
Release Policy
Problems
- MySQL ships releases before they are ready
- Benchmarks are given out which show “partial truths”
Symptoms
- MySQL 5.1 was declared RC way too early
- features are removed in RC releases
- major code changes are done each month in RC code
- Users are not happy with the releases until six months after GA (see 5.0)
- Critical bugs are still open in 5.1 and not scheduled to be fixed before GA:
- – bug 989
- –bug 30414
Solution
- wait to declare something GA until code stabilizes and critical bugs are fixed
- create a release policy and an independent release policy board
The good news
Monty says Sun is much more open source friendly than MySQL AB has been lately and is driving MySQL in the right direction.
Overall, this was a very fair assessment of the current state of MySQL. It is good to know that people are thinking about these things. While I have always thought they worried about this “stuff”..sometimes it feels like MySQL doesn’t because of a lack of information coming from them. Can I just say that Monty is really cool? He has no need to continue working and yet he obviously enjoys immensely what he does. And his honesty is very refreshing.
April 16th, 2008 — MySQL
I have a confession to make. I am struggling this afternoon. For two or three weeks before the conference I had many things to do and managed to wear myself out. Then, last Thursday, I jumped on a plane (for the second time in a week) and flew cross-country to come out here. I then spent two days taking a look at Kickfire’s new technology. Sunday night came around and the conference started (well, not officially, but it was good community dinner). Oh, and I was working hard to get the latest issue of MySQL Magazine out Tuesday. The first two days for the most part I have been OK. Then, because I was a bit run down, I went to bed fairly early last night (10 ‘ish).
I woke up at 5:00 AM. Couldn’t go back to sleep. Now I am paying for it. I haven’t blogged in the last two sessions simply because I don’t have the energy. And in the last session (Baron’s great session on the Query Cache) I started dozing off at one point. Shhh…don’t tell him. Oh wait, he will probably read this. I couldn’t help it and it certainly wasn’t his fault. He did a great job.
I have woken up a bit since then (the caffeine helps sometimes!!). Now I am listening to Ann Harrison talk about Falcon and explain how it is quite a bit different than Innodb. However, for the rest of the day I am going to skip blogging the sessions. Too much concentration required. My point to all this is that you shouldn’t go to the conference run down. Get as much rest as you can. We end up in information overload somewhat as it is so don’t hobble yourself by not resting up beforehand.
April 16th, 2008 — MySQL
I would recommend that you download the video of this!! Sheeri posted it here.
The numbers in parentheses are Alexa rankings.
Moderator - Kaj Arno
(1317) Monty Taylor - MySQL
(905) Matt Ingenthron - Sun
(39) John Allspaw - Flickr
(13) Frank mash - Fotolog
(9) Domas Mituzas - Wikipedia
(6) Jeff Rothschild - Facebook
(2) Paul Tuckfield - YouTube
Question One: Number of MySQL servers
MySQL one master/three slaves
Sun four servers
Flickr 166
Fotolog 37
Wikipedia
Facebook 1,800 (900m/900s)
YouTube
Question Two: Number of MySQL DBAs
MySQL 1/10th
Sun 1.5
Flickr 0 (normally 1)
Fotolog 1
Wikipedia Technical Team
Facebook 2
YouTube 3
Question Three: Number of Web Servers
MySQL 2
Sun 160
Flickr 244
Fotolog 70
Wikipedia
Facebook 10,000
YouTube
Question Four: Number of Memcached servers
MySQL 2
Sun 8
Flickr 14
Fotolog 40
Wikipedia 79
Facebook 805
YouTube
Question Five: Version of MySQL
MySQL 5.23-2rc
Sun 5.0.21
Flickr 5.0.51
Fotolog 4.11
Wikipedia 4.4
Facebook 5.0.44
YouTube 5.0.24
Question Six: Operating System on Server
MySQL Fedora
Sun OpenSolaris
Flickr Linux
Fotolog Solaris 10
Wikipedia Fedora/Ubuntu
Facebook Fedora/RHEL
YouTube SuSE 9
Question Seven: What happens if a server fails?
Flickr - Federated setup for failover. Can loose any one side of the shard.
Wikipedia - if a master fails they replace with slave
Facebook - archive binlogs, promote slave
Fotolog - mount snapshots?
Youtube - SAN; shards with a master and multiple slaves so they promote slaves
Question Eight: What is Their Crucial Scaling Technology
Facebook doesn’t use SAN - they do use RAID 10 with 2.5″ drives
Fotolog — UltraSparc T1 — excellent master UltraSparc T2 — excellent slave — uses SAN
This was interesting to me. Frank (Fotolog) said they use a SAN to keep things manageable (only two dbas with the second one just hired). Facebook says they don’t use SAN because they didn’t want to limit themselves.
Next they got off on discussion about power. This varied quite a bit with YouTube pretty much dismissing power concerns. Of course Frank from Fotolog then pointed out that when they (Fotolog) want to expand in a datacenter — the datacenter has to get Google’s approval…hmmm..no wonder Google isn’t worried about it. Fotolog and Facebook were very much in favor of power savings. I think there is more than just saving a little power, you get cooling and space (if smaller of course) savings.
April 16th, 2008 — MySQL
Disclaimer: I have talked to Kickfire extensively before the conference. I have blogged several times and written and article for MySQL Magazine about this. I have not received any payment from Kickfire.
C2 (now Kickfire) approached MySQL about a year and a half ago with the idea of building a chip that would run SQL in hardware. Since this time they have developed what amounts to an “appliance” that is now the number one ranked server on the TPC-H benchmarks in price/performance and also number one in performance in non-clustered environment.
Kickfire is looking to reduce two bottlenecks of a data warehousing system - the I/O bottleneck and the CPU bottleneck.
Raj didn’t talk about the CPU bottleneck much. However, some of their techniques bypass caches to improve performance. Also, the SQL Chip is going to offload much of the load from the CPU because it does in the SQL Chip what was being done in the internal CPU.
I/O bottleneck resolution approaches:
- Column store format and compression of data (including on the fly reading of compressed data)
- intelligent indexing
- smart pre-fetching
In addition they developed the aforementioned SQL chip. The benefits of this:
- execution in hardware of SQL commands
- parallel query execution
Raj then went through some of the benchmarks. I released these numbers in the magazine article. It is available at http://mysqlzine.net (spring 2008 issue).
I think that there was a lot of wondering (as in “is this real?”) before the conference. I think a number of questions have been answered. It will be interesting to see what comes out of all this!
April 16th, 2008 — MySQL
Why Falcon?
Because the world is changing. He emphasized the fact that hardware is changing rapidly (something I have harped about quite a bit). When Falcon development started two socket boards where pretty rare. Now, quad-cores, many threads. Relative to CPUs and memory, disks are getting slower and slower and slower
Where Applications are going
- batch - dead
- timesharing - dead
- departmental computing dead
- client server - fading fast
- application server for most of us
- web services for the really big guy
The Challenge
exhaust CPU and memory and avoid the disk
Falcon tradeoffs
- use memory (page cache) to avoid disk reads
- use memory (record cache) to avoid the page cache manipulation
- use CPU to find the fastest path to a record
- use CPU to minimize record size
- synchronize most data structures with user mode read/write locks
- synchronize high contention data structures with interlocked instructions
The Falcon Architecture
- incomplete in-memory database with disk backfill
- multi-version concurrency control in memory
- updates in memory until commit
- group commits in a single serial log write
- post-commit multi-threaded pipe line to move updates to disk
Incomplete in-memory database
- selected records cached in memory
- separate cache for disk pages
- record cache is 15% the cost of a page cache hit
- record cache is more memory efficient than page cache
Record Encoding - Cache Efficiency
Records encoded by value, not declaration — what this means is that the string “abc” occupies the same space in varch(3) or varchar(4096) or char(3). The number 7 is the same if its a small int, medium int, int, decimal or numeric.
MVCC
- update operations create new record versions
- readers don’t block writers (hmmmm)
- everyone sees a consistent view of the data
Updates are in memory until commit
- updates are held in memory pending commit
- index changes arre held in memory peding commit
- verb rollback is dirt cheap
- transaction rollback is dirt cheap
At commit time
- pending record updates are flushed to serial log
- pending index updates are flushed to serial log
- commit record written to serial log
- serial log flushed to the oxide and the transaction is committed
What happens when we run out of memory?
- large transaction flushes uncommitted data early to the serial log (called “chilled”)
- these records can be fetched from the serial log (called “thawed”)
- scavenger garbage collects unloved records periodically
- when things get really bad, entire record chains are flushed to backlog
Falcon is definitely oriented towards OLTP, large volume, fairly small statements.
Falcon Weaknesses
- Transactions are ACID but not serializable
- Latency advantage disappears at saturation
- Very large transactions degrade performace
- optimized for Web, not batch operations
Falcon Strengths
- Runs like a memory database when data fits in cache
- scales like a disk-based database when data doesn’t fit in cache
- lowest possible latency for Web applications
- absorbs huge spiky loads
When should you use what?
- if you don’t need ACID, MyISAM is probably fastest
- for Uniprocessors and small memory systems, Innodb is a good choice
- For large transaction batch, InnoDB maybe be best match
- for multicores and large number of threads, Falcon is probably best
- For the web Falcon is hard to beat
Sounds to me like Falcon is really coming along. My question would be if the single-threaded nature of the MySQL server itself will hold Falcon back down the road.
April 16th, 2008 — MySQL
The MySQL Sandbox is a way to set up a new test server or servers in just a few seconds.
Features
- not an official MySQL product
- GPL
- installs from a tarball in a few seconds
- installs side servers, not the main one (doesn’t disturb running servers!!!)
- creates a separate environment — data directory, port and socket
Sandbox commands
- start
- stop
- clear ( creates a fresh installation)
- use
- start_all
- stop_all
- clear_all
- multi_cmd
m for master and s for slave
quick install
./express_install.pl /path/mysql_os_5.1.tar.gz
This would install a single server
For replication
./set_replication.pl /path/mysql_os.tar.gz
for multiple servers
./set_many 5.0.51
and multiple servers of different versions
./set _custom_many.pl 5.0.51 5.1.23 6.0.5
(can specify full path for configurations
–how_man_nodes option creates theh specified number of nodes (–how_many_nodes=4 creates four hosts)
–help prints the help screen
Fine Tuning
./express_install.pl /path/mysql_OS.tar.gz [option] [option] …
./express_install.pl /path/mysql-OS.tar.gz –my_file={small|medium|large|hug}
or you can specify a path to a custom my.cnf
Commands
start simply start the server, creates the pid and socket files
stop stops the server
clear stops the server and removes all files in the data directory and clears the test database
use start the mysql client
- uses the credentials in the my_sandbox.cnf fiel
- deafault username and password msandbox/msandbox and root password of msandbox
my {sqldump|sqlbinlog|sqladmin|}
For Replication
start_all
starts the master and then the slave
stop _all
stops the slave and then the master
clear_all
clear all the slaves and then the master
For Multi-node Sandbox
start_all
stop_all
clear_all
multi_cmd
executes command on all nodes
This is a really cool looking tool. I have heard of it before, but never taken the time to download it and evaluate it. It would be really nice to use this on our test machine. If you do much testing at all I would recommend you download it and take it for a spin!!
Resources
http://sf.net/prjects/mysql-sandbox
April 15th, 2008 — MySQL, News
So multiple people have blogged about this announcement already. The Innodb company has announced a new plug-in that can be a drop-in replacement for the current Innodb code in MySQL 5.1. Evidently it is really easy also. It has some interesting new features including non-blocking index creation if I am understanding things correctly. That to me seems to be the big new feature to me.
However, the reason why I am pointing this announcement out when others already have is that Peter Zaitsev did a talk today about the scalablitiy issues of Innodb dbs this afternoon. I blogged during the talk here. He pointed out (with benchmarks) multiple times that the 5.1 server (with Innodb tables) was actually not working as well as 5.0 currently does. If I understand correctly he didn’t have a chance to test the new plug-in (not for certain though — I was unclear about that). They possibly could have fixed these regression issues in the new version but I would love to see tests of it. Just a thought before we all get to excited :) I like the idea personally that the Innodb plug-in isn’t so tightly coupled with the MySQL server. Good to see them releasing something. Hope they continue! Scaling improvements above 8 cores would be nice.
April 15th, 2008 — MySQL
Scalability: the ability for a server to grow up (and technically down…but they don’t consider this)
Performs: response time is decent (ie not 30 seconds), easy to measure in production. Make sure there is enough system capacity (system loaded with degrading in response time and failing). With system operations you need to be able to have reasonable times for activities (24 hr backup of 2 terabytes is not acceptable).
Application Growth Parameters
- load - growing from 100 q/sec to 1,000 q/sec
- database size growing from 10 GB to 100 GB
- data distribution - going from 10 friends on average to 250 average
- launching new features - adding full text search can add a heavy load
As applicatons grow all the parameters tend to grow at once. This includeds more complex queries on larger databases. To maintain performance you can optimize the application: schema, queries, caching, sharding and replication.
What is a platform?
Testing done through Micro Benchmarks (simple operations designed to test some particular aspect of behavior). Peter didn’t really explain this enough for me…other than a general idea not sure about it.
Issue with Innodb
- large tables are hard to deal
- backup - physical backup is a must
- can’t move separate tables between servers
- no repair functionality (corruption means dump and restore)
- ALTER TABLE is very slow (MM replication helps)
- table maint with OPTIMIZE TABLE
Online Index Creation in InnoDB
plugin just announced today by Innodb
Innodb can now build indexes without rebuilding whole table (index build done by sort which is much faster). It is ten times faster loading data and adding indexes (separately).
Dealing with Many Tables in Innodb
many people avoid the problem of large table by creating many small tables.
using innodb_file_per_table=1 will resolve this problem.
“Warmup” is still a problem. In 5.0 only one table can be opened at a time. stats update on open makes it quite slow.
Large Buffer Pool Size
In general, the more memory you can get for buffer pool the better it is
Warmup will take a long time. With a 32GB buffer pool, it can take an hour to fill at 600 pages/sec fill rate. SHOW STATUS and SHOW INNODB STATUS gets expensive because of global locking
In addition, shutdown will take longer with large buffer pool.
You can set innodb_max_dirty_pages_pct=0 in advance and it will help with the flushing of the buffer pool.
Peter discussed benchmarking 5.0 and 5.1. It appears that 5.1 has some regression issues! They tested up to 64 threads in a test. In all cases 5.1 simply did not perform as well. Quad-core chips did not perform as well as dual core.
Looks to me like 5.1 has bit to go in respect to Innodb. However, I don’t believe they had the latest patches. He kept mentioning Mark Callhan??