Mark Callaghan — Scaling InnoDB/MySQL @ UC
Goals
Make Innodb scale on big servers
- fix bottlenecks on big SMP server
- utilize server with many disk
- support thousands of database connections
- handle corruption in mormroy and on disk
- make query plans perdictable
- support thousands of tables and accounts
Mentioned a problem with SANs?
Desirable Features
- throughput scales with number of CPU cores
- efficient support for 128 GB buffer cache
- performance for servers with man and or remote disk
- recover from corruption
CPU Problems
- Mutex implementations favor portability over performance
- Mutex hotspots
- buffer cache
- memory allocation
- transaction log
- adaptive hash latch
Symtops of CPU problems
- adaptive hash latch contention
- SHOW INNODB STATUS displays the session that holds the adaptive hash latch
- a background thread logs SHOW INNODB STATUS into the error log when there is along lock wait
- excessive mutex contention
- server has many queries, is slow and is not IO bound
- on linux, vmstat will report a lot of idle time
Workarounds for CPU problems
- Upgrade
- MySQL 5.0.30 has import improvements
- MySQL 5.0.54 fixes a bug that causes some threads to miss wakeup signals
- experiment with innodb_thread_concurrency to limit the number of threads that run concurrently
Making the InnoDB RW-mutex fast
- Use atomic ops to change internal state (replacing the Innodb spin lock mutex)
- use separate events to wake readers and writers
Work in progress for CPU problems
- change the innodb mutex to use atomic ops
- change the Innodb read-write mutex to use atomic ops
Performance Tests
Eight-CPU core server. Data is from the sbtest table generated by sysbench. Data is cached (key buffer, InnoDB buffer cache).
It was amazing the difference that these patches make. The scalability for these servers (number of sessions) was almost linear.
Memory
To support a 128GB buffer cache:
- data structures must scale
- walking a list with 8M page entries might be slow
- Resources need to be split
- more than one mutex might be needed for the buffer cache and LRU chain
- Detection of corruption is more important
- memory will be corrupted by software and hardware bugs
Detect and correct corruption
Features in InnoDB
- page checksums to detect corruption that occurs after a page has been written to disk
- doublewrite buffer to correct partial page writes that occur during a server crash
Innodb crashes when it reads a page with an invalid checksum
if this is a page for a secondary index, than the index can be rebuilt
Disk performance
- Innodb uses one background thread to process prefetch requests.
- Innodb uses one background thread to flush dirty buffer cache pages to disk. This is fine as long as writes go to the OS buffer cache. Otherwise, writes may be slow
- The background IO threads assume a server with one disk and don’t run fast enough when there is work to be done
Fixes
Google patch adds support for multiple background IO threads.
The Google patch will soon have an option to tune the maximum rate of background IO
Connections
- need to support thousands of connections, but not will to use one thread per connection
- MySQL 6 separates threads from connections
Query Plans
- innodb uses sampling to gather stats for the optimizer
- stats are not stored on disk
- stats are collected the first time a table is used after startup and after many rows has been modified
stats can be off between slaves with identical dataset. This happens (sometimes) except if the is a unique key
Overall a good talk. Mark got off several times on things that, frankly, if you weren’t deeply involved in the Innodb code you wouldn’t understand (code snippets). However, it sounds like they are solving some important problems. I am really looking forward to these being put into the “official” MySQL code.
2 Comments so far
Leave a reply
[…] notes from others are available as well. No Comments Leave a Commenttrackback addressThere was an error with your comment, please try […]
Thanks