Diamond Notes

Just another WordPress weblog

Mark Callaghan — Scaling InnoDB/MySQL @ UC

Goals

Make Innodb scale on big servers

  • fix bottlenecks on big SMP server
  • utilize server with many disk
  • support thousands of database connections
  • handle corruption in mormroy and on disk
  • make query plans perdictable
  • support thousands of tables and accounts

Mentioned a problem with SANs?

Desirable Features

  • throughput scales with number of CPU cores
  • efficient support for 128 GB buffer cache
  • performance for servers with man and or remote disk
  • recover from corruption

CPU Problems

  • Mutex implementations favor portability over performance
  • Mutex hotspots
    • buffer cache
    • memory allocation
    • transaction log
    • adaptive hash latch

Symtops of CPU problems

  • adaptive hash latch contention
    • SHOW INNODB STATUS displays the session that holds the adaptive hash latch
    • a background thread logs SHOW INNODB STATUS into the error log when there is along lock wait
  • excessive mutex contention
  • server has many queries, is slow and is not IO bound
  • on linux, vmstat will report a lot of idle time

Workarounds for CPU problems

  • Upgrade
    • MySQL 5.0.30 has import improvements
    • MySQL 5.0.54 fixes a bug that causes some threads to miss wakeup signals
  • experiment with innodb_thread_concurrency to limit the number of threads that run concurrently

Making the InnoDB RW-mutex fast

  • Use atomic ops to change internal state (replacing the Innodb spin lock mutex)
  • use separate events to wake readers and writers

Work in progress for CPU problems

  • change the innodb mutex to use atomic ops
  • change the Innodb read-write mutex to use atomic ops

Performance Tests

Eight-CPU core server. Data is from the sbtest table generated by sysbench. Data is cached (key buffer, InnoDB buffer cache).

It was amazing the difference that these patches make. The scalability for these servers (number of sessions) was almost linear.

Memory

To support a 128GB buffer cache:

  • data structures must scale
    • walking a list with 8M page entries might be slow
  • Resources need to be split
    • more than one mutex might be needed for the buffer cache and LRU chain
  • Detection of corruption is more important
    • memory will be corrupted by software and hardware bugs

Detect and correct corruption

Features in InnoDB

  • page checksums to detect corruption that occurs after a page has been written to disk
  • doublewrite buffer to correct partial page writes that occur during a server crash

Innodb crashes when it reads a page with an invalid checksum

if this is a page for a secondary index, than the index can be rebuilt

Disk performance

  • Innodb uses one background thread to process prefetch requests.
  • Innodb uses one background thread to flush dirty buffer cache pages to disk. This is fine as long as writes go to the OS buffer cache. Otherwise, writes may be slow
  • The background IO threads assume a server with one disk and don’t run fast enough when there is work to be done

Fixes

Google patch adds support for multiple background IO threads.

The Google patch will soon have an option to tune the maximum rate of background IO

Connections

  • need to support thousands of connections, but not will to use one thread per connection
  • MySQL 6 separates threads from connections

Query Plans

  • innodb uses sampling to gather stats for the optimizer
  • stats are not stored on disk
  • stats are collected the first time a table is used after startup and after many rows has been modified

stats can be off between slaves with identical dataset. This happens (sometimes) except if the is a unique key

Overall a good talk. Mark got off several times on things that, frankly, if you weren’t deeply involved in the Innodb code you wouldn’t understand (code snippets). However, it sounds like they are solving some important problems. I am really looking forward to these being put into the “official” MySQL code.

2 Comments so far

  1. […] notes from others are available as well. No Comments Leave a Commenttrackback addressThere was an error with your comment, please try […]

  2. Sefa Şahin June 16th, 2008 3:23 am

    Thanks

Leave a reply