Diamond Notes

Just another WordPress weblog

Jim Starkey and Ann Harrison — Falcon From the Beginning @ UC

Why Falcon?

Because the world is changing.  He emphasized the fact that hardware is changing rapidly (something I have harped about quite a bit).  When Falcon development started two socket boards where pretty rare.  Now, quad-cores, many threads.  Relative to CPUs and memory, disks are getting slower and slower and slower

Where Applications are going

  • batch - dead
  • timesharing - dead
  • departmental computing dead
  • client server - fading fast
  • application server for most of us
  • web services for the really big guy

The Challenge

exhaust CPU and memory and avoid the disk

Falcon tradeoffs

  • use memory (page cache) to avoid disk reads
  • use memory (record cache) to avoid the page cache manipulation
  • use CPU to find the fastest path to a record
  • use CPU to minimize record size
  • synchronize most data structures with user mode read/write locks
  • synchronize high contention data structures with interlocked instructions

The Falcon Architecture

  • incomplete in-memory database with disk backfill
  • multi-version concurrency control in memory
  • updates in memory until commit
  • group commits in a single serial log write
  • post-commit multi-threaded pipe line to move updates to disk

Incomplete in-memory database

  • selected records cached in memory
  • separate cache for disk pages
  • record cache is 15% the cost of a page cache hit
  • record cache is more memory efficient than page cache

Record Encoding - Cache Efficiency

Records encoded by value, not declaration — what this means is that the string “abc” occupies the same space in varch(3) or varchar(4096) or char(3).  The number 7 is the same if its a small int, medium int, int, decimal or numeric.

MVCC

  • update operations create new record versions
  • readers don’t block writers (hmmmm)
  • everyone sees a consistent view of the data

Updates are in memory until commit

  • updates are held in memory pending commit
  • index changes arre held in memory peding commit
  • verb rollback is dirt cheap
  • transaction rollback is dirt cheap

At commit time

  • pending record updates are flushed to serial log
  • pending index updates are flushed to serial log
  • commit record written to serial log
  • serial log flushed to the oxide and the transaction is committed

What happens when we run out of memory?

  • large transaction flushes uncommitted data early to the serial log (called “chilled”)
  • these records can be fetched from the serial log (called “thawed”)
  • scavenger garbage collects unloved records periodically
  • when things get really bad, entire record chains are flushed to backlog

Falcon is definitely oriented towards OLTP, large volume, fairly small statements.

Falcon Weaknesses

  • Transactions are ACID but not serializable
  • Latency advantage disappears at saturation
  • Very large transactions degrade performace
  • optimized for Web, not batch operations

Falcon Strengths

  • Runs like a memory database when data fits in cache
  • scales like a disk-based database when data doesn’t fit in cache
  • lowest possible latency for Web applications
  • absorbs huge spiky loads

When should you use what?

  • if you don’t need ACID, MyISAM is probably fastest
  • for Uniprocessors and small memory systems, Innodb is a good choice
  • For large transaction batch, InnoDB maybe be best match
  • for multicores and large number of threads, Falcon is probably best
  • For the web Falcon is hard to beat

Sounds to me like  Falcon is really coming along.  My question would be if the single-threaded nature of the MySQL server itself will hold Falcon back down the road.

No comments yet. Be the first.

Leave a reply