Jim Starkey and Ann Harrison — Falcon From the Beginning @ UC
Why Falcon?
Because the world is changing. He emphasized the fact that hardware is changing rapidly (something I have harped about quite a bit). When Falcon development started two socket boards where pretty rare. Now, quad-cores, many threads. Relative to CPUs and memory, disks are getting slower and slower and slower
Where Applications are going
- batch - dead
- timesharing - dead
- departmental computing dead
- client server - fading fast
- application server for most of us
- web services for the really big guy
The Challenge
exhaust CPU and memory and avoid the disk
Falcon tradeoffs
- use memory (page cache) to avoid disk reads
- use memory (record cache) to avoid the page cache manipulation
- use CPU to find the fastest path to a record
- use CPU to minimize record size
- synchronize most data structures with user mode read/write locks
- synchronize high contention data structures with interlocked instructions
The Falcon Architecture
- incomplete in-memory database with disk backfill
- multi-version concurrency control in memory
- updates in memory until commit
- group commits in a single serial log write
- post-commit multi-threaded pipe line to move updates to disk
Incomplete in-memory database
- selected records cached in memory
- separate cache for disk pages
- record cache is 15% the cost of a page cache hit
- record cache is more memory efficient than page cache
Record Encoding - Cache Efficiency
Records encoded by value, not declaration — what this means is that the string “abc” occupies the same space in varch(3) or varchar(4096) or char(3). The number 7 is the same if its a small int, medium int, int, decimal or numeric.
MVCC
- update operations create new record versions
- readers don’t block writers (hmmmm)
- everyone sees a consistent view of the data
Updates are in memory until commit
- updates are held in memory pending commit
- index changes arre held in memory peding commit
- verb rollback is dirt cheap
- transaction rollback is dirt cheap
At commit time
- pending record updates are flushed to serial log
- pending index updates are flushed to serial log
- commit record written to serial log
- serial log flushed to the oxide and the transaction is committed
What happens when we run out of memory?
- large transaction flushes uncommitted data early to the serial log (called “chilled”)
- these records can be fetched from the serial log (called “thawed”)
- scavenger garbage collects unloved records periodically
- when things get really bad, entire record chains are flushed to backlog
Falcon is definitely oriented towards OLTP, large volume, fairly small statements.
Falcon Weaknesses
- Transactions are ACID but not serializable
- Latency advantage disappears at saturation
- Very large transactions degrade performace
- optimized for Web, not batch operations
Falcon Strengths
- Runs like a memory database when data fits in cache
- scales like a disk-based database when data doesn’t fit in cache
- lowest possible latency for Web applications
- absorbs huge spiky loads
When should you use what?
- if you don’t need ACID, MyISAM is probably fastest
- for Uniprocessors and small memory systems, Innodb is a good choice
- For large transaction batch, InnoDB maybe be best match
- for multicores and large number of threads, Falcon is probably best
- For the web Falcon is hard to beat
Sounds to me like Falcon is really coming along. My question would be if the single-threaded nature of the MySQL server itself will hold Falcon back down the road.
No comments yet. Be the first.
Leave a reply