Sunday, October 14, 2012

Cassandra

http://en.wikipedia.org/wiki/Apache_Cassandra
http://perspectives.mvdirona.com/2008/07/12/FacebookReleasesCassandraAsOpenSource.aspx


  • Designed to handle very large amounts of data
  • Spread out across many commodity servers 
  • Provides highly available service with no single point of failure
  • Initially developed by Facebook and powered Inbox Search until late 2010
  • Jeff Hammerbacher: BigTable data model running on an Amazon Dynamo-like infrastructure (distributed storage system)
  • write ahead logging and indexing
  • Structured key-value store 
  • Tunable consistency
  • Keys map to multiple values, which are grouped into column families
  • Column families are fixed when database created
  • Columns can be added to a family at any time. 
  • Hybrid between column-oriented DBMS and row-oriented store.[
  • BigTable modeling
  • eventual consistency
  • Gossip protocol for cluster membership maintenance
  • master-master serving read/write requests inspired by Dynamo
  • Writes:
    • Write to arbitrary node in Cassandra cluster
    • Request sent to node owning the data
    • Node writes to log first and then applied to in-memory copy
    • No locks in critical path
    • Sequential disk accesses
    • Behaves like a write through cache
    • Atomicity guarantee for a key
    • Always writable