Lesson 2: Building a Big Data Infrastructure Part 2

Structured Storage & Cassandra

Structured Data

  1. More like tables

  2. Fast write and query times

Cassandra

  1. Modeled after Google's BigTable

  2. Distributed

  3. Column Oriented Database

  4. Open source originally from Facebook

  5. Used in Twitter, LinkedIn, Netflix, etc.

Column Oriented Properties

  1. Column names not set

  2. Wide rows

  3. Rows occupy non-contiguous disk space

Other Column Oriented Data Stores

  1. BigTable

  2. HBase

  3. DynamoDB

CAP Theorem

  1. Consistency

  2. Availability

  3. Partition Tolerance

Cassandra relaxes consistency

Cassandra is Good For

  1. Time Series Data

  2. Event Data

  3. Timelines

  4. High Volume

On to the install...

/

#