Working with Big Data

Intro

This site goes with the Addison Wesley LiveLessons: Working with Big Data: Infrastructure, Algorithms, and Visualizations available on Safari Online

The goal of these live lessons is to touch on the various aspects of big data at a high level. For instance, instead of going into detail on the many nuances of Hadoop, we'll just get it set up and use it in conjunction with other tools like Cassandra. Further, we'll go into how to work with and integrate algorithms and close out with tools for visualization. We'll be going through the full-stack to see how the different pieces of a big data system fit together.

Lesson 1: Unstructured storage and Hadoop
1. Set up a basic Hadoop installation
2. Write data into the Hadoop File System
3. Write a Hadoop streaming job to process text files
Lesson 2: Structured storage and Cassandra
1. Set up a basic Cassandra installation
2. Create a Cassandra schema for storing data
3. Store and retrieve data from Cassandra using Ruby
4. Write data into Cassandra from a Hadoop streaming job
5. Use Hadoop to parallelize writes
Lesson 3: Real-time Processing and Messaging
1. Set up the Kafka messaging system
2. Publish and consume data from Kafka using Ruby
3. Aggregate log files into Hadoop with Kafka and a Ruby Consumer
4. Create horizontally scalable message consumers
5. Sample messages using Kafka's partitioning
6. Create redundant message consumers for high availability
Lesson 4: Running machine earning algorithms in a big data architecture
Lesson 5: Experimentation and running algorithms in production
Lesson 6: Basic Visualizations

Authors and Contributors

Paul Dix (@pauldix)

Discussion

Ask questions about the lectures, or about anything related to big data in the Working with Big Data Group.

Intro

Table of Contents

Authors and Contributors

Discussion