Lesson 4: Working with Algorithms

K-Nearest Neighbors

Algorithms in Big Data

  1. Machine Learning

  2. Efficiency, e.g., Bloom Filters

  3. Distributed Computing, e.g, Paxos

Machine Learning

  1. Classification or Supervised Learning

  2. Clustering or Unsupervised Learning

  3. Semi-supervised Learning

  4. Dimensionality Reduction

  5. Collaborative Filtering

Machine Learning is Good For

  1. Spam Detection

  2. Topic Assignment

  3. Sentiment Detection

  4. Product, Music, or Content Recommendations

  5. Ranking Search Results

  6. Ad Targeting

Classification Workflow

  1. Given labeled training data, train a model

  2. Test the model on hold out data

  3. Update the model based on testing

  4. Given a new instance, make a prediction

Clustering Workflow

  1. Given unlabeled training data, find clusters

  2. Examine the clusters for "approximate correctness"

  3. Update the representation or the algorithm and repeat

On to implementing KNN...

/

#