Lesson 4: Working with Algorithms
Algorithms in Big Data
Machine Learning
Efficiency, e.g., Bloom Filters
Distributed Computing, e.g, Paxos
Machine Learning
Classification or Supervised Learning
Clustering or Unsupervised Learning
Semi-supervised Learning
Dimensionality Reduction
Collaborative Filtering
Machine Learning is Good For
Spam Detection
Topic Assignment
Sentiment Detection
Product, Music, or Content Recommendations
Ranking Search Results
Ad Targeting
Classification Workflow
Given labeled training data, train a model
Test the model on hold out data
Update the model based on testing
Given a new instance, make a prediction
Clustering Workflow
Given unlabeled training data, find clusters
Examine the clusters for "approximate correctness"
Update the representation or the algorithm and repeat
On to implementing KNN...
←
→
/
#