Course Preview

Algorithms for Big Data
Instructor: Jelani Nelson
Department: Computer Science
Institution: Harvard University
Platform: Independent
Price: Free
Big data is data so large that it does not fit in the main memory of a single machine, and the need to process big data by efficient algorithms arises in Internet search, network traffic monitoring, machine learning, scientific computing, signal processing, and several other areas. This course will cover mathematically rigorous models for developing such algorithms, as well as some provable limitations of algorithms operating in those models. Some topics we will cover include: 1. Sketching and Streaming. Extremely small-space data structures that can be updated on the fly in a fast-moving stream of input. 2. Dimensionality reduction. General techniques and impossibility results for reducing data dimension while still preserving geometric structure. 3. Numerical linear algebra. Algorithms for big matrices (e.g. a user/product rating matrix for Netflix or Amazon). Regression, low rank approximation, matrix completion, ... 4. Compressed sensing. Recovery of (approximately) sparse signals based on few linear measurements. 5. External memory and cache-obliviousness. Algorithms and data structures minimizing I/Os for data not fitting on memory but fitting on disk. B-trees, buffer trees, multiway mergesort, ... This course is intended for both graduate students and advanced undergraduate students satisfying the below prerequisites.