This course will explore the mathematical foundations of a rapidly evolving new field: large-scale machine learning. We will focus on recent texts in machine learning, statistics, and optimization, with the goal to understand the tradeoffs that are driving algorithmic design in this new discipline. These tradeoffs will revolve around statistical accuracy, scalability, algorithmic complexity, and implementation.
Sample topics include:
- Optimization and Learning
- Memorization, Generalization, and Algorithmic Stability
- Stochastic Methods for Convex and Nonconvex Settings
- Expressive Power of Neural Nets, Hardness, and Recent Results
- Large Scale Learning and Systems
- System Tradeoffs, Platforms, and Modern Architectures
- Centralized and Decentralized Distributed Optimization
- Delays, Communication Bottlenecks, and Adversarial Attacks
- Memorization, Generalization, and Algorithmic Stability
- Stochastic Methods for Convex and Nonconvex Settings
- Expressive Power of Neural Nets, Hardness, and Recent Results
- System Tradeoffs, Platforms, and Modern Architectures
- Centralized and Decentralized Distributed Optimization
- Delays, Communication Bottlenecks, and Adversarial Attacks
[syllabus]
Lectures
Week 1
-
Introduction and Course Overviewslides: [lecture 1] other: [Quiz 0]
-
Concentration of the Empirical Risk via Parameter Count Boundsslides: [lecture 2] Reading material: Chapter 4 of [2]
Week 2
-
VC-dimension and Compression-based Generalization Boundsslides: [lecture 3] reading list
-
Limitations of Rademacher Complexity; moving forward with Stabilityslides: [lecture 4] reading list
Week 3
-
Generalization and Stability of Global Minimaslides: [lecture 5] reading list
-
Computational Aspects of the ERM and Gradient Descentslides: [lecture 6] reading list
Week 4
-
How fast is Gradient Descent?slides: [lecture 7] reading list
-
A Primer on SGD!slides: [lecture 8] reading list
Week 5
-
The PL Land of Non-Convexityslides: [lecture 9] reading list
-
When are Local Minima Good, and Neural Nets Easy to Fit?slides: [lecture 10] reading list
Week 6
-
Understanding Deep Learning: Is it all about SGD?slides: [lecture 11] reading list
-
Advances and Challenges of Large-scale Learningslides: [lecture 12]
Week 7
-
Scaling-up SGD with Mini-batchesslides: [lecture 13] reading list
-
Lifting Syncronization Barriers in Distributed Learningslides: [lecture 14] reading list
Week 8: Spring Break
Week 9
-
Letting the Hogs go Wild, while converging fastslides: [lecture 15] reading list
-
Taming the Wild Hogs: Reproducible Parallel Machine Learningslides: [lecture 16] reading list
Week 10
-
Project Presentations
-
Model Pruning and Deep Compressionslides: [lecture 17] reading list
Week 11
-
From Model Pruning to Sparse Updates and the Lottery Ticket Hypothesisslides: [lecture 18] reading list
Week 12
-
The Advantages and Challenges Binary Neural Networksslides: [lecture 19] reading list
-
Communication Primitives for Distributed Trainingslides: [lecture 20] reading list
Week 13
-
Overcoming Communication Bottlenecks with Gradient Compression
-
Decentralized and Federated Learning
Week 14
-
Worst-case Model Robustness
-
Robustness and Fault Tolerance during Training
Week 15
-
Open Problems in Model Large-scale Learning
-
Project Poster Presentations