This course will explore the mathematical foundations of a rapidly evolving new field: large-scale optimization and machine learning. We will focus on recent texts in machine learning, optimization, and randomized algorithms, with the goal to understand the tradeoffs that are driving algorithmic design in this new discipline. These tradeoffs will revolve around statistical accuracy, scalability, algorithmic complexity, and implementation.
Sample topics include:
- Optimization and Learning
- Stochastic Methods for Convex and Nonconvex Settings
- Overfitting, Generalization, and Algorithmic Stability
- Expressive Power of Neural Nets, Hardness, and Recent Results
- Large Scale Learning and Systems
- System Tradeoffs, Platforms, and Bottlenecks
- Synchronous and Asynchronous Distributed Optimization
- Stragglers and Adversarial Attacks during Distributed Learning
- Stochastic Methods for Convex and Nonconvex Settings
- Overfitting, Generalization, and Algorithmic Stability
- Expressive Power of Neural Nets, Hardness, and Recent Results
- System Tradeoffs, Platforms, and Bottlenecks
- Synchronous and Asynchronous Distributed Optimization
- Stragglers and Adversarial Attacks during Distributed Learning
[syllabus]
Lectures
Week 1
-
Introduction and Course OverviewSlides: [lecture 1]
-
Concentration of the Empirical Risk
Week 2
-
Computational Aspects of the ERM and Families of Loss Functions
-
Convexity in Learning and Intro to Gradient Descent
Week 3
-
Convergence Rates and Complexity of Gradient Descent
-
The Stochastic Gradient Method
Week 4
-
Accelerating SGD with Variance Reduction
-
Random Coordinate Descent and Importance Sampling
Week 5
-
SGD Bounds for some Nonconvex Problems
-
Stepsize Rules (of Thumb), Subgradients, and Projections
-
Challenges in Distributed and Parallel Machine LearningLecture slides: [pptx]
-
Scaling-up SGD with Mini-Batches
-
Lifting the Synchronization Barriers and Freeing the Locks
-
Understanding Hogwild!: Convergence Rates and Challenges
-
No Class
-
Proposal Presentations
-
Serial Equivalence in Asynchronous Machine Learning
-
Communication Bottlenecks and Gradient Quantization
-
Mitigating Stragglers in Distributed Computation
-
Inference on Deep Networks, Model Compression and Quantization
Week 6
Week 7
Week 8
Week 9
Week 11
Week 12
Week 13