This course will explore the mathematical foundations of a rapidly evolving new field: largescale optimization and machine learning. We will focus on recent texts in machine learning, optimization, and randomized algorithms, with the goal to understand the tradeoffs that are driving algorithmic design in this new discipline. These tradeoffs will revolve around statistical accuracy, scalability, algorithmic complexity, and implementation.
Sample topics include:
 Optimization and Learning
 Stochastic Methods for Convex and Nonconvex Settings
 Overfitting, Generalization, and Algorithmic Stability
 Expressive Power of Neural Nets, Hardness, and Recent Results
 Large Scale Learning and Systems
 System Tradeoffs, Platforms, and Bottlenecks
 Synchronous and Asynchronous Distributed Optimization
 Stragglers and Adversarial Attacks during Distributed Learning
 Stochastic Methods for Convex and Nonconvex Settings
 Overfitting, Generalization, and Algorithmic Stability
 Expressive Power of Neural Nets, Hardness, and Recent Results
 System Tradeoffs, Platforms, and Bottlenecks
 Synchronous and Asynchronous Distributed Optimization
 Stragglers and Adversarial Attacks during Distributed Learning
[syllabus]
Lectures
Week 1

Introduction and Course OverviewSlides: [lecture 1]

Concentration of the Empirical Risk
Week 2

Computational Aspects of the ERM and Families of Loss Functions

Convexity in Learning and Intro to Gradient Descent
Week 3

Convergence Rates and Complexity of Gradient Descent

The Stochastic Gradient Method
Week 4

Accelerating SGD with Variance Reduction

Random Coordinate Descent and Importance Sampling
Week 5

SGD Bounds for some Nonconvex Problems

Stepsize Rules (of Thumb), Subgradients, and Projections

Challenges in Distributed and Parallel Machine LearningLecture slides: [pptx]

Scalingup SGD with MiniBatches

Lifting the Synchronization Barriers and Freeing the Locks

Understanding Hogwild!: Convergence Rates and Challenges

No Class

Proposal Presentations

Serial Equivalence in Asynchronous Machine Learning

Communication Bottlenecks and Gradient Quantization

Mitigating Stragglers in Distributed Computation

Inference on Deep Networks, Model Compression and Quantization

Robustness of Predictions and Adversarial ExamplesLecture slides: [pptx]
Week 6
Week 7
Week 8
Week 9
Week 11
Week 12
Week 13