ECE 901: Large-scale Machine Learning and Optimization

Electrical and Computer Engineering
Fall 2016


This course will explore the mathematical foundations of a rapidly evolving new field: large­-scale optimization and machine learning. We will focus on recent texts in machine learning, optimization, and randomized algorithms, with the goal to understand the tradeoffs that are driving algorithmic design in this new discipline. These trade­offs will revolve around statistical accuracy, scalability, algorithmic complexity, and implementation.

Sample topics include:
  • Optimization and Learning
    • Stochastic Methods for Convex and Nonconvex Settings
    • Constrained and Projection-free Optimization
    • Overfitting, Generalization, and Algorithmic Stability
  • Large Scale Learning and Systems
    • System Tradeoffs and Platforms
    • Asynchronous Stochastic Optimization
    • Distributed Learning
  • Matrix Methods for ML and Semidefinite Optimization
    • Matrix Tools for Machine Learning
    • Dimensionality Reduction
    • Semidefinite Optimization and Randomized Rounding
[syllabus]

Lectures

Week 1

  • Introduction and Course Overview
    Slides: [lecture 1]
  • Concentration of the Empirical Risk
    Reading material: Chapters 4 and 6 of [2], Sections 2.2 and 2.3 of [3].
    Lecture scribe: [pdf]

Week 2

  • Computational Aspects of the ERM and Families of Loss Functions
    Reading material: Chapters 6 and 12 of [2].
    Lecture scribe: [pdf]
  • Convexity in Learning and Gradient Descent
    Reading material: Chapter 12 of [2], Chapter 3 of [1].
    Lecture scribe: [pdf]

Week 3

  • Convergence and Complexity of Gradient Descent
    Reading material: Chapter 3 of [1], Chapter 14 of [2].
    Lecture scribe: [pdf]
  • The Stochastic Gradient Method
    Reading material: Chapter 6 of [1], Chapter 14 of [2], Section 3 of [3], Chapter 2 of [4].
    Lecture scribe: [pdf]

Week 4

  • Accelerating SGD with Variance Reduction
    Reading material: [5], Section 6.3 of [1], Section 5.3 of [3].
    Lecture scribe: [pdf]
  • Random Coordinate Descent and Importance Sampling
    Reading material: Section 6.4 of [1], Section 7.3 of [3], [6], [7].
    Lecture scribe: [pdf]

Week 5

  • Stepsize Rules (of Thumb), Subgradients, and Projections
    Reading material: [8] , [4], Sections 1.2, 3.1 of [1].
    Lecture scribe: [pdf]
  • Avoiding Projections with Frank-Wolfe
    Reading material: Section 3.3 of [1], [9], [10].
    Lecture scribe: [pdf]

Week 6

  • The Expressive Power of Neural Nets and their Intractability
    Reading material: Chapter 20 of [2].
    Lecture scribe: [pdf]
  • SGD for Nonconvex Problems and Overparameterized NNs
    Reading material: [11], [12].
    Lecture scribe: [pdf]

Week 7

  • Algorithmic Stability, Generalization, and Regularizers
    Reading material: Chapter 13 of [2], [13].
    Lecture scribe: [pdf]
  • Stability of Stochastic Gradient Descent
    Reading material: [14].
    Lecture scribe: [pdf]

Week 8


Week 10

  • Serial Equivalence in Parallel Machine Learning
    Reading material: [17]
    Lecture scribe: [pdf]

Week 11


Final Week

Paper Presentations

Day 1: Algorithms and Convergence (Oct. 24)


Day 2: Neural Nets (Nov. 11)


Day 3: Systems and Practical Considerations (Nov. 15)


Day 4: Learning Theory (Nov. 22)


Day 5: Statistical Validity and Bias (Nov. 29)

References

  1. Convex Optimization: Algorithms and Complexity, Sebastien Bubeck.
  2. Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David.
  3. Optimization Methods for Large-Scale Machine Learning, by Léon Bottou, Frank E. Curtis, and Jorge Nocedal.
  4. Introductory Lectures on Stochastic Optimization, by John Duchi.
  5. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, by Rie Johnson and Tong Zhang.
  6. Efficiency of coordinate descent methods on huge-scale optimization problems, by Yurii Nesterov.
  7. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition, by Hamed Karimi, Julie Nutini, and Mark Schmidt.
  8. Subgradients (course notes) , by Stephen Boyd, John Duchi, and Lieven Vandenberghe.
  9. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization, by Martin Jaggi.
  10. Projection-free Online Learning, by Elad Hazan and Satyen Kale.
  11. Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming, by by Saeed Ghadimi and Guanghui Lan.
  12. No bad local minima: Data independent training error guarantees for multilayer neural networks, by Daniel Soudry and Yair Carmon.
  13. Stability and Generalization, by Olivier Bousquet and André Elisseef.
  14. Train faster, generalize better: Stability of stochastic gradient descent, by Moritz Hardt, Benjamin Recht, and Yoram Singer.
  15. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, by Feng Niu, Benjamin Recht, Christopher Ré and Stephen J. Wright.
  16. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization, by Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan.
  17. CYCLADES: Conflict-free Asynchronous Machine Learning, by Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris Papailiopoulos, Ce Zhang, Michael I. Jordan, Kannan Ramchandran, Christopher Ré, Benjamin Recht.
  18. The phase transition in site percolation on pseudo-random graphs, by Michael Krivelevich.

Downloads and useful links