October 4, 2019
2:00 PM - 3:00 PM
Title: On the Dynamics of Neural Network Optimization
Abstract: Virtually all modern deep learning systems are trained with some form of local descent algorithm over a high-dimensional parameter space. Despite its apparent simplicity, the mathematical picture of the resulting setup contains several mysteries that combine statistics, approximation theory and optimization. In order to make progress, authors have recently focused in the so-called ‘overparametrised’ regime, which studies asymptotic properties of the algorithm as the number of neurons grows. In particular, neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for its favorable training properties. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appropriate assumptions.
In this talk, we will review recent progress on this problem, and will describe a non-local mass transport dynamics that leads to a modified PDE with the same minimizer. We implement this non-local dynamics as a stochastic neuronal birth-death process and we prove that it accelerates the rate of convergence in the mean-field limit. We will illustrate our algorithms with empirical examples to provide intuition for the mechanism through which convergence is accelerated, and discuss current open problems in this research direction.
Joint work with G. Rotskoff (NYU), S. Jelassi (Princeton) and E. Vanden-Eijnden (NYU).
Bio: Joan Bruna is an Assistant Professor at Courant Institute, New York University (NYU), in the Department of Computer Science, Department of Mathematics (affiliated) and the Center for Data Science, since Fall 2016. He belongs to the CILVR group and to the Math and Data groups. From 2015 to 2016, he was Assistant Professor of Statistics at UC Berkeley and part of BAIR (Berkeley AI Research). Before that, he worked at FAIR (Facebook AI Research) in New York. Prior to that, he was a postdoctoral researcher at Courant Institute, NYU. He completed his PhD in 2013 at Ecole Polytechnique, France. Before his PhD he was a Research Engineer at a semi-conductor company, developing real-time video processing algorithms. Even before that, he did a MsC at Ecole Normale Superieure de Cachan in Applied Mathematics (MVA) and a BA and MS at UPC (Universitat Politecnica de Catalunya, Barcelona) in both Mathematics and Telecommunication Engineering. For his research contributions, he has been awarded a Sloan Research Fellowship (2018), a NSF CAREER Award (2019) and a best paper award at ICMLA (2018).