TITLE: Towards Deeper Understandings of Deep Learning
Recent breakthroughs in machine learning often involve learning highly non-convex models, especially deep neural networks. Though many empirical works have demonstrated the success of these methods, the formal study of the principles behind them is less established.
This talk will show a few of the recent results towards developing such principles. In particular, we focus on the over-parameterized neural networks for multi-class classifications. We will show that stochastic gradient descent (SGD) on over-parameterized deep neural networks provably finds the global minimum for the training objective. Moreover, we also prove that such perfect fitting can also be extended to test data set when the labels are generated by certain teaching networks.
This talk will also cover how to use the above results as a step to establish the theory behind the “magic’’ of learning rate decay in training neural networks, as well as how the identity mapping in ResNet helps in the learning process.
Yuanzhi Li is a postdoctoral researcher at the computer science department of Stanford University. Previously, he obtained his Ph.D. at Princeton under the advice of Sanjeev Arora. His research interests include topics in deep learning, non-convex optimization, and online learning.