Back to Home page

Project: Saddles in Deep Learning

Convergence to local minima is well understood in the non-convex literature. Deep neural networks are also assumed to converge to local minima. The proliferation of Saddle points increase as the dimension increases. In this work we hypothesise that

  1. All deep networks converge to degenerate saddles
  2. Good saddles are good enough
We coin a new definition good saddle and empirically verify that deep networks converge to them where the hessian at convergence have significant number of zero eigenvalues showing flatness in many directions making gradient descent difficult to escape. The above figure is a toy example of an error surface showing the flatness and different gradient descent algorithms work in different way to escape the flat region.

The source code for this project could be found here

Hit Web Stats
Fast Counters