Project: Saddles in Deep Learning

Convergence to local minima is well understood in the non-convex literature. Deep neural networks are also assumed to converge to local minima. The proliferation of Saddle points increase as the dimension increases. In this work we hypothesise that

All deep networks converge to degenerate saddles
Good saddles are good enough

We coin a new definition good saddle and empirically verify that deep networks converge to them where the hessian at convergence have significant number of zero eigenvalues showing flatness in many directions making gradient descent difficult to escape. The above figure is a toy example of an error surface showing the flatness and different gradient descent algorithms work in different way to escape the flat region.

The source code for this project could be found here