A great overview of the topic. Thank you for sharing it.
@bobitsmagic49613 ай бұрын
On the slide of 33:00 we are using the Jacobian instead of the hessian. When the network only has a single output and we use the least squares loss function would the newton step collapse to gradient descent with the gradient divided by its length? It feels like we are just throwing away all curvature information at this point