The Problem with Gradient Descent

Рет қаралды 2,314

Күн бұрын

Пікірлер: 15

@42svb58 10 ай бұрын

Incredible video. I watched many videos on getting into data science and machine learning. This is the best one with graphics, content, pace, and complexity.

@korigamik 9 ай бұрын

Man I really like this video! Can you share the code for the animations that you used in the video with us?

@0mean1sigma 9 ай бұрын

Glad you liked the video. Unfortunately the complete code for the animations got deleted (accidentally) as I didn't have any kind of workflow back then (but the animation of the network learning is available on the GitHub with link in the video description). However I'm improving on that and have uploaded the code for the animation in my latest video. I'm really sorry about this again.

@mister-8658 Жыл бұрын

you have my vote for SoME

@0mean1sigma Жыл бұрын

I appreciate it. Thanks a lot. I tried to keep it simple and I’m a little worried that maybe I made it too simple (in turn less interesting).

@mister-8658 Жыл бұрын

@@0mean1sigma I think you found a decent balance here's hoping you place

@sakchais Жыл бұрын

These videos are excellent!

@0mean1sigma Жыл бұрын

Thanks!!!

@swfsql Жыл бұрын

Really good video and explanation! I'm not sure if this is what you meant, but I'd say that gradient descent is pretty good, but that's if we have infinite steps to work with. If we do have, we can make the update changes approach zero and avoid any "jumps". But since we don't have infinite steps to work with, then yeah, some techniques help with the updates!

@0mean1sigma Жыл бұрын

Not entirely true. I’ve tried training a basic neural net on MNIST dataset using vanilla GD and it fails to converge. For visualisation I’ve kept the dimensions to 2 but in higher dimensions this becomes a serious problem. That’s why you see optimisers like Adam used almost all the time in practice.

@swfsql Жыл бұрын

@@0mean1sigma I understand, but you didn't train with infinite time in your hands (pushing the alpha towards ~0, like 1e-36). I agree that in practice, with limited training time, then those problems with GD become practical.