That e^a trick shows that, even though algebra is such a pain, it comes in handy so often to make things move smooth. Reminds me of that trick to avoid overflow in binary search: mid = low + ((high - low) / 2). Favorite thing about these lectures are the small hints for math and Python along the way. Thanks for being so detail oriented!
@michaelmuller1365 ай бұрын
Great, very enlightening, liked the small details also, thank you!
@make_education9 күн бұрын
Thanks a lot!
@MichaelChenAdventures7 ай бұрын
Thank you Jeremy!
@markozege Жыл бұрын
When we compare result of the softmax with the one-hot vector (at 1:21:00), we take only the value of the softmax where one-hot vector is one. Isn't this a missed opportunity to incorporate other "wrong" predictions into the loss function? E.g. if the model is highly confident in making the prediction for some other wrong class (eg. numbers that look similar) then getting more penalised for this could further speed up the training?
@SKULDROPR Жыл бұрын
I think I understand what you are getting at. Focal loss lets you control the amount of penalty you are talking about for the other wrong predictions.
@amitaswal7359 Жыл бұрын
if our prediction is wrong then the log value of our wrong prediction will be a large -ve number, so it won't matter
@SKULDROPR Жыл бұрын
@@amitaswal7359 Now I think about it, you are correct, it is no big deal either way
@maxim_ml Жыл бұрын
Softmax makes it so that the larger the probability for a wrong class is, the smaller the probability for the right class is. So there already is a penalty for having a high probability for the wrong class. Maybe having a loss that penalizes an uneven distribution of probabilities among the wrong classes would be useful. I guess Soft Labels already end up doing that.
@myfolder45614 ай бұрын
I've found the walk thru on backward propagation in this lesson a bit lacking and jumpy. Highly recommend Andrej Karpathy's zero to hero series for those who're interested to dig a bit deeper into the math and step thru of application of the chain rule in deriving gradients
@bomb3r4223 ай бұрын
i second that, it was a bit rushed and unclear. Andrej does a fantastic job with explaining backprop.