HI Krish.. dL/dW'11= should be [dL/dO21. dO21/dO11. dO11/dW'11] + [dL/dO21. dO21/dO12. dO12/dW'11] as per the last chain rule illustration. Please confirm
@rahuldey63694 жыл бұрын
...but O12 is independent of W11,in that case won't the 2nd term be zero?
@RETHICKPAVANSE4 жыл бұрын
wrong bruh
@ayushprakash38904 жыл бұрын
we don't have the second term
@Ajamitjain3 жыл бұрын
Can anyone clarify this? I too have this question.
@grahamfernando87753 жыл бұрын
@@Ajamitjain dL/dW'11= should be [dL/dO21. dO21/dO11. dO11/dW'11]
@mahabir054 жыл бұрын
I like how you explain and end your class "never give up " It very encouraging
@manishsharma22114 жыл бұрын
Yes
@Xnaarkhoo4 жыл бұрын
many years ago in the college I was enjoy watching videos from IIT - before the mooc area, India had and still have many good teachers ! It brings me joy to see that again. Seems Indians have a gene of pedagogy
@Vinay12722 жыл бұрын
I have been taking a well-known world-class course on AI and ML since the past 2 years and none of the lecturers have made me so interested in any topic as much as you have in this video. This is probably the first time I have sat through a 15-minute lecture without distracting myself. What I realise now is that I didn't lack motivation or interest, nor that I was lazy - I just did not have lecturers whose teaching inspired me enough to take interest in the topics, yours did. You have explained the vanishing gradient problem so very well and clear. It shows how strong your concepts are and how knowledgeable you are. Thank you for putting out your content here and sharing your knowledge with us. I am so glad I found your channel. Subscribed forever.
@tosint4 жыл бұрын
I hardly comment on videos, but this is a gem. One of the best videos explaining vanishing gradients problems.
@lekjov61704 жыл бұрын
I just want to add this mathematically, the derivative of the sigmoid function can be defined as: *derSigmoid = x * (1-x)* As Krish Naik well said, we have our maximum when *x=0.5*, giving us back: *derSigmoid = 0.5 * (1-0.5) --------> derSigmoid = 0.25* That's the reason the derivative of the sigmoid function can't be higher than 0.25
@ektamarwaha59414 жыл бұрын
COOL
@thepsych34 жыл бұрын
cool
@tvfamily62104 жыл бұрын
should be: derSigmoid(x) = Sigmoid(x)[1-Sigmoid(x)], and we know it reaches maximum at x=0. Plugging in: Sigmoid(0)=1/(1+e^(-0))=1/2=0.5, thus derSigmoid(0)=0.5*[1-0.5]=0.25
@benvelloor4 жыл бұрын
@@tvfamily6210 Thank you!
@est99494 жыл бұрын
I'm still confused. The weight w should be in here somewhere. This seems to be missing w.
@ltoco44154 жыл бұрын
Thank you sir for making this misleading concept crystal clear. Your knowledge is GOD level 🙌
@PeyiOyelo5 жыл бұрын
Sir or As my Indian Friends say, "Sar", you are a very good teacher and thank you for explaining this topic. It makes a lot of sense. I can also see that you're very passionate however, the passion kind of makes you speed up the explanation a bit making it a bit hard to understand sometimes. I am also very guilty of this when I try to explain things that I love. Regardless, thank you very much for this and the playlist. I'm subscribed ✅
@amc84373 жыл бұрын
Consider reducing playback speed.
@gultengorhan23062 жыл бұрын
You are teaching better than many other people in this field.
@bhavikdudhrejiya44785 жыл бұрын
Very nice way to explain. Learned from this video- 1. Getting the error (Actual Output - Model Output)^2 2. Now We have to reduce an error i.e Backpropagation, We have to find a new weight or a new variable 3. Finding New Weight = Old weight x Changes in the weight 4. Change in the Weight = Learning rate x d(error / old weight) 5. After getting a new weight is as equals to old weight due to derivate of Sigmoid ranging between 0 to 0.25 so there is no update in a new weight 6. This is a vanishing gradient
@ToqaGhozlan2 ай бұрын
Many thanks for you , 9:25 the output is 0.004 even your explanation is the best THX
@sapnilpatel1645 Жыл бұрын
so far best explanation about vanishing gradient.
@rushikeshmore88904 жыл бұрын
Kudos sir ,am working as data analyst read lots of blogs , watched videos but today i cleared the concept . Thanks for The all stuff
@satyadeepbehera28415 жыл бұрын
Appreciate your way of teaching which answers fundamental questions.. This "derivative of sigmoid ranging from 0 to 0.25" concept was nowhere mentioned.. thanks for clearing the basics...
@mittalparikh62524 жыл бұрын
Look for Mathematics for Deep Learning. It will help
@classictremonti79973 жыл бұрын
So happy I found this channel! I would have cried if I found it and it was given in Hindi (or any other language than English)!!!!!
@deepthic63364 жыл бұрын
I must say this, normally I am kinda person who prefers to study on own and crack it. Never used to listen to any of the lectures till date because I just don't understand and I dislike the way they explain without passion(not all though). But, you are a gem and I can see the passion in your lectures. You are the best Krish Naik. I appreciate it and thank you.
@piyalikarmakar59793 жыл бұрын
One of the best vedio on clarifying Vanishing Gradient problem..Thank you sir..
@marijatosic2174 жыл бұрын
I am amazed by the level of energy you have! Thank you :)
@vikrantchouhan99082 жыл бұрын
Kudos to your genuine efforts. One needs sincere efforts to ensure that the viewers are able to understand things clearly and those efforts are visible in your videos. Kudos!!! :)
@al3bda4 жыл бұрын
oh my god you are a good teacher i really fall in love how you explain and simplify things
@koraymelihyatagan81112 жыл бұрын
Thank you very much, I was wandering around the internet to find such an explanatory video.
@himanshubhusanrath24923 жыл бұрын
One of the best explanations of vanishing gradient problem. Thank you so much @KrishNaik
@MrSmarthunky4 жыл бұрын
Krish.. You are earning a lot of Good Karmas by posting such excellent videos. Good work!
@skiran51293 жыл бұрын
I'm lucky to see this wonderful class.. Tq..
@yousufborno38754 жыл бұрын
You should get Oscar for your teaching skills.
@sumeetseth224 жыл бұрын
Love your videos, I have watched and taken many courses but no one is as good as you
@mittalparikh62524 жыл бұрын
Overall got the idea, that you are trying to convey. Great work
@aidenaslam56395 жыл бұрын
Great stuff! Finally understand this. Also loved it when you dropped the board eraser
@manujakothiyal37454 жыл бұрын
Thank you so much. The amount of effort you put is commendable.
@venkatshan40503 жыл бұрын
Marana mass explanation🔥🔥. Simple and very clearly said.
@YashSharma-es3lr3 жыл бұрын
very simple and nice explanation . I understand it in first time only
@benvelloor4 жыл бұрын
Very well explained. I can't thank you enough for clearing all my doubts!
@MauiRivera3 жыл бұрын
I like the way you explain things, making them easy to understand.
@meanuj15 жыл бұрын
Nice presentation..so much helpful...
@elielberra28672 жыл бұрын
Thank you for all the effort you put into your explanations, they are very clear!
@classictremonti79973 жыл бұрын
Krish...you rock brother!! Keep up the amazing work!
@MsRAJDIP5 жыл бұрын
Tommorow I have interview, clearing all my doubts from all your videos 😊
@adityashewale7983 Жыл бұрын
hats off to you sir,Your explanation is top level, THnak you so much for guiding us...
@DEVRAJ-np2og6 ай бұрын
do u completed his full playlist?
@maheshsonawane8737 Жыл бұрын
Very nice now i understand why weights doesn't update in RNN. The main point is derivative of sigmoid is between 0 and 0.25. Vanishing gradient is associated with only sigmoid function. 👋👋👋👋👋👋👋👋👋👋👋👋
@swapwill4 жыл бұрын
The way you explain is just awesome
@vishaljhaveri61763 жыл бұрын
Thank you, Krish SIr. Nice explanation.
@prerakchoksi23794 жыл бұрын
I am doing deep learning specialization, feeling that this is much better than that
@nabeelhasan65933 жыл бұрын
Very nice video sir , you explained very well the inner intricacies of this problem
@b0nnibell_4 жыл бұрын
you sir made neural network so much fun!
@sekharpink5 жыл бұрын
Derivative of loss with respect to w11 dash you specified incorrectly, u missed derivative of loss with respect to o21 in the equation. Please correct me if iam wrong.
@sekharpink5 жыл бұрын
Please reply
@ramleo14615 жыл бұрын
Evn I hv this doubt
@krishnaik065 жыл бұрын
Apologies for the delay...I just checked the video and yes I have missed that part.
@ramleo14615 жыл бұрын
@@krishnaik06Hey!, U dnt hv to apologise, on the contrary u r dng us a favour by uploading these useful videos, I was a bit confused and wanted to clear my doubt that all, thank you for the videos... Keep up the good work!!
@rajatchakraborty20584 жыл бұрын
@@krishnaik06 I think you have also missed the w12 part in the derivative. Please correct me if I am wrong
@yoyomemory68254 жыл бұрын
Very clear explanation, thanks for the upload.. :)
@aaryankangte67342 жыл бұрын
Sir thank u for teaching us all the concepts from basics but just one request is that if there is a mistake in ur videos then pls rectify it as it confuses a lot of people who watch these videos as not everyone sees the comment section and they just blindly belive what u say. Therefore pls look into this.
@benoitmialet98423 жыл бұрын
Thank you so much, great quality content.
@it029-shreyagandhi54 ай бұрын
Great teaching skills !!!
@nola80282 жыл бұрын
You just earned a +1 subscriber ^_^ Thank you very much for the clear and educative video
@hiteshyerekar22045 жыл бұрын
Nice video Krish.Please make practicle based video on gradient decent,CNN,RNN.
@skviknesh4 жыл бұрын
I understood it. Thanks for the great tutorial! My query is: weight vanishes when respect to more layers. When new weight ~= old weight result becomes useless. what would the O/P of that model look like (or) will we even achieve global minima??
@అరుణాచలశివ300311 ай бұрын
you are legend nayak sir
@faribataghinezhad2 жыл бұрын
Thank you sir for your amazing video. that was great for me.
@sunnysavita90715 жыл бұрын
your videos are very helpful ,good job and good work keep it up...
@shmoqe2 жыл бұрын
Great explanation, Thank you!
@RAZONEbe_sep_aiii_08194 жыл бұрын
There is a very big mistake at 4:14 sir, you didn't applied the chain rule correctly, check the equation.
@jagritiprakash43364 жыл бұрын
I have the same doubt
@naresh8198 Жыл бұрын
crystal clear explanation !
@susmitvengurlekar4 жыл бұрын
Understood completely! If weights hardly change, no point in training and training. But I have got a question, where can I use this knowledge and understanding I just acquired ?
@tonnysaha76763 жыл бұрын
Thank you thank you thank you sir infinite times🙏.
@manikosuru57125 жыл бұрын
As usual extremely good outstanding... And a small request can expect this DP in coding(python) in future??
@krishnaik065 жыл бұрын
Yes definitely
@hokapokas5 жыл бұрын
Good job bro as usual... Keep up the good work.. I had a request of making a video on implementing back propagation. Please make a video for it.
@krishnaik065 жыл бұрын
Already the video has been made.please have a look on my deep learning playlist
@hokapokas5 жыл бұрын
@@krishnaik06 I have seen that video but it's not implemented in python.. If you have a notebook you can refer me to pls
@krishnaik065 жыл бұрын
With respect to implementation with python please wait till I upload some more videos
@muhammadarslankahloon75194 жыл бұрын
Hello sir, why the chain rule explained in this video is different from the very last chain rule video. kindly clearly me and thanks for such an amazing series on deep learning.
@daniele55404 жыл бұрын
Great tutorial man! Thank you!
@arunmeghani16673 жыл бұрын
great video and great explanation
@krishj80113 жыл бұрын
Very nice series... 👍
@spicytuna083 жыл бұрын
you teach better than ivy league professors. what a waste of money spending $$$ on college.
@BalaguruGupta3 жыл бұрын
Thanks a lot sir for the wonderful explanation :)
@neelanshuchoudhary5365 жыл бұрын
very nice explanation,,great :)
@salimtheone2 жыл бұрын
very well explained 100/100
@Kabir_Narayan_Jha5 жыл бұрын
This video is amazing and you are amazing teacher thanks for sharing such amazing information Btw where are you from banglore?
@magicalflute4 жыл бұрын
Very well explained. Vanishing gradient problem as per my understanding is that, it is not able to perform the optimizer job (to reduce the loss) as old weight and new weights will be almost equal. Please correct me, if i am wrong. Thanks!!
@nirmalroy17385 жыл бұрын
super video...extremely well explained.
@abdulqadar95802 жыл бұрын
Great efforts Sir
@GunjanGrunge3 жыл бұрын
that was very well explained
@nikunjlahoti97042 жыл бұрын
Great Lecture
@winviki1235 жыл бұрын
Could you please explain why bias is needed in neural networks along with weights?
@Rising._.Thunder5 жыл бұрын
it is because when you want to control or fix the output of a given neuron within a certain range, for example, if the neuron is always giving inputs between 9 and 10, you can put a bias =-9 so as to make the neuron output between 0 and 1
@nazgulzholmagambetova11982 жыл бұрын
great video! thank you so much!
@melikad27684 жыл бұрын
Thank youuuu, its really great:)
@dhananjayrawat3174 жыл бұрын
best explanation. Thanks man
@gaurawbhalekar20064 жыл бұрын
excellent explanation sir
@khiderbillal99613 жыл бұрын
thanks sir you really hepled me
@gouthamkarakavalasa4267 Жыл бұрын
Gradient Descent will be applied on Cost function right ?-1/m Σ (Y*log(y_pred) + (1-y)* log(1-y_pred))... in this case if they had applied on the activation function, how the algo will come to global minima.
@aishwaryaharidas21004 жыл бұрын
Should we again add bias to the product of the output from the hidden layer O11, O12 and weights W4, W5?
@naughtyrana45914 жыл бұрын
Guruvar ko pranam🙏
@narsingh28014 жыл бұрын
You are just amazing. Thnx
@gowthamprabhu1224 жыл бұрын
Can someone please explain why the derivative of each parent layer reduces ? i.e why does layer two have lower derivative of O/P with respect to its I/P?
@karth123994 жыл бұрын
Sir you are saying derivative of sigmoid is 0 to 0.25. I understand it. But how that will imply derivative of O21 /derivative of 011 should be less than 0.25. Could you please help me understand that assumption
@rish_hyun4 жыл бұрын
He agreed that he did it wrong subconsciously I found his comment somewhere in this chat
@jsridhar724 жыл бұрын
The output of every neuron in a layer is the Sigmoid of weighted sum of input. Since sigmoid is applied as the activation function in every neuron(here O21 is output after applying sigmoid function), the derivative should be between 0 and 0.25.
@sunnysavita90715 жыл бұрын
very good explanation.
@ambreenfatimah1943 жыл бұрын
Helped a lot....thanks
@sowmyakavali26703 жыл бұрын
Hi krish everyone says that Wnew = Wold - n * dL/dWold theoritically we know that dL/dWold means slope where as in practical scenario L is a single scalar value Wold is also a single scalar value Then how dL/dWold is calculating ??? And also coming to the activation function , you are explaining it theoritically , can you explain it by taking practical values ? , and don't tell it by taking predefined function or module , bcz we know how to find a module and import it and how to use it , but we don't know practical
@LazingOnSunday2 жыл бұрын
This video is really goooooddd! Can anyone help me understand why the derivate value decreases as we go backward @9:03? I am new to DL..!!
@vaseekaranchittibabu25714 ай бұрын
yes i have the same doubt. Can someone explain this?
@shahidabbas94485 жыл бұрын
Sir i'm really confusing about the actual y value please can you tell about that. i thought it would be our input value but here input value is so many with one predicted output
@gautam19405 жыл бұрын
This is an interesting fact to know. Makes me curious to see how ReLU overcame this problem
@sandipansarkar92114 жыл бұрын
Thanks krish .Video was superb but I am having apprehension I might get lost somewhere .Please provide some reading reference regrading this topic considering as a beginner.Cheers
@ayushprakash38904 жыл бұрын
is this equation correct ?? (this equation is used in the starting of the video) dL / dw11 = dO21 / dO11 * dO11 / dw11 should it be : dL / dw11 = dL / dO21 * dO21 / dO11 * dO11 / dw11
@amitdebnath22078 ай бұрын
Hats Off Brother
@Joe-tk8cx Жыл бұрын
Great video, one question, when you calculate the new weights using the old weight - learning rate x derivative of loss with respect to weight, the derivative of loss wrt weight is that the sigmoid function ?
@ashwinshetgaonkar63292 жыл бұрын
output O21 also depends upon O12,so its derivative should also be considered
@FlyingTurtleLP2 жыл бұрын
What I didnt get: What can the values of the derivative of the sigmoid function be? I don't think you mentioned it in the video.