I think this is the best way to teach.which discuses where things. Fail.kudos to you
@saketpathak53612 жыл бұрын
Morning sir.....I have recently started watching your videos and the concepts are getting more than clear to me after watching your videos.....Although I have one request that can you please complete the clustering techniques from ML playlist....like DBSCAN and recommendation systems types of problems are not yet covered.....It will be very very helpful sir if you make those videos
@ShubhamSharma-gs9pt2 жыл бұрын
thanks sir!! waiting for more and more uploads in this playlist!😊😊😊😊😊
@apudas73 ай бұрын
Great sir. the way you teach is really helpful sir.
@elonmusk42674 ай бұрын
the legendary playlist
@pranavreddy92185 ай бұрын
b11 (bias), a11 and z11 make it clear which one is which.. that is the only confusion, remaining all awesome.. great series
@apudas73 ай бұрын
Thank you sir. you're the reason why I can continue learning DL.
@SARATCHANDRASRINADHU7 күн бұрын
superb explaination sir
@DipansuJoshi6 ай бұрын
Thanks, sir! 💌💌Loves to spend time on your KZbin Channel.
@ali75988 Жыл бұрын
17:03 Chain rule: y is an activation function, lets say y = a(u) at final node. chain rule, doesn't take into account u (the weighted sum variable) but takes a11 and z11 (weighted sum variable at node 11) both into account. if anyone can explain why we skipped weighted sum at output but took at hidden layer nodes, would be thankful. Regards,
@sofiashahin4603 Жыл бұрын
So for last node the activation function gives the output y_pred , nd we take that into account also as in the chain goes from L>y_pred>z_final>o11>z_previous>w¹11(complete chain) Coz if we won't then it wdnt be differentiable! U can differentiate a fxn wrt another only if it is a fxn of it. [Also In case of simple regression the activation terms r omitted coz O wd always be w1x1+w2x2+b ] I hope my point was clear
@NiranjanV8232 ай бұрын
47:40 How will gradient for ReLU be large if x is large, gradient for ReLU is constant = 1 for x>0
@shubharya3418 Жыл бұрын
Sir, in this video at 40:00, you said that NN with tanh is affected more for vanishing gradient descent problem compared to the NN with sigmoid for the inputs closer to zero (very small inputs). But, The sigmoid activation function maps its inputs to values between 0 and 1. When the inputs are close to zero, the output of the sigmoid function is close to 0.5, and its gradient is close to zero. On the other hand, the tanh activation function maps its inputs to values between -1 and 1. When the inputs are close to zero, the output of the tanh function is close to zero, and its gradient is close to 1. Therefore, a neural network with a tanh activation function is less likely to face the vanishing gradient problem compared to a neural network with a sigmoid activation function, especially for inputs close to zero. could you please look into this?? thanks.🙏
@thelife5628 Жыл бұрын
I am also having same doubt bro.... did you get any solution for it??
@thelife5628 Жыл бұрын
and for sigmoid one, at 0 sigmoid has max derivative value ~0.25
@thelife5628 Жыл бұрын
Sir we request you to pls clarify this...
@xenon17874 ай бұрын
I think he is wrong is saying that tanh will give VDP early but ofc it'll give at deeper levels as for e.g (0.9)^n and n is large then gradient will vanish
@rakshitbazaz696015 күн бұрын
Look at the curve of both, sigmoid will have larger values near zero than tanh, so it will have less effect.
@shantanuyadav7323 ай бұрын
Great video sir, amazing explaination.
@pavantripathi18907 ай бұрын
Thanks for the wonderful explanation!
@motivation_with_harsh Жыл бұрын
Best teacher you are sir ❤❤❤❤❤❤❤❤❤❤❤
@rafibasha41452 жыл бұрын
@17:15,bro how y^hat associated directly with z11 there should be cumulative output z21 right
@apratimmehta1828 Жыл бұрын
Y hat is z21.because last node has linear activation. Hense a21 is same as z21 and hence y hat.
@thethreemusketeers45002 жыл бұрын
Thanks sir Deep learning playlist complete kr do its request sir
@rockykumarverma98027 күн бұрын
Thank you so much sir 🙏🙏🙏
@krishcp7718 Жыл бұрын
Hi Nitish, Nicely presented video. At timestamp 16:42 in the derivative of loss function with chain rule, I think the middle term should be the partial derivative w.r.t a21 and then a21 should be w.r.t w2 11 and w2 21 and not as given. Because yhat does not change directly because of a11 , rather it changes because of a21. Actually there needs to be a value a21 along with bias b21 likewise to value of neurons a11 and a12 in the precious layer. This is because in backprop the value of any neuron or weights of the connects is based on the directly connected neuron's weight and bias. Thanks a lot. Krish
@bibekrauth3408 Жыл бұрын
Bro Y_hat = a21, so it doesn't matter. Equations are correct only
@sujithsaikalakonda4863 Жыл бұрын
@@bibekrauth3408 you are correct, but a_21 is changing with the change in weights (w2_11 and w2_21), and then this weighs dependent on a_11 and a_21 respectively. If anything wrong please correct me.
@ickywitchy466710 ай бұрын
@@sujithsaikalakonda4863 last layer has no activation function..you are totally wrong
@ShubhamSingh-iq5kj2 жыл бұрын
Thank you for amazing ml playlist just about to complete playlist .😇😇😇
@kindaeasy97976 ай бұрын
9:30 , partial derivative equal to 0 how?
@thelife5628 Жыл бұрын
14:24 Sir you have used linear activation function in your output layer but during practical at 23:55 you are using sigmoid in output layer. I have tried using linear in outer layer and I am getting final weights still 0, But when I used sigmoid in outer layer I got non-zero constant weights.
@apratimmehta1828 Жыл бұрын
Sir used linear activation in the last layer because he took a regression example in theory but he implemented a classification problem in practical hence he used sigmoid there.
@researcher741011 ай бұрын
Sir please make video on deep learning by using Pytorch...
@narendraparmar163110 ай бұрын
Really helpful Thanks
@BhaskarMishra-e6qАй бұрын
Sir I think you forgot 3rd attachment Please share case 3 Code notebook in Description.
@ankanmazumdar50002 жыл бұрын
Sir , just one suggestion for these kind of long videos, kindly add chapters to the video by splitting video's timestamps, would be really helpful to understand , where are we?
@tehreemqasim22048 ай бұрын
excellent video
@suryakantacharya79335 ай бұрын
Sir jab ham large random number le rahe hai toh RELU mai agar gradients bade aa rahe toh ham learning rate ko kam kar ke range mai nahi la sakte?
@rakeshkumarrout26292 жыл бұрын
Sir discord ka link kaha milega..
@namanmodi75362 жыл бұрын
sir aap ne bola tha every 3 days me ak video upload hoga par aap to aab every 3 week me ak video upload kar rahe ho
@debjitsarkar26512 жыл бұрын
SIR HOW TO JOIN YOUR ONLINE 6 MONTH AI CLASS?? PLEASE REPLY
@avishinde29292 жыл бұрын
good morning sir ji thank you so much sir 😊😊
@popularclips66813 ай бұрын
Amazing
@sakshitattri48822 ай бұрын
Derivative of tanh will be 1 not 0…. The formula of derivative is 1-tanh^2(x) Right??
@dragnar4743 Жыл бұрын
near the end of the video, when we took ReLU with large weight, that was exploding gradient right??
@bhushanbowlekar4539 Жыл бұрын
at timestamp 38.05, it should be derivative of tanh(0) is close to 1 similarly for sigmoid ,derivative of sigmoid(0)= 0.25 approx but not 0 ,and hence at timestamp 40.05 sigmoid will reach to VGP faster than tanh becase this 0.25 is less as comapred to derivative(tanh)=1 and at timestamp 46.30 it should be EGP am I right?
@anjalimishra2884 Жыл бұрын
The given link in description for the code was not running...it shows errors
@harsh.gupta20219 ай бұрын
Sir can u please provide us with the soft copy of your notes used in video
@rafibasha41452 жыл бұрын
Nitish bro,upload 2 videos per week if possible
@alroygama61662 жыл бұрын
Derivative of tanh(0) is not 0 but 1 sir. Please check
@AhmedAli-uj2js Жыл бұрын
yes, you are right
@sandipansarkar92112 жыл бұрын
where is the dataset link?
@pavankumarbannuru61452 жыл бұрын
sir your giving diff x values multiplied by weights, though weights are same but x is diff for each right, then how come it ill be same, pls reply
@Naman_Bansal1027 ай бұрын
BEST VIDEO
@shaz-z5062 жыл бұрын
47.57 is it an exploding gradient problem or a vanishing gradient?
@tanmaychakraborty78182 жыл бұрын
Exploding Gradient
@flakky626 Жыл бұрын
@@tanmaychakraborty7818 merko bhi same doubt aaya tha glad koi aur bhi notice kiya I was confused there
@tanmaychakraborty7818 Жыл бұрын
@@flakky626 no worries hope it clarified
@waheedweins5 ай бұрын
i am feeling trouble in finding data_set .. can anyone help?
@AbdulRahman-zp5bp2 жыл бұрын
if we initialize the weights with big random values, will Exploding Gradient problem occur?
@hammadkhalid7201 Жыл бұрын
sir, I initialize small positive weights and biases or big negative weights and biases how will that lead to a vanishing gradient problem for Sigmoid and Tanh when their gradients behave in the best way if small inputs are provided. The smaller the inputs for calculating the gradients the better the weights updation process.
@MohadisTanoli9 ай бұрын
Smaller gradient will lead to almost no updation of weights and its a vanishing gradient prob
@teenagepanda89722 жыл бұрын
Thank you sir
@siddharthvij9087 Жыл бұрын
Excellent video
@umangdobariya56802 жыл бұрын
very in-depth video
@farhadkhan38932 жыл бұрын
thank you for your struggle
@lingasodanapalli61510 ай бұрын
Why does derivitive of tanh(X) becomes zero if Activation becomes zero? No it is not. Because tanh(X) derivitive is sech^2(X). So derivitive of tanh(X) at X=0 is 1 nmot zero. In ReLU case it is zero because for ReLU F(X)= (Max(0, X). Hence derivitive of )n is zero.
@braineaterzombie39816 ай бұрын
I may be wrong.. But i think you completely misunderstood tanh and confused it with tanh used in trigonometry . This is NOT normal tan function. It just looks like tan(x) horizontally. So its derivative is not sec2x
@RamanSharma-wl4ld3 ай бұрын
@@braineaterzombie3981 bro tanh(x) has derivative 1 at 0
@mr.deep.2 жыл бұрын
Thanks
@tanmaychakraborty78182 жыл бұрын
You're really legend
@sandipansarkar92112 жыл бұрын
finished watching
@aniketsharma19432 жыл бұрын
A doubt when we are taking derivatives of sigmoid for zero (weights/biases). Earlier we considered one neuron as an output and took derivative wrt weights, in this video we are taking derivatives of activation function also wrt z , does this mean same ?
@tanmaychakraborty78182 жыл бұрын
Yes it means the same you can try to derivative it manually for example Sigmoid we will have 1/(1+e^z) Now z=w11*X1 +... So on.. then y_hat=sigmoid (value)...now to derivative this you need the chain rule Conclusion: Your assumption is correct Hope it helps