Weight Initialization Techniques | What not to do? | Deep Learning

  Рет қаралды 34,172

CampusX

CampusX

Күн бұрын

Пікірлер: 74
@yashodhanmandke3843
@yashodhanmandke3843 2 жыл бұрын
I think this is the best way to teach.which discuses where things. Fail.kudos to you
@saketpathak5361
@saketpathak5361 2 жыл бұрын
Morning sir.....I have recently started watching your videos and the concepts are getting more than clear to me after watching your videos.....Although I have one request that can you please complete the clustering techniques from ML playlist....like DBSCAN and recommendation systems types of problems are not yet covered.....It will be very very helpful sir if you make those videos
@apudas7
@apudas7 Ай бұрын
Great sir. the way you teach is really helpful sir.
@ShubhamSharma-gs9pt
@ShubhamSharma-gs9pt 2 жыл бұрын
thanks sir!! waiting for more and more uploads in this playlist!😊😊😊😊😊
@DipansuJoshi
@DipansuJoshi 3 ай бұрын
Thanks, sir! 💌💌Loves to spend time on your KZbin Channel.
@apudas7
@apudas7 18 күн бұрын
Thank you sir. you're the reason why I can continue learning DL.
@elonmusk4267
@elonmusk4267 Ай бұрын
the legendary playlist
@pranavreddy9218
@pranavreddy9218 2 ай бұрын
b11 (bias), a11 and z11 make it clear which one is which.. that is the only confusion, remaining all awesome.. great series
@shantanuyadav732
@shantanuyadav732 9 күн бұрын
Great video sir, amazing explaination.
@ShubhamSingh-iq5kj
@ShubhamSingh-iq5kj 2 жыл бұрын
Thank you for amazing ml playlist just about to complete playlist .😇😇😇
@pavantripathi1890
@pavantripathi1890 4 ай бұрын
Thanks for the wonderful explanation!
@motivation_with_harsh
@motivation_with_harsh 9 ай бұрын
Best teacher you are sir ❤❤❤❤❤❤❤❤❤❤❤
@thethreemusketeers4500
@thethreemusketeers4500 2 жыл бұрын
Thanks sir Deep learning playlist complete kr do its request sir
@researcher7410
@researcher7410 8 ай бұрын
Sir please make video on deep learning by using Pytorch...
@namanmodi7536
@namanmodi7536 2 жыл бұрын
sir aap ne bola tha every 3 days me ak video upload hoga par aap to aab every 3 week me ak video upload kar rahe ho
@ankanmazumdar5000
@ankanmazumdar5000 2 жыл бұрын
Sir , just one suggestion for these kind of long videos, kindly add chapters to the video by splitting video's timestamps, would be really helpful to understand , where are we?
@debjitsarkar2651
@debjitsarkar2651 2 жыл бұрын
SIR HOW TO JOIN YOUR ONLINE 6 MONTH AI CLASS?? PLEASE REPLY
@narendraparmar1631
@narendraparmar1631 8 ай бұрын
Really helpful Thanks
@popularclips6681
@popularclips6681 27 күн бұрын
Amazing
@rafibasha4145
@rafibasha4145 Жыл бұрын
Nitish bro,upload 2 videos per week if possible
@krishcp7718
@krishcp7718 Жыл бұрын
Hi Nitish, Nicely presented video. At timestamp 16:42 in the derivative of loss function with chain rule, I think the middle term should be the partial derivative w.r.t a21 and then a21 should be w.r.t w2 11 and w2 21 and not as given. Because yhat does not change directly because of a11 , rather it changes because of a21. Actually there needs to be a value a21 along with bias b21 likewise to value of neurons a11 and a12 in the precious layer. This is because in backprop the value of any neuron or weights of the connects is based on the directly connected neuron's weight and bias. Thanks a lot. Krish
@bibekrauth3408
@bibekrauth3408 Жыл бұрын
Bro Y_hat = a21, so it doesn't matter. Equations are correct only
@sujithsaikalakonda4863
@sujithsaikalakonda4863 Жыл бұрын
@@bibekrauth3408 you are correct, but a_21 is changing with the change in weights (w2_11 and w2_21), and then this weighs dependent on a_11 and a_21 respectively. If anything wrong please correct me.
@ickywitchy4667
@ickywitchy4667 7 ай бұрын
@@sujithsaikalakonda4863 last layer has no activation function..you are totally wrong
@suryakantacharya7933
@suryakantacharya7933 2 ай бұрын
Sir jab ham large random number le rahe hai toh RELU mai agar gradients bade aa rahe toh ham learning rate ko kam kar ke range mai nahi la sakte?
@mr.deep.
@mr.deep. 2 жыл бұрын
Thanks
@avishinde2929
@avishinde2929 2 жыл бұрын
good morning sir ji thank you so much sir 😊😊
@teenagepanda8972
@teenagepanda8972 2 жыл бұрын
Thank you sir
@harsh.gupta2021
@harsh.gupta2021 6 ай бұрын
Sir can u please provide us with the soft copy of your notes used in video
@tehreemqasim2204
@tehreemqasim2204 5 ай бұрын
excellent video
@rakeshkumarrout2629
@rakeshkumarrout2629 2 жыл бұрын
Sir discord ka link kaha milega..
@shubharya3418
@shubharya3418 Жыл бұрын
Sir, in this video at 40:00, you said that NN with tanh is affected more for vanishing gradient descent problem compared to the NN with sigmoid for the inputs closer to zero (very small inputs). But, The sigmoid activation function maps its inputs to values between 0 and 1. When the inputs are close to zero, the output of the sigmoid function is close to 0.5, and its gradient is close to zero. On the other hand, the tanh activation function maps its inputs to values between -1 and 1. When the inputs are close to zero, the output of the tanh function is close to zero, and its gradient is close to 1. Therefore, a neural network with a tanh activation function is less likely to face the vanishing gradient problem compared to a neural network with a sigmoid activation function, especially for inputs close to zero. could you please look into this?? thanks.🙏
@thelife5628
@thelife5628 Жыл бұрын
I am also having same doubt bro.... did you get any solution for it??
@thelife5628
@thelife5628 Жыл бұрын
and for sigmoid one, at 0 sigmoid has max derivative value ~0.25
@thelife5628
@thelife5628 Жыл бұрын
Sir we request you to pls clarify this...
@xenon1787
@xenon1787 Ай бұрын
I think he is wrong is saying that tanh will give VDP early but ofc it'll give at deeper levels as for e.g (0.9)^n and n is large then gradient will vanish
@umangdobariya5680
@umangdobariya5680 2 жыл бұрын
very in-depth video
@siddharthvij9087
@siddharthvij9087 Жыл бұрын
Excellent video
@farhadkhan3893
@farhadkhan3893 Жыл бұрын
thank you for your struggle
@Naman_Bansal102
@Naman_Bansal102 4 ай бұрын
BEST VIDEO
@dragnar4743
@dragnar4743 Жыл бұрын
near the end of the video, when we took ReLU with large weight, that was exploding gradient right??
@AbdulRahman-zp5bp
@AbdulRahman-zp5bp 2 жыл бұрын
if we initialize the weights with big random values, will Exploding Gradient problem occur?
@sandipansarkar9211
@sandipansarkar9211 2 жыл бұрын
finished watching
@anjalimishra2884
@anjalimishra2884 Жыл бұрын
The given link in description for the code was not running...it shows errors
@bhushanbowlekar4539
@bhushanbowlekar4539 Жыл бұрын
at timestamp 38.05, it should be derivative of tanh(0) is close to 1 similarly for sigmoid ,derivative of sigmoid(0)= 0.25 approx but not 0 ,and hence at timestamp 40.05 sigmoid will reach to VGP faster than tanh becase this 0.25 is less as comapred to derivative(tanh)=1 and at timestamp 46.30 it should be EGP am I right?
@pavankumarbannuru6145
@pavankumarbannuru6145 2 жыл бұрын
sir your giving diff x values multiplied by weights, though weights are same but x is diff for each right, then how come it ill be same, pls reply
@rafibasha4145
@rafibasha4145 Жыл бұрын
@17:15,bro how y^hat associated directly with z11 there should be cumulative output z21 right
@apratimmehta1828
@apratimmehta1828 Жыл бұрын
Y hat is z21.because last node has linear activation. Hense a21 is same as z21 and hence y hat.
@sandipansarkar9211
@sandipansarkar9211 Жыл бұрын
where is the dataset link?
@alroygama6166
@alroygama6166 2 жыл бұрын
Derivative of tanh(0) is not 0 but 1 sir. Please check
@AhmedAli-uj2js
@AhmedAli-uj2js Жыл бұрын
yes, you are right
@shaz-z506
@shaz-z506 2 жыл бұрын
47.57 is it an exploding gradient problem or a vanishing gradient?
@tanmaychakraborty7818
@tanmaychakraborty7818 2 жыл бұрын
Exploding Gradient
@flakky626
@flakky626 Жыл бұрын
@@tanmaychakraborty7818 merko bhi same doubt aaya tha glad koi aur bhi notice kiya I was confused there
@tanmaychakraborty7818
@tanmaychakraborty7818 Жыл бұрын
@@flakky626 no worries hope it clarified
@tanmaychakraborty7818
@tanmaychakraborty7818 2 жыл бұрын
You're really legend
@thelife5628
@thelife5628 Жыл бұрын
14:24 Sir you have used linear activation function in your output layer but during practical at 23:55 you are using sigmoid in output layer. I have tried using linear in outer layer and I am getting final weights still 0, But when I used sigmoid in outer layer I got non-zero constant weights.
@apratimmehta1828
@apratimmehta1828 Жыл бұрын
Sir used linear activation in the last layer because he took a regression example in theory but he implemented a classification problem in practical hence he used sigmoid there.
@kindaeasy9797
@kindaeasy9797 3 ай бұрын
9:30 , partial derivative equal to 0 how?
@hammadkhalid7201
@hammadkhalid7201 Жыл бұрын
sir, I initialize small positive weights and biases or big negative weights and biases how will that lead to a vanishing gradient problem for Sigmoid and Tanh when their gradients behave in the best way if small inputs are provided. The smaller the inputs for calculating the gradients the better the weights updation process.
@MohadisTanoli
@MohadisTanoli 7 ай бұрын
Smaller gradient will lead to almost no updation of weights and its a vanishing gradient prob
@kindaeasy9797
@kindaeasy9797 3 ай бұрын
Wow
@ali75988
@ali75988 11 ай бұрын
17:03 Chain rule: y is an activation function, lets say y = a(u) at final node. chain rule, doesn't take into account u (the weighted sum variable) but takes a11 and z11 (weighted sum variable at node 11) both into account. if anyone can explain why we skipped weighted sum at output but took at hidden layer nodes, would be thankful. Regards,
@sofiashahin4603
@sofiashahin4603 9 ай бұрын
So for last node the activation function gives the output y_pred , nd we take that into account also as in the chain goes from L>y_pred>z_final>o11>z_previous>w¹11(complete chain) Coz if we won't then it wdnt be differentiable! U can differentiate a fxn wrt another only if it is a fxn of it. [Also In case of simple regression the activation terms r omitted coz O wd always be w1x1+w2x2+b ] I hope my point was clear
@CODEToGetHer-rq2nf
@CODEToGetHer-rq2nf 10 ай бұрын
God level teacher ❤️🤌🏻
@waheedweins
@waheedweins 2 ай бұрын
i am feeling trouble in finding data_set .. can anyone help?
@herambvaidya3709
@herambvaidya3709 Жыл бұрын
He is him 🫡
@yashjain6372
@yashjain6372 Жыл бұрын
best
@ajitkumarpatel2048
@ajitkumarpatel2048 2 жыл бұрын
🙏🙏
@lingasodanapalli615
@lingasodanapalli615 7 ай бұрын
Why does derivitive of tanh(X) becomes zero if Activation becomes zero? No it is not. Because tanh(X) derivitive is sech^2(X). So derivitive of tanh(X) at X=0 is 1 nmot zero. In ReLU case it is zero because for ReLU F(X)= (Max(0, X). Hence derivitive of )n is zero.
@braineaterzombie3981
@braineaterzombie3981 3 ай бұрын
I may be wrong.. But i think you completely misunderstood tanh and confused it with tanh used in trigonometry . This is NOT normal tan function. It just looks like tan(x) horizontally. So its derivative is not sec2x
@RamanSharma-wl4ld
@RamanSharma-wl4ld 25 күн бұрын
@@braineaterzombie3981 bro tanh(x) has derivative 1 at 0
@aniketsharma1943
@aniketsharma1943 2 жыл бұрын
A doubt when we are taking derivatives of sigmoid for zero (weights/biases). Earlier we considered one neuron as an output and took derivative wrt weights, in this video we are taking derivatives of activation function also wrt z , does this mean same ?
@tanmaychakraborty7818
@tanmaychakraborty7818 2 жыл бұрын
Yes it means the same you can try to derivative it manually for example Sigmoid we will have 1/(1+e^z) Now z=w11*X1 +... So on.. then y_hat=sigmoid (value)...now to derivative this you need the chain rule Conclusion: Your assumption is correct Hope it helps
Xavier/Glorat And He Weight Initialization in Deep Learning
21:07
This mother's baby is too unreliable.
00:13
FUNNY XIAOTING 666
Рет қаралды 38 МЛН
Ozoda - Lada ( Official Music Video 2024 )
06:07
Ozoda
Рет қаралды 31 МЛН
Every parent is like this ❤️💚💚💜💙
00:10
Like Asiya
Рет қаралды 26 МЛН
ДЕНЬ УЧИТЕЛЯ В ШКОЛЕ
01:00
SIDELNIKOVVV
Рет қаралды 4 МЛН
But what is a neural network? | Chapter 1, Deep learning
18:40
3Blue1Brown
Рет қаралды 17 МЛН
Deep Learning(CS7015): Lec 9.4 Better initialization strategies
26:31
NPTEL-NOC IITM
Рет қаралды 20 М.
Why RNNs are needed | RNNs Vs ANNs | RNN Part 1
30:19
CampusX
Рет қаралды 58 М.
Components of Neural Network|Neural network Weight, Bias, layers, activation
10:52
This mother's baby is too unreliable.
00:13
FUNNY XIAOTING 666
Рет қаралды 38 МЛН