Dear Krish, The formula at 6:00 made my day. My month actually. I build my CNN with the standard library only, cuda toolkit, with a pinch of openCV. Implementing this in my CNN's fully-connected section brought that alive, boosting it's performance. Tonight we will clink glasses to your health! Thanks man!
@tharukabalasooriya32694 жыл бұрын
Gosh!!, India in another level of education man!! from higher Dimension
@jaskarankaur49714 жыл бұрын
I cant express how thankful I am to stumble upon your channel
@AdityaRajPVSS3 жыл бұрын
मेरी इतने दिनों की खोज यहाँ आ कर खत्म हुई है। बहुत बहुत धन्यवाद आपका। आप अभी तक के सबसे बेहतर ML DL गुरु हैं । भगवान आपको स्वस्थ रखें।
@hanman51955 жыл бұрын
Your amazing at explaining ,excellent Guruji i found finally in Data science field
@ayushmansharma4362 Жыл бұрын
This video is very good, I'm also watching the MIT deep learning videos in there they are just briefing the topics and not explaining the actual working in details, this video is very easy to understand.
@AdmMusicc4 жыл бұрын
The video is great. However, could you explain why the respective techniques are good with sigmoid/relu?
@dcrespin2 жыл бұрын
It may be worth to note that instead of partial derivatives one can work with derivatives as the linear transformations they really are. Also, looking at the networks in a more structured manner makes clear that the basic ideas of BPP apply to very general types of neural networks. Several steps are involved. 1.- More general processing units. Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong, beyond Euclidean spaces, to any Hilbert space. Derivatives are linear transformations and the derivative of a neural processing unit is the direct sum of its partial derivatives with respect to the inputs and with respect to the weights. This is a linear transformation expressed as the sum of its restrictions to a pair of complementary linear subspaces. 2.- More general layers (any number of units). Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a unique layer is equivalent to taking their product (as functions, in the sense of set theory). The layers are functions of the of inputs and of the weights of the totality of the units. The derivative of a layer is then the product of the derivatives of the units; this is a product of linear transformations. 3.- Networks with any number of layers. A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers; this is a composition of linear transformations. 4.- Quadratic error of a function. ... --- With the additional text down below this is going to be excessively long. Hence I will stop the itemized previous comments. The point is that a sufficiently general, precise and manageable foundation for NNs clarifies many aspects of BPP. If you are interested in the full story and have some familiarity with Hilbert spaces please google for our paper dealing with Backpropagation in Hilbert spaces. A related article with matrix formulas for backpropagation on semilinear networks is also available. We have developed a completely new deep learning algorithm called Neural Network Builder (NNB) which is orders of magnitude more efficient, controllable, precise and faster than BPP. The NNB algorithm assumes the following guiding principle: The neural networks that recognize given data, that is, the “solution networks”, should depend only on the training data vectors. Optionally the solution network may also depend on parameters that specify the distances of the training vectors to the decision boundaries, as chosen by the user and up to the theoretically possible maximum. The parameters specify the width of chosen strips that enclose decision boundaries, from which strips the data vectors must stay away. When using the traditional BPP the solution network depends, besides the training vectors, in guessing a more or less arbitrary initial network architecture and initial weights. Such is not the case with the NNB algorithm. With the NNB algorithm the network architecture and the initial (same as the final) weights of the solution network depend only on the data vectors and on the decision parameters. No modification of weights, whether incremental or otherwise, need to be done. For a glimpse into the NNB algorithm, search in this platform our video about : NNB Deep Learning Without Backpropagation. In the description of the video links to a free demo software will be found. The new algorithm is based on the following very general and powerful result (google it): Polyhedrons and Perceptrons Are Functionally Equivalent. For the conceptual basis of general NNs in see our article Neural Network Formalism. Regards, Daniel Crespin
@saurabhtripathi624 жыл бұрын
When to use which initialization technique
@minderiskrir9895 жыл бұрын
At 7:30 the Initialization Method is called Glorot, not Gorat.
@krishnaik065 жыл бұрын
Hey thanks I think I misspelt. Apologies
@hanman51955 жыл бұрын
@@krishnaik06 This exposes your open Heart to accept faults or mistakes. Your Awesome Sir :)
@muppurigopi95762 жыл бұрын
You are Amazing Sir.......................
@cristianchavez56744 жыл бұрын
Great Job Krish !
@aish_waryaaa3 жыл бұрын
Best video and explanation.Thank You Sir...
@monikathakur-zu1eb4 жыл бұрын
hi krishna , please take some inputs and then calculate the intial weights with xavier technique.
@sandipansarkar92114 жыл бұрын
Thanks. I understood the video but what was written below the video.I couldn't relate to what was shown in the video. Need to hold myself till I am able to solve problems on deep learning.
@anjalisharma65433 жыл бұрын
Hi Sir, I have one doubt if we can use the sigmoid activation function on top of the output layer for binary classification and sigmoid can be used in hidden layer but why we can not use the Relu activation on the top of the output layer even though we can use in hidden layer on netron?
@ethyleneglycol4324 Жыл бұрын
Because sigmoid's output range is between 0 and 1. So you could take a threshold like 0.5 0r 0.8 and say if sigmoid's output is less than the threshold then we put that data in class A. Otherwise we put it in class B. But relu's output range is between 0 and infinity, so you can't do the same thing. Relu's better suited for regression, i.e. when you want to predict a value for your data, like a country's population in 2030 or the price of product X after the rise in the price of product Y.
@ArthurCor-ts2bg4 жыл бұрын
Very lucid and insightful
@yousefsrour3316 Жыл бұрын
Thank you so much man, amazing.
@jagdishjazzy4 жыл бұрын
Isnt the vanishing gradient problem dependent on the type of activation function used rather than the type of weight initialized?
@vatsal_gamit4 жыл бұрын
It depends upon the weights!! You are updating the weights with back propagation.
@pranjalsingh12873 жыл бұрын
actually he wanted to say exploding gradient problem
@praveensingh-lx4dk4 жыл бұрын
When you will start deep learning classes? I am waiting for that.
@Skandawin785 жыл бұрын
Pls mention abt bias as well.. not sure how it is changed in backpropagation
@good1142 жыл бұрын
Thank you Sir 🙏🙏🙏🙏♥️😊♥️
@nitayg13265 жыл бұрын
Wow! Didnt know about maths behind weight initialization!
@marijatosic2174 жыл бұрын
Very helpful! Thank you!
@blackberrybbb5 жыл бұрын
Learned a lot! Thanks!
@soumyajitmukherjee23963 жыл бұрын
Can you share the research paper links for reference purpose
@srinivasanpandian58744 жыл бұрын
Hello Krish, Thanks for good videos.. how to know that whether need to use Uniform or normal ?
@Skandawin785 жыл бұрын
Where do we specify the weights initialization method in keras?
@mittalparikh62524 жыл бұрын
you can improve much better by making ur content ready during the video session to avoid lot mistakes in your explanations. At this point lot of people are following you for understanding stuff, a wrong explanation can affect their understanding
@paragjp4 жыл бұрын
Hi, Excellent lecture. Can you suggest a book to learn this weight initialization
@fthialbkosh16324 жыл бұрын
Thanks a lot, an excellent explanation.
@yepnah35144 жыл бұрын
hi!! good videos. how would i go about entering the weight values? do i need to use something like : set_weights()??
@harishbabuk.s43444 жыл бұрын
Hi Krishna, What is the importance of weight initialization in MLP and with out weight initialization cant we build a model just using the value of each dimension.
@RAZZKIRAN4 жыл бұрын
weight should be small means number of input features ? numeric value of the weight?
@sumanthkaranam994 жыл бұрын
sir ,pls make a video on batch normalization
@rohitrn45684 жыл бұрын
Good stuff
@rahuldebdas23742 жыл бұрын
Its not Gorat initialization, it is GLOROT initialization
@poojav20124 жыл бұрын
Hi Krish, you explain very well and make learning very interesting. May i know which book are you referring to teach so that i can buy that book?
@help2office4273 жыл бұрын
Krish use research papers not use any book
@prasantas21954 жыл бұрын
Hi Krish, in some of your videos subtitles are not enabled. With that it would be really MORE helpful.
@ahmedelsabagh69904 жыл бұрын
Simple and useful
@khushpreetsandhu98745 жыл бұрын
What if we have all the features as important, for eg- I have 6 features and I am performing multi class classification, so in this each feature is important for me, how would we intialize class weights in this scenario?
@riffraff73585 жыл бұрын
Bro you are best....
@vishaljhaveri75653 жыл бұрын
Thank you sir.
@pratyushraj26402 жыл бұрын
even if weights same activation wiill not be same...please help anyone
@himanshukushwaha5064 Жыл бұрын
we have remeber it for exam
@ajayrana42963 жыл бұрын
how it impact on algorithm...
@imranriaz97522 жыл бұрын
Thanks
@RnFChannelJr4 жыл бұрын
great explanation sir, may i get the research paper ? thankss
@fajarazka93934 жыл бұрын
sure sure
@KenchoDorji-pu6wm3 ай бұрын
Anyone watching in 2024. Must watch video😊.
@louerleseigneur45323 жыл бұрын
Thanks buddy
@alluprasad59765 жыл бұрын
thank you ! Will you make videos on cnn please
@krishnaik065 жыл бұрын
Yes I will
@ruchit96974 жыл бұрын
Here you gave example of the weights being same taking all weights as 0 but if we take the same weights having value more than 0 then I dont think the problem would come
@NOCMG-ht9bd4 жыл бұрын
nO DEAR AGAIN PROBELM WILL OCURS as it generate large new weights with negtive values after updation . This called exploding gradient problem as teached by Krish in previous lecture.
@ruchit96974 жыл бұрын
@@NOCMG-ht9bd exploding gradient problem will occur when the weights initialized will have a very large value....here I am talking about his example taking same value of weights as zero.....what if we take same normal value of weights....then I don't think same value of weights will have a problem
@NOCMG-ht9bd4 жыл бұрын
but by experience we should not take same values of weights again it will create problem later on.
@NOCMG-ht9bd4 жыл бұрын
also weights should be according to useful ness of features means if feature is higly important then weight must be high in comparison of other feature weights and vice versa.
@NOCMG-ht9bd4 жыл бұрын
also it depends on the problem domain and expert choice too.
@RAZZKIRAN4 жыл бұрын
fan-In means number of inputs ? why squreroot of (3)
@pranavb97684 жыл бұрын
In the second layer it would be square root of 2
@yashchauhan57105 жыл бұрын
Amazing
@atchutram98942 жыл бұрын
What is very small or small or large? It makes no sense
3 жыл бұрын
even all weights are same but still we have bias activation function never be zero
@Programming91312 жыл бұрын
Sir Hindi me explain ker deta to Jada acche rhe the agr English me he smj na hota to Google se nhi smj leta
@krishnakanthbandaru93085 жыл бұрын
Research paper link please
@zaidalattar24834 жыл бұрын
Great!...
@abdultaufiq22374 жыл бұрын
It's ahhhhhhmaaaazing........
@haneulkim49022 жыл бұрын
While training deep neural network with 2 units in the final layer with sigmoid activation function for binary classification( highly imbalanced) 2 weights of final layer becomes both 0 leading to same score for all inputs since it only uses bias in sigmoid, what are some reasons for this?