Layer Normalization in Transformers | Layer Norm Vs Batch Norm

Рет қаралды 24,166

CampusX

Күн бұрын

Пікірлер: 135

@sachink9102 7 ай бұрын

Thank you, NitishJi, Eeagerly waiting to attend your Transformers sessions. Please complete this series.

@abhisheksaurav 7 ай бұрын

This playlist is like a time machine. I’ve watched you grow your hair from black to white, and I’ve seen the content quality continuously improve video by video. Great work!

@animatrix1631 7 ай бұрын

I feel the same as well but I guess he's not that old

@zerotohero1002 6 ай бұрын

Courage comes at a price ❤

@RamandeepSingh_04 7 ай бұрын

Another student added in the waiting list demanding for next video. Thank you sir.

@ayushrathore2570 7 ай бұрын

This whole playlist is the best thing I discovered on KZbin! Thank you so much, sir

@muhammadsheraz177 7 ай бұрын

Please end this playlist as early as possible

@nvnurav1892 6 ай бұрын

Sir one small suggestion, aap apni videos pe speech to speech translation laga ke english mai convert kar lo and upload it on Udemy/youtube. it will help a lot of people jinko hindi nhi aati and will help your hard work get more and more attraction.🙂🙂. We are really very lucky that we are getting such rich content for free.. God bless you.

@SuperRia33 2 ай бұрын

Thanks

@yashshekhar538 7 ай бұрын

Respected Sir, your playlist is the best. Kindly increase the frequency of videos.

@neerajshikarkhane 29 күн бұрын

Nitish, you are an amazing teacher. You got an awesome art of communicating in a simplistic manner. Videos on the Transformer architecture are great and I am enjoying those!

@akeshagarwal794 7 ай бұрын

Congratulations for building a 200k Family you deserve even more reach🎉❤ We love you sir ❤

@neerajshikarkhane 29 күн бұрын

Thanks!

@PrathameshKhade-j2e 7 ай бұрын

Sir try to complete this playlist as early as possible , you are the best teacher and we want to learn the deep learning concept from you

@sahil5124 7 ай бұрын

this is really important topic. Thank you so much. Please cover everything about Transformer architecture

@SandhyaJha-i7w 27 күн бұрын

We totally dependent upon you.please end this playlist as earlier as possible. It's our humble request sir

@ai_pie1000 7 ай бұрын

Congratulations Brother for 200k users Family ... 👏👏👏

@AidenDsouza-ii8rb 6 ай бұрын

Your DL playlist is like a thrilling TV series - can't wait for the next episode! Any chance we could get a season finale soon? Keep up the awesome work!

@ShivamSuri-lz5it 5 ай бұрын

Excellent deep learning playlist , highly recommended !!

@abdurrahamanshohan7303 Ай бұрын

thanks a lot for your videos. you are putting so much efforts to make the topic understandable, highly appreciable.

@shreeyagupta5720 7 ай бұрын

Congratulations for 200k sir 👏 🎉🍺

@ryannflin1285 4 ай бұрын

bhai literally mujhe samajh nhi aa rha hai ki mujhe samajh kaise aa rha, koi itna accha kaise padha sakta hai yrr,love u sir ( from IITJ)

@rb4754 7 ай бұрын

Congratulations for 200k subscribers!!!!!!!!!!!!!!!!!!

@rajnishadhikari9280 7 ай бұрын

Thanks for this amazing series.

@GanitSikho-xo2yx 7 ай бұрын

Well, I am waiting for your next video. It's a gem of learning!

@just__arif 4 ай бұрын

Top-quality content!

@taseer12 7 ай бұрын

Sir I can't describe your efforts Love from Pakistan

@sharangkulkarni1759 5 ай бұрын

जबरदस्त,! मजा आगया, जिस तरह से padding के zeroes को लपेटे मे ले लिया, मजा आगया

@vinaykumar-xh5pi 7 ай бұрын

please release the next video very curious to complete ...... loved your content as always

@SBhupendraAdhikari 5 ай бұрын

Thanks a Lot Sir, Really enjoying the learning of Transformers

@znyd. 7 ай бұрын

Congrats on the 200k subs, love from Bangladesh ❤.

@shibrajdeb5177 7 ай бұрын

sir please upload regular video . This videos help me a lot. please sir upload regular videos

@Fazalenglish 5 ай бұрын

I really like the way you explain things ❤

@ConvxO2 7 ай бұрын

please complete this this playlist and add transformers tutorials as soon as possible

@udaysharma138 4 ай бұрын

Thanks a lot Nitish Sir , best Explanation

@WIN_1306 7 ай бұрын

at 46:10 ,why it is zero? as beta is added so it will prevent it from becoming zero?

@dilippokhrel4009 3 ай бұрын

Initially the gama value is kept 1 and beta is kept zero, hence initially the value will be zero. But during training process may be value will be other than zero

@Shisuiii69 5 ай бұрын

Question: Sir agr kia ho ky jo padding wala vector hai isme B¹ ki value 0 ky bjae khuch aur ajae ky wo update hoti rehti hai to is se padding vector 0 nhi rhe ga to kia ye model me affect nhi kre ga ?

@slaypop-b5n 3 ай бұрын

Bro Did u find the answer ? Had the same doubt

@Ishant875 3 ай бұрын

Same doubt

@slaypop-b5n 3 ай бұрын

@@Ishant875 any updates , bro ? Did u get the answer ?

@saurabhbadole777 7 ай бұрын

I am glad that I found this Channel! can't thank you enough, Nitish Sir! One more request: If you could create one-shot revision videos for machine learning, deep learning, and natural language processing (NLP).🤌

@AmitBiswas-hd3js 7 ай бұрын

Please cover this entire Transformer architecture as soon as possible

@mayyutyagi 7 ай бұрын

Amazing series full of knowledge...

@ghousepasha4172 7 ай бұрын

Please sir update videos regularly, we wait a lot for your videos

@praneethkrishna6782 4 ай бұрын

@campusx Hi Nitish, thanks a lot for the elaborated explanation. But I have a query, Is it really that the values '0' representing the padding tokens really the reason (or the only reason) which is stopping from using Batch Normalization. because it can be internally handled to not consider '0' which calulating the mean and stadard deviation while calulating z across features. on the other hand I think, this technique (Batch Norm) is clubbing the embeddings of different sentences while calulating Z which seems little odd to me. and that is the reason for not using this technique. please correct me if I am wrong here

@AKSHATSHAW-tf3ow 3 ай бұрын

Same doubt, I think there is both the reasons for this.

@WIN_1306 7 ай бұрын

i am the 300th person to like this video sir plzz upload next vidoes we are eagerly waiting

@gurvgupta5515 7 ай бұрын

Thanks for this video sir. Can you also make a video on Rotary Positional Embeddings (RoPE) that is used in Llama as well as other LLMs for enhanced attention.

@bmp-zz9pu 7 ай бұрын

SIr krdo pls iss playlist ko poora!!!!!!!!!

@krisharora2959 6 ай бұрын

Next video is awaited more than anything

@muhammadsheraz177 7 ай бұрын

Sir kindly can you tell that when this playlist will complete.

@Xanchs4593 6 ай бұрын

Can you pls explain what is the add in add and norm layer?

@anonymousman3014 7 ай бұрын

Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.

@physicskiduniya8054 7 ай бұрын

Bhaiya! Awaiting for your course upcoming videos please try to complete this playlist asap bhaiya

@ishika7585 7 ай бұрын

Kindly make video on Regex as well

@WIN_1306 7 ай бұрын

what is regex?

@arunkrishna1036 6 ай бұрын

Sir what if Beta value is updated during learning process? Then it will get added along with the padded zeros making it a non zero value in the further iterations

@Shisuiii69 5 ай бұрын

Same confusion, did you find the answer?

@1111Shahad 7 ай бұрын

Thank you Nitish, Waiting for your next upload.

@virajkaralay8844 7 ай бұрын

Absolute banger video again. Appreciate the efforts you're taking for transformers. Cannot wait for when you explain the entire transformer architecture.

@virajkaralay8844 7 ай бұрын

Also, congratulations for 200k subscribers. May you reach many more milestones

@arpitpathak7276 7 ай бұрын

Thank you sir I am waiting for this video ❤

@anonymousman3014 7 ай бұрын

@F16bitopt 7 ай бұрын

sir mera doubt that ki mai agar transformer architecture mai batchnorm use karoon kunki jo values matrix mai hai un sabka apna learning rate and bias factor hai to jo bias hai uskai karan to zero chala hi jayega fir layer norm kyun. kyunki ham ((x-u)/var)*lambda+bias krtai hi hain to bias to apne aap usko zero nhi hone dega. Please help sir

@RamandeepSingh_04 7 ай бұрын

still it will be a very small number and will affect the result and not represent the true picture of the feature in batch normalization.

@WIN_1306 7 ай бұрын

@@RamandeepSingh_04 compared to others who are without padding it will be small, but still sir wrote zero but zero to nhi hi hoga

@AkashSingh-oz7qx 4 ай бұрын

please also cover Generative and diffusion models.

@princekhunt1 6 ай бұрын

Sir, Please complete this series.

@ESHAANMISHRA-pr7dh 7 ай бұрын

Respected sir, I request you to please complete the playlist. I am really thankful to you for your amazing videos in this playlist. I have recommended this playlist to a lot of my friends and they loved it too. Thanks for providing such content for free🙏🙏

@SulemanZeb. 7 ай бұрын

Please start MLOPs playlist as we are desperately waiting for.......

@technicalhouse9820 7 ай бұрын

Sir love you so much from Pakistan

@SaurabhKumar-t4s 5 ай бұрын

At 46:04, if sigma4 is 0 then how do we divide with this value.

@Shisuiii69 5 ай бұрын

Mujhe bhi same confusion thi chatgpt se pta kiya usne kha ky hum ek error value add krty jo very close to zero hota hai to isliye hum zero likh dety hai after normalization

@hassan_sid 7 ай бұрын

It would be great if you make a video on RoPE

@rockykumarverma980 2 ай бұрын

Thank you so much sir 🙏🙏🙏

@advaitdanade7538 7 ай бұрын

Sir please end this playlist fast placement season is nearby😢

@SANJAYTYAGI-bk6tx 7 ай бұрын

Sir In batch normalization , in your example we have three mean and three variance along with same number of beta and gamma i.e. 3. But in layer normalization , we have eight mean and eight variance along with 3 beta and 3 gamma. That means number of beta and gamma are same in both batch and layer normalization. Is it correct? Pl elaborate on it .

@campusx-official 7 ай бұрын

Yes

@WIN_1306 7 ай бұрын

mean and variance are used for normalisation ,beta and gamma are used for scaling

@BharathKumar-vs8fm 2 ай бұрын

Sir, I think you accidentally represented standardization for normalization. I assume normalization is the process of translating data into the range [0, 1].

@barryallen5243 7 ай бұрын

Just ignoring padded rows while performing batch normalization should also work, I feel like it that padded zeros are not the only reason we layer normalization instead of batch normalization.

@WIN_1306 7 ай бұрын

how would you ignore padding cols in batch normalisation?

@sagarbhagwani7193 7 ай бұрын

thanks sir plse complete this playlist asap

@shubharuidas2624 7 ай бұрын

Please also continue with vision transformer

@khatiwadaAnish 6 ай бұрын

Awesome 👍👍

@not_amanullah 7 ай бұрын

This is helpful 🖤

@dharmendra_397 7 ай бұрын

Very nice video

@intcvn 7 ай бұрын

complete jaldi sir waiting asf

@himansuranjanmallick16 5 ай бұрын

thank you sir................

@darkpheonix6592 7 ай бұрын

please upload remaining videos quickly

@29_chothaniharsh62 7 ай бұрын

Sir can you please continue the 100 interview questions on ML playlist?

@not_amanullah 7 ай бұрын

Thanks ❤

@WIN_1306 7 ай бұрын

sir can u tell that around how many and which topics are left?

@rose9466 7 ай бұрын

Can you give an estimate by when this playlist will be completed

@vinayakbhat9530 4 ай бұрын

excellent

@UCS_B_DebopamDey 2 ай бұрын

thank you sir

@gauravbhasin2625 7 ай бұрын

Nitish, please relook at your covariate shift funds... yes, you are partially correct but how you explained covariate shift is actually incorrect. (example - Imagine training a model to predict if someone will buy a house based on features like income and credit score. If the model is trained on data from a specific city with a certain average income level, it might not perform well when used in a different city with a much higher average income. The distribution of "income" (covariate) has shifted, and the model's understanding of its relationship to house buying needs to be adjusted.)

@WIN_1306 7 ай бұрын

ig , the explanation that sir gave and your explanation are same with different example of covariate shift

@electricalengineer5540 2 ай бұрын

bit heavy but is excellent

@manojprasad6781 7 ай бұрын

Waiting for the next video💌

@teksinghayer5469 7 ай бұрын

when will you code transformer from scratch in pytorch

@adarshsagar9817 7 ай бұрын

sir please complete the NLP playlist

@WIN_1306 7 ай бұрын

which one? how many videos does it have?

@oden4013 6 ай бұрын

sir please upload next video please its almost a month

@harshgupta-w5y 7 ай бұрын

Jldi next video dalo sir

@zerotohero1002 6 ай бұрын

one month ho gya sir please upload eagarly waiting🥺🥺🥺

@vikassengupta8427 7 ай бұрын

Sir next video ❤❤

@MrSat001 7 ай бұрын

Great 👍

@hamidhafiz-s2n 13 күн бұрын

you are amazing

@turugasairam2886 4 ай бұрын

sir, why dont you translate it to english and upload , making a new channel like campusX english, i am sure it will attract more audience and reach. I am sure you would have thought of this already

@ghousepasha4172 7 ай бұрын

Sir please complete playlist I will pay 5000 for that

@faizack 6 ай бұрын

😂😂😂🎉