Vision Transformer explained in detail

Vision Transformer explained in detail | ViTs

Рет қаралды 3,673

Code With Aarohi

Күн бұрын

Пікірлер: 71

@soravsingla8782 Ай бұрын

Your videos are always unique & highly knowledgeable. Thank you

@CodeWithAarohi Ай бұрын

Thank you!

@layamahmoudi6002 15 күн бұрын

Thank you for the amazing video, it's absolutely perfect!

@CodeWithAarohi 14 күн бұрын

I'm glad you found it helpful!

@TruthOnly_jayshreeRam Күн бұрын

awesome, very nicely explained. Thanks Ma'am.

@CodeWithAarohi 11 сағат бұрын

Most welcome 😊

@TruthOnly_jayshreeRam 11 сағат бұрын

@@CodeWithAarohi Ma'am, Can you please make a video on "memory-augmented zero-shot image captioning"

@vcarvewood4545 Ай бұрын

You are excellent teacher. I'm in love in your voice since YOLOv8 tutorials. Attention to Aarohi is all we need.

@CodeWithAarohi Ай бұрын

Thank you for the compliment! I'm really glad the tutorials and my voice have made learning enjoyable for you.

@revanb2781 2 күн бұрын

Very impressive video, Thank you.

@CodeWithAarohi 2 күн бұрын

You're welcome!

@arnavthakur5409 22 күн бұрын

Excellent content ma'am

@CodeWithAarohi 21 күн бұрын

Glad you found it helpful!

@Sunil-ez1hx 22 күн бұрын

Such an informative video🙏🙏

@CodeWithAarohi 21 күн бұрын

Thanks, glad you found it helpful!

@eranfeit 8 күн бұрын

Thank you for a great explanation

@CodeWithAarohi 2 күн бұрын

You are welcome :)

@AsthaPatidar-w1t Ай бұрын

Please make a video for Convolution to Vision Transformer in detail. And thanks for this video.

@CodeWithAarohi Ай бұрын

Noted!

@pifordtechnologiespvtltd5698 Ай бұрын

Extremely Appreciated 👏👏👏

@CodeWithAarohi Ай бұрын

Thank you so much 😀

@bharatto2220 Ай бұрын

Thankyou for explaining the videos very elaborately and clearly. But at some places it was too basic like RGB, would appreciate a timeline so that I can skip to the required part

@CodeWithAarohi Ай бұрын

Thankyou and I will add timeline

@aneerimmco Ай бұрын

informative, thank you ma'am.

@CodeWithAarohi Ай бұрын

Most welcome 😊

@munimahmed9374 Ай бұрын

Can you please explain DEiT model, this Vit explanasion is the best video on Vit I found on the internet. thanks a lot

@madhavanu6980 Ай бұрын

Transformers for remote sensing classification paper ma'am.....plz explain it ma'am...bcz you do it great and in easily understandable manner

@Ishaheennabi Ай бұрын

it's great video thanks for it

@CodeWithAarohi Ай бұрын

Glad you liked it!

@satvik4225 Ай бұрын

43:20 You said that we do element wise addition of Patch representation and position embedding which means their dimension is same. The patch representation is of length 768x1 and you also said the length of the position embedding vector is 512. How will you do the element wise addition. did you mean linear projected vector of eatch patch which has dimension of 512? I learnt alot of stuff, thanks

@CodeWithAarohi Ай бұрын

@@satvik4225 Every patch is of size 512x1 after linear embedding layer. And then we are adding position encoding with patches.

@satvik4225 Ай бұрын

Okay, thanks. It was a bit confusing as you said here that patch representation is flatenned patches. Thanks again

@salmareang7458 Ай бұрын

I have some confusion take one input image then how qkv are find ?

@jynpogger 15 күн бұрын

God Please Protect My Teacher at all costs

@CodeWithAarohi 14 күн бұрын

Thank you so much for your kind words and blessings! 😊🙏

@ramchandhablani9834 7 күн бұрын

Mam, your video is very good, I have two questions, If there are 2 hidden layers, then there will be three matrices say W1, W2 and W3 for linear projection. The 2nd question is to train these weights and biases, we neet target vectors corresponding to each input vector. from where we will get those target vectors?

@CodeWithAarohi 2 күн бұрын

W1: for projecting image patches into embeddings. W2: for query, key, and value projections in the self-attention layer. W3: for the feed-forward layer after attention. The target vectors come from the labeled dataset, where each input image has a corresponding label (for classification tasks) .

@soravsingla6574 15 күн бұрын

Perfect

@CodeWithAarohi 14 күн бұрын

Thanks!

@Mulugeta-c5q Ай бұрын

Thank you for your Good work and can you make a video for ViTPose code too?

@CodeWithAarohi Ай бұрын

@@Mulugeta-c5q Sure

@aryarushipathak5039 Ай бұрын

Hello ma'am , can we use Vit and CNN to identify emotions from the face ? CNN for feature extraction and mtcnn for emotion labeling

@nursami7842 27 күн бұрын

Hello ma'am, can you explain in more detail about encoder transformers such as normalization, multihead attention, softmax, MLP, The video doesn't provide a detailed explanation about that, can you explain that in the next video?

@nursami7842 27 күн бұрын

or is there a reference that explains in detail about it?

@CodeWithAarohi 26 күн бұрын

Will try to cover in another video.

@sreenalakhani4985 17 күн бұрын

pls make a video on video vision transformer also

@CodeWithAarohi 17 күн бұрын

Sure!

@satvik4225 Ай бұрын

can you explain diffusion models next

@CodeWithAarohi Ай бұрын

Noted!

@madhavanu6980 Ай бұрын

Ma'am plz do video on TRS remote sensing transformers...plz ma'am its my humble request....as i completely understand the ViT still I can't understand TRS plz ma'am

@CollegeOnline 22 күн бұрын

mam please please please please please please please create video on Gated Vision transformer as i am trying to use it in my research paper, but I am not able to find any literature regarding GVT. mam if you have any links to GVIT then kindly share it please

@CodeWithAarohi 21 күн бұрын

Hi, I need time to make this video because I never used this model before and I have read the paper and understand it in order to create a video.

@fatima-arbab Ай бұрын

Ma'am plzzz make a video Copy move forgery detection in video using machine learning Using yolo model and dataset casia

@shivamsood4566 17 күн бұрын

Mam pls also make on stable diffusion,pls mam 👏

@CodeWithAarohi 16 күн бұрын

Noted!

@adityanjsg99 Ай бұрын

Arohi ji, possible for you to build a Model which is as good as gpt? Though on limited data and scale..

@CodeWithAarohi Ай бұрын

@@adityanjsg99 I am not sure 🤔

@rahulhanot4481 Ай бұрын

when will we get more video of this topic

@CodeWithAarohi Ай бұрын

@@rahulhanot4481 kzbin.info/aero/PLv8Cp2NvcY8AzNCATbDWMr8vqbJBYbxFW&si=-488A3JJRBAVtEdh

@AB099 Ай бұрын

EasyOCR next tutorial text detection

@CodeWithAarohi Ай бұрын

Noted!

@khalidamiralam6469 Ай бұрын

Hello ma'am, I am still waiting for your video on video generation

@CodeWithAarohi Ай бұрын

Will upload soon

@danigunawan6807 Ай бұрын

please d your share code explainned thanks ?

@chinnaiahkotadi4702 26 күн бұрын

Thank you for your effort mam.but you unable to explain how to position value getting, 2. You not explain while neural network work with help of relu activation functions 3. While making relationships between quer,key and value , what is role of Key

@CodeWithAarohi 26 күн бұрын

Noted! Will try to cover in another video.

@sanathspai3210 Ай бұрын

Hi Arohi It was good session. One suggestions could you please make video of all the yolo versions starting from V1 to v11? Many are waiting for it and will be very beneficial

@CodeWithAarohi Ай бұрын

Hi, Glad my video is helpful! Great suggestion, I will surely make a video on all yolo models.

@sanathspai3210 Ай бұрын

@ please try to tag me once you are done. I’m very much in need of it and the way you explain it

@CodeWithAarohi Ай бұрын

Sure