Vision Transformer and its Applications

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Basics

Smart Parenting Gadget for a Mess-Free Mealtime 🍽️👍 #parenting #gadgets #asmr

How Many Balloons To Make A Store Fly?

Give male police officers the most tiring car#Short #Officer Rabbit #angel

كم بصير عمركم عام ٢٠٢٥😍 #shorts #hasanandnour

Vision Transformer and its Applications

Рет қаралды 44,553

Open Data Science

Open Data Science

Күн бұрын

Пікірлер: 25

@jhjbm1959 Жыл бұрын

This video provides a clear step by step explanation how to get from images to input features for Transformer encoders, which has proven hard to find anywhere else. Thank you.

@PrestonRahim Жыл бұрын

Super helpful. Was very lost on the process from image patch to embedded vector until I watched this.

@DrAIScience 7 ай бұрын

Very very very nice explanation!!! I like learning the foundation/origin of the concepts where models are derived..

@crapadopalese Жыл бұрын

10:46 - this is a mistake; the convolution is not equivariant to scaling - if the bird is scaled, the output of the convolution will not be simply a scaling of the original output. That would only be true if you also rescale the filters.

@xXMaDGaMeR Жыл бұрын

amazing lecture, thank you sir!

@ailinhasanpour

@ailinhasanpour Жыл бұрын

thanks for sharing , it was extremely helpful 💯

@OpenDataScienceCon

@OpenDataScienceCon Жыл бұрын

Thank you!

@sahil-vz8or Жыл бұрын

you said 196 patches in imagenet data. No of matches will depend on the input image size and the patch size. For eg: if the input image is of 400X400 and patch size of 8X8, then no of patches will be (400X400/8X8) = 50X50 =2500.

@rikki146 Жыл бұрын

20:17 I think the encoder blocks are stacked in parallel fashion rather than sequential?

@SarangBanakhede

@SarangBanakhede 3 ай бұрын

10:58 Scale Equivariance: Definition: A function is scale equivariant if a scaling (resizing) of the input results in a corresponding scaling of the output. Convolution in CNNs: Standard convolutions are not scale equivariant. This means that if you resize an object in an image (e.g., making it larger or smaller), the CNN may not recognize it as the same object. Convolutional filters have fixed sizes, so they may fail to detect features that are significantly larger or smaller than the size of the filter. Example: If a CNN is trained to detect a small object using a specific filter size, it might struggle to detect the same object when it appears much larger in the image because the filter is not capable of adjusting to different scales. Why is Convolution Not Scale Equivariant? The filters in a CNN have a fixed receptive field, meaning they look for patterns of a specific size. If the size of the pattern changes (e.g., due to scaling), the fixed-size filters may no longer detect the pattern effectively.

@scottkorman4953

@scottkorman4953 Жыл бұрын

What exactly is happening in the self-attention and MLP blocks of the encoder module? Could you describe it in a simplistic way?

@mohammedrakib3736

@mohammedrakib3736 8 ай бұрын

Fantastic Video! Really loved the detailed explanation step-by-step.

@DrAIScience 7 ай бұрын

Do you have a video about beit or dino?

@НиколайНовичков-е1э

@НиколайНовичков-е1э Жыл бұрын

Thank you, sir

@PRASHANTKUMAR-ze6mj

@PRASHANTKUMAR-ze6mj Жыл бұрын

thanks for sharing

@DrAIScience 7 ай бұрын

Are you the channel owner??

@anirudhgangadhar6158

@anirudhgangadhar6158 Жыл бұрын

Great resource!

@muhammadshahzaibiqbal7658

@muhammadshahzaibiqbal7658 2 жыл бұрын

Thanks for sharing.

@liangcheng9856

@liangcheng9856 Жыл бұрын

awesome

@hoangtrung.aiengineer

@hoangtrung.aiengineer Жыл бұрын

Thank you for making such a great video

@capocianni1043

@capocianni1043 Жыл бұрын

Thank you for this genuine knowledge.

@saimasideeq7254

@saimasideeq7254 Жыл бұрын

thankyou much clearer

@improvement_developer8995

@improvement_developer8995 Жыл бұрын

Tax evader 🤮

@improvement_developer8995

@improvement_developer8995 Жыл бұрын

🤮

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

16:51

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

DeepFindr

Рет қаралды 91 М.

Vision Transformer Basics

30:49

Vision Transformer Basics

Samuel Albanie

Рет қаралды 31 М.

Smart Parenting Gadget for a Mess-Free Mealtime 🍽️👍 #parenting #gadgets #asmr

00:33

Smart Parenting Gadget for a Mess-Free Mealtime 🍽️👍 #parenting #gadgets #asmr

Coo-Cool Reacts!

Рет қаралды 14 МЛН

How Many Balloons To Make A Store Fly?

00:22

How Many Balloons To Make A Store Fly?

MrBeast

Рет қаралды 150 МЛН

Give male police officers the most tiring car#Short #Officer Rabbit #angel

00:27

Give male police officers the most tiring car#Short #Officer Rabbit #angel

兔子警官

Рет қаралды 83 МЛН

كم بصير عمركم عام ٢٠٢٥😍 #shorts #hasanandnour

00:27

كم بصير عمركم عام ٢٠٢٥😍 #shorts #hasanandnour

hasan and nour shorts

Рет қаралды 11 МЛН

Attention in transformers, visually explained | DL6

26:10

Attention in transformers, visually explained | DL6

3Blue1Brown

Рет қаралды 1,8 МЛН

Vision Transformer for Image Classification

14:47

Vision Transformer for Image Classification

Shusen Wang

Рет қаралды 122 М.

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

1:08:37

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

Stanford Online

Рет қаралды 50 М.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

29:56

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

Yannic Kilcher

Рет қаралды 347 М.

Vision Transformer in PyTorch

29:52

Vision Transformer in PyTorch

mildlyoverfitted

Рет қаралды 84 М.

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

39:13

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

Yannic Kilcher

Рет қаралды 124 М.

Vision Transformer explained in detail | ViTs

1:11:48

Vision Transformer explained in detail | ViTs

Code With Aarohi

Рет қаралды 2,1 М.

Transformers in Vision: From Zero to Hero

1:08:38

Transformers in Vision: From Zero to Hero

AICamp

Рет қаралды 40 М.

How AI 'Understands' Images (CLIP) - Computerphile

18:05

How AI 'Understands' Images (CLIP) - Computerphile

Computerphile

Рет қаралды 215 М.

The math behind Attention: Keys, Queries, and Values matrices

36:16

The math behind Attention: Keys, Queries, and Values matrices

Serrano.Academy

Рет қаралды 263 М.

Honor Magic7Pro Honor mobile phone set my lower body on fire and ran away. This dick is so bad.

0:16

Honor Magic7Pro Honor mobile phone set my lower body on fire and ran away. This dick is so bad.

Funny House

Рет қаралды 1,5 МЛН

Этот парень ускорил работу своего компьютера с помощью аквариума! #рекомендации #факты #интересно

0:36

Этот парень ускорил работу своего компьютера с помощью аквариума! #рекомендации #факты #интересно

Alex Mikado

Рет қаралды 1,7 МЛН

How to quickly draw up a plan on iPad Procreate! #ipad手工#设计人

0:37

How to quickly draw up a plan on iPad Procreate! #ipad手工#设计人

Akaworktime188

Рет қаралды 8 МЛН

ИЗ ДРЕВНЕГО ХЛАМА - В ТОПОВЫЙ ИГРОВОЙ ПК ЗА 10.000 / СБОРКА И АПГРЕЙД КОМПА ДЛЯ ИГР

23:10

ИЗ ДРЕВНЕГО ХЛАМА - В ТОПОВЫЙ ИГРОВОЙ ПК ЗА 10.000 / СБОРКА И АПГРЕЙД КОМПА ДЛЯ ИГР

spline

Рет қаралды 169 М.

Me Charging My Phone Before Going Out

0:18

Me Charging My Phone Before Going Out

Godfrey Twins

Рет қаралды 14 МЛН

Как подключить магнитолу?

0:51

Как подключить магнитолу?

KS Customs

Рет қаралды 2,3 МЛН

ПРОДАЛИ ЮТУБ КНОПКУ ЗА ПК

1:00

ПРОДАЛИ ЮТУБ КНОПКУ ЗА ПК

VA-PC

Рет қаралды 121 М.