Vision Transformer and its Applications

  Рет қаралды 44,553

Open Data Science

Open Data Science

Күн бұрын

Пікірлер: 25
@jhjbm1959
@jhjbm1959 Жыл бұрын
This video provides a clear step by step explanation how to get from images to input features for Transformer encoders, which has proven hard to find anywhere else. Thank you.
@PrestonRahim
@PrestonRahim Жыл бұрын
Super helpful. Was very lost on the process from image patch to embedded vector until I watched this.
@DrAIScience
@DrAIScience 7 ай бұрын
Very very very nice explanation!!! I like learning the foundation/origin of the concepts where models are derived..
@crapadopalese
@crapadopalese Жыл бұрын
10:46 - this is a mistake; the convolution is not equivariant to scaling - if the bird is scaled, the output of the convolution will not be simply a scaling of the original output. That would only be true if you also rescale the filters.
@xXMaDGaMeR
@xXMaDGaMeR Жыл бұрын
amazing lecture, thank you sir!
@ailinhasanpour
@ailinhasanpour Жыл бұрын
thanks for sharing , it was extremely helpful 💯
@OpenDataScienceCon
@OpenDataScienceCon Жыл бұрын
Thank you!
@sahil-vz8or
@sahil-vz8or Жыл бұрын
you said 196 patches in imagenet data. No of matches will depend on the input image size and the patch size. For eg: if the input image is of 400X400 and patch size of 8X8, then no of patches will be (400X400/8X8) = 50X50 =2500.
@rikki146
@rikki146 Жыл бұрын
20:17 I think the encoder blocks are stacked in parallel fashion rather than sequential?
@SarangBanakhede
@SarangBanakhede 3 ай бұрын
10:58 Scale Equivariance: Definition: A function is scale equivariant if a scaling (resizing) of the input results in a corresponding scaling of the output. Convolution in CNNs: Standard convolutions are not scale equivariant. This means that if you resize an object in an image (e.g., making it larger or smaller), the CNN may not recognize it as the same object. Convolutional filters have fixed sizes, so they may fail to detect features that are significantly larger or smaller than the size of the filter. Example: If a CNN is trained to detect a small object using a specific filter size, it might struggle to detect the same object when it appears much larger in the image because the filter is not capable of adjusting to different scales. Why is Convolution Not Scale Equivariant? The filters in a CNN have a fixed receptive field, meaning they look for patterns of a specific size. If the size of the pattern changes (e.g., due to scaling), the fixed-size filters may no longer detect the pattern effectively.
@scottkorman4953
@scottkorman4953 Жыл бұрын
What exactly is happening in the self-attention and MLP blocks of the encoder module? Could you describe it in a simplistic way?
@mohammedrakib3736
@mohammedrakib3736 8 ай бұрын
Fantastic Video! Really loved the detailed explanation step-by-step.
@DrAIScience
@DrAIScience 7 ай бұрын
Do you have a video about beit or dino?
@НиколайНовичков-е1э
@НиколайНовичков-е1э Жыл бұрын
Thank you, sir
@PRASHANTKUMAR-ze6mj
@PRASHANTKUMAR-ze6mj Жыл бұрын
thanks for sharing
@DrAIScience
@DrAIScience 7 ай бұрын
Are you the channel owner??
@anirudhgangadhar6158
@anirudhgangadhar6158 Жыл бұрын
Great resource!
@muhammadshahzaibiqbal7658
@muhammadshahzaibiqbal7658 2 жыл бұрын
Thanks for sharing.
@liangcheng9856
@liangcheng9856 Жыл бұрын
awesome
@hoangtrung.aiengineer
@hoangtrung.aiengineer Жыл бұрын
Thank you for making such a great video
@capocianni1043
@capocianni1043 Жыл бұрын
Thank you for this genuine knowledge.
@saimasideeq7254
@saimasideeq7254 Жыл бұрын
thankyou much clearer
@improvement_developer8995
@improvement_developer8995 Жыл бұрын
Tax evader 🤮
@improvement_developer8995
@improvement_developer8995 Жыл бұрын
🤮
Vision Transformer Basics
30:49
Samuel Albanie
Рет қаралды 31 М.
How Many Balloons To Make A Store Fly?
00:22
MrBeast
Рет қаралды 150 МЛН
كم بصير عمركم عام ٢٠٢٥😍 #shorts #hasanandnour
00:27
hasan and nour shorts
Рет қаралды 11 МЛН
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 1,8 МЛН
Vision Transformer for Image Classification
14:47
Shusen Wang
Рет қаралды 122 М.
Vision Transformer in PyTorch
29:52
mildlyoverfitted
Рет қаралды 84 М.
Vision Transformer explained in detail | ViTs
1:11:48
Code With Aarohi
Рет қаралды 2,1 М.
Transformers in Vision: From Zero to Hero
1:08:38
AICamp
Рет қаралды 40 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 215 М.
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 263 М.
How to quickly draw up a plan on iPad Procreate! #ipad手工#设计人
0:37
Me Charging My Phone Before Going Out
0:18
Godfrey Twins
Рет қаралды 14 МЛН
Как подключить магнитолу?
0:51
KS Customs
Рет қаралды 2,3 МЛН
ПРОДАЛИ ЮТУБ КНОПКУ ЗА ПК
1:00
VA-PC
Рет қаралды 121 М.