Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

  Рет қаралды 2,645

Discover AI

Discover AI

Күн бұрын

In a Colab Notebook we code a visualization of the last layer of the Vision Transformer Encoder stack and analyze the visual output of each of the 12 Attention Heads, given a specific image. Now we understand how a only pre-trained ViT (although with the DINO method) can not always succeed in an image classification (downstream) task. The fine-tuning of the ViT is simply missing - but essential for a better performance.
Based on the COLAB NB by Niels Rogge, HuggingFace (all rights with him):
colab.research...
In one of my next video we will fine-tune a pre-trained Vision Transformer ViT from scratch. For better image classification performance.
#ai
#vision
#technology

Пікірлер: 3
Attention in transformers, visually explained | Chapter 6, Deep Learning
26:10
This mother's baby is too unreliable.
00:13
FUNNY XIAOTING 666
Рет қаралды 38 МЛН
Smart Sigma Kid #funny #sigma
00:14
CRAZY GREAPA
Рет қаралды 4,6 МЛН
Зу-зу Күлпаш 2. Интернет мошенник
40:13
ASTANATV Movie
Рет қаралды 601 М.
Vision Transformer and its Applications
34:38
Open Data Science
Рет қаралды 42 М.
Harvard Presents NEW Knowledge-Graph AGENT (MedAI)
38:36
Discover AI
Рет қаралды 38 М.
Vision Transformer Basics
30:49
Samuel Albanie
Рет қаралды 28 М.
DINO: Self-Supervised Vision Transformers
21:12
Soroush Mehraban
Рет қаралды 2,9 М.
Cross Attention | Method Explanation | Math Explained
13:06
This mother's baby is too unreliable.
00:13
FUNNY XIAOTING 666
Рет қаралды 38 МЛН