Рет қаралды 4,150
In this video we explain Meta-Transformer, a unified framework for multimodal learning.
With Meta-Transformer, we can use the same pre-trained transformer to process information of 12 different modalities, which is significantly more than what was possible until now with similar works such as ImageBind by Meta AI.
We review the architecture of Meta-Transformer, which is composed of Data-to-Sequence Tokenizer, a Unified Multimodal Model, and task specific models, and explain how Meta-Transformer is used to create models that can solve end tasks for different modalities.
Next we dive deeper into the pre-training process of the unified multimodal model, which is based on the LAION-2B dataset and trained using contrastive learning approach.
We finish by reviewing some of the results presented in the paper.
Blog post - aipapersacademy.com/meta-tran...
Meta-Transformer paper page - arxiv.org/abs/2307.10802
ImageBind Video - • ImageBind from Meta AI...
👍 Please like & subscribe if you enjoy this content
----------------------------------------------------------------------------------
Support us - paypal.me/aipapersacademy
----------------------------------------------------------------------------------
Chapters:
0:00 Introducing Meta-Transformer
0:55 Meta-Transformer Architecture
3:10 Pre-training
4:46 Results