OpenAI CLIP model explained

  Рет қаралды 5,045

Machine Learning Studio

Machine Learning Studio

Күн бұрын

Пікірлер: 11
@AI_For_Scientists
@AI_For_Scientists 2 ай бұрын
Great video series on vit and derivatives, watched all of it. Thank you very much for sharing.
@PyMLstudio
@PyMLstudio 2 ай бұрын
Glad you enjoyed it.
@klaverhenrik
@klaverhenrik Ай бұрын
Your videos are amazing! Clear and well-structured. Are the slides available anywhere?
@PyMLstudio
@PyMLstudio Ай бұрын
Thanks, I am glad you found the videos useful! Sure, I am uploading the slides to GitHub, you can find the PDF slides at github.com/PyML-studio/mlstudio/tree/main/Slides
@SebastianRaschka
@SebastianRaschka 5 ай бұрын
Very nice video! I can also imagine that predicting the caption text exactly isn't only more difficult but it would also be more likely result in (more) overfitting if it is learned this way. At 5:43, the pair-wise similarities, they are basically like cross-attention scores?
@PyMLstudio
@PyMLstudio 5 ай бұрын
Yes, in a way, it’s analogous to cross-attention, taking dot-product between the features from the text encoder and image encoder. This dot-product similarity is used as the final output of the model to determine if an image and a text caption are related or not. Good question, thanks for the comment
@fouziaanjums6475
@fouziaanjums6475 5 ай бұрын
Please cover FasterViT model too...
@PyMLstudio
@PyMLstudio 5 ай бұрын
Absolutely, I’ll cover that , I have a few other topics lined up, then I’ll get to FasterViT Thanks for the suggestion!
@fouziaanjums6475
@fouziaanjums6475 10 күн бұрын
​@@PyMLstudioPlease make a video on the above topic soon....
@randomstuff39280
@randomstuff39280 3 ай бұрын
thank you for explaining! very clear! but I'm wondering how do you know WiT dataset is based on 50000 queries and 20000 pairs for each query? I can't find it in the paper.
@PyMLstudio
@PyMLstudio 3 ай бұрын
Thanks for the comment! Please see Page 3, section 2.2: Creating a sufficiently large dataset But it’s 500000 queries, balancing 20000 (Image, text) pairs per query
OpenAI CLIP: ConnectingText and Images (Paper Explained)
48:07
Yannic Kilcher
Рет қаралды 136 М.
Learn PyTorch for deep learning in a day. Literally.
25:36:58
Daniel Bourke
Рет қаралды 1,6 МЛН
Players push long pins through a cardboard box attempting to pop the balloon!
00:31
Why no RONALDO?! 🤔⚽️
00:28
Celine Dept
Рет қаралды 66 МЛН
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 17 МЛН
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models
18:21
Machine Learning Courses
Рет қаралды 10 М.
Vectoring Words (Word Embeddings) - Computerphile
16:56
Computerphile
Рет қаралды 297 М.
The Complete Machine Learning Roadmap [2024]
5:25
Programming with Mosh
Рет қаралды 251 М.
Variational Autoencoders
15:05
Arxiv Insights
Рет қаралды 513 М.
Contrastive Learning - 5 Minutes with Cyrill
5:24
Cyrill Stachniss
Рет қаралды 18 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Nadam Optimizer
4:30
SolFinder Research
Рет қаралды 77