OpenAI's CLIP for Zero Shot Image Classification

  Рет қаралды 13,623

James Briggs

James Briggs

Күн бұрын

Пікірлер: 24
@satishgoda
@satishgoda 7 ай бұрын
Thank you so much for explaining it in as simple terms as possible and CLIPping all the nerdy details.
@Velvet-Sunshine.
@Velvet-Sunshine. 2 жыл бұрын
I found your video spellbinding, informative. I spent years learning methods of how to speak. It's an art. In a nutshell. Experiment with microphones, they vary greatly. One in fifty are good enough to have a full range from 20 Cycles to 18,000 Cycles and it needs to produce this on its own as straight out flat, no graphic equalizer. Hold the microphone 1/2 inch below your chin and 1/2 inch towards your throat, positioning the microphone under your lower jaw. Next aline and direct the microphone towards your throat. This will produce and incredible up close and personal affect. Because the vocal cords in your throat is where the bass is produced, the sound from your mouth is where the highs are produced, but these alone produce an undesirable voice recording, the throat is what is needed to equalize the desired effect. Following this procedure is vital. I realize this would be difficult for a video if you are in the image. But the sound of your voice will Captivate your audience when making recordings when you're not in the image. Some people use a clip-on microphone attached to their shirt. This is a complete failure. If you experiment, you may find your audience doesn't mind you holding the microphone in one hand under the chin as described above. But it's killer audio, for your audience. Advice that took me years to master. Special note, when looking for a microphone, go to a guitar store where they sell the professional equipment, and make them plug several microphones that you are considering into a amp, to test performance on each. I can tell you microphones that cost over $1,000.00 are usually junk. Also, I prefer that dynamic microphone made with coils and magnets like they made in the 1970s over microphones that are made with a computer chip. Good luck if you want to sound fabulous.
@ahmedwaly9073
@ahmedwaly9073 6 ай бұрын
Amazing explanation
@chyldstudios
@chyldstudios 2 жыл бұрын
Stable Diffusion will be integrating OpenAI's CLIP model into their architecture to improve the generation of novel images.
@UmarFaruk-f8t
@UmarFaruk-f8t 8 күн бұрын
Are you suggesting the image encoder never have seen any cassette player during it's training and still it has formed an understanding of it in the latency space? I would disagree to this.
@SinanAkkoyun
@SinanAkkoyun Жыл бұрын
Just a rather basic question, when using cosine similarity and or normalizing and then comparing dot products, doesn't this reduce information? My question is why the length of the vector doesn't play any role when looking for similarities
@anukulkumarsingh220
@anukulkumarsingh220 Жыл бұрын
What if i have multiple captions for a singke image in my dataset. Should i combine them in a single string? Or can i associate multiple captions for the same image?
@fr3fou
@fr3fou 11 ай бұрын
In the example code for the `openai/clip-vit-base-patch32` model in huggingface, the logits output is convereted into probabilities using Softmax. In this video / article we don't, why is that?
@celestial_x
@celestial_x Жыл бұрын
once again, I am being referred to your videos, last time it was Bag of visual words video. You are pretty famous among professors in India. (not just professors, professors of IITs)
@jamesbriggs
@jamesbriggs Жыл бұрын
Wow, thanks that's awesome!
@sailfromsurigao
@sailfromsurigao Жыл бұрын
Do you have material for finetuning CLIP to another dataset?
@helloansuman
@helloansuman Жыл бұрын
Amazing video.
@Sonalikohli-s1t
@Sonalikohli-s1t 8 ай бұрын
superb video. so much informatic. but i want to use this zero shot learning in numeric dataset, means no text or image dataset , i want to train the zero shot learning model with this type of dataset can you please help me in this task?
@mvrdara
@mvrdara 2 жыл бұрын
Great explanation! We still need fine tuning for novel dataset right? Zero shot learning can't fully eliminate fine tuning and transfer learning?
@jamesbriggs
@jamesbriggs 2 жыл бұрын
yes exactly - this works well for more generic use-cases, but not all - for example I have seen fine-tuning required for fashion items and and satellite imagery
@surajitchakraborty1903
@surajitchakraborty1903 2 жыл бұрын
Hi, the Pinecone article link does not seem to work. Are you able to provide the correct link ?
@jamesbriggs
@jamesbriggs 2 жыл бұрын
Oops I fixed it, the correct link is www.pinecone.io/learn/zero-shot-image-classification-clip/
@henkhbit5748
@henkhbit5748 2 жыл бұрын
Impressive, can it be used for face recognition or for non face images only?
@jamesbriggs
@jamesbriggs 2 жыл бұрын
for specific faces it would need more fine-tuning, but otherwise I believe it should work - I'm working on a video now covering CLIP for object detection that should be helpful
@riyaml5332
@riyaml5332 Жыл бұрын
Hi, this video was outstanding and very informative I have a lot of images each one of which represent a separate class and I don't know how to implement this code on it, it would mean a lot to me if you can assist in this matter
@cloudshoring
@cloudshoring 2 жыл бұрын
Awesome !
@MogensBrun
@MogensBrun 3 ай бұрын
Excellent description. I have thousands of images with design objects on a Mac FileMaker server, which can connect to gpt-4o or similar AI-model. I am interested to hear your opinion upon analysing this images according to a JSON file with some few hundreds design taxonomies (category name and description). You are welcome to contact me directly.
@shaheerzaman620
@shaheerzaman620 2 жыл бұрын
super cool!
@jamesbriggs
@jamesbriggs 2 жыл бұрын
Thanks Shaheer!
Fast Zero Shot Object Detection with OpenAI CLIP
29:32
James Briggs
Рет қаралды 11 М.
OpenAI CLIP Explained | Multi-modal ML
33:33
James Briggs
Рет қаралды 24 М.
Молодой боец приземлил легенду!
01:02
МИНУС БАЛЛ
Рет қаралды 2,2 МЛН
Can You Find Hulk's True Love? Real vs Fake Girlfriend Challenge | Roblox 3D
00:24
Twin Telepathy Challenge!
00:23
Stokes Twins
Рет қаралды 123 МЛН
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 215 М.
Zero-Shot Learning - Dr. Timothy Hospedales
29:08
Yandex for ML
Рет қаралды 36 М.
OpenAI CLIP: ConnectingText and Images (Paper Explained)
48:07
Yannic Kilcher
Рет қаралды 136 М.
What is Zero-Shot Learning?
8:55
IBM Technology
Рет қаралды 21 М.
Contrastive Learning with SimCLR | Deep Learning Animated
14:57
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,8 МЛН
Fast intro to multi-modal ML with OpenAI's CLIP
22:54
James Briggs
Рет қаралды 13 М.
GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem
19:15
Молодой боец приземлил легенду!
01:02
МИНУС БАЛЛ
Рет қаралды 2,2 МЛН