OpenAI's CLIP for Zero Shot Image Classification

Рет қаралды 13,623

James Briggs

Күн бұрын

Пікірлер: 24

@satishgoda 7 ай бұрын

Thank you so much for explaining it in as simple terms as possible and CLIPping all the nerdy details.

@Velvet-Sunshine. 2 жыл бұрын

I found your video spellbinding, informative. I spent years learning methods of how to speak. It's an art. In a nutshell. Experiment with microphones, they vary greatly. One in fifty are good enough to have a full range from 20 Cycles to 18,000 Cycles and it needs to produce this on its own as straight out flat, no graphic equalizer. Hold the microphone 1/2 inch below your chin and 1/2 inch towards your throat, positioning the microphone under your lower jaw. Next aline and direct the microphone towards your throat. This will produce and incredible up close and personal affect. Because the vocal cords in your throat is where the bass is produced, the sound from your mouth is where the highs are produced, but these alone produce an undesirable voice recording, the throat is what is needed to equalize the desired effect. Following this procedure is vital. I realize this would be difficult for a video if you are in the image. But the sound of your voice will Captivate your audience when making recordings when you're not in the image. Some people use a clip-on microphone attached to their shirt. This is a complete failure. If you experiment, you may find your audience doesn't mind you holding the microphone in one hand under the chin as described above. But it's killer audio, for your audience. Advice that took me years to master. Special note, when looking for a microphone, go to a guitar store where they sell the professional equipment, and make them plug several microphones that you are considering into a amp, to test performance on each. I can tell you microphones that cost over $1,000.00 are usually junk. Also, I prefer that dynamic microphone made with coils and magnets like they made in the 1970s over microphones that are made with a computer chip. Good luck if you want to sound fabulous.

@ahmedwaly9073 6 ай бұрын

Amazing explanation

@chyldstudios 2 жыл бұрын

Stable Diffusion will be integrating OpenAI's CLIP model into their architecture to improve the generation of novel images.

@UmarFaruk-f8t 8 күн бұрын

Are you suggesting the image encoder never have seen any cassette player during it's training and still it has formed an understanding of it in the latency space? I would disagree to this.

@SinanAkkoyun Жыл бұрын

Just a rather basic question, when using cosine similarity and or normalizing and then comparing dot products, doesn't this reduce information? My question is why the length of the vector doesn't play any role when looking for similarities

@anukulkumarsingh220 Жыл бұрын

What if i have multiple captions for a singke image in my dataset. Should i combine them in a single string? Or can i associate multiple captions for the same image?

@fr3fou 11 ай бұрын

In the example code for the `openai/clip-vit-base-patch32` model in huggingface, the logits output is convereted into probabilities using Softmax. In this video / article we don't, why is that?

@celestial_x Жыл бұрын

once again, I am being referred to your videos, last time it was Bag of visual words video. You are pretty famous among professors in India. (not just professors, professors of IITs)

@jamesbriggs Жыл бұрын

Wow, thanks that's awesome!

@sailfromsurigao Жыл бұрын

Do you have material for finetuning CLIP to another dataset?

@helloansuman Жыл бұрын

Amazing video.

@Sonalikohli-s1t 8 ай бұрын

superb video. so much informatic. but i want to use this zero shot learning in numeric dataset, means no text or image dataset , i want to train the zero shot learning model with this type of dataset can you please help me in this task?

@mvrdara 2 жыл бұрын

Great explanation! We still need fine tuning for novel dataset right? Zero shot learning can't fully eliminate fine tuning and transfer learning?

@jamesbriggs 2 жыл бұрын

yes exactly - this works well for more generic use-cases, but not all - for example I have seen fine-tuning required for fashion items and and satellite imagery

@surajitchakraborty1903 2 жыл бұрын

Hi, the Pinecone article link does not seem to work. Are you able to provide the correct link ?

@jamesbriggs 2 жыл бұрын

Oops I fixed it, the correct link is www.pinecone.io/learn/zero-shot-image-classification-clip/

@henkhbit5748 2 жыл бұрын

Impressive, can it be used for face recognition or for non face images only?

@jamesbriggs 2 жыл бұрын

for specific faces it would need more fine-tuning, but otherwise I believe it should work - I'm working on a video now covering CLIP for object detection that should be helpful

@riyaml5332 Жыл бұрын

Hi, this video was outstanding and very informative I have a lot of images each one of which represent a separate class and I don't know how to implement this code on it, it would mean a lot to me if you can assist in this matter

@cloudshoring 2 жыл бұрын

Awesome !

@MogensBrun 3 ай бұрын

Excellent description. I have thousands of images with design objects on a Mac FileMaker server, which can connect to gpt-4o or similar AI-model. I am interested to hear your opinion upon analysing this images according to a JSON file with some few hundreds design taxonomies (category name and description). You are welcome to contact me directly.