Mastering Google's VLM PaliGemma: Tips And Tricks For Success and Fine Tuning

  Рет қаралды 10,579

Sam Witteveen

Sam Witteveen

Күн бұрын

Пікірлер: 20
@amandamate9117
@amandamate9117 4 ай бұрын
excellent video, cant wait for more visual model examples especially with ScreenAI for agents who browse the web
@OumarDicko-c5i
@OumarDicko-c5i 4 ай бұрын
Thank you for your video
@paulmiller591
@paulmiller591 4 ай бұрын
This is an exciting sub-field. We have a lot of clients making observations so keen to try this. Happy travels Sam.
@FirstArtChannel
@FirstArtChannel 4 ай бұрын
Inference speed and size of the model still seems reasonable longer/larger than a Multimodal LLM such as LLaVA, or am I wrong?
@samwitteveenai
@samwitteveenai 4 ай бұрын
honestly its a while since I played with LLaVA and mostly I have just used it on Ollama, so not sure how it compares. Phi3-Vision is also worth checking out. I may make a video on that as well
@sundarrajendiran2722
@sundarrajendiran2722 Ай бұрын
Can we upload multiple images in the demo and ask questions which have answer in any one of the images?
@SenderyLutson
@SenderyLutson 4 ай бұрын
I think the the Aria dataset from Meta is also open
@samwitteveenai
@samwitteveenai 4 ай бұрын
interesting dataset. Didn't know about this. Thanks
@miguelalba2106
@miguelalba2106 3 ай бұрын
Do you know how good the dataset should be in terms of completeness for fine tuning? I have lots of images-texts of clothes, but in some there are more details than others, so I guess during training the model will be confused. Ex. There are thousands of images of dresses with only the color, and thousands of images with color + other details
@SonGoku-pc7jl
@SonGoku-pc7jl 4 ай бұрын
thanks, we will see phi 3 with vision for compare :)
@AngusLou
@AngusLou 4 ай бұрын
Is it possible to make the whole thing local?
@ricardocosta9336
@ricardocosta9336 4 ай бұрын
Ty my dude
@willjohnston8216
@willjohnston8216 4 ай бұрын
Do you know if they are going to release a model for real time video sentiment analysis? I thought there was a demo of that by either Google or OpenAI?
@samwitteveenai
@samwitteveenai 4 ай бұрын
not sure but you can do some of this already with Gemini, just not realtime (publicly at least)
@SenderyLutson
@SenderyLutson 4 ай бұрын
How many VRAM do this model consume on while running? And the Q4 version?
@samwitteveenai
@samwitteveenai 4 ай бұрын
the inference was running on a T4 so it is manageable. The FT was on an A100.
@unclecode
@unclecode 4 ай бұрын
Fascinating. I wonder if there is any example for fine-tuning for segmentation involved. If so, the way we collate the data should be different. I have one question about the timeline at 15 minutes and 30 seconds. I noticed a part of the code that splits the data set into train and test. But after split it says `train_ds = split_ds["test"]` shouldn't be "train"?. I think that might be a mistake. What do you think? Very interesting content, especially if the model has the general knowledge to get into a game like your McDonald's example. This definitely has great applications in medical and education fields as well. Thank you for the content.
@samwitteveenai
@samwitteveenai 4 ай бұрын
just look at the output from the model when you do segmentation and copy that. Yes you will need to to update the collate function. The "test" part is correct because it is just setting it to train on a very small number of examples, in a real training yes use the 'train' with is 95% of the data as opposed to 5% on the test.
@unclecode
@unclecode 4 ай бұрын
@@samwitteveenai Oh ok, that was for just video demo, thx for clarification 👍
@unclecode
@unclecode 4 ай бұрын
​@@samwitteveenai Thx, I get it now, the "test" is just for the demo in this colab. Although It would've been clearer if they used a subset of like 100 rows from the train split. I experimented a bit, the model is super friendly to fine-tuning. Whatever they did, it made this model really easy to tune. We're in a time where "tune-friendly" actually makes sense.
New Summarization via In Context Learning with a New Class of Models
28:12
Build Anything with AI Agents, Here's How
29:49
David Ondrej
Рет қаралды 294 М.
Как подписать? 😂 #shorts
00:10
Денис Кукояка
Рет қаралды 7 МЛН
А ВЫ ЛЮБИТЕ ШКОЛУ?? #shorts
00:20
Паша Осадчий
Рет қаралды 9 МЛН
Man Mocks Wife's Exercise Routine, Faces Embarrassment at Work #shorts
00:32
Fabiosa Best Lifehacks
Рет қаралды 5 МЛН
How To Get Married:   #short
00:22
Jin and Hattie
Рет қаралды 21 МЛН
Florence 2 - The Best Small VLM Out There?
14:02
Sam Witteveen
Рет қаралды 14 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 993 М.
Testing Microsoft's New VLM - Phi-3 Vision
14:53
Sam Witteveen
Рет қаралды 13 М.
Run Google Gemma 2B 7B Locally on the CPU & GPU
25:03
Nono Martínez Alonso
Рет қаралды 10 М.
Using Ollama To Build a FULLY LOCAL "ChatGPT Clone"
11:17
Matthew Berman
Рет қаралды 252 М.
5 Problems Getting LLM Agents into Production
13:12
Sam Witteveen
Рет қаралды 13 М.
Moshi The Talking AI
15:29
Sam Witteveen
Рет қаралды 14 М.
Marker: This Open-Source Tool will make your PDFs LLM Ready
14:11
Prompt Engineering
Рет қаралды 51 М.
Unveiling Meta's Impressive CV Model: Sam 2
12:00
Sam Witteveen
Рет қаралды 34 М.
Как подписать? 😂 #shorts
00:10
Денис Кукояка
Рет қаралды 7 МЛН