PaliGemma by Google: Train Model on Custom Detection Dataset

  Рет қаралды 12,695

Roboflow

Roboflow

Күн бұрын

Пікірлер: 57
@rairorr
@rairorr 8 ай бұрын
The evaluation technique of visualizing the learning rate is something I’ve never seen before. Beautiful.
@Roboflow
@Roboflow 8 ай бұрын
I’m super glad you liked it. I’ve invested a bit of time to make it look good.
@abdshomad
@abdshomad 8 ай бұрын
Always the first to teach others all conputer vision tools ... Thank you ... 🎉
@Roboflow
@Roboflow 8 ай бұрын
You are always first to comment! Thank you… 🙏🏻
@lucasbeyer2985
@lucasbeyer2985 8 ай бұрын
Great dive, I'm happy to see you figured this all out without us documenting it well yet! For the issue about bad performance on card detection, I think most the reasons you listed contribute, except I don't think larger model is needed. 1) It suffers from the same issues as pix2seq (and all max-likelihood detection models) of being "conservative" and needing either lots of tricks/augmentations (see pix2seq) to fix that, or RL-tuning (see "Tuning computer vision models with task rewards" paper). 2) Order should be fine if trained for long enough 3) probably attention-only with SGD (what's done in the colab) fine-tuning is too weak for "harder" tasks like this. (not sure about this one)
@Roboflow
@Roboflow 8 ай бұрын
Thanks a lot! I have almost none intuition with this category of models, so any ideas and suggestions are super helpful. I intend to spend more time battling this issue. Going to start here: - arxiv.org/abs/2302.08242 - arxiv.org/abs/2109.10852
@Roboflow
@Roboflow 8 ай бұрын
Also if you need any help documenting it I’m happy to help ;)
@lucasbeyer2985
@lucasbeyer2985 8 ай бұрын
@@Roboflow yup, those two are good reads to start. Probably pix2seq first, as it introduce the general idea but also lots of hacks to make it work OK, and then RL-tuning second, because it then shows how to skip all of the tricks in a nice way (with RL), but in the writing, assumes you kinda know pix2seq already.
@RipNonoRasta
@RipNonoRasta 6 ай бұрын
amazing work!
@Roboflow
@Roboflow 6 ай бұрын
Thank you!
@gokayfem
@gokayfem 8 ай бұрын
excellent tutorial!
@trezero
@trezero 8 ай бұрын
I'd love to see your take on Phi 3 Vision with Grounding Dino.
@Roboflow
@Roboflow 8 ай бұрын
I have not tried Phi 3 yet
@Pingu_astrocat21
@Pingu_astrocat21 8 ай бұрын
Thank you for this :)
@lovol2
@lovol2 8 ай бұрын
Fantastic. Thank you.
@codeandrobotid2466
@codeandrobotid2466 8 ай бұрын
Good job, bro!
@AashishMtho
@AashishMtho 5 ай бұрын
VIdeo on inference please!
@uttamdwivedi7709
@uttamdwivedi7709 7 ай бұрын
Great tutorial @roboflow !!! Is it possible to train same model for VQA as well as object detection? Can you provide any example of how the JSONL file should look like in such cases?
@Roboflow
@Roboflow 7 ай бұрын
We will soon add support for VQA datasets in roboflow. I plan to roll out tutorials covering this topic soon.
@유영재-c9c
@유영재-c9c 4 ай бұрын
I think the LLaVA families we are familiar with are not specific for detection or segmentation tasks, is that correct?
@MABatin-hd1mt
@MABatin-hd1mt 7 ай бұрын
As always another great video. However, do you think few shot detection is possible with PaliGemma where I can fine-tune the model with as little as a single image rather than a large dataset like rf100?
@kaifshaheem7546
@kaifshaheem7546 4 ай бұрын
Once the model is saved in .npz format in locally, how it can load the model to make inferences with different case of input?
@techtb2923
@techtb2923 Ай бұрын
did you find the answer??
@marcc0183
@marcc0183 8 ай бұрын
thank you for this. One question, I've been researching for a while and I haven't found a way. I want to train a model that has information about my trips, meal plans with friends, etc. so I can ask it. I don't know if it is best to use vector embeddings with images or use vision models to convert the images to text, and this text together with the metadata of the photos (with Google Photos API) create a RAG on the model that has information about this in text. What do you recommend vector embeddings with images or RAG with text? all the best
@hegalzhang1457
@hegalzhang1457 6 ай бұрын
Greate work !Do you have example code for finetuning OCR task?
@Roboflow
@Roboflow 6 ай бұрын
Not yet. But I plan to play with OCR and VQA tasks finetuning.
@mahirafser3736
@mahirafser3736 5 ай бұрын
I am getting 0 mAP and blank confusion matrix on a medical image dataset (tooth endodonics). Can you tell me the reasons @Roboflow
@thesuriya_3
@thesuriya_3 7 ай бұрын
can you do as well as in VQA😊 using pytorch code
@haricharan4474
@haricharan4474 6 ай бұрын
When I'm running the last step in the notebook I'm getting an error saying "An error occured when getting the model upload URL: 404 Client Error: Not Found for url" how to overcome this ?
@BingxinYang-j7t
@BingxinYang-j7t 7 ай бұрын
if I want to fine-tune the PaliGemma model for the segmentation task, what dataset should I prepare?
@Roboflow
@Roboflow 7 ай бұрын
We don’t have a written tutorial but I made Google Colab some time ago. Would that be enough?
@BingxinYang-j7t
@BingxinYang-j7t 7 ай бұрын
@@Roboflow sorry, I cannot find the Google Colab about fine-tuning the PaliGemma model for the segmentation task. Would you please give me the web address? Thank you very much.
@JingyunYang-i1r
@JingyunYang-i1r 4 ай бұрын
@@BingxinYang-j7t Have you found it yet? Thanks!
@BingxinYang-j7t
@BingxinYang-j7t 4 ай бұрын
@@JingyunYang-i1r No, I have not found it.😮‍💨
@safiraghulam1862
@safiraghulam1862 7 ай бұрын
Hi, Can I fine-tune the model on a medical dataset? Currently, the model is not performing well on this data, and the results indicate that it is not suitable for medical data out-of-the-box. If I fine-tune the model on my dataset, which consists of approximately 200 to 300 images, will it work better? Additionally, is it possible to quantize this model to reduce its size from 3B to something smaller without significantly compromising its performance? Thank you.
@Roboflow
@Roboflow 7 ай бұрын
It is possible to fine-tune on medical images I done that on several use-cases like tumors detection.
@safiraghulam1862
@safiraghulam1862 7 ай бұрын
@@Roboflow okay
@toobasheikh106
@toobasheikh106 8 ай бұрын
Hi, I fine tuned the model on a small medical object detection dataset with just 223 training examples and after training for more than 20 epochs it starts giving random predictions (classes that are not there in the dataset and even some spam classes). Could you please highlight what could be the reason?
@Roboflow
@Roboflow 8 ай бұрын
Can you send me the link to your dataset?
@toobasheikh106
@toobasheikh106 8 ай бұрын
@@Roboflow sorry it's a private dataset for organoid cell detection and has 4 classes
@toobasheikh106
@toobasheikh106 8 ай бұрын
It gives me random predictions like this: metaphase; talaga; ExecuteAsync; tooth; JvmSlicer#
@suphotnarapong355
@suphotnarapong355 8 ай бұрын
Thank you
@AkramKhanyt
@AkramKhanyt 8 ай бұрын
How did you deploy the model and also give code for how to download dependencies supervision of paligemma ........
@Roboflow
@Roboflow 8 ай бұрын
For now I deployed it simply by cloning this HF space: huggingface.co/spaces/big-vision/paligemma But we are wrapping up the work on github.com/roboflow/inference integration.
@IceStormSerenade
@IceStormSerenade 7 ай бұрын
@@Roboflow Hii can you explain how the deployment works please? I haven't used gradio before so I am having difficulty with it...
@president2
@president2 8 ай бұрын
What about liquid AI? I see a video from years ago. Any updates on this process?
@ttkrpink
@ttkrpink 8 ай бұрын
Great tutorial. I followed the tutorial online, everything worked great. When I tried on a local machine, I got the following error on the line: for image, _, caption in make_predictions(validation_data_iterator(), num_examples=4, batch_size=4): ValueError: Sharding GSPMDSharding({devices=[3]
@Roboflow
@Roboflow 8 ай бұрын
Do you have multiple GPUs locally?
@ttkrpink
@ttkrpink 8 ай бұрын
@@Roboflow yes, I do have 3 GPUs
@Roboflow
@Roboflow 8 ай бұрын
​@@ttkrpink please try using `batch_size` value that is devisable by 3. 3, 6, 9, 12, 15 something like this. It looks like it tries to split your data equally across all the GPUs but you set batch_size=4.
@shubhamkumarjha1868
@shubhamkumarjha1868 3 ай бұрын
What is the best model for training multiple bounding boxes
@Roboflow
@Roboflow 3 ай бұрын
If we talk about VLMs, that’s probably Florence-2
@shubhamkumarjha1868
@shubhamkumarjha1868 3 ай бұрын
@@Roboflow Thanks for the reply. 😀
@darklord96423
@darklord96423 8 ай бұрын
Good job, bro, but don't u think that not including the answers bout photo, Yolo is better than paligemma. I mean in detection
@Roboflow
@Roboflow 8 ай бұрын
YOLO is definitely a better detection model, so if you don’t care about other capabilities, you’ll probably be better off staying with YOLO.
@유영재-c9c
@유영재-c9c 4 ай бұрын
jax is difficult for me. huggingface is right for me 😅
Florence-2: Fine-tune Microsoft’s Multimodal Model
25:43
Roboflow
Рет қаралды 20 М.
YOLO-World: Real-Time, Zero-Shot Object Detection Explained
17:49
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 16 МЛН
Google's 9 Hour AI Prompt Engineering Course In 20 Minutes
20:17
Tina Huang
Рет қаралды 233 М.
How to fine-tune a model using LoRA (step by step)
38:03
Underfitted
Рет қаралды 15 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,7 МЛН
YOLOv9: How to Train on Custom Dataset from Scratch with Ultralytics
21:22