The evaluation technique of visualizing the learning rate is something I’ve never seen before. Beautiful.
@Roboflow8 ай бұрын
I’m super glad you liked it. I’ve invested a bit of time to make it look good.
@abdshomad8 ай бұрын
Always the first to teach others all conputer vision tools ... Thank you ... 🎉
@Roboflow8 ай бұрын
You are always first to comment! Thank you… 🙏🏻
@lucasbeyer29858 ай бұрын
Great dive, I'm happy to see you figured this all out without us documenting it well yet! For the issue about bad performance on card detection, I think most the reasons you listed contribute, except I don't think larger model is needed. 1) It suffers from the same issues as pix2seq (and all max-likelihood detection models) of being "conservative" and needing either lots of tricks/augmentations (see pix2seq) to fix that, or RL-tuning (see "Tuning computer vision models with task rewards" paper). 2) Order should be fine if trained for long enough 3) probably attention-only with SGD (what's done in the colab) fine-tuning is too weak for "harder" tasks like this. (not sure about this one)
@Roboflow8 ай бұрын
Thanks a lot! I have almost none intuition with this category of models, so any ideas and suggestions are super helpful. I intend to spend more time battling this issue. Going to start here: - arxiv.org/abs/2302.08242 - arxiv.org/abs/2109.10852
@Roboflow8 ай бұрын
Also if you need any help documenting it I’m happy to help ;)
@lucasbeyer29858 ай бұрын
@@Roboflow yup, those two are good reads to start. Probably pix2seq first, as it introduce the general idea but also lots of hacks to make it work OK, and then RL-tuning second, because it then shows how to skip all of the tricks in a nice way (with RL), but in the writing, assumes you kinda know pix2seq already.
@RipNonoRasta6 ай бұрын
amazing work!
@Roboflow6 ай бұрын
Thank you!
@gokayfem8 ай бұрын
excellent tutorial!
@trezero8 ай бұрын
I'd love to see your take on Phi 3 Vision with Grounding Dino.
@Roboflow8 ай бұрын
I have not tried Phi 3 yet
@Pingu_astrocat218 ай бұрын
Thank you for this :)
@lovol28 ай бұрын
Fantastic. Thank you.
@codeandrobotid24668 ай бұрын
Good job, bro!
@AashishMtho5 ай бұрын
VIdeo on inference please!
@uttamdwivedi77097 ай бұрын
Great tutorial @roboflow !!! Is it possible to train same model for VQA as well as object detection? Can you provide any example of how the JSONL file should look like in such cases?
@Roboflow7 ай бұрын
We will soon add support for VQA datasets in roboflow. I plan to roll out tutorials covering this topic soon.
@유영재-c9c4 ай бұрын
I think the LLaVA families we are familiar with are not specific for detection or segmentation tasks, is that correct?
@MABatin-hd1mt7 ай бұрын
As always another great video. However, do you think few shot detection is possible with PaliGemma where I can fine-tune the model with as little as a single image rather than a large dataset like rf100?
@kaifshaheem75464 ай бұрын
Once the model is saved in .npz format in locally, how it can load the model to make inferences with different case of input?
@techtb2923Ай бұрын
did you find the answer??
@marcc01838 ай бұрын
thank you for this. One question, I've been researching for a while and I haven't found a way. I want to train a model that has information about my trips, meal plans with friends, etc. so I can ask it. I don't know if it is best to use vector embeddings with images or use vision models to convert the images to text, and this text together with the metadata of the photos (with Google Photos API) create a RAG on the model that has information about this in text. What do you recommend vector embeddings with images or RAG with text? all the best
@hegalzhang14576 ай бұрын
Greate work !Do you have example code for finetuning OCR task?
@Roboflow6 ай бұрын
Not yet. But I plan to play with OCR and VQA tasks finetuning.
@mahirafser37365 ай бұрын
I am getting 0 mAP and blank confusion matrix on a medical image dataset (tooth endodonics). Can you tell me the reasons @Roboflow
@thesuriya_37 ай бұрын
can you do as well as in VQA😊 using pytorch code
@haricharan44746 ай бұрын
When I'm running the last step in the notebook I'm getting an error saying "An error occured when getting the model upload URL: 404 Client Error: Not Found for url" how to overcome this ?
@BingxinYang-j7t7 ай бұрын
if I want to fine-tune the PaliGemma model for the segmentation task, what dataset should I prepare?
@Roboflow7 ай бұрын
We don’t have a written tutorial but I made Google Colab some time ago. Would that be enough?
@BingxinYang-j7t7 ай бұрын
@@Roboflow sorry, I cannot find the Google Colab about fine-tuning the PaliGemma model for the segmentation task. Would you please give me the web address? Thank you very much.
@JingyunYang-i1r4 ай бұрын
@@BingxinYang-j7t Have you found it yet? Thanks!
@BingxinYang-j7t4 ай бұрын
@@JingyunYang-i1r No, I have not found it.😮💨
@safiraghulam18627 ай бұрын
Hi, Can I fine-tune the model on a medical dataset? Currently, the model is not performing well on this data, and the results indicate that it is not suitable for medical data out-of-the-box. If I fine-tune the model on my dataset, which consists of approximately 200 to 300 images, will it work better? Additionally, is it possible to quantize this model to reduce its size from 3B to something smaller without significantly compromising its performance? Thank you.
@Roboflow7 ай бұрын
It is possible to fine-tune on medical images I done that on several use-cases like tumors detection.
@safiraghulam18627 ай бұрын
@@Roboflow okay
@toobasheikh1068 ай бұрын
Hi, I fine tuned the model on a small medical object detection dataset with just 223 training examples and after training for more than 20 epochs it starts giving random predictions (classes that are not there in the dataset and even some spam classes). Could you please highlight what could be the reason?
@Roboflow8 ай бұрын
Can you send me the link to your dataset?
@toobasheikh1068 ай бұрын
@@Roboflow sorry it's a private dataset for organoid cell detection and has 4 classes
@toobasheikh1068 ай бұрын
It gives me random predictions like this: metaphase; talaga; ExecuteAsync; tooth; JvmSlicer#
@suphotnarapong3558 ай бұрын
Thank you
@AkramKhanyt8 ай бұрын
How did you deploy the model and also give code for how to download dependencies supervision of paligemma ........
@Roboflow8 ай бұрын
For now I deployed it simply by cloning this HF space: huggingface.co/spaces/big-vision/paligemma But we are wrapping up the work on github.com/roboflow/inference integration.
@IceStormSerenade7 ай бұрын
@@Roboflow Hii can you explain how the deployment works please? I haven't used gradio before so I am having difficulty with it...
@president28 ай бұрын
What about liquid AI? I see a video from years ago. Any updates on this process?
@ttkrpink8 ай бұрын
Great tutorial. I followed the tutorial online, everything worked great. When I tried on a local machine, I got the following error on the line: for image, _, caption in make_predictions(validation_data_iterator(), num_examples=4, batch_size=4): ValueError: Sharding GSPMDSharding({devices=[3]
@Roboflow8 ай бұрын
Do you have multiple GPUs locally?
@ttkrpink8 ай бұрын
@@Roboflow yes, I do have 3 GPUs
@Roboflow8 ай бұрын
@@ttkrpink please try using `batch_size` value that is devisable by 3. 3, 6, 9, 12, 15 something like this. It looks like it tries to split your data equally across all the GPUs but you set batch_size=4.
@shubhamkumarjha18683 ай бұрын
What is the best model for training multiple bounding boxes
@Roboflow3 ай бұрын
If we talk about VLMs, that’s probably Florence-2
@shubhamkumarjha18683 ай бұрын
@@Roboflow Thanks for the reply. 😀
@darklord964238 ай бұрын
Good job, bro, but don't u think that not including the answers bout photo, Yolo is better than paligemma. I mean in detection
@Roboflow8 ай бұрын
YOLO is definitely a better detection model, so if you don’t care about other capabilities, you’ll probably be better off staying with YOLO.
@유영재-c9c4 ай бұрын
jax is difficult for me. huggingface is right for me 😅