Рет қаралды 7,731
Learn how to fine-tune PaliGemma, Google's open-source Vision-Language Model, for custom object detection tasks. This step-by-step tutorial walks you through modifying Google's notebook to train PaliGemma on your dataset. We'll use the handwritten digits and math operations dataset from RF100, explore the JSONL format, and demonstrate how to deploy your fine-tuned model for real-world inference. Discover the power of PaliGemma for image captioning, VQA, and object detection, and overcome its limitations.
Chapters:
- 00:00 PaliGemma Capabilities
- 02:03 Environment Setup
- 05:25 Dataset Format
- 09:07 Downloading Pre-trained Model
- 11:27 Loading Dataset
- 13:45 Training and Evaluating the Model
- 15:19 Deploying the Model
- 17:37 Important Considerations
- 20:02 Outro
Resources:
- Roboflow: roboflow.com
- 🔴 Community Session June 6th, 2024 at 08:00 AM PST / 11:00 AM EST / 05:00 PM CET: roboflow.stream
- ⭐ Notebooks GitHub: github.com/roboflow/notebooks
- ⭐ Supervision GitHub: github.com/roboflow/supervision
- 📓 PaliGemma notebook: colab.research.google.com/git...
- 🗞 Gemma arXiv paper: arxiv.org/pdf/2403.08295
- 🗞 SigLIP arXiv paper: arxiv.org/pdf/2303.15343
- 🗞 PaliGemma blog post: blog.roboflow.com/how-to-fine...
- 🔗 RF100: www.rf100.org
- 🔗 PaliGemma model card: www.kaggle.com/models/google/...
- 🔗 PaliGemma fine-tuned checkpoints: huggingface.co/collections/go...
- 🔗 PaliGemma HF Space: huggingface.co/spaces/big-vis...
Stay updated with the projects I'm working on at github.com/roboflow and github.com/SkalskiP! ⭐