Multimodal RAG with Qwen-2 and ColPali: Ask Questions from Images 🔥

Рет қаралды 3,096

19 күн бұрын

In this tutorial, I demonstrate how to use Qwen-2-VL-7B Instruct and ColPali for building a multimodal RAG engine. You'll learn how to process a PDF containing images and ask questions about those images. I also walk you through the indexing process using ColPali, making document retrieval easy and efficient. All the coding is done in Colab for ease of use. 😊
Don't forget to like, comment, and subscribe for more tutorials! 🔥📚
GitHub: github.com/AIAnytime/MultiModal-RAG-using-Qwen-2-VL-and-Colpali
Colpali GitHub: github.com/illuin-tech/colpali?tab=readme-ov-file
Byalidi GitHub: github.com/AnswerDotAI/byaldi
Qwen2 VL: huggingface.co/Qwen/Qwen2-VL-7B-Instruct
Join this channel to get access to perks:
kzbin.info/door/-zVytOQB62OwMhKRi0TDvgjoin
To further support the channel, you can contribute via the following methods:
Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
UPI: sonu1000raw@ybl
#qwen2vl #multimodal #rag #ai

Пікірлер: 11

@gerhardheinzerling9880 15 күн бұрын

Thank you so much for the video. Just great! We have got PDFs with vector graphics in it. So we can just simple get the images from the PDF. Any idea?

@samketola919 18 күн бұрын

How can we extract images along with their figure captions from a PDF?

@mahajanvinod97 18 күн бұрын

I’m encountering an issue where, when I ask a question, the system immediately searches the document for a solution. How can I prevent this? I want the LLM to first fully understand the problem before searching for an answer in the document. Could you please help me with this?