ColPali: Vision-Based RAG System For Complex Documents

  Рет қаралды 18,629

Prompt Engineering

Prompt Engineering

Күн бұрын

Пікірлер: 38
@manishsharma2211
@manishsharma2211 Ай бұрын
one thing what I like about this man is - he shows some background on each line / framework / library used to make people aware about all those nuances interactions b/w projects and researchers involved in it. love that
@israelazarkovitch5852
@israelazarkovitch5852 2 ай бұрын
Colpali is an excellent technique for English documents. When you try to use non-English documents, the retrieval doesn't work well because colapli uses the paligemma model which is a relatively small model trained mostly on an English data set
@engineerprompt
@engineerprompt 2 ай бұрын
good point, but I think you can finetune the vision model for other languages. Qwen is probably a good option there as well. Will see if there are any resources available and will share.
@vardhan254
@vardhan254 2 ай бұрын
qwen 2 VL is good for indic languages atleast from what i have tested
@saranepalashok
@saranepalashok 2 ай бұрын
Excellent. Exactly what I was looking for. A "fine-tuning" episode of such a VBRAG pipeline would be a great followup episode.
@engineerprompt
@engineerprompt 2 ай бұрын
good idea, will look into it.
@frag_it
@frag_it 2 ай бұрын
Can you make an end to end project where instead of an index we throw the embedding to a vectorstore like chromadb or pinecone or something would be amazing
@darpnpro
@darpnpro Ай бұрын
Thank you for sharing this!
@manishsharma2211
@manishsharma2211 Ай бұрын
EXCELLENT VIDEO - THANK YOU
@kai_s1985
@kai_s1985 2 ай бұрын
Great work! Thanks! I wonder how it compares to vanilla RAG for text pdfs in terms of accuracy? Vanilla RAG suffers when the answer for the user question needs to be synthesized from different parts of the text. GraphRAG is good for those cases bit it is slow and expensive. Can this handle complex questions like those?
@LTBLTBLTBLTB
@LTBLTBLTBLTB 2 ай бұрын
I tried to do this technique but with gemini-1.5-flash-exp-0827 and it works fine.
@wdonno
@wdonno 2 ай бұрын
How do you "chunk" or parse sections out of longer documents? Or if we want to create a Knowledge graph? The ultimate analysis is done on an LLM, so we still have context length issues especially for local implementations. Can you extract the text itself for further processing?
@shobhitbishop
@shobhitbishop 2 ай бұрын
Will this work properly on pdf comprising detailed tabular information? And the hand drawn images?
@BACA01
@BACA01 2 ай бұрын
Very good content, thank you.
@engineerprompt
@engineerprompt 2 ай бұрын
Glad you liked it!
@loicbaconnier9150
@loicbaconnier9150 2 ай бұрын
Why do we need a large Vram GPU ? where for Colpali or VLM ?
@mrrohitjadhav470
@mrrohitjadhav470 Ай бұрын
SAME QUESTION FROME ME
@IdPreferNot1
@IdPreferNot1 2 ай бұрын
Cool find in Claudette
@saeeds851
@saeeds851 13 күн бұрын
Would this approach work well with summarization with qwen2 vl 7b locally for technical papers with diagrams. Thank you.
@engineerprompt
@engineerprompt 12 күн бұрын
Yes, checkout the localgpt-vision project that implements and end to end vision based RAG. kzbin.info/www/bejne/j4HWZZh9edV8j5Y
@Yes-lm9dq
@Yes-lm9dq 2 ай бұрын
Do you think one could use this and convert a pdf into a text file which can be used to generate a knowledge graph using Microsoft's GraphRAG?
@tecnom7133
@tecnom7133 2 ай бұрын
Thanks
@tecnom7133
@tecnom7133 2 ай бұрын
may be if you pass an Image URL instead of the Image bytes you will consume less input tokens so less Cost?
@absar66
@absar66 2 ай бұрын
many thanks for this great video…I have a set of scanned pages saved as pdf. will this work?..thanks..
@engineerprompt
@engineerprompt 2 ай бұрын
Yes, I think this approach will work on scanned pages as well.
@diego.castronuovo
@diego.castronuovo Ай бұрын
What if the information to answer a question is in two consecutive pages, and only the first one is retrieved because the second only contains the continuation of the first. This is a real problem.
@engineerprompt
@engineerprompt Ай бұрын
You can ask to retrieve multiple images/pages or can append the neighboring pages to your context. To make all this simple, I have put together an OSS, video coming soon: github.com/PromtEngineer/localGPT-Vision
@amortalbeing
@amortalbeing 2 ай бұрын
thanks
@goran-ai
@goran-ai 2 ай бұрын
What is the best way to contact you for consulting with our dev company?
@RedCloudServices
@RedCloudServices 2 ай бұрын
I wonder if a VBRAG could perform math calculations extracted from an image table? 🤔 I suppose if the results are accurate they could then be passed to another agent capable of calculations on the result?
@engineerprompt
@engineerprompt 2 ай бұрын
math might be a little hard but I think its worth trying.
@user-uk9ls
@user-uk9ls 2 ай бұрын
This does not run on local Nivida RTX 4x 16 RAM GPU ?
@engineerprompt
@engineerprompt 2 ай бұрын
I think that will be able to run the pipeline.
@neatpaul
@neatpaul 2 ай бұрын
But this works only for pdf, what about docx, pptx, epub files? I want to work multimodal on those files too.
@fra4897
@fra4897 2 ай бұрын
it works with whatever that can be converted to image, so everything
@ayanshproplayer5559
@ayanshproplayer5559 2 ай бұрын
Offline work???
@engineerprompt
@engineerprompt 2 ай бұрын
if you watch the video, you will know the answer :)
@kareemyoussef2304
@kareemyoussef2304 2 ай бұрын
None of these solutions are open source..even in your other videos. I think the video you have that uses marker is the only one
Agentic RAG: Make Chatting with Docs Smarter
16:11
Prompt Engineering
Рет қаралды 18 М.
Trapped by the Machine, Saved by Kind Strangers! #shorts
00:21
Fabiosa Best Lifehacks
Рет қаралды 41 МЛН
Увеличили моцареллу для @Lorenzo.bagnati
00:48
Кушать Хочу
Рет қаралды 8 МЛН
Disrespect or Respect 💔❤️
00:27
Thiago Productions
Рет қаралды 41 МЛН
Car Bubble vs Lamborghini
00:33
Stokes Twins
Рет қаралды 45 МЛН
Try this Before RAG. This New Approach Could Save You Thousands!
19:33
Prompt Engineering
Рет қаралды 19 М.
Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems
16:49
Ollama on Kubernetes: ChatGPT for free!
18:29
Mathis Van Eetvelde
Рет қаралды 4,6 М.
Building Production RAG Over Complex Documents
1:22:18
Databricks
Рет қаралды 10 М.
O1’s Chain of Thought: I Built a System to Mimic It-Here’s How It Went!
16:29
ColPali: Vision Language Models for Efficient Document Retrieval
17:36
Prompt Engineering
Рет қаралды 12 М.
The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!
16:14
Visual PDF Reader: ColPALI for RAG  #ai
27:33
Discover AI
Рет қаралды 5 М.
Qwen Just Casually Started the Local AI Revolution
16:05
Cole Medin
Рет қаралды 71 М.
Multimodal RAG: Text, Images, Tables & Audio Pipeline
1:10:54
Tech With Zoum
Рет қаралды 3,6 М.
Trapped by the Machine, Saved by Kind Strangers! #shorts
00:21
Fabiosa Best Lifehacks
Рет қаралды 41 МЛН