Multi-modal RAG: Chat with Docs containing Images

  Рет қаралды 15,558

Prompt Engineering

24 күн бұрын

Learn how to build a multimodal RAG system using CLIP mdoel.
LINKS:
Notebook: tinyurl.com/pfc64874
Flow charts in the paper:
tinyurl.com/4pp78xuf
tinyurl.com/5yeww5py
tinyurl.com/4un6y6x5
tinyurl.com/2jkbb3ma
💻 RAG Beyond Basics Course:
prompt-s-site.thinkific.com/courses/rag
Let's Connect:
🦾 Discord: discord.com/invite/t4eYQRUcXB
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: www.patreon.com/PromptEngineering
💼Consulting: calendly.com/engineerprompt/consulting-call
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Newsletter, localgpt:
tally.so/r/3y9bb0
00:00 Introduction to Multimodal RAC Systems
01:24 First Approach: Unified Vector Space
02:23 Second Approach: Grounding Modalities to Text
03:57 Third Approach: Separate Vector Stores
06:26 Code Implementation: Setting Up
09:05 Code Implementation: Downloading Data
11:13 Code Implementation: Creating Vector Stores
14:00 Querying the Vector Store
All Interesting Videos:
Everything LangChain: kzbin.info/aero/PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr
Everything LLM: kzbin.info/aero/PLVEEucA9MYhNF5-zeb4Iw2Nl1OKTH-Txw
Everything Midjourney: kzbin.info/aero/PLVEEucA9MYhMdrdHZtFeEebl20LPkaSmw
AI Image Generation: kzbin.info/aero/PLVEEucA9MYhPVgYazU5hx6emMXtargd4z

Пікірлер: 36
@engineerprompt
@engineerprompt 24 күн бұрын
If you want to learn RAG Beyond Basics, checkout this course: prompt-s-site.thinkific.com/courses/rag
@AI-Teamone
@AI-Teamone 22 күн бұрын
Such an insightful information, Eagerly waiting for more multimodel approches.
@ilaydelrey3122
@ilaydelrey3122 23 күн бұрын
a nice open source and self hosted version would be great
@RolandoLopezNieto
@RolandoLopezNieto 24 күн бұрын
Lots of good info, thanks
@tasfiulhedayet
@tasfiulhedayet 23 күн бұрын
We need more videos on this topic
@aerotheory
@aerotheory 23 күн бұрын
Keep going with this approach, it is something I have been struggling with.
@waju3234
@waju3234 22 күн бұрын
Me too. For my case, the answer is normally hidden behind the data, context and the images.
@ArdeniusYT
@ArdeniusYT 24 күн бұрын
Hi your videos are very helpful thank you
@engineerprompt
@engineerprompt 24 күн бұрын
Glad you like them!
@ai-touch9
@ai-touch9 23 күн бұрын
I appreciate your effort. Pl create one to fine tune the model for efficient retrieval if possible, with lang chain.
@mohsenghafari7652
@mohsenghafari7652 24 күн бұрын
it's great job! Thanks
@engineerprompt
@engineerprompt 24 күн бұрын
thanks :)
@legendchdou9578
@legendchdou9578 23 күн бұрын
Very nice video but if you can do it with open source embedding model it would be very cool. thank you for the video
@vinayakaholla
@vinayakaholla 24 күн бұрын
Can you pls dive deeper into why qdrant was used and other vector dbs limitations to store both text and image embeddings, thx
@engineerprompt
@engineerprompt 24 күн бұрын
will see if I can create a video on it.
@BACA01
@BACA01 24 күн бұрын
Thanks your videos are very helpful. I have several Gigs of pdf ebooks that i would like to process with RAG. What do you think what approach would be the best, this or a graphrag. In my case i'm looking only for local models as the costs would be very high. What if to convert all pdf pages into images first and then process them with local model like phi 3 vision and then process it with Graphrag, would it work out?
@garfield584
@garfield584 22 күн бұрын
Thanks
@Techn0man1ac
@Techn0man1ac 19 күн бұрын
What about make same, but using LLAMA3 or less local LLM?
@ignaciopincheira23
@ignaciopincheira23 24 күн бұрын
It is essential to conduct a thorough preprocessing of the documents before entering them into the RAG. This involves extracting the text, tables, and images, and processing the latter through a vision module. Additionally, it is crucial to maintain content coherence by ensuring that references to tables and images are correctly preserved in the text. Only after this processing should the documents be entered into a LLM.
@engineerprompt
@engineerprompt 24 күн бұрын
agree!
@jtjames79
@jtjames79 22 күн бұрын
That's a lot of work. Can an AI do this?
@engineerprompt
@engineerprompt 22 күн бұрын
@@jtjames79 Yup :)
@JNET_Reloaded
@JNET_Reloaded 24 күн бұрын
wheres the code used?
@BarryMarkGee
@BarryMarkGee 6 күн бұрын
Out of interest what is the application called that you used to illustrate the flows? (2:53 in the video) thanks.
@engineerprompt
@engineerprompt 6 күн бұрын
I am using mermaid code for this.
@BarryMarkGee
@BarryMarkGee 6 күн бұрын
@@engineerprompt thanks. Great video btw 👍🏻
@amanharis1845
@amanharis1845 24 күн бұрын
Can we do this method using Langchain ?
@engineerprompt
@engineerprompt 24 күн бұрын
Yes, will be creating a video on it.
@codelucky
@codelucky 21 күн бұрын
Is it better than GraphRAG? How does the output quality compare to it?
@engineerprompt
@engineerprompt 20 күн бұрын
You could potentially create a graphRAG on top of it.
@RickySupriyadi
@RickySupriyadi 24 күн бұрын
I except image generation will be have another kind of breed... image gen based on image understanding based on facts
@redbaron3555
@redbaron3555 23 күн бұрын
This approach is not good enough to add value. The pictures and text needs to be referenced and linked in both vector stores to create better similarities.
@engineerprompt
@engineerprompt 20 күн бұрын
watch my latest video :)
@arifmp3284
@arifmp3284 6 күн бұрын
U have any work?
@Know_Ur_World
@Know_Ur_World 2 күн бұрын
Which video ​@@engineerprompt