Multimodal RAG: Chat with PDFs (Images & Tables) [latest version]

Рет қаралды 24,507

Alejandro AO - Software & Ai

Күн бұрын

Пікірлер: 50

@algatra6942 Ай бұрын

Idk, i just finally found most understandable AI Explanation Content. Thank you Alejandro

@alejandro_ao Ай бұрын

glad to hear this :)

@argl1995 Ай бұрын

@@alejandro_aoI want to create a multi llm chatbot for telecommunication, is it a way to connect with you apart from KZbin so that I can share the problem statement with you ?

@ZevUhuru 29 күн бұрын

Bro I literally came to back to get your old video on PDFs and you already have an update. Thank You!

@onkie.ponkie Ай бұрын

i was about to learn from the previous video. But you brother. just bring more gold.

@alejandro_ao Ай бұрын

you’re the best

@whouchekin Ай бұрын

the best touch is when you add front-end good job

@alejandro_ao Ай бұрын

hey! i'll add a ui for this in a coming tutorial 🤓

@nirmesh44 20 күн бұрын

Best Explanation ever.

@Diego_UG Ай бұрын

What do you recommend or how do you suggest that the conversion of a PDF of images (text images) to text can be automated? The problem is that traditional OCR does not always do the job well, but ChatGPT can handle difficult images.

@uknowme4_5 46 минут бұрын

when will the frontend part come? Super excited for this part

@muhammadadilnaeem Ай бұрын

Amazing Toturial

@jaimeperezpazo Ай бұрын

Excellent!!!! Thank you Alejandro

@ahmadsawal2956 26 күн бұрын

thanks for great content, how can we modify this to user local LLM , Ollama3.2 and Ollama-vision

@ronnie333333 Ай бұрын

Thank you for the video. Just curious, how to go about persisting the multivector database? What data sources are available that cater to such requirements? Also, how do we go about getting an image as an input from the user, so the language model can relate to it within the documents and predict an answer!

@maryamesmaeili7365 14 күн бұрын

Thanks for this beautiful and comprehensive presentation. I do have one question about its security. I would like to use Mistral instead of opening and also using ollama run the code locally, so I would like to know your opinion about the data security. Will it rest secure? Because the pdf files I'm going to use as input are confidential. Thanks for your response in advance

@MrAhsan99 13 күн бұрын

Well, if you're running the model (LLaMA, Mistral, or Qwen) locally, you don't need to worry. It's safe unless someone hacks your PC and steals all your data. 😛

@alejandro_ao 4 күн бұрын

hey there. sure thing. if you are running these models locally, then there is absolutely no data leaving your machine, so no need to worry about data leakage. that being said, consider that there are two moments where your data from your files could leave your computer: - when you call your LLM. so if you are using a local LLM with OLlama, no need to worry. Just make sure to use a multimodal model, such as Llava to make sure to get the images. or stick to a text to text model if your pdf does not require multimodality. - when you are parsing your document. in this example, i used Unstructured's open source library locally. if you do the same, your data never leaves your computer. but you can also use their serverless api. if you do that, then your data would be transiting through their servers. that being said, neither unstructured nor OpenAI/Anthropic use the data you send them to train their models. but I understand if you still don't want your confidential data transiting through foreign services

@SidewaysCat Ай бұрын

Hey dude what are you using to screen record? Mouse sizing and movement looks super smooth id like to create a similar style when giving tutorials

@alejandro_ao Ай бұрын

hey there, that's this screen studio app for mac developed by the awesome Adam Pietrasiak @pie6k, check it out :)

@ChristopherFoster-McBride 25 күн бұрын

Could you create a repo for running this on Windows? Great video btw!

@Pman-i3c Ай бұрын

Very nice, is it possible to be done with local LLM like Ollama model?

@alejandro_ao Ай бұрын

Yes, absolutely. just use the langchain ollama integration and change the line of code where i use ChatOpenAI or ChatGroq. Be sure to select multimodal models when dealing with images though

@saivikas96 21 күн бұрын

Hi Bro, Thanks for the video. I have implemented the same algorithm using local LLM Ollama & I have used LLava model as multimodel . But Whenever I am asking some questions the algorithm is not able to retrieve images. I am stuck at this point.

@maryamesmaeili7365 7 күн бұрын

Hi I saw your comment and I am trying to do the same thing, and I am stocked with ChromaDB, my kernel keeps dying, then I found ChromaDB works well with openAI models, do you use another Vector Database? If yes which one? Thank you in advance.

@duanxn Ай бұрын

Great tutorial, very detailed. Just one question, any options to link the text chunk that describes the image as the context of the image to create more accurate summary of the image?

@alejandro_ao Ай бұрын

beautiful question. totally. as you can see, the image is actually one of the `orig_elements` inside a `CompositeElement`. and the `CompositeElement` object has a property called `text`, which contains the raw text of the entire chunk. this means that instead of just extracting the image alone like i did here, you can extract the image alongside the text in its parent `CompositeElement` and send that along with the image when generating the summary. great idea 💪

@venualli3917 Күн бұрын

I have been trying to execute the code there are lot of dependecy conflicts, if possible would you please provide list of dependencies that you where downloaded

@GowthamRaghavanR Ай бұрын

Good one!!, did you see any opensource alternatives like Markers?

@julianomoraisbarbosa Ай бұрын

# til tks for you video. is possible using crewAI in the same example ?

@davidbaur300990 21 күн бұрын

The functionality you showed to visualize the PDF Page and the retrieved chunk in the end seems super usefull for Citation display. Since I couldnt find the implementation in your colab is there a way to find this implementation somewhere else? Amazing content!

@akagi937 7 күн бұрын

when is the frontend part , being really fun following allow so far

@alejandro_ao 4 күн бұрын

that would be a great addition. i'm working on that one!

@suksukram1234 10 күн бұрын

Good job!

@alexramos587 Ай бұрын

Nice

@AjayKumar-p2f1y 15 күн бұрын

Hi, Its a great video. can you help me with how to install these decencies in windows machine

@blakchos Ай бұрын

any idea to install poppler tesseract libmagic in windows maqhine?

@texasfossilguy 15 күн бұрын

Why did you decide to use Groq?

@alejandro_ao 4 күн бұрын

just because they have a very generous free tier and pretty good models. that turns out useful for people watching who don't want to enter their credit card details to use these LLMs

@MrAhsan99 14 күн бұрын

how to increase the accuracy of this RAG system?

@alejandro_ao 4 күн бұрын

several improvement ideas. here are 3: - you can retrieve the image on the fly only if the chunk retrieved contains images instead of indexing them separately. - you can keep the images indexed separately, but instead of sending the single image to the LLM, you can retrieve their parent chunk and send the entire chunk along with the image for better context. - you can add a persistent document store so you don't have to re-index the whole thing every time

@AkashL-y9q Ай бұрын

Hi Bro, Can you create a video for Multimodal RAG: Chat with video visuals and dialogues.

@alejandro_ao Ай бұрын

this sounds cool! i’ll make a video about it!

@AkashL-y9q Ай бұрын

Thanks @@alejandro_ao

@champln 13 күн бұрын

Can I connect to SQL database source ?

@alejandro_ao 4 күн бұрын

sure thing. that would be more like structured retrieval rather than unstructured. you can check out this video: kzbin.info/www/bejne/b5TGnWSVjNplarMsi=kSu-QkMInzq98u-u

@karansingh-ce8yy Ай бұрын

what about mathematical equatons?

@alejandro_ao Ай бұрын

in this example, i embedded them with the rest of the text. if you want to process them separately, you can always extract them from the `CompositeElement` like i did here with the images. then you can maybe have a LLM explain the equation and vectorize that explanation (like we did with the description of the images). in my case, i just put them with the rest of the text, i feel like that gives the LLM enough context about it.

@karansingh-ce8yy Ай бұрын

@@alejandro_ao thanks for the context i was stuck at this for a week now