Build your own RAG based LLM Application (Completely Offline!): AI for your documents

Рет қаралды 7,826

Yankee Maharjan

Күн бұрын

Пікірлер

@hurricanos13 7 күн бұрын

This is by far the best and concise rag tutorial available online.

@starX7995 18 күн бұрын

bro this is the only tutorial which helped me so far of all the other youtube videos i have watched on rag based applications love you from india bro

@yankeem 18 күн бұрын

Thank you for your kind words! Glad it helped! 🚀

@modest_supreme 14 күн бұрын

Great tutorial, thank you for going in depth and showing me these tools!

@programmingholic Ай бұрын

Thaks bro for sharing, it's helpful.

@fancypetsulove 3 күн бұрын

Talk slowly to make it easy to follow. Thanks for this tutorial

@JellosKanellos 20 күн бұрын

Thanks for the awesome video! I am setting up a LLM/RAG project where I want the LLM to analyze log files. I noticed that the upsert actions into the chromadb take a long time, even with relatively small log files. Inserting around 8000 chunks can easily take up 10 minutes. I believe the lack of concurrency in the underlying SQLite architecture is the problem. Are there any alternatives to chromadb that I can use in order to solve this problem? I still want to run everything locally. Thanks in advance.

@yankeem 20 күн бұрын

Hey Jellos, this might be worth the read: docs.trychroma.com/deployment/performance#insert-throughput Apart from this, try using Vector Database with an Async Client like Qdrant. Also note that the upsert might be taking longer due to the embedding model. If you switch to a different model, that might help as well. Let me know what works for you!

@JellosKanellos 16 күн бұрын

@@yankeem Thanks for the suggestions. I did test Qdrant today. It is about a factor 2 faster. I am trying to change the configuration to keep making it faster.

@yankeem 16 күн бұрын

Awesome!

@EverydayKarma Ай бұрын

Lets goo 🥳

@akki_the_tecki 24 күн бұрын

is this the BASIC RAG? or AVANCED ONE? because I should develop an ENTERPRISE RAG for atleast 1000 pdf documents. can i get help from you my brother?

@themax2go 23 күн бұрын

what''s the issue here? if you can "rag" 1 pdf you can "rag" 1k pdfs. though you might need to use a diff vector store. search for vector db comparison, there are a few vids. also, what's "advanced one" mean... do you mean ms' graphrag? depending on the "theme" of the docs you might not need graphrag. it's not that straightforward (yet)

@akki_the_tecki 23 күн бұрын

@themax2go suggest me some videos brother... My rag pipeline is very basic without reranking and all.. I want a robust one

@yankeem 22 күн бұрын

Please watch the full video, If you want perform a semantic search on Vector DB and let LLM generate the answer, then video up to this chapter covers it 15:58. If you need to rank the extracted document based on relevance then section from 20:30 would be relevant.

@akki_the_tecki 22 күн бұрын

@@yankeem I need it for Multiple files (pdfs) and it shld be different UIs for Ingesting (ingest.py) and Generation (app.py)

@themax2go 19 күн бұрын

you're too vague. what is "advanced rag", explain?

@girishcoorgi4224 20 күн бұрын

I tried to replicate the code by setting up the environment on mac machine. I am getting this error "ConnectError: [Errno 61] Connection refused in upsert." while clicking on the process button On windows machine - I am getting this error "error PermissionError: [WinError 32] The process cannot access the file because it is being used by another process:" Please help me fix this issue

@yankeem 20 күн бұрын

Can you try the dependencies from the demo repo? github.com/yankeexe/llm-rag-with-reranker-demo

@Ramakishh Ай бұрын

if we have images in the pdf ,will it process with out error?

@yankeem Ай бұрын

No, it will just extract the text by default. We have to extract and process images separately in case of multimodal RAG application.

@slayofficial1136 6 күн бұрын

bro its very easy with pdf data try to do with csv file and excel data

@yankeem 6 күн бұрын

All the steps are the same with CSV and excel as well. If you want to use csv use this loader instead of the PDFLoader as shown in the video. python.langchain.com/docs/how_to/document_loader_csv/ -- Same for excel use a supported loader like this one: python.langchain.com/docs/how_to/document_loader_office_file/ Or convert it to markdown or any other format and use a relevant loader. You can learn more here: python.langchain.com/docs/how_to/#document-loaders And here: python.langchain.com/docs/how_to/#text-splitters

@mohammad-xy9ow Ай бұрын

Bhai isko sikhne k baad koi job mil skta h kya

@yankeem 22 күн бұрын

You can try learning it and putting it on your CV. Best of luck.

@docuong9569 Ай бұрын

error PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\cuongdd\\AppData\\Local\\Temp\\tmprjpx428b.pdf'

@akki_the_tecki 22 күн бұрын

@@docuong9569 yes same error.. have you made it to pass?

@yankeem 20 күн бұрын

Is this resolved?

@akki_the_tecki 20 күн бұрын

@yankeem we should run it on Ubuntu terminal! then there is no errors

@docuong9569 20 күн бұрын

@@yankeem No, please show me how to fix the error. I run it on windows 10

@yankeem 20 күн бұрын

@@docuong9569 I don't have a windows machine to test this, can you DM me the stack trace on LinkedIn?

@MicheleSblendorio-n6v 29 күн бұрын

Even though I cloned the repo, it gives me an error regarding the chromadb when I process a document. How can I solve it? Many thanks! if isinstance(target[0], (int, float)) and not isinstance(target[0], bool): ~~~~~~^^^ IndexError: list index out of range in upsert. 2024-11-26 10:13:23.939 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_

@yankeem 28 күн бұрын

Resolving in DM.

@akki_the_tecki 22 күн бұрын

@@yankeem where is ur DM,? how can i contact you?

@akki_the_tecki 22 күн бұрын

@@MicheleSblendorio-n6v iam also getting the same error

@MicheleSblendorio-n6v 22 күн бұрын

@@akki_the_tecki I haven't solved it yet. I'll have to debug the code.

@akki_the_tecki 22 күн бұрын

@@MicheleSblendorio-n6v pls tell me how if u did it!