Contextual RAG is stupidly brilliant!

Рет қаралды 16,820

Күн бұрын

Пікірлер: 57

@epokaixyz Ай бұрын

Consider these actionable insights from the video: 1. Understand the power of context in search queries and how it enhances accuracy in Retrieval Augmented Generation (RAG). 2. Experiment with different chunking strategies for your data when building your RAG system. 3. Explore and utilize embedding models like Gemini and Voyage for transforming text into numerical representations. 4. Combine embedding models with BM25, a ranking function, to improve ranking and retrieval processes. 5. Implement contextual retrieval by adding context to data chunks using Large Language Models (LLMs). 6. Analyze the cost and benefits of using contextual retrieval, considering factors like processing power and latency. 7. Optimize your RAG system by experimenting with reranking during inference to fine-tune retrieval results.

@amortalbeing Ай бұрын

so the LLM is the Achilles heel of the whole process. if it messes up the context, everything goes south immediately! but if it works well by default, it will enhance the final results

@PeterDrewSEO Ай бұрын

Mate, I've been trying to understand RAG for ages, non coder here obviously, but your explanation was brilliant. Thank you

@int8float64 Ай бұрын

As you said its really costly like graph vector DBs and high maintenance. A classic (sparse + dense retriever) + sparse reranker should simple do a good job also considering most of the new sota models have more context window.

@henkhbit5748 Ай бұрын

Thanks for the update.👍 We see a lot of different techniques to improve RAG and the additional quality improvement are not that big and the cost are much higher (more tokens) and also the inference time goes up... Agree, that for most of use cases its not worth the effort and money.

@kenchang3456 Ай бұрын

This is really interesting and I think, intuitively, it will help me with my project. Thank you very much.

@dr.mikeybee Ай бұрын

You can create the contextual tag locally using ollama.

@wylhias Ай бұрын

I've been working on something quite similar over the last few months for a corpus of documents that are in a tree hierarchy to increase accuracy. Seems it was not a bad idea after all 😁

@ysy69 Ай бұрын

excellent video and insights!

@1littlecoder Ай бұрын

Glad you enjoyed it!

@ROKIBULHASANTANZIM Ай бұрын

i was really caught off guard when you said '....large human being' 😂😂

@1littlecoder Ай бұрын

i just rewatched it 🤣

@1voice4all Ай бұрын

Unfortunately, large humans are extinct! [or maybe left planet Earth.]

@shobhitsadwal6081 Ай бұрын

🤣🤣🤣🤣🤣🤣

@SleepeJobs Ай бұрын

Thank you for such insights and simple explanation

@laviray5447 Ай бұрын

Honestly that few percent improvement is not worth for most cases...

@MhemanthRachaboyina 7 күн бұрын

Great Video

@1littlecoder 7 күн бұрын

@@MhemanthRachaboyina thank you

@arashputata Ай бұрын

Is it really worth all the noise and having a new name for it and all? This is an idea that many developers have already been using. I mean anyone who thinks a little bit naturally realizes that adding a little description of what the chunk is about in relation to the rest of the document, would have automatically do it :D Myself and many others have been doing it for very obvious reasons .. I just didnt know I have to give it a name and publish it as technique.. these LLM BS taught me one thing , and that is put a name on any trivial idea and you are now an inventor

@1littlecoder Ай бұрын

Honestly, that's one thing I've actually mentioned on the video. If such improvements are something you need

@laviray5447 Ай бұрын

Yes, actually there are many more techniques like this which offer similar percent of improvement and none of them are worth it. Basic rag is still enough for now.

@RajaSekharaReddyKaluri Ай бұрын

Thank you! Feeding in whole document text to add few lines of context for each chunk seems way too much for less benefit. Instead we would need a better embedding model to enhance the retrieval without any of the overheads. And Companies will be interested in chunking, embedding and indexing proprietary documents only once in their lifetime. They can't reindex the whole archive everytime a new improvement is released

@henno6207 Ай бұрын

It would be great if they could just build this into their platform, like openai has with their agents.

@DCinzi Ай бұрын

Wait, would nbot be more efficent for the LLm to rather than create a context use that compute ti create a new chunk that puts together two previous chuncks (eg. chunch 1 + chunck x) based on context, and rather than go down of the route "lets try to aid the LLM to find the right chunk to the user request by maximizing attention to that one particular chunk", go down the route " lets try to aid the LLM [..] by maximizing the probability to find the right node in a net of higher percentage possibilities"?

@phanindraparashar8930 Ай бұрын

I was experimenting with this and its really amazing. But too simple approach 😅😅

@1littlecoder Ай бұрын

the beauty is how simple it is :D

@phanindraparashar8930 Ай бұрын

@@1littlecoder keeping it simple always works

@souvickdas5564 Ай бұрын

How to generate those context for chunks without having the sufficient information to the LLM regarding the chunk? How they are getting the information about the revenue in that example?

@1littlecoder Ай бұрын

That is from the entire document

@souvickdas5564 Ай бұрын

@@1littlecoder then it will be very much costly as the entire document is being fed into llm. And what about the llm's token limit? If I have a significantly large document.

@randomlettersqzkebkw Ай бұрын

@@souvickdas5564 this techique is golden for local run LLMs. Its free.

@akshaya626 Ай бұрын

I have the same doubt. Please let us know if there's clarity.

@afolamitimothy8819 Ай бұрын

Thanks 😅

@tripandplan Ай бұрын

to generate context.. do we need to pass all documents.. how we will address the token limit ?

@limjuroy7078 Ай бұрын

I think the reason why Anthropic introduces this technique is because of they have the CACHING!!!

@1littlecoder Ай бұрын

easy upsell 👀

@limjuroy7078 Ай бұрын

@@1littlecoder As far as I know, if u use the prompt caching feature to store all your documents such as your company documents, it would greatly reduce the cost, particularly on the input tokens cost consumption as {{WHOLE DOCUMENT}} are retrieved from the cache. Am I right?

@Praveenppk2255 Ай бұрын

is it something similiar to what google calls context caching ?

@1littlecoder Ай бұрын

No context Caching is basically on top of it. Thanks for the reminder. I should probably make a separate video on

@Praveenppk2255 Ай бұрын

@@1littlecoder oh nice , perfect

@KevinKreger Ай бұрын

Smart chunks 🎉

@1littlecoder Ай бұрын

Someone's going to steal this name for a new RAG technqiue :)

@ByteBop911 Ай бұрын

Isn’t it agentic chunking strategy??

@shreyassrinivasa5983 Ай бұрын

Have been doing this long back and much more

@1voice4all Ай бұрын

They could have used something similar to LLMLingua on each chunk then pass it to a smaller model for deriving context as it is a very specific use and does not demand a huge model. This way cost can be controlled and the quality can be enhanced. Also, they can add a model router rather than using a predefined model. This model router can choose the model based on the information corpus has. There are many patterns which can enhance this RAG pipeline. This just seems very lazy.

@truliapro7112 Ай бұрын

Your content is really good, but I've noticed that you tend to speak very quickly, almost as if you're holding your breath. Is there a reason for this? I feel that a slower, calmer pace would make the information easier to absorb and more enjoyable to follow. It sometimes feels like you're rushing, and I believe a more relaxed delivery would enhance your already great work. Please understand this is meant as constructive feedback, not a criticism. I'm just offering a suggestion to help make your content even better.

@1littlecoder Ай бұрын

Thank you for the feedback. I understand. I have a nature of speaking very fast so n typically I've to slow down. I'll try to do that more diligently

@Macorelppa Ай бұрын

This is the guy who called o1 preview overhyped. 🤭

@1littlecoder Ай бұрын

Did I?

@dhanush.priyan Ай бұрын

he never said that. he said, gpt 01 is just a glorified chain of though and that's actually true

@MichealScott24 Ай бұрын

❤🫡

@phanindraparashar8930 Ай бұрын

I tried another stupidity simple aproach. Create a QA data set with LLM. Find nearest question and provide answer. Surprisingly it also works really great 😅😅😅

@1littlecoder Ай бұрын

Here you go. You just invented a new RAG technique 😉

@arashputata Ай бұрын

This is actually surprisignly good for RAG on expert/narrow domains! i did the same thing for a bot on web accessibility rules, and it worked perfect AF

@phanindraparashar8930 Ай бұрын

@@arashputata which method

@phanindraparashar8930 Ай бұрын

@@1littlecoder also u can later use the data to fine-tune 😅😅

@kontrakamkam7148 Ай бұрын

yeah that is my not-so-secret weapon too 😂