Consider these actionable insights from the video: 1. Understand the power of context in search queries and how it enhances accuracy in Retrieval Augmented Generation (RAG). 2. Experiment with different chunking strategies for your data when building your RAG system. 3. Explore and utilize embedding models like Gemini and Voyage for transforming text into numerical representations. 4. Combine embedding models with BM25, a ranking function, to improve ranking and retrieval processes. 5. Implement contextual retrieval by adding context to data chunks using Large Language Models (LLMs). 6. Analyze the cost and benefits of using contextual retrieval, considering factors like processing power and latency. 7. Optimize your RAG system by experimenting with reranking during inference to fine-tune retrieval results.
@amortalbeingАй бұрын
so the LLM is the Achilles heel of the whole process. if it messes up the context, everything goes south immediately! but if it works well by default, it will enhance the final results
@PeterDrewSEOАй бұрын
Mate, I've been trying to understand RAG for ages, non coder here obviously, but your explanation was brilliant. Thank you
@int8float64Ай бұрын
As you said its really costly like graph vector DBs and high maintenance. A classic (sparse + dense retriever) + sparse reranker should simple do a good job also considering most of the new sota models have more context window.
@henkhbit5748Ай бұрын
Thanks for the update.👍 We see a lot of different techniques to improve RAG and the additional quality improvement are not that big and the cost are much higher (more tokens) and also the inference time goes up... Agree, that for most of use cases its not worth the effort and money.
@kenchang3456Ай бұрын
This is really interesting and I think, intuitively, it will help me with my project. Thank you very much.
@dr.mikeybeeАй бұрын
You can create the contextual tag locally using ollama.
@wylhiasАй бұрын
I've been working on something quite similar over the last few months for a corpus of documents that are in a tree hierarchy to increase accuracy. Seems it was not a bad idea after all 😁
@ysy69Ай бұрын
excellent video and insights!
@1littlecoderАй бұрын
Glad you enjoyed it!
@ROKIBULHASANTANZIMАй бұрын
i was really caught off guard when you said '....large human being' 😂😂
@1littlecoderАй бұрын
i just rewatched it 🤣
@1voice4allАй бұрын
Unfortunately, large humans are extinct! [or maybe left planet Earth.]
@shobhitsadwal6081Ай бұрын
🤣🤣🤣🤣🤣🤣
@SleepeJobsАй бұрын
Thank you for such insights and simple explanation
@laviray5447Ай бұрын
Honestly that few percent improvement is not worth for most cases...
@MhemanthRachaboyina7 күн бұрын
Great Video
@1littlecoder7 күн бұрын
@@MhemanthRachaboyina thank you
@arashputataАй бұрын
Is it really worth all the noise and having a new name for it and all? This is an idea that many developers have already been using. I mean anyone who thinks a little bit naturally realizes that adding a little description of what the chunk is about in relation to the rest of the document, would have automatically do it :D Myself and many others have been doing it for very obvious reasons .. I just didnt know I have to give it a name and publish it as technique.. these LLM BS taught me one thing , and that is put a name on any trivial idea and you are now an inventor
@1littlecoderАй бұрын
Honestly, that's one thing I've actually mentioned on the video. If such improvements are something you need
@laviray5447Ай бұрын
Yes, actually there are many more techniques like this which offer similar percent of improvement and none of them are worth it. Basic rag is still enough for now.
@RajaSekharaReddyKaluriАй бұрын
Thank you! Feeding in whole document text to add few lines of context for each chunk seems way too much for less benefit. Instead we would need a better embedding model to enhance the retrieval without any of the overheads. And Companies will be interested in chunking, embedding and indexing proprietary documents only once in their lifetime. They can't reindex the whole archive everytime a new improvement is released
@henno6207Ай бұрын
It would be great if they could just build this into their platform, like openai has with their agents.
@DCinziАй бұрын
Wait, would nbot be more efficent for the LLm to rather than create a context use that compute ti create a new chunk that puts together two previous chuncks (eg. chunch 1 + chunck x) based on context, and rather than go down of the route "lets try to aid the LLM to find the right chunk to the user request by maximizing attention to that one particular chunk", go down the route " lets try to aid the LLM [..] by maximizing the probability to find the right node in a net of higher percentage possibilities"?
@phanindraparashar8930Ай бұрын
I was experimenting with this and its really amazing. But too simple approach 😅😅
@1littlecoderАй бұрын
the beauty is how simple it is :D
@phanindraparashar8930Ай бұрын
@@1littlecoder keeping it simple always works
@souvickdas5564Ай бұрын
How to generate those context for chunks without having the sufficient information to the LLM regarding the chunk? How they are getting the information about the revenue in that example?
@1littlecoderАй бұрын
That is from the entire document
@souvickdas5564Ай бұрын
@@1littlecoder then it will be very much costly as the entire document is being fed into llm. And what about the llm's token limit? If I have a significantly large document.
@randomlettersqzkebkwАй бұрын
@@souvickdas5564 this techique is golden for local run LLMs. Its free.
@akshaya626Ай бұрын
I have the same doubt. Please let us know if there's clarity.
@afolamitimothy8819Ай бұрын
Thanks 😅
@tripandplanАй бұрын
to generate context.. do we need to pass all documents.. how we will address the token limit ?
@limjuroy7078Ай бұрын
I think the reason why Anthropic introduces this technique is because of they have the CACHING!!!
@1littlecoderАй бұрын
easy upsell 👀
@limjuroy7078Ай бұрын
@@1littlecoder As far as I know, if u use the prompt caching feature to store all your documents such as your company documents, it would greatly reduce the cost, particularly on the input tokens cost consumption as {{WHOLE DOCUMENT}} are retrieved from the cache. Am I right?
@Praveenppk2255Ай бұрын
is it something similiar to what google calls context caching ?
@1littlecoderАй бұрын
No context Caching is basically on top of it. Thanks for the reminder. I should probably make a separate video on
@Praveenppk2255Ай бұрын
@@1littlecoder oh nice , perfect
@KevinKregerАй бұрын
Smart chunks 🎉
@1littlecoderАй бұрын
Someone's going to steal this name for a new RAG technqiue :)
@ByteBop911Ай бұрын
Isn’t it agentic chunking strategy??
@shreyassrinivasa5983Ай бұрын
Have been doing this long back and much more
@1voice4allАй бұрын
They could have used something similar to LLMLingua on each chunk then pass it to a smaller model for deriving context as it is a very specific use and does not demand a huge model. This way cost can be controlled and the quality can be enhanced. Also, they can add a model router rather than using a predefined model. This model router can choose the model based on the information corpus has. There are many patterns which can enhance this RAG pipeline. This just seems very lazy.
@truliapro7112Ай бұрын
Your content is really good, but I've noticed that you tend to speak very quickly, almost as if you're holding your breath. Is there a reason for this? I feel that a slower, calmer pace would make the information easier to absorb and more enjoyable to follow. It sometimes feels like you're rushing, and I believe a more relaxed delivery would enhance your already great work. Please understand this is meant as constructive feedback, not a criticism. I'm just offering a suggestion to help make your content even better.
@1littlecoderАй бұрын
Thank you for the feedback. I understand. I have a nature of speaking very fast so n typically I've to slow down. I'll try to do that more diligently
@MacorelppaАй бұрын
This is the guy who called o1 preview overhyped. 🤭
@1littlecoderАй бұрын
Did I?
@dhanush.priyanАй бұрын
he never said that. he said, gpt 01 is just a glorified chain of though and that's actually true
@MichealScott24Ай бұрын
❤🫡
@phanindraparashar8930Ай бұрын
I tried another stupidity simple aproach. Create a QA data set with LLM. Find nearest question and provide answer. Surprisingly it also works really great 😅😅😅
@1littlecoderАй бұрын
Here you go. You just invented a new RAG technique 😉
@arashputataАй бұрын
This is actually surprisignly good for RAG on expert/narrow domains! i did the same thing for a bot on web accessibility rules, and it worked perfect AF
@phanindraparashar8930Ай бұрын
@@arashputata which method
@phanindraparashar8930Ай бұрын
@@1littlecoder also u can later use the data to fine-tune 😅😅