Contextual RAG is stupidly brilliant!

  Рет қаралды 16,820

1littlecoder

1littlecoder

Күн бұрын

Пікірлер: 57
@epokaixyz
@epokaixyz Ай бұрын
Consider these actionable insights from the video: 1. Understand the power of context in search queries and how it enhances accuracy in Retrieval Augmented Generation (RAG). 2. Experiment with different chunking strategies for your data when building your RAG system. 3. Explore and utilize embedding models like Gemini and Voyage for transforming text into numerical representations. 4. Combine embedding models with BM25, a ranking function, to improve ranking and retrieval processes. 5. Implement contextual retrieval by adding context to data chunks using Large Language Models (LLMs). 6. Analyze the cost and benefits of using contextual retrieval, considering factors like processing power and latency. 7. Optimize your RAG system by experimenting with reranking during inference to fine-tune retrieval results.
@amortalbeing
@amortalbeing Ай бұрын
so the LLM is the Achilles heel of the whole process. if it messes up the context, everything goes south immediately! but if it works well by default, it will enhance the final results
@PeterDrewSEO
@PeterDrewSEO Ай бұрын
Mate, I've been trying to understand RAG for ages, non coder here obviously, but your explanation was brilliant. Thank you
@int8float64
@int8float64 Ай бұрын
As you said its really costly like graph vector DBs and high maintenance. A classic (sparse + dense retriever) + sparse reranker should simple do a good job also considering most of the new sota models have more context window.
@henkhbit5748
@henkhbit5748 Ай бұрын
Thanks for the update.👍 We see a lot of different techniques to improve RAG and the additional quality improvement are not that big and the cost are much higher (more tokens) and also the inference time goes up... Agree, that for most of use cases its not worth the effort and money.
@kenchang3456
@kenchang3456 Ай бұрын
This is really interesting and I think, intuitively, it will help me with my project. Thank you very much.
@dr.mikeybee
@dr.mikeybee Ай бұрын
You can create the contextual tag locally using ollama.
@wylhias
@wylhias Ай бұрын
I've been working on something quite similar over the last few months for a corpus of documents that are in a tree hierarchy to increase accuracy. Seems it was not a bad idea after all 😁
@ysy69
@ysy69 Ай бұрын
excellent video and insights!
@1littlecoder
@1littlecoder Ай бұрын
Glad you enjoyed it!
@ROKIBULHASANTANZIM
@ROKIBULHASANTANZIM Ай бұрын
i was really caught off guard when you said '....large human being' 😂😂
@1littlecoder
@1littlecoder Ай бұрын
i just rewatched it 🤣
@1voice4all
@1voice4all Ай бұрын
Unfortunately, large humans are extinct! [or maybe left planet Earth.]
@shobhitsadwal6081
@shobhitsadwal6081 Ай бұрын
🤣🤣🤣🤣🤣🤣
@SleepeJobs
@SleepeJobs Ай бұрын
Thank you for such insights and simple explanation
@laviray5447
@laviray5447 Ай бұрын
Honestly that few percent improvement is not worth for most cases...
@MhemanthRachaboyina
@MhemanthRachaboyina 7 күн бұрын
Great Video
@1littlecoder
@1littlecoder 7 күн бұрын
@@MhemanthRachaboyina thank you
@arashputata
@arashputata Ай бұрын
Is it really worth all the noise and having a new name for it and all? This is an idea that many developers have already been using. I mean anyone who thinks a little bit naturally realizes that adding a little description of what the chunk is about in relation to the rest of the document, would have automatically do it :D Myself and many others have been doing it for very obvious reasons .. I just didnt know I have to give it a name and publish it as technique.. these LLM BS taught me one thing , and that is put a name on any trivial idea and you are now an inventor
@1littlecoder
@1littlecoder Ай бұрын
Honestly, that's one thing I've actually mentioned on the video. If such improvements are something you need
@laviray5447
@laviray5447 Ай бұрын
Yes, actually there are many more techniques like this which offer similar percent of improvement and none of them are worth it. Basic rag is still enough for now.
@RajaSekharaReddyKaluri
@RajaSekharaReddyKaluri Ай бұрын
Thank you! Feeding in whole document text to add few lines of context for each chunk seems way too much for less benefit. Instead we would need a better embedding model to enhance the retrieval without any of the overheads. And Companies will be interested in chunking, embedding and indexing proprietary documents only once in their lifetime. They can't reindex the whole archive everytime a new improvement is released
@henno6207
@henno6207 Ай бұрын
It would be great if they could just build this into their platform, like openai has with their agents.
@DCinzi
@DCinzi Ай бұрын
Wait, would nbot be more efficent for the LLm to rather than create a context use that compute ti create a new chunk that puts together two previous chuncks (eg. chunch 1 + chunck x) based on context, and rather than go down of the route "lets try to aid the LLM to find the right chunk to the user request by maximizing attention to that one particular chunk", go down the route " lets try to aid the LLM [..] by maximizing the probability to find the right node in a net of higher percentage possibilities"?
@phanindraparashar8930
@phanindraparashar8930 Ай бұрын
I was experimenting with this and its really amazing. But too simple approach 😅😅
@1littlecoder
@1littlecoder Ай бұрын
the beauty is how simple it is :D
@phanindraparashar8930
@phanindraparashar8930 Ай бұрын
@@1littlecoder keeping it simple always works
@souvickdas5564
@souvickdas5564 Ай бұрын
How to generate those context for chunks without having the sufficient information to the LLM regarding the chunk? How they are getting the information about the revenue in that example?
@1littlecoder
@1littlecoder Ай бұрын
That is from the entire document
@souvickdas5564
@souvickdas5564 Ай бұрын
@@1littlecoder then it will be very much costly as the entire document is being fed into llm. And what about the llm's token limit? If I have a significantly large document.
@randomlettersqzkebkw
@randomlettersqzkebkw Ай бұрын
@@souvickdas5564 this techique is golden for local run LLMs. Its free.
@akshaya626
@akshaya626 Ай бұрын
I have the same doubt. Please let us know if there's clarity.
@afolamitimothy8819
@afolamitimothy8819 Ай бұрын
Thanks 😅
@tripandplan
@tripandplan Ай бұрын
to generate context.. do we need to pass all documents.. how we will address the token limit ?
@limjuroy7078
@limjuroy7078 Ай бұрын
I think the reason why Anthropic introduces this technique is because of they have the CACHING!!!
@1littlecoder
@1littlecoder Ай бұрын
easy upsell 👀
@limjuroy7078
@limjuroy7078 Ай бұрын
@@1littlecoder As far as I know, if u use the prompt caching feature to store all your documents such as your company documents, it would greatly reduce the cost, particularly on the input tokens cost consumption as {{WHOLE DOCUMENT}} are retrieved from the cache. Am I right?
@Praveenppk2255
@Praveenppk2255 Ай бұрын
is it something similiar to what google calls context caching ?
@1littlecoder
@1littlecoder Ай бұрын
No context Caching is basically on top of it. Thanks for the reminder. I should probably make a separate video on
@Praveenppk2255
@Praveenppk2255 Ай бұрын
@@1littlecoder oh nice , perfect
@KevinKreger
@KevinKreger Ай бұрын
Smart chunks 🎉
@1littlecoder
@1littlecoder Ай бұрын
Someone's going to steal this name for a new RAG technqiue :)
@ByteBop911
@ByteBop911 Ай бұрын
Isn’t it agentic chunking strategy??
@shreyassrinivasa5983
@shreyassrinivasa5983 Ай бұрын
Have been doing this long back and much more
@1voice4all
@1voice4all Ай бұрын
They could have used something similar to LLMLingua on each chunk then pass it to a smaller model for deriving context as it is a very specific use and does not demand a huge model. This way cost can be controlled and the quality can be enhanced. Also, they can add a model router rather than using a predefined model. This model router can choose the model based on the information corpus has. There are many patterns which can enhance this RAG pipeline. This just seems very lazy.
@truliapro7112
@truliapro7112 Ай бұрын
Your content is really good, but I've noticed that you tend to speak very quickly, almost as if you're holding your breath. Is there a reason for this? I feel that a slower, calmer pace would make the information easier to absorb and more enjoyable to follow. It sometimes feels like you're rushing, and I believe a more relaxed delivery would enhance your already great work. Please understand this is meant as constructive feedback, not a criticism. I'm just offering a suggestion to help make your content even better.
@1littlecoder
@1littlecoder Ай бұрын
Thank you for the feedback. I understand. I have a nature of speaking very fast so n typically I've to slow down. I'll try to do that more diligently
@Macorelppa
@Macorelppa Ай бұрын
This is the guy who called o1 preview overhyped. 🤭
@1littlecoder
@1littlecoder Ай бұрын
Did I?
@dhanush.priyan
@dhanush.priyan Ай бұрын
he never said that. he said, gpt 01 is just a glorified chain of though and that's actually true
@MichealScott24
@MichealScott24 Ай бұрын
❤🫡
@phanindraparashar8930
@phanindraparashar8930 Ай бұрын
I tried another stupidity simple aproach. Create a QA data set with LLM. Find nearest question and provide answer. Surprisingly it also works really great 😅😅😅
@1littlecoder
@1littlecoder Ай бұрын
Here you go. You just invented a new RAG technique 😉
@arashputata
@arashputata Ай бұрын
This is actually surprisignly good for RAG on expert/narrow domains! i did the same thing for a bot on web accessibility rules, and it worked perfect AF
@phanindraparashar8930
@phanindraparashar8930 Ай бұрын
@@arashputata which method
@phanindraparashar8930
@phanindraparashar8930 Ай бұрын
@@1littlecoder also u can later use the data to fine-tune 😅😅
@kontrakamkam7148
@kontrakamkam7148 Ай бұрын
yeah that is my not-so-secret weapon too 😂
Anthropic's new improved RAG: Explained (for all LLM)
33:54
Discover AI
Рет қаралды 4,9 М.
Кто круче, как думаешь?
00:44
МЯТНАЯ ФАНТА
Рет қаралды 4,9 МЛН
бабл ти гель для душа // Eva mash
01:00
EVA mash
Рет қаралды 9 МЛН
Disrespect or Respect 💔❤️
00:27
Thiago Productions
Рет қаралды 40 МЛН
What type of pedestrian are you?😄 #tiktok #elsarca
00:28
Elsa Arca
Рет қаралды 27 МЛН
RAG vs. Fine Tuning
8:57
IBM Technology
Рет қаралды 56 М.
The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!
16:14
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 213 М.
Microservices are Technical Debt
31:59
NeetCodeIO
Рет қаралды 633 М.
Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)
20:19
Cole Medin
Рет қаралды 240 М.
There is A NEW KING!!!
11:48
1littlecoder
Рет қаралды 9 М.
WE GOT ACCESS TO GPT-3! [Epic Special Edition]
3:57:17
Machine Learning Street Talk
Рет қаралды 327 М.
The Strange Physics Principle That Shapes Reality
32:44
Veritasium
Рет қаралды 6 МЛН
Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems
16:49
ell: A Powerful, Robust Framework for Prompt Engineering
15:04
Ian Wootten
Рет қаралды 31 М.
Кто круче, как думаешь?
00:44
МЯТНАЯ ФАНТА
Рет қаралды 4,9 МЛН