Advanced RAG 06 - RAG Fusion

Рет қаралды 22,793

Күн бұрын

Пікірлер: 52

@deepaksingh9318 9 ай бұрын

Very nicely explained Sam. I saw all 6 methods and one thing i really liked about your video is that they very crisp , upto the point and with code part that too in all

@samwitteveenai 8 ай бұрын

Thanks I try these one idea vids to the point with rambling on. lol not always successfully I am the first to admit.

@gundamdhinesh5379 6 ай бұрын

@@samwitteveenai Yes, please continue with the same approach. Your pace is on point.

@RameshBaburbabu Жыл бұрын

🎯 Key Takeaways for quick navigation: 00:00 🧠 *Overview of RAG Fusion* - RAG Fusion aims to bridge the gap between explicit user queries and their intended questions. - The technique involves query duplication with a twist, generating multiple queries from a single user input. - Results from these queries are ranked using reciprocal rank fusion, and the outputs are used as context for generative output. 03:20 📊 *Example of RAG Fusion in Action* - RAG Fusion is particularly useful for vague user queries where a wide variety of data is desired. - The example demonstrates rewriting user input to address different angles of a topic. - The technique allows for a more comprehensive exploration of a subject based on varied perspectives. 06:49 🖥️ *Implementing RAG Fusion with LangChain* - The video introduces implementing RAG Fusion using LangChain in a notebook with the PaLM-2 model. - Steps involve setting up a retriever, creating a chat chain with a prompt, and demonstrating the RAG Fusion process. - LangChain expressions are used to build a chain for generating multiple search queries and fusing the results. 09:46 🔄 *Debugging and Query Generation in RAG Fusion* - LangChain debugging is showcased to understand the generation and ranking of multiple search queries. - The process involves generating related queries, ranking them, retrieving context, and applying reciprocal rank fusion. - The resulting queries are filtered, and the final response is generated by combining the best results with the original query. Made with HARPA AI

@jayhu6075 Жыл бұрын

What een great explanation about RAG. Hopefully a following about playing around with the LangChain expression and Rag stuff. Many thanks.

@marekjkos Жыл бұрын

Excellent content, concise and straight forward with enough feel (context) for the topic.

@NicolasEmbleton Жыл бұрын

Saw this earlier this week. Seems super interesting. Gotta get to try it.

@seththunder2077 Жыл бұрын

Hey Sam! I have a couple questions I'd like to ask but first, you tend to mention "if you were to use this in production" and I was wondering if you could do a video on how can we start and what to consider. It could be something as simple as a production level for chatbot with a single PDF but the idea is that a lot of us are not experts/still beginners like myself as I'm still in university but I'd like to get my hands dirty on production level projects in Gen AI. It would definitely teach us how to use things like threads, async, storing chat sessions for your users, what cloud provider to pick and based on what do you pick one etc.... As for my questions, 1) In the directory_loader, why didn't you set multi_threading = True? Wouldn't that make it faster? 2) In the directory_loader again, wouldn't it be better to set the cls_loader for your txt files or any file type you are working on? 3) Why did you use invoke and not run/apply/batch? When do you consider using those instead? 4) Normally we persist the vector db (such as chroma) locally but if we did it for production level which normally we'd store the file on a cloud, is there a way to persist it on the cloud to save costs if I'm using OpenAI embeddings for example? 5) Could you teach us how can we make and evaluate the chunk sizes/overlap? For instance, lets say I used specific values and I started running my code to see the output. How do you evaluate that its good enough? Do you run 100 examples and evaluate their responses yourself or is there some other method that would inform you "This is the good chunk size and overlap for our use case" I tend to watch a lot of youtubers but no one really comes close to the content you deliver and the way you deliver hence, why I'm asking all these specific detailed questions. Lastly, thank you so much for these videos and the content you're doing for us is extremely valuable. It may not seem much to you as an expert but for newcomers, it's marvelous.

@samwitteveenai Жыл бұрын

lots of good questions here. I will make a video going into some of them more. This like your early questions, mostly I was just getting something basic up to show the RagFusion. Things like multi_threading etc yes you are totally right. I modified this from a notebook I created to show using all Google's stack of (PaLM2 LLM, PaLM Embeddings and VertexAI Search as the vector store. ). The changed you propose make a lot of sense. For cheap vectorstores in the cloud checkout LanceDB. I will try to make a video for it at some point, just been way too busy with work lately.

@yazanrisheh5127 Жыл бұрын

@@samwitteveenai Hello sam. Can you please make a video that combines a lot of the features that you have been showing us so far? For example, something like talking to multiple document types like pdf txt and csv using directory_loader and how you will split these different file types, a chain like conversational retrievalchain that has both memory and prompt, constitutional ai to ensure no harmful answers, fine tuning for a specific use case, etc... all those in 1 video to show us how to connect all these components. I personally at least do understand literally every separate video you're doing but when I try connecting some of those, I feel lost sometimes and get errors that Idk how to fix and couldn't find a solution on github issues or from llms like Perplexity, Bing etc...

@wonderplatform 6 ай бұрын

what's the latency?

@micbab-vg2mu Жыл бұрын

Very interesting - Thank you for sharing:)

@JJaitley 7 ай бұрын

@Sam How can we handle latency in this case?

@Schaelpy Жыл бұрын

Thank you ver much for making a video about it!

@ShivamSharma-ig9vp 3 ай бұрын

Hi Sam, Great explanation...... Can you please suggest for the below query? I am building a rag application with Langchain.I just want it to stick to the documents it has. i am using react agent but it is still searching the web for the answers. How I can stop that. I have given him prompt as well to not to look for information apart the documents.

@waneyvin Жыл бұрын

Hello Sam, since we have RAG, is Tree-of-Thought no longer necessary? are they different approach? I'd never seen ToT implementation on youtube.

@yahyahussein-p5o 7 ай бұрын

How can I add memory with RAG Fusion.

@Eamo-21 Жыл бұрын

another great video! Have you any videos around strategies related to producing long form articles with LLMS ? getting over the 4k token output limit in a clever way?

@caiyu538 Жыл бұрын

Great lectures.

@souvickdas5564 9 ай бұрын

I have a very generic question about evaluation of the RAG system. How can we evaluate the responses generated by the RAG system?

@sumanthyadav6974 Жыл бұрын

is there any way I can generate or suggest multiple search queries based on the original search query along with my response? Please let me know.

@ahmadh9381 7 ай бұрын

Can you make a video on how can to add Memory when using RAG fusion?

@joffreylemery6414 Жыл бұрын

Hello Sam ! I have two questions : - First, are you open to do consulting for a project on my firm ? We could have a need of you for a RAG Subject and we are open to pay for consulting (mostly on the following question) - Second : We can see RAG as a narrowing yools (we send lots of documents, and we can extract smaller answers according to query). But if i want to create a vector db with a specific kind of documents, let's say, the contracts of my company. And my goal is to create an agent able to reproduce contract according to specification. Is RAG useful for that ? We don't want to go into fine-tuning. Let us know !

@szekaatti8394 Жыл бұрын

Great video as always. I feel like instead of prompting an LLM to re-write the query, the vectore-store embeddings should be descriptive enough to map to the same place as if you would use that re-phrased query. (eg.: when you query it with "Tell me about Universal Studios Singapore", the vectore encoding should map to a 'place' where also the timing/pricing answers are stored) Do you think something like this would be possible by fine-tuning and embedding model?

@saivamsi441 10 ай бұрын

This is great. But I just wanted to know instead of retrieving vector embedding for each question, how about we combine the questions and pass it to vector DB. Example : tell me about OpenAI User intends to ask: what is openai, advantages z disadvantages, use cases, domains it can be used. Instead of creating multiple search queries, can we jus combine the search queries into one Generated query (Sort of DSPy framework using Bayesiansignature) . Also, if we use re ranking and get top _K values. It'll be great try and understand if it's gonna be same or different!?

@sumanthyadav6974 Жыл бұрын

Hey sam! So when we generate multiple search queries based on the original queries, it generates those multiple search queries outside of the domain. Shouldn't it ideally generate based on the vector db. Is there anyway to make sure we stay within the domain?

@MachineLearningZuu Жыл бұрын

hi sam. thanks for wonderful content. I have a small questions Here you merged all the documents into one single string (raw_text) and proceed instead of dealing with document level. any specific reason for that ?

@saranyag165 Жыл бұрын

Very informative video!

@billykotsos4642 Жыл бұрын

neat idea

@TheAmit4sun Жыл бұрын

So its a Multi Query Retrieval Concept

@aifarmerokay Жыл бұрын

This is flare with direct method

@Pure_Science_and_Technology Жыл бұрын

This seems HYDE method inspired

@hqcart1 Жыл бұрын

openAI killed RAG with the assistant

@christosmelissourgos2757 Жыл бұрын

After having integrated assistant the day it came out to my app I have come to the conclusion that assistants can become quite expensive , really quick because of the token compounding call after call So I came back to rag

@choiswimmer Жыл бұрын

Until you try it at scale for production, you don't really know There's a reason no one has come out saying they've used some of these features at production yet

@SijohnMathew Жыл бұрын

It’s freaking expensive dude. So RAG is here to stay.

@hqcart1 Жыл бұрын

@@choiswimmerI am using it already to categorize over 3 million items , it cost me around $30.

@samwitteveenai Жыл бұрын

The Assistant is just OpenAI's implementation of RAG + Tools. The challenge is you can't tweak how their RAG works you have to go with what their settings are and even they showed on Demo Day that different cases require different kinds of RAG (eg the show one case where HyDE doesn't work at all and then another where it worked).