Advanced RAG 06 - RAG Fusion

Рет қаралды 21,153

Күн бұрын

Пікірлер: 52

@RameshBaburbabu 10 ай бұрын

🎯 Key Takeaways for quick navigation: 00:00 🧠 *Overview of RAG Fusion* - RAG Fusion aims to bridge the gap between explicit user queries and their intended questions. - The technique involves query duplication with a twist, generating multiple queries from a single user input. - Results from these queries are ranked using reciprocal rank fusion, and the outputs are used as context for generative output. 03:20 📊 *Example of RAG Fusion in Action* - RAG Fusion is particularly useful for vague user queries where a wide variety of data is desired. - The example demonstrates rewriting user input to address different angles of a topic. - The technique allows for a more comprehensive exploration of a subject based on varied perspectives. 06:49 🖥️ *Implementing RAG Fusion with LangChain* - The video introduces implementing RAG Fusion using LangChain in a notebook with the PaLM-2 model. - Steps involve setting up a retriever, creating a chat chain with a prompt, and demonstrating the RAG Fusion process. - LangChain expressions are used to build a chain for generating multiple search queries and fusing the results. 09:46 🔄 *Debugging and Query Generation in RAG Fusion* - LangChain debugging is showcased to understand the generation and ranking of multiple search queries. - The process involves generating related queries, ranking them, retrieving context, and applying reciprocal rank fusion. - The resulting queries are filtered, and the final response is generated by combining the best results with the original query. Made with HARPA AI

@waneyvin 10 ай бұрын

Hello Sam, since we have RAG, is Tree-of-Thought no longer necessary? are they different approach? I'd never seen ToT implementation on youtube.

@seththunder2077 10 ай бұрын

Hey Sam! I have a couple questions I'd like to ask but first, you tend to mention "if you were to use this in production" and I was wondering if you could do a video on how can we start and what to consider. It could be something as simple as a production level for chatbot with a single PDF but the idea is that a lot of us are not experts/still beginners like myself as I'm still in university but I'd like to get my hands dirty on production level projects in Gen AI. It would definitely teach us how to use things like threads, async, storing chat sessions for your users, what cloud provider to pick and based on what do you pick one etc.... As for my questions, 1) In the directory_loader, why didn't you set multi_threading = True? Wouldn't that make it faster? 2) In the directory_loader again, wouldn't it be better to set the cls_loader for your txt files or any file type you are working on? 3) Why did you use invoke and not run/apply/batch? When do you consider using those instead? 4) Normally we persist the vector db (such as chroma) locally but if we did it for production level which normally we'd store the file on a cloud, is there a way to persist it on the cloud to save costs if I'm using OpenAI embeddings for example? 5) Could you teach us how can we make and evaluate the chunk sizes/overlap? For instance, lets say I used specific values and I started running my code to see the output. How do you evaluate that its good enough? Do you run 100 examples and evaluate their responses yourself or is there some other method that would inform you "This is the good chunk size and overlap for our use case" I tend to watch a lot of youtubers but no one really comes close to the content you deliver and the way you deliver hence, why I'm asking all these specific detailed questions. Lastly, thank you so much for these videos and the content you're doing for us is extremely valuable. It may not seem much to you as an expert but for newcomers, it's marvelous.

@samwitteveenai 10 ай бұрын

lots of good questions here. I will make a video going into some of them more. This like your early questions, mostly I was just getting something basic up to show the RagFusion. Things like multi_threading etc yes you are totally right. I modified this from a notebook I created to show using all Google's stack of (PaLM2 LLM, PaLM Embeddings and VertexAI Search as the vector store. ). The changed you propose make a lot of sense. For cheap vectorstores in the cloud checkout LanceDB. I will try to make a video for it at some point, just been way too busy with work lately.

@yazanrisheh5127 10 ай бұрын

@@samwitteveenai Hello sam. Can you please make a video that combines a lot of the features that you have been showing us so far? For example, something like talking to multiple document types like pdf txt and csv using directory_loader and how you will split these different file types, a chain like conversational retrievalchain that has both memory and prompt, constitutional ai to ensure no harmful answers, fine tuning for a specific use case, etc... all those in 1 video to show us how to connect all these components. I personally at least do understand literally every separate video you're doing but when I try connecting some of those, I feel lost sometimes and get errors that Idk how to fix and couldn't find a solution on github issues or from llms like Perplexity, Bing etc...

@ShivamSharma-ig9vp 12 күн бұрын

Hi Sam, Great explanation...... Can you please suggest for the below query? I am building a rag application with Langchain.I just want it to stick to the documents it has. i am using react agent but it is still searching the web for the answers. How I can stop that. I have given him prompt as well to not to look for information apart the documents.

@wonderplatform 3 ай бұрын

what's the latency?

@yahyahussein-p5o 4 ай бұрын

How can I add memory with RAG Fusion.

@ahmadh9381 4 ай бұрын

Can you make a video on how can to add Memory when using RAG fusion?

@J3SIM-38 8 ай бұрын

Metacrawler did the same thing circa 1995 en.wikipedia.org/wiki/MetaCrawler - History Rhymes

@JJaitley 4 ай бұрын

@Sam How can we handle latency in this case?

@jayhu6075 10 ай бұрын

What een great explanation about RAG. Hopefully a following about playing around with the LangChain expression and Rag stuff. Many thanks.

@micbab-vg2mu 10 ай бұрын

Very interesting - Thank you for sharing:)

@marekjkos 9 ай бұрын

Excellent content, concise and straight forward with enough feel (context) for the topic.

@saivamsi441 7 ай бұрын

This is great. But I just wanted to know instead of retrieving vector embedding for each question, how about we combine the questions and pass it to vector DB. Example : tell me about OpenAI User intends to ask: what is openai, advantages z disadvantages, use cases, domains it can be used. Instead of creating multiple search queries, can we jus combine the search queries into one Generated query (Sort of DSPy framework using Bayesiansignature) . Also, if we use re ranking and get top _K values. It'll be great try and understand if it's gonna be same or different!?

@souvickdas5564 6 ай бұрын

I have a very generic question about evaluation of the RAG system. How can we evaluate the responses generated by the RAG system?

@deepaksingh9318 6 ай бұрын

Very nicely explained Sam. I saw all 6 methods and one thing i really liked about your video is that they very crisp , upto the point and with code part that too in all

@samwitteveenai 6 ай бұрын

Thanks I try these one idea vids to the point with rambling on. lol not always successfully I am the first to admit.

@gundamdhinesh5379 4 ай бұрын

@@samwitteveenai Yes, please continue with the same approach. Your pace is on point.

@joffreylemery6414 10 ай бұрын

Hello Sam ! I have two questions : - First, are you open to do consulting for a project on my firm ? We could have a need of you for a RAG Subject and we are open to pay for consulting (mostly on the following question) - Second : We can see RAG as a narrowing yools (we send lots of documents, and we can extract smaller answers according to query). But if i want to create a vector db with a specific kind of documents, let's say, the contracts of my company. And my goal is to create an agent able to reproduce contract according to specification. Is RAG useful for that ? We don't want to go into fine-tuning. Let us know !

@szekaatti8394 10 ай бұрын

Great video as always. I feel like instead of prompting an LLM to re-write the query, the vectore-store embeddings should be descriptive enough to map to the same place as if you would use that re-phrased query. (eg.: when you query it with "Tell me about Universal Studios Singapore", the vectore encoding should map to a 'place' where also the timing/pricing answers are stored) Do you think something like this would be possible by fine-tuning and embedding model?

@Eamo-21 9 ай бұрын

another great video! Have you any videos around strategies related to producing long form articles with LLMS ? getting over the 4k token output limit in a clever way?

@sumanthyadav6974 10 ай бұрын

Hey sam! So when we generate multiple search queries based on the original queries, it generates those multiple search queries outside of the domain. Shouldn't it ideally generate based on the vector db. Is there anyway to make sure we stay within the domain?

@caiyu538 10 ай бұрын

Great lectures.

@sumanthyadav6974 10 ай бұрын

is there any way I can generate or suggest multiple search queries based on the original search query along with my response? Please let me know.

@MachineLearningZuu 10 ай бұрын

hi sam. thanks for wonderful content. I have a small questions Here you merged all the documents into one single string (raw_text) and proceed instead of dealing with document level. any specific reason for that ?

@archiee1337 Ай бұрын

great stuff, thank you

@TheAmit4sun 10 ай бұрын

So its a Multi Query Retrieval Concept

@Schaelpy 9 ай бұрын

Thank you ver much for making a video about it!

@akashaia 10 ай бұрын

This is flare with direct method

@NicolasEmbleton 10 ай бұрын

Saw this earlier this week. Seems super interesting. Gotta get to try it.

@Canna_Science_and_Technology 10 ай бұрын

This seems HYDE method inspired

@saranyag165 10 ай бұрын

Very informative video!

@billykotsos4642 10 ай бұрын

neat idea

@hqcart1 10 ай бұрын

openAI killed RAG with the assistant

@christosmelissourgos2757 10 ай бұрын

After having integrated assistant the day it came out to my app I have come to the conclusion that assistants can become quite expensive , really quick because of the token compounding call after call So I came back to rag

@choiswimmer 10 ай бұрын

Until you try it at scale for production, you don't really know There's a reason no one has come out saying they've used some of these features at production yet

@SijohnMathew 10 ай бұрын

It’s freaking expensive dude. So RAG is here to stay.

@hqcart1 10 ай бұрын

@@choiswimmerI am using it already to categorize over 3 million items , it cost me around $30.

@samwitteveenai 10 ай бұрын

The Assistant is just OpenAI's implementation of RAG + Tools. The challenge is you can't tweak how their RAG works you have to go with what their settings are and even they showed on Demo Day that different cases require different kinds of RAG (eg the show one case where HyDE doesn't work at all and then another where it worked).