Session 7: RAG Evaluation with RAGAS and How to Improve Retrieval

Рет қаралды 22,325

Күн бұрын

Пікірлер: 41

@AI-Makerspace 11 ай бұрын

Colab Notebook: colab.research.google.com/drive/1TZo2sgf1YFzI4_U-tGppg_ylHAR3MXF_?usp=sharing Slides: canva.com/design/DAF13fk63Ps/oKNCJf_Oez21fkf0KRW9eA/edit?DAF13fk63Ps&

@someshfengade9623 8 ай бұрын

The slides link is not valid ?

@AI-Makerspace 8 ай бұрын

@@someshfengade9623 it looks like the permissions were set to "anyone can edit" and someone went ahead and did that! We've restored the previous version and it should work now!

@enceladus96 6 ай бұрын

incredibly informative, not like clickbait or anything like other channels. real 37mins worth of knowledge. Thank you 🙌

@lespaceman 8 ай бұрын

Great presentation guys, full of valuable knowledge 🎉

@孙姣姣 2 ай бұрын

That's very helpful to me. Thank you

@AI-Makerspace 2 ай бұрын

Love to hear it! For even deeper dives on RAGAS, check out our videos on RAG Assessment for LangChain RAG (kzbin.infoAnr1br0lLz8?si=Lf8cmhSUw3u0IpMD) and Synthetic Data Generation (SDG) (kzbin.infoY7V1TTdEWn8?si=-fTs08wrKGYattkA)!

@mansoorbaig9232 6 ай бұрын

Great job guys. 👏

@AI-Makerspace 6 ай бұрын

Thanks Mansoor!

@Adityasharma-z1r 5 ай бұрын

This is really great explanation. I have one query, lets say I want to improve the performance by focusing on Faithfulness or Answer Relevance, so which RAG optimization techniques I should follow to increase Faithfulness or which techniques can improve Relevance or Precision etc.

@AI-Makerspace 5 ай бұрын

The answer is, unfortunately, it depends! The whole system needs to work together (from data quality, to retrieval quality, to model performance, to prompting), and it needs to work for your use case. What is the best metric to use for your use case? That also depends. It all comes down to metrics-driven development: docs.ragas.io/en/stable/concepts/metrics_driven.html , but you need to decide which direction to drive! There are some simple things to do after you set up RAG like reranking, but for any given use case the details really matter with regards to what steps you should take.

@bdoriandasilva 4 ай бұрын

great video, thanks a lot!

@marnow88 3 ай бұрын

Great video! How can I use RAGAS with Azure OpenAI flavour?

@AI-Makerspace 3 ай бұрын

You can use the Azure OpenAI connectors for LangChain as your Critic and Generator!

@farhangnorouzi484 4 ай бұрын

Would you share the link to the notebook please??

@AI-Makerspace 4 ай бұрын

In the pinned comment! colab.research.google.com/drive/1TZo2sgf1YFzI4_U-tGppg_ylHAR3MXF_?usp=sharing

@andybrown8438 10 ай бұрын

Thanks for the great video. When did context relevance get broken out into context precision and context recall? The RAGAs paper of 26 September 2023 still refers only to relevance and I'd find it useful to have a source to explain why it was broken into two components. Intuitively it makes sense though.

@AI-Makerspace 10 ай бұрын

Hey @andybrown8438 we're planning another event soon on RAG eval, and are in contact with the RAGAS creators - we'll ask them!

@supergaulig 5 ай бұрын

Good video but one question: Why did you choose to create the testset step-by-step yourself and not use the provided TestSetGenerator from Ragas? Was is not available back then?

@AI-Makerspace 5 ай бұрын

That's right! They had just rolled it out it when we had them on for this more recent event: kzbin.infoAnr1br0lLz8?si=_wIYqsL4vcVM5QDq

@kamalyadav4259 8 ай бұрын

Hi chris I have a use case for text-to-SQL with RAG using LangChain. Is there any example or guide to evaluate the SQL result? Is the metric the same as regular text RAG? Thanks in advance

@AI-Makerspace 8 ай бұрын

The E2E metrics would likely be the same - and you could crearte a dataset that let you compare the intermediate results as well, the same as you saw here.

@RaviPrakash-dz9fm 5 ай бұрын

Can anyone tell me how ragas actually calculates these numbers. Like manually I get it, but what do the algorithms or functions look like? Like how does it measure faithfulness?

@AI-Makerspace 5 ай бұрын

Hey Ravi great question! We go a bit deeper into this in our more recent event with the creators! kzbin.infoAnr1br0lLz8?si=UG6vRnSY9oVtAuAT We'd recommend reading through the docs and digging into the source to go EVEN deeper! e.g., docs.ragas.io/en/stable/concepts/metrics/faithfulness.html

@cynogriffin6678 9 ай бұрын

Hi Chris, Very informative video, Can you please tell how can I generate test set using Azure in RAGAs.

@AI-Makerspace 9 ай бұрын

You'd want to use a LangChain apadter for Azure - so we can use that to create the test set.

@farhangnorouzi484 4 ай бұрын

Thanks for sharing. I’m looking for a github link to its repo, if possible

@AI-Makerspace 4 ай бұрын

Best place to go for that is straight to the source! github.com/explodinggradients/ragas

@yerson557 Ай бұрын

Where does ground truth come from? Is this a human annotated property? I understand the ground truth in RAGAS refers to the correct answer to the question. It's typically used for the context_recall metric. But how to we get this? Human in the loop? LLM generated? More documents from the retrieval? Thank you?

@AI-Makerspace Ай бұрын

"Ground Truth" can come from any of these sources! Of course, getting it straight from the people who perform whatever tasks you're automating is the right idea, but this can be very expensive. In the case of RAGAS the "Ground Truth" is represented by the output you get when you provide [question, retrieved context] pairs as input to a generator. That is, we are not actually using a RAG system, but passing "correct" [question, context] pairs as input. These are "correct" because they were synthetically generated and are known to be correct; see Synthetic Test Data Generation: docs.ragas.io/en/stable/concepts/testset_generation.html Note that Ground Truth is different than "Answer" because "Answer" actually uses the RAG application that you're building, while "Ground Truth" passes [question, context] pairs in direclty.

@micbab-vg2mu 11 ай бұрын

thank you:)

@HosselBossel 10 ай бұрын

Chris I love your explanations and notebooks! But you shouldn't be singing while Greg is talking at 16:49

@AI-Makerspace 10 ай бұрын

😆

@AdamPippert 7 ай бұрын

Why did nobody laugh at Greg’s durag joke?

@AI-Makerspace 7 ай бұрын

😆🤣

@privacytest9126 3 ай бұрын

Ground truth generated by GPT-4? Not even remotely useful for local RAG! In fact, ground truth presupposes you know the question, not really typical of real world user interactions.

@AI-Makerspace 3 ай бұрын

Thanks privacytest! This is an estute point - ground truth data is always better when it's generated by humans, but alas, it's so rare to find golden datasets generated that way out in the wild. The industry needs a path to eval and RAGAS was like "here's one!" ... moreover, the synthetic test data generation technique is quickly becoming more of an industry standard all the time. Check out next week's event to learn more and bring your questions live! bit.ly/data4enterprise