Session 7: RAG Evaluation with RAGAS and How to Improve Retrieval

  Рет қаралды 19,162

AI Makerspace

AI Makerspace

Күн бұрын

What you'll learn this session:
- How and why to evaluate RAG systems using best-practice open-source tooling
- RAG Assessment with RAGAS, including Context Precision, Context Recall, Answer Relevancy, and Faithfulness
- How to improve RAG system outputs using advanced retrieval
Speakers:
Dr. Greg Loughnane, Founder & CEO AI Makerspace.
/ greglough. .
Chris Alexiuk, CTO AI Makerspace.
/ csalexiuk
Apply for one of our AI Engineering Courses today!
www.aimakerspa...

Пікірлер: 39
@AI-Makerspace
@AI-Makerspace 10 ай бұрын
Colab Notebook: colab.research.google.com/drive/1TZo2sgf1YFzI4_U-tGppg_ylHAR3MXF_?usp=sharing Slides: canva.com/design/DAF13fk63Ps/oKNCJf_Oez21fkf0KRW9eA/edit?DAF13fk63Ps&
@someshfengade9623
@someshfengade9623 6 ай бұрын
The slides link is not valid ?
@AI-Makerspace
@AI-Makerspace 6 ай бұрын
@@someshfengade9623 it looks like the permissions were set to "anyone can edit" and someone went ahead and did that! We've restored the previous version and it should work now!
@cynogriffin6678
@cynogriffin6678 7 ай бұрын
Hi Chris, Very informative video, Can you please tell how can I generate test set using Azure in RAGAs.
@AI-Makerspace
@AI-Makerspace 7 ай бұрын
You'd want to use a LangChain apadter for Azure - so we can use that to create the test set.
@farhangnorouzi484
@farhangnorouzi484 2 ай бұрын
Thanks for sharing. I’m looking for a github link to its repo, if possible
@AI-Makerspace
@AI-Makerspace 2 ай бұрын
Best place to go for that is straight to the source! github.com/explodinggradients/ragas
@AdamPippert
@AdamPippert 5 ай бұрын
Why did nobody laugh at Greg’s durag joke?
@AI-Makerspace
@AI-Makerspace 5 ай бұрын
😆🤣
@privacytest9126
@privacytest9126 2 ай бұрын
Ground truth generated by GPT-4? Not even remotely useful for local RAG! In fact, ground truth presupposes you know the question, not really typical of real world user interactions.
@AI-Makerspace
@AI-Makerspace 2 ай бұрын
Thanks privacytest! This is an estute point - ground truth data is always better when it's generated by humans, but alas, it's so rare to find golden datasets generated that way out in the wild. The industry needs a path to eval and RAGAS was like "here's one!" ... moreover, the synthetic test data generation technique is quickly becoming more of an industry standard all the time. Check out next week's event to learn more and bring your questions live! bit.ly/data4enterprise
@RaviPrakash-dz9fm
@RaviPrakash-dz9fm 3 ай бұрын
Can anyone tell me how ragas actually calculates these numbers. Like manually I get it, but what do the algorithms or functions look like? Like how does it measure faithfulness?
@AI-Makerspace
@AI-Makerspace 3 ай бұрын
Hey Ravi great question! We go a bit deeper into this in our more recent event with the creators! kzbin.infoAnr1br0lLz8?si=UG6vRnSY9oVtAuAT We'd recommend reading through the docs and digging into the source to go EVEN deeper! e.g., docs.ragas.io/en/stable/concepts/metrics/faithfulness.html
@Adityasharma-z1r
@Adityasharma-z1r 3 ай бұрын
This is really great explanation. I have one query, lets say I want to improve the performance by focusing on Faithfulness or Answer Relevance, so which RAG optimization techniques I should follow to increase Faithfulness or which techniques can improve Relevance or Precision etc.
@AI-Makerspace
@AI-Makerspace 3 ай бұрын
The answer is, unfortunately, it depends! The whole system needs to work together (from data quality, to retrieval quality, to model performance, to prompting), and it needs to work for your use case. What is the best metric to use for your use case? That also depends. It all comes down to metrics-driven development: docs.ragas.io/en/stable/concepts/metrics_driven.html , but you need to decide which direction to drive! There are some simple things to do after you set up RAG like reranking, but for any given use case the details really matter with regards to what steps you should take.
@supergaulig
@supergaulig 3 ай бұрын
Good video but one question: Why did you choose to create the testset step-by-step yourself and not use the provided TestSetGenerator from Ragas? Was is not available back then?
@AI-Makerspace
@AI-Makerspace 3 ай бұрын
That's right! They had just rolled it out it when we had them on for this more recent event: kzbin.infoAnr1br0lLz8?si=_wIYqsL4vcVM5QDq
@farhangnorouzi484
@farhangnorouzi484 2 ай бұрын
Would you share the link to the notebook please??
@AI-Makerspace
@AI-Makerspace 2 ай бұрын
In the pinned comment! colab.research.google.com/drive/1TZo2sgf1YFzI4_U-tGppg_ylHAR3MXF_?usp=sharing
@孙姣姣
@孙姣姣 8 күн бұрын
That's very helpful to me. Thank you
@AI-Makerspace
@AI-Makerspace 8 күн бұрын
Love to hear it! For even deeper dives on RAGAS, check out our videos on RAG Assessment for LangChain RAG (kzbin.infoAnr1br0lLz8?si=Lf8cmhSUw3u0IpMD) and Synthetic Data Generation (SDG) (kzbin.infoY7V1TTdEWn8?si=-fTs08wrKGYattkA)!
@marnow88
@marnow88 Ай бұрын
Great video! How can I use RAGAS with Azure OpenAI flavour?
@AI-Makerspace
@AI-Makerspace Ай бұрын
You can use the Azure OpenAI connectors for LangChain as your Critic and Generator!
@enceladus96
@enceladus96 4 ай бұрын
incredibly informative, not like clickbait or anything like other channels. real 37mins worth of knowledge. Thank you 🙌
@HosselBossel
@HosselBossel 9 ай бұрын
Chris I love your explanations and notebooks! But you shouldn't be singing while Greg is talking at 16:49
@AI-Makerspace
@AI-Makerspace 8 ай бұрын
😆
@kamalyadav4259
@kamalyadav4259 6 ай бұрын
Hi chris I have a use case for text-to-SQL with RAG using LangChain. Is there any example or guide to evaluate the SQL result? Is the metric the same as regular text RAG? Thanks in advance
@AI-Makerspace
@AI-Makerspace 6 ай бұрын
The E2E metrics would likely be the same - and you could crearte a dataset that let you compare the intermediate results as well, the same as you saw here.
@andybrown8438
@andybrown8438 9 ай бұрын
Thanks for the great video. When did context relevance get broken out into context precision and context recall? The RAGAs paper of 26 September 2023 still refers only to relevance and I'd find it useful to have a source to explain why it was broken into two components. Intuitively it makes sense though.
@AI-Makerspace
@AI-Makerspace 8 ай бұрын
Hey @andybrown8438 we're planning another event soon on RAG eval, and are in contact with the RAGAS creators - we'll ask them!
@mansoorbaig9232
@mansoorbaig9232 5 ай бұрын
Great job guys. 👏
@AI-Makerspace
@AI-Makerspace 5 ай бұрын
Thanks Mansoor!
@bdoriandasilva
@bdoriandasilva 2 ай бұрын
great video, thanks a lot!
@micbab-vg2mu
@micbab-vg2mu 9 ай бұрын
thank you:)
@lespaceman
@lespaceman 6 ай бұрын
Great presentation guys, full of valuable knowledge 🎉
@nirash8018
@nirash8018 6 ай бұрын
Dude you're over 30 years old. Take the cap off if you want to be taken seriously
@AI-Makerspace
@AI-Makerspace 6 ай бұрын
Thanks for the tip @nirash! The h/t, that is. Cheers!
@nirash8018
@nirash8018 6 ай бұрын
@@AI-Makerspace You're welcome bro. Carry that bald head with pride
@AI-Makerspace
@AI-Makerspace 6 ай бұрын
@@nirash8018 ✊
Session 8: Fine-Tuning Embedding Models for RAG Systems
15:46
AI Makerspace
Рет қаралды 6 М.
AI Agent Evaluation with RAGAS
19:42
James Briggs
Рет қаралды 13 М.
How do Cats Eat Watermelon? 🍉
00:21
One More
Рет қаралды 10 МЛН
Which One Is The Best - From Small To Giant #katebrush #shorts
00:17
How to evaluate an LLM-powered RAG application automatically.
50:42
The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!
16:14
Agentic RAG Using CrewAI & LangChain!
12:52
Pavan Belagatti
Рет қаралды 3,9 М.
Why Agent Frameworks Will Fail (and what to use instead)
19:21
Dave Ebbelaar
Рет қаралды 65 М.
What is RAG? (Retrieval Augmented Generation)
11:37
Don Woodlock
Рет қаралды 146 М.
Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework
1:00:40
How do Cats Eat Watermelon? 🍉
00:21
One More
Рет қаралды 10 МЛН