Abigail Haddad - Automating Tests for your RAG Chatbot or Other Generative Tool

  Рет қаралды 40

Lander Analytics

Lander Analytics

Ай бұрын

Automating Tests for your RAG Chatbot or Other Generative Tool by Abigail Haddad
Visit rstats.ai for information on upcoming conferences.
Abstract: Building a Retrieval Augmented Generation (RAG) chatbot that answers questions about a specific set of documents is straightforward. But how do you tell if it's working? Automated evaluation of generative tools for specific use cases is tricky, but it's also important if you want to easily compare performance using different underlying LLMs, system prompts, temperatures, or other parameters -- or just make sure you're not breaking something when you push your code. In this talk, I'll discuss why this kind of evaluation is challenging and review a few options for the kinds of assessments you can create, including using an LLM to evaluate your LLM-based tool. We'll then look at several ways to write automated LLM-led evaluations, including with a library that allows you to easily and with very little coding create complex grading rubrics for your tests.
Bio: Abigail Haddad is a data scientist who is working on automating LLM evaluations. Previously, she worked on research and data science for the Department of Defense, including at the RAND Corporation and as a Department of the Army civilian. Her hobbies include analyzing federal job listings and co-organizing Data Science DC. She blogs at The Present of Coding.
Twitter: / abbystat
Presented at the 2024 New York R Conference (May 16, 2024)
Hosted by Lander Analytics (landeranalytics.com)

Пікірлер
What is RAG? (Retrieval Augmented Generation)
11:37
Don Woodlock
Рет қаралды 111 М.
1 or 2?🐄
00:12
Kan Andrey
Рет қаралды 56 МЛН
I CAN’T BELIEVE I LOST 😱
00:46
Topper Guild
Рет қаралды 117 МЛН
Wes McKinney - The Future Roadmap for the Composable Data Stack
21:49
Hadley Wickham - R in Production
41:19
Lander Analytics
Рет қаралды 1,1 М.
Discover the Secrets to Becoming Your Customers' Top Choice
18:28
Why Agent Frameworks Will Fail (and what to use instead)
19:21
Dave Ebbelaar
Рет қаралды 19 М.
Jon Harmon - I Built a Robot to Write This Talk
17:10
Lander Analytics
Рет қаралды 72
Simulating the Evolution of Rock, Paper, Scissors
15:00
Primer
Рет қаралды 584 М.
GraphRAG: LLM-Derived Knowledge Graphs for RAG
15:40
Alex Chao
Рет қаралды 91 М.
Max Kuhn -SHINYLIVE IS SO EASY
19:49
Lander Analytics
Рет қаралды 687
После ввода кода - протирайте панель
0:18
Up Your Brains
Рет қаралды 1,2 МЛН
Мой инст: denkiselef. Как забрать телефон через экран.
0:54
Samsung Galaxy Unpacked July 2024: Official Replay
1:8:53
Samsung
Рет қаралды 23 МЛН