Abigail Haddad - Automating Tests for your RAG Chatbot or Other Generative Tool

Anna Kircher - Analyzing Consistency in LLM Outputs Leveraging Colourful Queries

What is RAG? (Retrieval Augmented Generation)

❌А с малыми только таким способом! Не бить же их #pov #story

I CAN’T BELIEVE I LOST 😱

УГНАЛИ МАШИНУ 😱 НЕЖДАННЫЙ ПОЦЕЛУЙ АМИНА ПОПАЛА В БОЛЬНИЦУ 🤯 ВЫПУСКНОЙ В КАМПУСЕ 🤩

Abigail Haddad - Automating Tests for your RAG Chatbot or Other Generative Tool

Рет қаралды 40

Lander Analytics

Lander Analytics

Ай бұрын

Automating Tests for your RAG Chatbot or Other Generative Tool by Abigail Haddad
Visit rstats.ai for information on upcoming conferences.
Abstract: Building a Retrieval Augmented Generation (RAG) chatbot that answers questions about a specific set of documents is straightforward. But how do you tell if it's working? Automated evaluation of generative tools for specific use cases is tricky, but it's also important if you want to easily compare performance using different underlying LLMs, system prompts, temperatures, or other parameters -- or just make sure you're not breaking something when you push your code. In this talk, I'll discuss why this kind of evaluation is challenging and review a few options for the kinds of assessments you can create, including using an LLM to evaluate your LLM-based tool. We'll then look at several ways to write automated LLM-led evaluations, including with a library that allows you to easily and with very little coding create complex grading rubrics for your tests.
Bio: Abigail Haddad is a data scientist who is working on automating LLM evaluations. Previously, she worked on research and data science for the Department of Defense, including at the RAND Corporation and as a Department of the Army civilian. Her hobbies include analyzing federal job listings and co-organizing Data Science DC. She blogs at The Present of Coding.
Twitter: / abbystat
Presented at the 2024 New York R Conference (May 16, 2024)
Hosted by Lander Analytics (landeranalytics.com)

Пікірлер

Anna Kircher - Analyzing Consistency in LLM Outputs Leveraging Colourful Queries

22:53

Anna Kircher - Analyzing Consistency in LLM Outputs Leveraging Colourful Queries

Lander Analytics

Рет қаралды 58

What is RAG? (Retrieval Augmented Generation)

11:37

What is RAG? (Retrieval Augmented Generation)

Don Woodlock

Рет қаралды 111 М.

❌А с малыми только таким способом! Не бить же их #pov #story

01:00

❌А с малыми только таким способом! Не бить же их #pov #story

Gufee.medalin

Рет қаралды 12 МЛН

00:12

Kan Andrey

Рет қаралды 56 МЛН

I CAN’T BELIEVE I LOST 😱

00:46

I CAN’T BELIEVE I LOST 😱

Topper Guild

Рет қаралды 117 МЛН

УГНАЛИ МАШИНУ 😱 НЕЖДАННЫЙ ПОЦЕЛУЙ АМИНА ПОПАЛА В БОЛЬНИЦУ 🤯 ВЫПУСКНОЙ В КАМПУСЕ 🤩

19:48

УГНАЛИ МАШИНУ 😱 НЕЖДАННЫЙ ПОЦЕЛУЙ АМИНА ПОПАЛА В БОЛЬНИЦУ 🤯 ВЫПУСКНОЙ В КАМПУСЕ 🤩

KiKiDo

Рет қаралды 2,4 МЛН

Wes McKinney - The Future Roadmap for the Composable Data Stack

21:49

Wes McKinney - The Future Roadmap for the Composable Data Stack

Lander Analytics

Рет қаралды 660

Megan Robertson - Not Your College Stats Course: Engaging Stakeholders Through Data Science

20:55

Megan Robertson - Not Your College Stats Course: Engaging Stakeholders Through Data Science

Lander Analytics

Рет қаралды 128

Hadley Wickham - R in Production

41:19

Hadley Wickham - R in Production

Lander Analytics

Рет қаралды 1,1 М.

Discover the Secrets to Becoming Your Customers' Top Choice

18:28

Discover the Secrets to Becoming Your Customers' Top Choice

Janice B Gordon

Рет қаралды 99

Why Agent Frameworks Will Fail (and what to use instead)

19:21

Why Agent Frameworks Will Fail (and what to use instead)

Dave Ebbelaar

Рет қаралды 19 М.

David Robinson - Science of Prod Devel: Bringing Causal Inference to Conversion & Retention Metrics

21:38

David Robinson - Science of Prod Devel: Bringing Causal Inference to Conversion & Retention Metrics

Lander Analytics

Рет қаралды 267

Jon Harmon - I Built a Robot to Write This Talk

17:10

Jon Harmon - I Built a Robot to Write This Talk

Lander Analytics

Рет қаралды 72

Simulating the Evolution of Rock, Paper, Scissors

15:00

Simulating the Evolution of Rock, Paper, Scissors

Primer

Рет қаралды 584 М.

GraphRAG: LLM-Derived Knowledge Graphs for RAG

15:40

GraphRAG: LLM-Derived Knowledge Graphs for RAG

Alex Chao

Рет қаралды 91 М.

Max Kuhn -SHINYLIVE IS SO EASY

19:49

Max Kuhn -SHINYLIVE IS SO EASY

Lander Analytics

Рет қаралды 687

После ввода кода - протирайте панель

0:18

После ввода кода - протирайте панель

Up Your Brains

Рет қаралды 1,2 МЛН

The first two iPads are imitations, just for demonstration purposes, don't worry#ipadkeyboard #ipad

0:12

The first two iPads are imitations, just for demonstration purposes, don't worry#ipadkeyboard #ipad

Typecase

Рет қаралды 1 МЛН

Сложный РЕМОНТ ТОПОВОГО Samsung Galaxy S22 ULTRA SM-S908E после залития / НЕ ЛОВИТ СЕТИ

20:45

Сложный РЕМОНТ ТОПОВОГО Samsung Galaxy S22 ULTRA SM-S908E после залития / НЕ ЛОВИТ СЕТИ

notebook-31

Рет қаралды 111 М.

Мой инст: denkiselef. Как забрать телефон через экран.

0:54

Мой инст: denkiselef. Как забрать телефон через экран.

Денис Киселев

Рет қаралды 2,3 МЛН

Does Venom also like music?#desksetup #desk #desktop #venom #pickup

0:19

Does Venom also like music?#desksetup #desk #desktop #venom #pickup

Zhuerxin

Рет қаралды 4,9 МЛН

Samsung Galaxy Unpacked July 2024: Official Replay

1:8:53

Samsung Galaxy Unpacked July 2024: Official Replay

Samsung

Рет қаралды 23 МЛН

Product Link in Bio ( # 1636 ) @MaviGadgets ✅ Smart Universal Magnetic Car Phone Holder

0:14

Product Link in Bio ( # 1636 ) @MaviGadgets ✅ Smart Universal Magnetic Car Phone Holder

MaviGadget

Рет қаралды 11 МЛН