Deep Dive into LLM Evaluation with Weights & Biases

Рет қаралды 17,127

9 ай бұрын

In the dynamic world of Large Language Models (LLMs), we've unlocked the power to build smart systems from our data. Just like any other piece of automation software, it's essential we take the time to assess these LLM systems. In this webinar, we're going to dive into how we can effectively evaluate these systems, with a particular focus on Retrieval Augmented Generation (RAG) systems.
We'll start by discussing the 'eye-balling' technique and why Weight & Biases Prompts stands out as the first great tool in this area. Then, we'll move on to supervised evaluation, highlighting why it's worth considering and pointing out some limitations.
To wrap things up, we'll look at how LLMs can be used to evaluate themselves - from generating their own evaluation datasets, to using standard metrics like SQuAD or BLUE, and even evaluating retrieval systems.
On top of all that, we'll also touch on how W&B Sweeps, an excellent tool for hyperparameter optimization, can be utilized to find the ideal balance to maximize accuracy and minimize costs. The session will end with a Q&A with the presenters.
This workshop is based off the foundational learnings of DeepLearning.AI’s course on Evaluating & Debugging Generative AI built in collaboration with the Weights & Biases team. Everything covered in the workshop is presented as continued education from the course.
Event Agenda
40-minute Workshop
10-minute Q&A: Answering questions from the audience.
About the Speakers:
Morgan McGuire - Growth Director at Weights & Biases
Morgan leads the Growth ML team and is a ML Engineer at Weights & Biases. He has a background in NLP and previously worked at Facebook on the Safety team where he helped classify and flag potentially high-severity content for removal.
Ayush Thakur - Machine Learning Engineer at Weights & Biases
Ayush is a MLE at Weights and Biases and Google Developer Expert in Machine Learning (TensorFlow). He is interested in everything computer vision and representation learning. For the past 7 months he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.
Carey Phelps -Founding Product Manager at Weights & Biases
Carey is the founding Product Manager at Weights & Biases. She studied computer science at Stanford and went on to found Carta Healthcare before joining Weights & Biases.

Пікірлер: 6

@philipegger4599 9 ай бұрын

It seems that on slide 32, "Human Eval" and "User Testing" have inexplicably switched places.

@ayushthakur736 9 ай бұрын

Thanks for catching. Corrected.

@420_gunna Ай бұрын

I get that content marketing is always going to be thinly veiled product marketing, but this was a little too on the nose, and not enough meat otherwise

@Gus-AI-World 9 ай бұрын

this is a disaster presentation. At 10 mins Ayush starts a business presentation. Then nothing can be seen because of the small sized font. I do not know who is your target audience for this? then suddenly Ayush jumps to somehow "it works" wandb screen for LLMs.