Evaluation for Large Language Models and Generative AI

Evaluation for Large Language Models and Generative AI - A Deep Dive

Рет қаралды 6,972

Rajistics - data science, AI, and machine learning

Күн бұрын

Evaluation for Large Language Models and Generative AI - A Deep Dive
Notebooks and additional resources: github.com/rajshah4/LLM-Evalu...
0:00 Overview
1:30 Evaluation is broken
4:00 Reliability issues with HELM and Hugging Face Leaderboard
6:32 Evaluation before generative AI
9:33 Framework for evaluating generative AI
14:01 Reviewing 8 evaluation approaches
15:22 Exact matching approach
29:12 Similarity approach
32:37 Functional Correctness
35:50 Evaluation Benchmarks
45:12 Human Evaluation
49:03 Human Comparison/Arena
52:00 Model-Based Evaluation
1:02:22 Red Teaming
1:06:11 Operational Issues in Evaluation
1:09:10 RAG Case Study
━━━━━━━━━━━━━━━━━━━━━━━━━
★ Rajistics Social Media »
● Home Page: www.rajivshah.com
● LinkedIn: / rajistics
━━━━━━━━━━━━━━━━━━━━━━━━━

Пікірлер: 11

@alishafique3 Ай бұрын

I have consulted many blogs but this LLM evaluation video is best so far. Thank you so much

@jacehua7334 6 ай бұрын

this is great been waiting for this

@twist8250 6 ай бұрын

Super informative with great research and presentation!

@a-moralphilosopher3525 2 ай бұрын

Thank you for this! very helpful!

@GauravKumar-ud7zf Ай бұрын

Great content. I watched from start to end. Just wondering which software or service you used to create this video, with your video on the slide.

@JOHNSMITH-ve3rq 6 ай бұрын

Wow this channel is amazing!!!!

@MannyBernabe 6 ай бұрын

awesome! thx!

@AjayJetty 5 ай бұрын

Love this Rajiv. Do you think the evaluations will become industry specific so that people can use out of the box evaluation frameworks to automate evaluations

@Rajistics 5 ай бұрын

Yes, right now a lot of evaluations are aligned with traditional academic topics, but I fully expect more industry specific evaluations will emerge

@felixhuthmacher6784 6 ай бұрын

Great video Rajiv, even though it was over 1h, I was glued to the screen the entire time. :) For anyone who is interested in learning more about how to approach this on AWS, a few weeks back I put together this video kzbin.info/www/bejne/fobYgGybf8eCis0 which also includes a notebook on how to get started quickly.

@Rajistics 6 ай бұрын

Thanks Felix, I added the video and notebook here: github.com/rajshah4/LLM-Evaluation