Evaluation for Large Language Models and Generative AI - A Deep Dive

  Рет қаралды 6,972

Rajistics - data science, AI, and machine learning

Rajistics - data science, AI, and machine learning

Күн бұрын

Evaluation for Large Language Models and Generative AI - A Deep Dive
Notebooks and additional resources: github.com/rajshah4/LLM-Evalu...
0:00 Overview
1:30 Evaluation is broken
4:00 Reliability issues with HELM and Hugging Face Leaderboard
6:32 Evaluation before generative AI
9:33 Framework for evaluating generative AI
14:01 Reviewing 8 evaluation approaches
15:22 Exact matching approach
29:12 Similarity approach
32:37 Functional Correctness
35:50 Evaluation Benchmarks
45:12 Human Evaluation
49:03 Human Comparison/Arena
52:00 Model-Based Evaluation
1:02:22 Red Teaming
1:06:11 Operational Issues in Evaluation
1:09:10 RAG Case Study
━━━━━━━━━━━━━━━━━━━━━━━━━
★ Rajistics Social Media »
● Home Page: www.rajivshah.com
● LinkedIn: / rajistics
━━━━━━━━━━━━━━━━━━━━━━━━━

Пікірлер: 11
@alishafique3
@alishafique3 Ай бұрын
I have consulted many blogs but this LLM evaluation video is best so far. Thank you so much
@jacehua7334
@jacehua7334 6 ай бұрын
this is great been waiting for this
@twist8250
@twist8250 6 ай бұрын
Super informative with great research and presentation!
@a-moralphilosopher3525
@a-moralphilosopher3525 2 ай бұрын
Thank you for this! very helpful!
@GauravKumar-ud7zf
@GauravKumar-ud7zf Ай бұрын
Great content. I watched from start to end. Just wondering which software or service you used to create this video, with your video on the slide.
@JOHNSMITH-ve3rq
@JOHNSMITH-ve3rq 6 ай бұрын
Wow this channel is amazing!!!!
@MannyBernabe
@MannyBernabe 6 ай бұрын
awesome! thx!
@AjayJetty
@AjayJetty 5 ай бұрын
Love this Rajiv. Do you think the evaluations will become industry specific so that people can use out of the box evaluation frameworks to automate evaluations
@Rajistics
@Rajistics 5 ай бұрын
Yes, right now a lot of evaluations are aligned with traditional academic topics, but I fully expect more industry specific evaluations will emerge
@felixhuthmacher6784
@felixhuthmacher6784 6 ай бұрын
Great video Rajiv, even though it was over 1h, I was glued to the screen the entire time. :) For anyone who is interested in learning more about how to approach this on AWS, a few weeks back I put together this video kzbin.info/www/bejne/fobYgGybf8eCis0 which also includes a notebook on how to get started quickly.
@Rajistics
@Rajistics 6 ай бұрын
Thanks Felix, I added the video and notebook here: github.com/rajshah4/LLM-Evaluation
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 1,8 МЛН
Omega Boy Past 3 #funny #viral #comedy
00:22
CRAZY GREAPA
Рет қаралды 25 МЛН
[Vowel]물고기는 물에서 살아야 해🐟🤣Fish have to live in the water #funny
00:53
Sigma Girl Education #sigma #viral #comedy
00:16
CRAZY GREAPA
Рет қаралды 66 МЛН
How to set up RAG - Retrieval Augmented Generation (demo)
19:52
Don Woodlock
Рет қаралды 8 М.
[Webinar] LLMs for Evaluating LLMs
49:07
Arthur
Рет қаралды 7 М.
Deep-dive into the AI Hardware of ChatGPT
20:15
High Yield
Рет қаралды 312 М.
Simple Introduction to Large Language Models (LLMs)
25:20
Matthew Berman
Рет қаралды 48 М.
AI vs ML vs DL vs Generative Ai
16:00
Krish Naik
Рет қаралды 26 М.
Nokia 3310 versus Red Hot Ball
0:37
PressTube
Рет қаралды 3,6 МЛН
How much charging is in your phone right now? 📱➡️ 🔋VS 🪫
0:11
Как я сделал домашний кинотеатр
0:41
RICARDO
Рет қаралды 1,5 МЛН
МОЖНО ЛИ заряжать AirPods в чехле 🧐😱🧐 #airpods #applewatch #dyson
0:22
Apple_calls РЕПЛИКА №1 В РФ
Рет қаралды 18 М.