What Language Model To Choose For Your Project? 🤔 LLM Evaluation

  Рет қаралды 440

Analytics Camp

Analytics Camp

Күн бұрын

#llm #huggingface #gpt4 #ai
With more than 490,000 language models uploaded in the Hugging Face model repositories, how do you find the best language model for your personal or business projects? I have spent two weeks searching for the best models so you don’t have to.
In this video, you will get to know all details about Hugging Face LLM Leaderboard and how it evaluates all the models objectively, the criteria and ratings, the top models in each category, and the performances of popular models such as GPT-3 and GPT-4, BERT, RoBERTa, and many more.
You will also get to know all six evaluation benchmarks in this leaderboard: MMLU, ARC, HellaSwag, TrithfulQA, Winogrande, and GSM8k. And of course, I’ll let you know about a platform where you can evaluate these models on your own!
Stick around for more videos on LLM, Natural Language Processing (NLP), Generative AI, fun coding and machine learning projects, and follow Analytics Camp on Twitter (X): / analyticscamp
Don’t forget to subscribe and watch these related videos:
Is Mamba destroying Transformers For Good? Language Models in AI
• Is Mamba Destroying Tr...
Transformer Language Models Simplified in JUST 3 MINUTES!
• Transformer Language M...
Mamba Language Model Simplified In JUST 5 MINUTES!
• Mamba Language Model S...
This Is How EXACTLY Language Models Work In AI-- NO Background Needed:
• This is how EXACTLY La...
Zeno AI Evaluation platform
zenoml.com/
www.youtube.com/@analyticsCam...
Key Terms and Concepts:
00:00 Intro
00:28 Hugging Face LLM Leaderboard
01:44 MMLU (Measuring Massive Multitask Language Understanding)
02:08 Hendrycks Tests in MMLU
02:36 Test of Moral Scenarios
03:55 EleutherAI
04:04 Eleuther AI Language Model Evaluation Harness
04:17 AI2 Reasoning Challenge (ARC)
04:30 TruthfulQA tests
04:57 Humans VS LLM scores
05:19 GPT-3 answers to TruthfulQA test (#gpt3)
06:06 HellaSwag tests
06:58 Sample test from HellaSwag
08:31 GPT-4 results of HellaSwag tests (#gpt4)
08:41 RoBERTa, BERT(#googleai) and GPT base models results
09:06 Winogrande test
09:54 GSM8K test
11:03 Results for deciding the best LLM
12:16 Best language models for Question Answering projects
12:48 Zeno AI Evaluation platform

Пікірлер: 6
@Researcher100
@Researcher100 3 ай бұрын
I see what you did with the GPT answers 😏 And the humans vs models thing with Khabib vs McGreggor was super dope 😂😂😂
@analyticsCamp
@analyticsCamp 3 ай бұрын
Thanks for watching!
@facundohannoch3675
@facundohannoch3675 2 ай бұрын
Thank you!! Could not find much information about how to compare LLMs, and your video was really helpful!
@analyticsCamp
@analyticsCamp 2 ай бұрын
Glad it was helpful! Let me know if you'd like me to cover any topic :)
@optiondrone5468
@optiondrone5468 3 ай бұрын
I'm new to ML, and when it comes to model selection, I always had questions about what are the important matrices that are considered during model selection. I like what Hugging Face did in their leaderboard, and I also liked your explanation. Thanks for sharing it with us.
@analyticsCamp
@analyticsCamp 3 ай бұрын
Glad it was helpful!
How to Choose an LLM
21:48
Krista AI
Рет қаралды 925
OMG 😨 Era o tênis dela 🤬
00:19
Polar em português
Рет қаралды 10 МЛН
Chips evolution !! 😔😔
00:23
Tibo InShape
Рет қаралды 42 МЛН
格斗裁判暴力执法!#fighting #shorts
00:15
武林之巅
Рет қаралды 85 МЛН
Simple Introduction to Large Language Models (LLMs)
25:20
Matthew Berman
Рет қаралды 48 М.
DjangoCon US 2023: Don't Buy the "A.I." Hype
26:09
Tim Allen
Рет қаралды 10 М.
Let's build GPT: from scratch, in code, spelled out.
1:56:20
Andrej Karpathy
Рет қаралды 4,3 МЛН
Should You Use Open Source Large Language Models?
6:40
IBM Technology
Рет қаралды 334 М.
A Complete Look at Large Language Models
10:49
AssemblyAI
Рет қаралды 17 М.
Testing AI Models with Bench LLM - See Which One's Best!
11:00
Testing AI
Рет қаралды 1,1 М.
Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework
1:00:40
What are Large Language Models (LLMs)?
5:30
Google for Developers
Рет қаралды 218 М.
How charged your battery?
0:14
V.A. show / Магика
Рет қаралды 2,4 МЛН
iphone fold ? #spongebob #spongebobsquarepants
0:15
Si pamer 😏
Рет қаралды 104 М.
POCO F6 PRO - ЛУЧШИЙ POCO НА ДАННЫЙ МОМЕНТ!
18:51
What percentage of charge is on your phone now? #entertainment
0:14