[Webinar] LLMs for Evaluating LLMs

  Рет қаралды 12,055

Arthur

Arthur

Күн бұрын

In this webinar, Arthur's ML Engineers Max Cembalest & Rowan Cheung shared best practices and learnings from using LLMs to evaluate other LLMs.
They covered:
• Evolving Evaluation: LLMs require new evaluation methods to determine which models are best suited for which purposes.
• LLMs as Evaluators: LLMs are used to assess other LLMs, leveraging their human-like responses and contextual understanding.
• Biases and Risks: Understanding biases in LLM responses when judging other models is essential to ensure fair evaluations.
• Relevance and Context: LLMs can create testing datasets that better reflect real-world context, enhancing model applicability assessment.
More links you might find useful:
• Learn more about Arthur Bench, our LLM evaluation product → www.arthur.ai/...
• Check out the Arthur Bench GitHub → github.com/art...
• Join us on Discord → / discord
--
About Arthur:
Arthur is the AI performance company. Our platform monitors, measures, and improves machine learning models to deliver better results. We help data scientists, product owners, and business leaders accelerate model operations and optimize for accuracy, explainability, and fairness.
Arthur’s research-led approach to product development drives exclusive capabilities in LLMs, computer vision, NLP, bias mitigation, and other critical areas. We’re on a mission to make AI work for everyone, and we are deeply passionate about building ML technology to drive responsible business results.
Learn more about Arthur → bit.ly/3KA31Vh
Follow us on Twitter → / itsarthurai
Follow us on LinkedIn → / arthurai
Sign up for our newsletter → www.arthur.ai/...

Пікірлер: 2
@vincentkaranja7062
@vincentkaranja7062 Жыл бұрын
Fantastic presentation, Max and Rowan! The depth of your analysis and the clarity with which you presented the complexities of evaluating LLMs is truly commendable. It's evident that a lot of thought and effort went into this research. I'm particularly intrigued by your approach to using LLMs as evaluators. It opens up a plethora of possibilities but also brings forth some ethical considerations. How do you account for systemic biases in evaluation metrics when using LLMs as evaluators? Given that traditional metrics might not capture the fairness aspect adequately, have you considered incorporating fairness metrics or mitigation methods in your evaluation process?
@ohmkaark
@ohmkaark 7 ай бұрын
I was looking for a good summary around LLM evaluation metrics.. I see a lot of them captured here well
[Webinar] How to Build a Modern Agentic System
1:00:55
Arthur
Рет қаралды 15 М.
«Жат бауыр» телехикаясы І 30 - бөлім | Соңғы бөлім
52:59
Qazaqstan TV / Қазақстан Ұлттық Арнасы
Рет қаралды 340 М.
Evaluating LLM-based Applications
33:50
Databricks
Рет қаралды 31 М.
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 406 М.
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 2,5 МЛН
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,7 МЛН
How to Evaluate LLM Performance for Domain-Specific Use Cases
56:43
How to evaluate an LLM-powered RAG application automatically.
50:42
How might LLMs store facts | DL7
22:43
3Blue1Brown
Рет қаралды 1 МЛН