How to Construct Domain Specific LLM Evaluation Systems: Hamel Husain and Emil Sedgh

  Рет қаралды 8,381

AI Engineer

AI Engineer

Күн бұрын

Many failed AI products share a common root cause: a failure to create robust evaluation systems. Evaluation systems allow you to improve your AI quickly in a systematic way and unlock superpowers like the ability to curate data for fine-tuning. However, many practitioners struggle with how to construct evaluation systems that are specific to their problems.
In this talk, we will walk through a detailed example of how to construct domain-specific evaluation systems.
Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.enginee... & join us at the AI Engineer World's Fair in 2025! Get your tickets today at ai.engineer/2025
About Hamel
Hamel Husain started working with language models five years ago when he led the team that created CodeSearchNet, a precursor to GitHub CoPilot. Since then, he has seen many successful and unsuccessful approaches to building LLM products. Hamel is also an active open source maintainer and contributor of a wide range of ML/AI projects. Hamel is currently an independent consultant.
About Emil
Emil is CTO at Rechat, where he leads the development of Lucy, an AI personal assistant designed to support real estate agents.

Пікірлер: 4
@mr_abims
@mr_abims 22 күн бұрын
This really came at the right time for me.Thank you so much
@hosseinderakhshan8632
@hosseinderakhshan8632 4 ай бұрын
Can't say enough how much I am proud of you!
@maxjesch
@maxjesch 4 ай бұрын
super relevant content! Thanks for sharing!
@yoavtamir7707
@yoavtamir7707 2 ай бұрын
Great talk
How to Evaluate LLM Performance for Domain-Specific Use Cases
56:43
GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem
19:15
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨
00:21
Two More French
Рет қаралды 42 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
Why Engineers Should Consider AI Consulting (And How to Start)
36:52
Building Real-World LLM Products with Fine-Tuning and More with Hamel Husain - 694
1:19:36
The TWIML AI Podcast with Sam Charrington
Рет қаралды 2,4 М.
Run LLM Evals with Pytest and LangSmith
15:52
LangChain
Рет қаралды 3,9 М.
How might LLMs store facts | DL7
22:43
3Blue1Brown
Рет қаралды 1 МЛН
Vertical AI Agents Could Be 10X Bigger Than SaaS
42:13
Y Combinator
Рет қаралды 662 М.
No more bad outputs with structured generation: Remi Louf
15:32
AI Engineer
Рет қаралды 10 М.