How to Construct Domain Specific LLM Evaluation Systems: Hamel Husain and Emil Sedgh

How to Evaluate LLM Performance for Domain-Specific Use Cases

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

So Cute 🥰 who is better?

Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨

Мен атып көрмегенмін ! | Qalam | 5 серия

How to Construct Domain Specific LLM Evaluation Systems: Hamel Husain and Emil Sedgh

Рет қаралды 8,381

AI Engineer

AI Engineer

Күн бұрын

Many failed AI products share a common root cause: a failure to create robust evaluation systems. Evaluation systems allow you to improve your AI quickly in a systematic way and unlock superpowers like the ability to curate data for fine-tuning. However, many practitioners struggle with how to construct evaluation systems that are specific to their problems.
In this talk, we will walk through a detailed example of how to construct domain-specific evaluation systems.
Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.enginee... & join us at the AI Engineer World's Fair in 2025! Get your tickets today at ai.engineer/2025
About Hamel
Hamel Husain started working with language models five years ago when he led the team that created CodeSearchNet, a precursor to GitHub CoPilot. Since then, he has seen many successful and unsuccessful approaches to building LLM products. Hamel is also an active open source maintainer and contributor of a wide range of ML/AI projects. Hamel is currently an independent consultant.
About Emil
Emil is CTO at Rechat, where he leads the development of Lucy, an AI personal assistant designed to support real estate agents.

Пікірлер: 4

@mr_abims 22 күн бұрын

This really came at the right time for me.Thank you so much

@hosseinderakhshan8632

@hosseinderakhshan8632 4 ай бұрын

Can't say enough how much I am proud of you!

@maxjesch 4 ай бұрын

super relevant content! Thanks for sharing!

@yoavtamir7707 2 ай бұрын

Great talk

How to Evaluate LLM Performance for Domain-Specific Use Cases

56:43

How to Evaluate LLM Performance for Domain-Specific Use Cases

Snorkel AI

Рет қаралды 5 М.

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

19:15

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

AI Engineer

Рет қаралды 89 М.

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

41:02

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

OMIR

Рет қаралды 1,4 МЛН

So Cute 🥰 who is better?

00:15

So Cute 🥰 who is better?

dednahype

Рет қаралды 19 МЛН

Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨

00:21

Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨

Two More French

Рет қаралды 42 МЛН

Мен атып көрмегенмін ! | Qalam | 5 серия

25:41

Мен атып көрмегенмін ! | Qalam | 5 серия

kak budto

Рет қаралды 1,2 МЛН

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

17:52

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

AI Engineer

Рет қаралды 12 М.

How Domain-Specific AI Agents (DXA) Will Shape the Industrial World in the Next 10 Years

32:29

How Domain-Specific AI Agents (DXA) Will Shape the Industrial World in the Next 10 Years

Industrial AI Federation

Рет қаралды 189 М.

GraphGeeks Talk Ep8: How To Create Knowledge Graphs from Unstructured Data

53:53

GraphGeeks Talk Ep8: How To Create Knowledge Graphs from Unstructured Data

GraphGeeks

Рет қаралды 2,8 М.

Why Engineers Should Consider AI Consulting (And How to Start)

36:52

Why Engineers Should Consider AI Consulting (And How to Start)

Hamel Husain

Рет қаралды 3,7 М.

Building Real-World LLM Products with Fine-Tuning and More with Hamel Husain - 694

1:19:36

Building Real-World LLM Products with Fine-Tuning and More with Hamel Husain - 694

The TWIML AI Podcast with Sam Charrington

Рет қаралды 2,4 М.

Run LLM Evals with Pytest and LangSmith

15:52

Run LLM Evals with Pytest and LangSmith

LangChain

Рет қаралды 3,9 М.

How might LLMs store facts | DL7

22:43

How might LLMs store facts | DL7

3Blue1Brown

Рет қаралды 1 МЛН

Vertical AI Agents Could Be 10X Bigger Than SaaS

42:13

Vertical AI Agents Could Be 10X Bigger Than SaaS

Y Combinator

Рет қаралды 662 М.

No more bad outputs with structured generation: Remi Louf

15:32

No more bad outputs with structured generation: Remi Louf

AI Engineer

Рет қаралды 10 М.

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

58:06

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Online

Рет қаралды 112 М.

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

41:02

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

OMIR

Рет қаралды 1,4 МЛН