How to evaluate upgrading your app to GPT-4o | LangSmith Evaluations - Part 18

Backtesting | LangSmith Evaluations - Part 19

NotebookLM - EXperimental by Google

😳Что делать, если вас Похоронили заживо ? #shorts

Wow!😮 Delicious Candies Turned Into A Snail Dessert!🐌🍭 #catvideos #catmemes #trending

Проверил Лайфхак ОГОНЬ-ТРЕНИЕМ Сахар+Марганцовка #фрост #shorts #frost #лайфхаки #лайфхак #выживание

А ВЫ ЛЮБИТЕ ШКОЛУ?? #shorts

How to evaluate upgrading your app to GPT-4o | LangSmith Evaluations - Part 18

Рет қаралды 9,813

LangChain

Күн бұрын

OpenAI recently released GPT-4o, which reports significant improvements in latency and cost. Many users may wonder how to evaluate the effects of upgrading their app to GPT-4o? For example, what latency benefit will users expect to gain and are there any material differences in app performance when I switch to the new GPT-4o model.
Decisions like this are often limited by quality evaluations! Here, we show the process of evaluating GPT-4o on an example RAG app with a 20 question eval set related to LangChain documentation. We show how regression testing in the LangSmith UI allows you to quickly pinpoint examples where GPT-4o shows improvements or regressions over your current app.
GPT-4o docs:
openai.com/ind...
LangSmith regression testing UI docs:
docs.smith.lan...
RAG evaluation docs:
docs.smith.lan...
Public dataset referenced in the video:
smith.langchai...
Cookbook referenced in the video:
github.com/lan...

Пікірлер: 13

@luisguillermopardo7792

@luisguillermopardo7792 4 ай бұрын

I have an script that use pandas agents and tools to solve questions. I updated to gpt-4o and the model has many Issues at uso y the tools compared with gpt-4. Do you know if we have to do any extra setting or something ?

@sounakroy1933 4 ай бұрын

For eda?

@AI_by_AI_007 4 ай бұрын

Googles team does AlphaFold and changes the world and Sam gives us NSFW tools….

@ibbobud 4 ай бұрын

Quick and to the point! Love the eval!

@millingabani 4 ай бұрын

You guys are awesome!

@learnbydoing6010

@learnbydoing6010 4 ай бұрын

So fast. 🎉thank you.

@calvin_banks_music

@calvin_banks_music 4 ай бұрын

Did you make this graphic at 1.48 programmatically or did you import it as image from a different tool?

@MaybeTogether 4 ай бұрын

Thank you. I instinctively started googling, because for me answer accuracy / answer quality is more significant to me

@octaviusp 4 ай бұрын

ahhaa, very fast reaction! great job

@_arkadij 4 ай бұрын

working fast cool

@ClarkNewlove 4 ай бұрын

Nice. Thanks for sharing!

@Nairb932 4 ай бұрын

Keep up the great work

@BenitoMartin-dk7lj

@BenitoMartin-dk7lj 4 ай бұрын

Amazing!

Backtesting | LangSmith Evaluations - Part 19

10:05

Backtesting | LangSmith Evaluations - Part 19

LangChain

Рет қаралды 1,1 М.

NotebookLM - EXperimental by Google

2:06

NotebookLM - EXperimental by Google

ecorpnu1

Рет қаралды 5

😳Что делать, если вас Похоронили заживо ? #shorts

00:37

😳Что делать, если вас Похоронили заживо ? #shorts

King jr

Рет қаралды 6 МЛН

Wow!😮 Delicious Candies Turned Into A Snail Dessert!🐌🍭 #catvideos #catmemes #trending

00:23

Wow!😮 Delicious Candies Turned Into A Snail Dessert!🐌🍭 #catvideos #catmemes #trending

Oscar's Funny World

Рет қаралды 41 МЛН

Проверил Лайфхак ОГОНЬ-ТРЕНИЕМ Сахар+Марганцовка #фрост #shorts #frost #лайфхаки #лайфхак #выживание

00:56

Проверил Лайфхак ОГОНЬ-ТРЕНИЕМ Сахар+Марганцовка #фрост #shorts #frost #лайфхаки #лайфхак #выживание

FROST

Рет қаралды 8 МЛН

А ВЫ ЛЮБИТЕ ШКОЛУ?? #shorts

00:20

А ВЫ ЛЮБИТЕ ШКОЛУ?? #shorts

Паша Осадчий

Рет қаралды 9 МЛН

Flowise AI Tutorial #4 - Vector Store Injest & Query (incl. Pinecone)

12:01

Flowise AI Tutorial #4 - Vector Store Injest & Query (incl. Pinecone)

Leon van Zyl

Рет қаралды 25 М.

26 Incredible Use Cases for the New GPT-4o

21:58

26 Incredible Use Cases for the New GPT-4o

The AI Advantage

Рет қаралды 824 М.

Building long context RAG with RAPTOR from scratch

21:30

Building long context RAG with RAPTOR from scratch

LangChain

Рет қаралды 33 М.

Building Corrective RAG from scratch with open-source, local LLMs

26:00

Building Corrective RAG from scratch with open-source, local LLMs

LangChain

Рет қаралды 92 М.

What are AI Agents?

12:29

What are AI Agents?

IBM Technology

Рет қаралды 489 М.

Building and Testing Reliable Agents

22:17

Building and Testing Reliable Agents

LangChain

Рет қаралды 12 М.

Reliable, fully local RAG agents with LLaMA3

21:19

Reliable, fully local RAG agents with LLaMA3

LangChain

Рет қаралды 113 М.

Optimization of LLM Systems with DSPy and LangChain/LangSmith

57:55

Optimization of LLM Systems with DSPy and LangChain/LangSmith

LangChain

Рет қаралды 19 М.

Fully local RAG agents with Llama 3.1

20:04

Fully local RAG agents with Llama 3.1

LangChain

Рет қаралды 49 М.

LangGraph 101: it's better than LangChain

32:26

LangGraph 101: it's better than LangChain

James Briggs

Рет қаралды 75 М.

Wi-fi с бесконечным паролем 😱

0:18

Wi-fi с бесконечным паролем 😱

FilmBytes

Рет қаралды 101 М.

Не работает RTX 3060 в современном ASUS TUF GAMING F15 FX506HM / ПОСЛЕ 2Х СЦ / КОД ОШИБКИ 43

40:24

Не работает RTX 3060 в современном ASUS TUF GAMING F15 FX506HM / ПОСЛЕ 2Х СЦ / КОД ОШИБКИ 43

notebook-31

Рет қаралды 105 М.

Краш-тест телефона Realme C61 #игры #game #веселыеигры #fungame #realmec61 #прочныйсмартфон

0:47

Краш-тест телефона Realme C61 #игры #game #веселыеигры #fungame #realmec61 #прочныйсмартфон

Двое играют | Наташа и Вова

Рет қаралды 1,2 МЛН

iPhone or Samsung?

0:28

iPhone or Samsung?

Kan Andrey

Рет қаралды 784 М.

Samsung копирует Apple?

0:36

Samsung копирует Apple?

Не шарю!

Рет қаралды 1,4 МЛН

TEST: Cleaning iPhone charging Port with hot glue and compressed air ❌ #asmr #satisfying

0:26

TEST: Cleaning iPhone charging Port with hot glue and compressed air ❌ #asmr #satisfying

Converto

Рет қаралды 18 МЛН

НЕДЕЛЯ с vivo X100 Ultra - кто сказал что у ЛУЧШИХ нет ПРОБЛЕМ? | ЧЕСТНЫЙ ОТЗЫВ

25:12

НЕДЕЛЯ с vivo X100 Ultra - кто сказал что у ЛУЧШИХ нет ПРОБЛЕМ? | ЧЕСТНЫЙ ОТЗЫВ

Павел Хмурчик

Рет қаралды 25 М.