How to evaluate upgrading your app to GPT-4o | LangSmith Evaluations - Part 18

  Рет қаралды 9,813

LangChain

LangChain

Күн бұрын

OpenAI recently released GPT-4o, which reports significant improvements in latency and cost. Many users may wonder how to evaluate the effects of upgrading their app to GPT-4o? For example, what latency benefit will users expect to gain and are there any material differences in app performance when I switch to the new GPT-4o model.
Decisions like this are often limited by quality evaluations! Here, we show the process of evaluating GPT-4o on an example RAG app with a 20 question eval set related to LangChain documentation. We show how regression testing in the LangSmith UI allows you to quickly pinpoint examples where GPT-4o shows improvements or regressions over your current app.
GPT-4o docs:
openai.com/ind...
LangSmith regression testing UI docs:
docs.smith.lan...
RAG evaluation docs:
docs.smith.lan...
Public dataset referenced in the video:
smith.langchai...
Cookbook referenced in the video:
github.com/lan...

Пікірлер: 13
@luisguillermopardo7792
@luisguillermopardo7792 4 ай бұрын
I have an script that use pandas agents and tools to solve questions. I updated to gpt-4o and the model has many Issues at uso y the tools compared with gpt-4. Do you know if we have to do any extra setting or something ?
@sounakroy1933
@sounakroy1933 4 ай бұрын
For eda?
@AI_by_AI_007
@AI_by_AI_007 4 ай бұрын
Googles team does AlphaFold and changes the world and Sam gives us NSFW tools….
@ibbobud
@ibbobud 4 ай бұрын
Quick and to the point! Love the eval!
@millingabani
@millingabani 4 ай бұрын
You guys are awesome!
@learnbydoing6010
@learnbydoing6010 4 ай бұрын
So fast. 🎉thank you.
@calvin_banks_music
@calvin_banks_music 4 ай бұрын
Did you make this graphic at 1.48 programmatically or did you import it as image from a different tool?
@MaybeTogether
@MaybeTogether 4 ай бұрын
Thank you. I instinctively started googling, because for me answer accuracy / answer quality is more significant to me
@octaviusp
@octaviusp 4 ай бұрын
ahhaa, very fast reaction! great job
@_arkadij
@_arkadij 4 ай бұрын
working fast cool
@ClarkNewlove
@ClarkNewlove 4 ай бұрын
Nice. Thanks for sharing!
@Nairb932
@Nairb932 4 ай бұрын
Keep up the great work
@BenitoMartin-dk7lj
@BenitoMartin-dk7lj 4 ай бұрын
Amazing!
Backtesting  | LangSmith Evaluations - Part 19
10:05
LangChain
Рет қаралды 1,1 М.
NotebookLM - EXperimental by Google
2:06
ecorpnu1
Рет қаралды 5
А ВЫ ЛЮБИТЕ ШКОЛУ?? #shorts
00:20
Паша Осадчий
Рет қаралды 9 МЛН
26 Incredible Use Cases for the New GPT-4o
21:58
The AI Advantage
Рет қаралды 824 М.
Building long context RAG with RAPTOR from scratch
21:30
LangChain
Рет қаралды 33 М.
What are AI Agents?
12:29
IBM Technology
Рет қаралды 489 М.
Building and Testing Reliable Agents
22:17
LangChain
Рет қаралды 12 М.
Reliable, fully local RAG agents with LLaMA3
21:19
LangChain
Рет қаралды 113 М.
Optimization of LLM Systems with DSPy and LangChain/LangSmith
57:55
Fully local RAG agents with Llama 3.1
20:04
LangChain
Рет қаралды 49 М.
LangGraph 101: it's better than LangChain
32:26
James Briggs
Рет қаралды 75 М.
Wi-fi с бесконечным паролем 😱
0:18
FilmBytes
Рет қаралды 101 М.
iPhone or Samsung?
0:28
Kan Andrey
Рет қаралды 784 М.
Samsung копирует Apple?
0:36
Не шарю!
Рет қаралды 1,4 МЛН