Рет қаралды 9,813
OpenAI recently released GPT-4o, which reports significant improvements in latency and cost. Many users may wonder how to evaluate the effects of upgrading their app to GPT-4o? For example, what latency benefit will users expect to gain and are there any material differences in app performance when I switch to the new GPT-4o model.
Decisions like this are often limited by quality evaluations! Here, we show the process of evaluating GPT-4o on an example RAG app with a 20 question eval set related to LangChain documentation. We show how regression testing in the LangSmith UI allows you to quickly pinpoint examples where GPT-4o shows improvements or regressions over your current app.
GPT-4o docs:
openai.com/ind...
LangSmith regression testing UI docs:
docs.smith.lan...
RAG evaluation docs:
docs.smith.lan...
Public dataset referenced in the video:
smith.langchai...
Cookbook referenced in the video:
github.com/lan...