Regression Testing | LangSmith Evaluations - Part 15

  Рет қаралды 3,227

LangChain

LangChain

Күн бұрын

Evaluations can accelerate LLM app development, but it can be challenging to get started. We've kicked off a new video series focused on evaluations in LangSmith.
With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've heard that it is challenging to get started. So, we are launching a series of short videos focused on explaining how to perform evaluations using LangSmith.
This video focuses on Regression Testing, which lets a user highlight particular examples in an eval set that show improvement or regression across a set of experiments.
Blog: blog.langchain.dev/regression...
LangSmith: smith.langchain.com/
Documentation: docs.smith.langchain.com/eval...

Пікірлер: 3
@MattJonesYT
@MattJonesYT 2 ай бұрын
This is extremely useful, especially for agent systems where the rules have been written to be over-fit for a particular LLM. I find crewai often has that problem, it works well for the LLM it was written for but then makes nonsense with a different LLM.
@MattJonesYT
@MattJonesYT 2 ай бұрын
An extension of this idea would be doing regressions on the prompt system as a whole in an agent system to see how well it adapts to other LLMs. Make a matrix of how its prompts work for its original LLM vs new, out-of-sample LLMs. If it immediately breaks on new LLMs then it is probably over-fit and you can have AI try to re-write those prompts to be simpler and then make a system that is more robust for different LLMs.
@UtopIA-IAparaDevs
@UtopIA-IAparaDevs 2 ай бұрын
Thank you
Reliable, fully local RAG agents with LLaMA3
21:19
LangChain
Рет қаралды 98 М.
Haha😂 Power💪 #trending #funny #viral #shorts
00:18
Reaction Station TV
Рет қаралды 14 МЛН
Was ist im Eis versteckt? 🧊 Coole Winter-Gadgets von Amazon
00:37
SMOL German
Рет қаралды 10 МЛН
3 wheeler new bike fitting
00:19
Ruhul Shorts
Рет қаралды 52 МЛН
Build an SQL Agent with Llama 3 | Langchain | Ollama
20:28
TheAILearner
Рет қаралды 2,4 М.
Pairwise Evaluation | LangSmith Evaluations - Part 17
13:45
LangChain
Рет қаралды 1,7 М.
LangSmith in 10 Minutes
9:21
LangChain
Рет қаралды 18 М.
Difference Between Regression Testing And Retesting
4:37
Software Testing Material
Рет қаралды 104 М.
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners
12:44