AI Agents' Secret Sauce

Рет қаралды 11,916

Күн бұрын

Пікірлер: 50

@PrincessKushana 2 ай бұрын

That's a pretty good list! Refining on your point about input handling: You need verbose error messages on your tools. Really verbose, like documention instead of error text. Done right your agents can try, try again and succeed where they will have otherwise failed.

@samwitteveenai 2 ай бұрын

Yes good point.

@themax2go Ай бұрын

Very good and important point

@freeideas 2 ай бұрын

Found this channel because I have built an agent to operate my PC/Mac for me. Even though I love writing software, i HATE operating computers and cellphones. Why? I don't know; just how I am built. But I have had to learn the lessons of this video the hard way. 1) you need the right set of tools that your LLM understands, 2) you have to use the LLM to scrub both the input and the output of those tools [e.g. I found I can't go directly from "click the subscribe button you now see" to tool call; I had to break this into two steps a) what tool can do this, and b) call that tool] , and 3) you need some kind of oversight or management function to make sure the work is going the right direction, with ability to re-plan. My current implementation is somewhat lacking in #3, so if everything goes as expected, it works great, otherwise it ends up in endless loops of despair. Anyway, thanks, Sam, for helping me crystalize these lessons in my mind. I will be refurbishing my bot soon.

@AI_Escaped 2 ай бұрын

Awesome video! I actually just came to the realization today i need to Name my tools and descriptions pretty much exactly like you described. Lots of great info, thank you!

@kepenge 2 ай бұрын

Would be nice if you could demo a custom tool using all the important aspects that you have mentioned. And what would the results of applying on not applying them.

@jayhu6075 2 ай бұрын

Been away for a while. It’s nice to be back on your channel and to watch this informative tutorial. Many thanks.

@superfliping 2 ай бұрын

Thank you for your wonderful configurations to learn tools. Local tools for automation with laptop and cellphone interface for llm local memory for my conversations and code building. Is what I'm building

@MeinDeutschkurs 2 ай бұрын

16:27 Yes, in deed! Tools are very important. I“m just not convinced that the framework approach is the best (and only) one.

@samwitteveenai 2 ай бұрын

I am not using LangChain for anything major in production, The framework approach is useful to show people things rather than write them from scratch and I don't want to go giving out prod code at this stage. The tools concepts though are pretty much the same no matter what you choose to use.

@MeinDeutschkurs 2 ай бұрын

@@samwitteveenai , good to know. Good that you mention it. Over time, I got the impression that these frameworks are the be-all and end-all.

@AI_Escaped 2 ай бұрын

Even a non-framework approach is a framework :)

@MeinDeutschkurs 2 ай бұрын

@@AI_Escaped I wrote „these frameworks“. And now, do you feel better?

@nedkelly3610 2 ай бұрын

Excellent, All the agent verification ive been thinking of, but i haven't had the time to write

@OscarDuys 2 ай бұрын

How does the agent fair when you give it access to lots of tools? I would assume that it increases the number of errors/hallucinations that occur but how quickly is this dropoff? Essentially asking if you've given it access to all 48 of your tools at the same time?

@AI_Escaped 2 ай бұрын

OpenAI Assistants can hold up to 128 function call tools, but I hear it can get confused when it has many tools, but I assume that depends on a lot of things. I was just thinking of just storing tools a python files locally in a database instead of defined with the actual agent. I really don't see much of a difference, and you can easily add a lot more meta data for the choosing of the correct tool. Or maybe a combination of a main tool repo where agents can pull the tools they need from, lot's to figure out. Oh and my other option, every tool is an agent.

@samwitteveenai 2 ай бұрын

I generally don't give the agent access to all of the tools. It doesn't really make sense, as the tools vary in their uses a lot (e.g. social media and other things just don't end up being in the same agent). That said, I do know people that are building agents that have hundreds of tools, if not more, and I think the Guerrilla paper was testing on thousands of tools. My personal belief is to give the agents just enough to do the job that you want them to do, and don't try to confuse them. Try to constrain the agents as much as possible to get good results. This also allows you to build tests and evals much easier as well.

@samwitteveenai 2 ай бұрын

Another approach that I like doing is cascading tools. Like a cascading classifier from ML, where you have one classifier that puts things into categories and another one that determines the low-level category. You can do the same kind of thing with agents, where the agent decides that it needs a Reddit tool and then it has another function call to work out exactly what Reddit tool and stuff like that. That's a very simplified example of this.

@szekaatti8394 2 ай бұрын

Are you writing evals (evaluation code) for each of those tools separately? Also you mentioned that you are using langchain in some of the tools, do you find it still useful? I find myself struggling more and more with these abstractions and most of the time I'm just using simple things like instructor and build "the framework" around it myself (things like retries on bad outputs, have a graph based routing, etc...).

@MeinDeutschkurs 2 ай бұрын

I can hear you! I feel the same! How much terminology do I need to study, that I‘m able to do this or that. Believe it or not, I can do lots with python, and I don‘t know how to name it. 😂

@samwitteveenai 2 ай бұрын

Yes this is totally valid point and I largely agree. I tend to use LangGraph for prototyping and then streamline for anything I want to put into production etc. Instructor is a cool lib nad I have used it for a few things.

@tauraik 2 ай бұрын

Thanks this was really good

@George-ew4io 2 ай бұрын

Thanks for the video. I see you using langraph a lot, do you recommend it for building production ready agents?

@samwitteveenai 2 ай бұрын

Most of the time I prototype things in LangGraph, and I think LandGraph is good for explaining things to people and for teaching, but for production, I generally go to custom code. I do have some things that run as LangGraph microservices that do small things, etc.

@alx8439 2 ай бұрын

what is your favorite framework? I'm seeing "agency swarm" is getting some good traction. Before that MS Autogen was very promising, but with limited toolset it was quite dumb. They have released a new version recently - worth checking. And Langflow / N8N also look nice.

@samwitteveenai 2 ай бұрын

I haven't really had a proper play with Agency Swarm, so I'm not sure that much about it. I do like Autogen. I think it's got some things that are going for it. I do like LangGraph for some things. Langflow and N8n are really low-code or no-code kinds of tools.

@GeorgeDekker 2 ай бұрын

Thank Sam. Any pointers on having agents generate tools?

@samwitteveenai 2 ай бұрын

The thing I would say is you generally don't want an agent to write its own tools on the fly. You can certainly use things like cursor and various code generation tools to create the tools and then use them in your agent, but I'm really reluctant to let the agent do that realtime. They tend to be too unrestricted and just end up wasting lots of tokens and going into loops of repeating themselves.

@GeorgeDekker 2 ай бұрын

@samwitteveenai thank you for your comment. I wasn't considering on the fly. I was thinking like a CrewAI crew that is completely focused on building CrewAI tools, following the process of [research, design, build, test, improve]. With a huge emphasis on utilizing existing tools, like read/write files, websearch, etc. and very strict small tasks/agent role definitions. And yes, I'm considering writing the whole thing with cursor. Do you think this is doable, any suggestions to assist the creation of good tests?

@AI_Escaped 2 ай бұрын

@@samwitteveenai Very true, but n some instances they can work ok. For example I have a psql agent that can manage databases using psql commands and sqlachemy with an InMemoryExecution tool and it actually works pretty well after some initial training, but it is true, they can get confused on occasion and get stuck in loops if they try something they've never tried before. But over all for general tasks it's not bad. All depends on the use case. I can just say for example, create a new relational database for this or that with all the these tables and fields and fill it will sample data, and it can pull it off no problem. In this case, i would have to make a tool for every possible action on a database. In production, i would have no choice though but to make a tool for each task.

@dawid_dahl 2 ай бұрын

So writing some functions is good when programming. Thanks!

@foxusmusicus2929 2 ай бұрын

With which tool do you generate these great videos? Looks like AI generated but great quality. I love it

@freeideas 2 ай бұрын

Ok sorry for so many messages, but I have a question/experience: I eventually had to make a prompt loop where I say something like, "translate this giant blob of english language in to this specific json format", because sometimes the kind of json i need is not exactly what a function call schema returns. The loop reads the json returned, finds errors if any, then re-prompts with the original inputs plus the first output and the errors. The loop parses out the json from the response (basically just finds the first and last squiggles or brackets). This sounds clumsy but it works so well that I am thinking of not using proper function tool calls anymore. The function tool calls are idiosyncratic and differ quite a bit from one LLM to another, but the json I get from this loop is perfect every time and rarely has to re-prompt. Wondering if anyone can tell me that I am being stupid or naive here. :)

@s4rel0ck 2 ай бұрын

I have a sense that LangChain and "agents" are simply an anthropomorphized solution in search of a problem. You argue that a custom tool isn't just an API call, but "agents" are merely sequenced, conditional, or looping LLM prompts, possibly with some function calling - essentially just API calls. With Cursor IDE, you can write functions to call new API endpoints in 1-2 requests, and then you have OpenAI's o1 model that has built-in chain-of-thought and planning capabilities. This begs the question of why you need an agentic approach or LangChain at all.

@AI_Escaped 2 ай бұрын

Agents are basically just smart programs

@samwitteveenai 2 ай бұрын

For me, agents are programs where you have the LLM make specific decisions that don't just work from conditional logic like normal programming. Regarding LangChain or any framework for these kinds of things, I think it just often makes it easier for a lot of people to see what's going on at a high level. Though I do get very frustrated (with LC ) at some of the low level stuff that's going on where it's being overly complicated, etc.

@waneyvin 2 ай бұрын

can you recommend some planning tools?

@samwitteveenai 2 ай бұрын

This really depends on your use case and your definition of planning. LLMs are not great at formal planning, but they are very creative at coming up with proposals that you can then use a checker to run through and validate it.

@waneyvin 2 ай бұрын

@@samwitteveenai Thanks a lot! Just looking for some general planning capabilities to interface with open-ended environment, it might then have a chance to make agents autonomous.

@nufh 2 ай бұрын

LLMs tend to generate answers even when they don't have the right information. When retrieving data from RAG, if the question is unrelated, they sometimes hallucinate responses. I've seen people build tools to filter and fact-check, but with multiple agents running, they quickly eat up my credits like there's no tomorrow.

@seththunder2077 2 ай бұрын

Hey sam, just curious in regards to **kwargs, why didnt you 1) include *args 2) just use *args instead of *kwargs

@samwitteveenai 2 ай бұрын

Yes, you are right. I could have probably just used *args here. I tend to use keyword arguments a lot and often the LLMs won't necessarily return what they need in the correct order, so I find keyword arguments work better there.

@MeinDeutschkurs 2 ай бұрын

I cannot hear it anymore… Framework here, framework there… What about tool usage without any framework? What about alternative approaches? What about an orchestrator that simply writes a python script on given functions, to return the first layer? direct_reply() or whatever tool/workflow I want to use the LLM uses to generate whatever? Sam, I‘m so frustrated because of all these so called frameworks. Mistral-Nemo is able to write instructed code and to avoid out of scope code.

@micbab-vg2mu 2 ай бұрын

thanks :)

@pensiveintrovert4318 2 ай бұрын

I am not aware of any agent system that is reliable enough to trust.

@samwitteveenai 2 ай бұрын

It really depends on the task and the industry, but I agree with you that most of the agents that people are building out there only work about 70% of the time. This is what I saw when I was in San Francisco talking to different startups, etc. It really comes down to how good are your evals and your tests for making sure that things work, and can you build verifiers and checkers.

@pensiveintrovert4318 2 ай бұрын

@@samwitteveenai what I have noticed, trying to develop with some of these systems, the initial work progresses faster than normal, but once you get to a bit complexity, they start breaking stuff that you did before and it becomes impossibly hard to incrementally improve what you have already.

@fontenbleau 2 ай бұрын

The only sense in ai agents was to make money...and this story will be closed soon altogether when they'll give that american 16 years prison for farming 12 mlns by agents on Spotify. Moreover this case, which will be resolved after elections makes Ai money (not received from humans and that's main goal for prosecution to prove) toxic active, which will be impossible to legalize in USA, next in EU outside of corporations bank accs. Yes, agents will be present after this, but scraping web is not interesting to anyone, so this part of industry will be of very few. I kinda see less news about agents now, accents changing.