That's a pretty good list! Refining on your point about input handling: You need verbose error messages on your tools. Really verbose, like documention instead of error text. Done right your agents can try, try again and succeed where they will have otherwise failed.
@samwitteveenai2 ай бұрын
Yes good point.
@themax2goАй бұрын
Very good and important point
@freeideas2 ай бұрын
Found this channel because I have built an agent to operate my PC/Mac for me. Even though I love writing software, i HATE operating computers and cellphones. Why? I don't know; just how I am built. But I have had to learn the lessons of this video the hard way. 1) you need the right set of tools that your LLM understands, 2) you have to use the LLM to scrub both the input and the output of those tools [e.g. I found I can't go directly from "click the subscribe button you now see" to tool call; I had to break this into two steps a) what tool can do this, and b) call that tool] , and 3) you need some kind of oversight or management function to make sure the work is going the right direction, with ability to re-plan. My current implementation is somewhat lacking in #3, so if everything goes as expected, it works great, otherwise it ends up in endless loops of despair. Anyway, thanks, Sam, for helping me crystalize these lessons in my mind. I will be refurbishing my bot soon.
@AI_Escaped2 ай бұрын
Awesome video! I actually just came to the realization today i need to Name my tools and descriptions pretty much exactly like you described. Lots of great info, thank you!
@kepenge2 ай бұрын
Would be nice if you could demo a custom tool using all the important aspects that you have mentioned. And what would the results of applying on not applying them.
@jayhu60752 ай бұрын
Been away for a while. It’s nice to be back on your channel and to watch this informative tutorial. Many thanks.
@superfliping2 ай бұрын
Thank you for your wonderful configurations to learn tools. Local tools for automation with laptop and cellphone interface for llm local memory for my conversations and code building. Is what I'm building
@MeinDeutschkurs2 ай бұрын
16:27 Yes, in deed! Tools are very important. I“m just not convinced that the framework approach is the best (and only) one.
@samwitteveenai2 ай бұрын
I am not using LangChain for anything major in production, The framework approach is useful to show people things rather than write them from scratch and I don't want to go giving out prod code at this stage. The tools concepts though are pretty much the same no matter what you choose to use.
@MeinDeutschkurs2 ай бұрын
@@samwitteveenai , good to know. Good that you mention it. Over time, I got the impression that these frameworks are the be-all and end-all.
@AI_Escaped2 ай бұрын
Even a non-framework approach is a framework :)
@MeinDeutschkurs2 ай бұрын
@@AI_Escaped I wrote „these frameworks“. And now, do you feel better?
@nedkelly36102 ай бұрын
Excellent, All the agent verification ive been thinking of, but i haven't had the time to write
@OscarDuys2 ай бұрын
How does the agent fair when you give it access to lots of tools? I would assume that it increases the number of errors/hallucinations that occur but how quickly is this dropoff? Essentially asking if you've given it access to all 48 of your tools at the same time?
@AI_Escaped2 ай бұрын
OpenAI Assistants can hold up to 128 function call tools, but I hear it can get confused when it has many tools, but I assume that depends on a lot of things. I was just thinking of just storing tools a python files locally in a database instead of defined with the actual agent. I really don't see much of a difference, and you can easily add a lot more meta data for the choosing of the correct tool. Or maybe a combination of a main tool repo where agents can pull the tools they need from, lot's to figure out. Oh and my other option, every tool is an agent.
@samwitteveenai2 ай бұрын
I generally don't give the agent access to all of the tools. It doesn't really make sense, as the tools vary in their uses a lot (e.g. social media and other things just don't end up being in the same agent). That said, I do know people that are building agents that have hundreds of tools, if not more, and I think the Guerrilla paper was testing on thousands of tools. My personal belief is to give the agents just enough to do the job that you want them to do, and don't try to confuse them. Try to constrain the agents as much as possible to get good results. This also allows you to build tests and evals much easier as well.
@samwitteveenai2 ай бұрын
Another approach that I like doing is cascading tools. Like a cascading classifier from ML, where you have one classifier that puts things into categories and another one that determines the low-level category. You can do the same kind of thing with agents, where the agent decides that it needs a Reddit tool and then it has another function call to work out exactly what Reddit tool and stuff like that. That's a very simplified example of this.
@szekaatti83942 ай бұрын
Are you writing evals (evaluation code) for each of those tools separately? Also you mentioned that you are using langchain in some of the tools, do you find it still useful? I find myself struggling more and more with these abstractions and most of the time I'm just using simple things like instructor and build "the framework" around it myself (things like retries on bad outputs, have a graph based routing, etc...).
@MeinDeutschkurs2 ай бұрын
I can hear you! I feel the same! How much terminology do I need to study, that I‘m able to do this or that. Believe it or not, I can do lots with python, and I don‘t know how to name it. 😂
@samwitteveenai2 ай бұрын
Yes this is totally valid point and I largely agree. I tend to use LangGraph for prototyping and then streamline for anything I want to put into production etc. Instructor is a cool lib nad I have used it for a few things.
@tauraik2 ай бұрын
Thanks this was really good
@George-ew4io2 ай бұрын
Thanks for the video. I see you using langraph a lot, do you recommend it for building production ready agents?
@samwitteveenai2 ай бұрын
Most of the time I prototype things in LangGraph, and I think LandGraph is good for explaining things to people and for teaching, but for production, I generally go to custom code. I do have some things that run as LangGraph microservices that do small things, etc.
@alx84392 ай бұрын
what is your favorite framework? I'm seeing "agency swarm" is getting some good traction. Before that MS Autogen was very promising, but with limited toolset it was quite dumb. They have released a new version recently - worth checking. And Langflow / N8N also look nice.
@samwitteveenai2 ай бұрын
I haven't really had a proper play with Agency Swarm, so I'm not sure that much about it. I do like Autogen. I think it's got some things that are going for it. I do like LangGraph for some things. Langflow and N8n are really low-code or no-code kinds of tools.
@GeorgeDekker2 ай бұрын
Thank Sam. Any pointers on having agents generate tools?
@samwitteveenai2 ай бұрын
The thing I would say is you generally don't want an agent to write its own tools on the fly. You can certainly use things like cursor and various code generation tools to create the tools and then use them in your agent, but I'm really reluctant to let the agent do that realtime. They tend to be too unrestricted and just end up wasting lots of tokens and going into loops of repeating themselves.
@GeorgeDekker2 ай бұрын
@samwitteveenai thank you for your comment. I wasn't considering on the fly. I was thinking like a CrewAI crew that is completely focused on building CrewAI tools, following the process of [research, design, build, test, improve]. With a huge emphasis on utilizing existing tools, like read/write files, websearch, etc. and very strict small tasks/agent role definitions. And yes, I'm considering writing the whole thing with cursor. Do you think this is doable, any suggestions to assist the creation of good tests?
@AI_Escaped2 ай бұрын
@@samwitteveenai Very true, but n some instances they can work ok. For example I have a psql agent that can manage databases using psql commands and sqlachemy with an InMemoryExecution tool and it actually works pretty well after some initial training, but it is true, they can get confused on occasion and get stuck in loops if they try something they've never tried before. But over all for general tasks it's not bad. All depends on the use case. I can just say for example, create a new relational database for this or that with all the these tables and fields and fill it will sample data, and it can pull it off no problem. In this case, i would have to make a tool for every possible action on a database. In production, i would have no choice though but to make a tool for each task.
@dawid_dahl2 ай бұрын
So writing some functions is good when programming. Thanks!
@foxusmusicus29292 ай бұрын
With which tool do you generate these great videos? Looks like AI generated but great quality. I love it
@freeideas2 ай бұрын
Ok sorry for so many messages, but I have a question/experience: I eventually had to make a prompt loop where I say something like, "translate this giant blob of english language in to this specific json format", because sometimes the kind of json i need is not exactly what a function call schema returns. The loop reads the json returned, finds errors if any, then re-prompts with the original inputs plus the first output and the errors. The loop parses out the json from the response (basically just finds the first and last squiggles or brackets). This sounds clumsy but it works so well that I am thinking of not using proper function tool calls anymore. The function tool calls are idiosyncratic and differ quite a bit from one LLM to another, but the json I get from this loop is perfect every time and rarely has to re-prompt. Wondering if anyone can tell me that I am being stupid or naive here. :)
@s4rel0ck2 ай бұрын
I have a sense that LangChain and "agents" are simply an anthropomorphized solution in search of a problem. You argue that a custom tool isn't just an API call, but "agents" are merely sequenced, conditional, or looping LLM prompts, possibly with some function calling - essentially just API calls. With Cursor IDE, you can write functions to call new API endpoints in 1-2 requests, and then you have OpenAI's o1 model that has built-in chain-of-thought and planning capabilities. This begs the question of why you need an agentic approach or LangChain at all.
@AI_Escaped2 ай бұрын
Agents are basically just smart programs
@samwitteveenai2 ай бұрын
For me, agents are programs where you have the LLM make specific decisions that don't just work from conditional logic like normal programming. Regarding LangChain or any framework for these kinds of things, I think it just often makes it easier for a lot of people to see what's going on at a high level. Though I do get very frustrated (with LC ) at some of the low level stuff that's going on where it's being overly complicated, etc.
@waneyvin2 ай бұрын
can you recommend some planning tools?
@samwitteveenai2 ай бұрын
This really depends on your use case and your definition of planning. LLMs are not great at formal planning, but they are very creative at coming up with proposals that you can then use a checker to run through and validate it.
@waneyvin2 ай бұрын
@@samwitteveenai Thanks a lot! Just looking for some general planning capabilities to interface with open-ended environment, it might then have a chance to make agents autonomous.
@nufh2 ай бұрын
LLMs tend to generate answers even when they don't have the right information. When retrieving data from RAG, if the question is unrelated, they sometimes hallucinate responses. I've seen people build tools to filter and fact-check, but with multiple agents running, they quickly eat up my credits like there's no tomorrow.
@seththunder20772 ай бұрын
Hey sam, just curious in regards to **kwargs, why didnt you 1) include *args 2) just use *args instead of *kwargs
@samwitteveenai2 ай бұрын
Yes, you are right. I could have probably just used *args here. I tend to use keyword arguments a lot and often the LLMs won't necessarily return what they need in the correct order, so I find keyword arguments work better there.
@MeinDeutschkurs2 ай бұрын
I cannot hear it anymore… Framework here, framework there… What about tool usage without any framework? What about alternative approaches? What about an orchestrator that simply writes a python script on given functions, to return the first layer? direct_reply() or whatever tool/workflow I want to use the LLM uses to generate whatever? Sam, I‘m so frustrated because of all these so called frameworks. Mistral-Nemo is able to write instructed code and to avoid out of scope code.
@micbab-vg2mu2 ай бұрын
thanks :)
@pensiveintrovert43182 ай бұрын
I am not aware of any agent system that is reliable enough to trust.
@samwitteveenai2 ай бұрын
It really depends on the task and the industry, but I agree with you that most of the agents that people are building out there only work about 70% of the time. This is what I saw when I was in San Francisco talking to different startups, etc. It really comes down to how good are your evals and your tests for making sure that things work, and can you build verifiers and checkers.
@pensiveintrovert43182 ай бұрын
@@samwitteveenai what I have noticed, trying to develop with some of these systems, the initial work progresses faster than normal, but once you get to a bit complexity, they start breaking stuff that you did before and it becomes impossibly hard to incrementally improve what you have already.
@fontenbleau2 ай бұрын
The only sense in ai agents was to make money...and this story will be closed soon altogether when they'll give that american 16 years prison for farming 12 mlns by agents on Spotify. Moreover this case, which will be resolved after elections makes Ai money (not received from humans and that's main goal for prosecution to prove) toxic active, which will be impossible to legalize in USA, next in EU outside of corporations bank accs. Yes, agents will be present after this, but scraping web is not interesting to anyone, so this part of industry will be of very few. I kinda see less news about agents now, accents changing.