Was I Wrong About AI Agents? | INSANE OpenAI-o1 Planning Capabilities

Рет қаралды 19,505

Күн бұрын

Пікірлер: 53

@mrpocock Ай бұрын

It's just beginning to get useful. It would be neat to see an agent community that automate a software development cycle, starting with documentation then acceptance tests, specification/contract tests, implementation, test verification and debugging. At the moment all the demos of LLMs making code I see are basically just asking it to write snake, and seeing if it does it, or when it doesn't, if it can fix the code given the compiler/runtime error. But I've found with my own projects that if you invest heavily in the textual and programmatic documentation up-front, they are much better at generating code that actually works.

@Music-mx4tt Ай бұрын

Do you see that happening? I definitely do.

@40bombala Ай бұрын

Wow, this is so good. I don't think structured output or JSON mode is available for o1 yet though. Will be even more powerful once those and function calls are available.

@janweber1699 Ай бұрын

wow that was more impressiv than expected kinda crazy, the scaling laws seem even more absurd now. great video what a cool dude

@wildfotoz Ай бұрын

I apologize if you've covered this in a previous video but, what if you do a video where you have an agent that will create a web application that talks to a database? Will it actually create a new database or tables in an existing database. Will it create the different pages or forms in the app and give you a menu system? Another idea, have it search different websites for new TV shows or movies that are coming out and have it show you only the stuff that matches your tastes. Stuff like this would be a more real world use case for agents.

@technolus5742 Ай бұрын

AI sailing the 7 seas 😏

@pin65371 Ай бұрын

One project that could be interesting to work on would be to think of a bunch of tasks that you do pretty often and use cursor to make a bunch of scripts for you and build almost like a home page with all the scripts and a front end for them. So lets say you want to do transcripts for your videos you could just have a drag and drop where you could just drag the video file on and it would create a transcript. You could also add a little chat for each tool so if you want to add custom instructions you could do that as well. With how quick and cheap it is to do stuff like this now you could quickly just build up a lot of tools that are useful.

@tomaszzielinski4521 Ай бұрын

For very complex tasks I'd try manager agent in the middle, which tries to follow the plan and talks to tool agents as needed, but also allows some flexibility, handling blocking issues or even reporting failure if the task is not possible.

@Leto2ndAtreides Ай бұрын

You're making me feel bad about not burning $1,000 on the OpenAI API.

@wurstelei1356 Ай бұрын

Did the requests for this video really cost $1000? Sounds expensive. One could buy a descent GPU and run agents forever locally from that money.

@gramnegrod Ай бұрын

Open ai is releasing to Tier 4 now:OpenAI’s Tier 4 for API usage has specific requirements and benefits: • Requirements: You must have a payment history of at least 14 days and have spent at least $250 on the API..

@eirikgg Ай бұрын

@@gramnegrod But you could also use it via openrouter, i did but so far havent seen much improvments over claude 3.5 im my tests

@thedinoman7828 Ай бұрын

just use openrouter

@mrpro7737 Ай бұрын

WOW, you did a great job with your prompting method Can you made a web search agent that - chat with user about a problem - than generate search queries for Google search - than export 10 first website from every search query - scrape all the websites and analyse the data with web scraping analysist agent - than save this data in vectors - give the saved data to an agent along with the main prompt + system prompt that make it generate an HTML page of an article that have all the solution for the problem along with sources and images its really a cool project, i am working on it 😅 , i am challenging myself to complete it in 1 month

@idea_list Ай бұрын

Quite excited to see this, thanks a bunch for pioneering and sharing, it saves others a lot of time) You probably should've explained crucialness of master agent role in more detail though. Half of the comments are from those who don't understand the imprortance of o1-preview output in your system.

@gramnegrod Ай бұрын

Awesome video! THX for all the creative applications! I think getting agents to program and deploy browser actions like through Selenium would really open up many valid use cases, basically limitless. I'm talking about having o1 produce workflows similar to what Mullion is trying to do but instead using a library like Skyvern into your agent tools. It is probably a bit grandiose, but it might work.

@wicktorinox6942 Ай бұрын

What would be great to see a demo, when it is not starting as "do a basic".... Because, this is where my struggles are starting.

@TheAIVarietyShow Ай бұрын

For the snow, 16F would be snow weather. Not sure what 16C is off the top of my head. Maybe that's why the Bart pic didn't come out as expected

@johannesseikowsky8197 13 күн бұрын

hey! I just became a channel member. where can I find the github for this project?

@MarkAlmeida-Cardy Ай бұрын

Great content! Any chance, we can have access to the code you used in the video?

@TheTruthIsGonnaHurt Ай бұрын

*How long did it take to do the planning?* Kind of a big step in testing capabilities and it wasn't discussed.

@raymond_luxury_yacht Ай бұрын

I'm gonna hook up to sonet and tell it to just spit out the code via API cos I'm fed up copying and pasting to Vs ode. It just works!

@alizaman239 Ай бұрын

How is this too different from having lambda function rather than agents ?

@micbab-vg2mu Ай бұрын

very interesting - thank you for sharing:)

@wurstelei1356 Ай бұрын

I wonder if the new LLaMa (3.2) models are capable of doing similar things. The smaller models seem better than from the 3.1 version.

@silentage6310 Ай бұрын

also interesting. at least the model should be able to call functions. additional training is needed, or make the function call with a different syntax. there are also studies that say that Json is not the best IO format for LLM

@wurstelei1356 Ай бұрын

@@silentage6310 LLaMa is capable of function calling. I think AllAboutAI made a video on it half a year ago. Not sure which LLaMa version it was.

@SebKrogh Ай бұрын

Could this be done in something like flowise or similar? Just thinking if there are any glaring limitations of the low-code setups?

@aaronag7876 Ай бұрын

Are all Ai Agents code written in Python ? or is there other languages that can be used ? Like Javascript ?

@nessrinetrabelsi8581 Ай бұрын

Can you just ask it to do one task with no details like you did? Like generate a .md which contains 3 images of the weather of next 3 days in SF?

@nufh Ай бұрын

Can I get this codes via AI Rookie?

@egoincarnate Ай бұрын

Is source for this available? I don't see it on the linked github...

@AyushBhatt-g7q Ай бұрын

same, i was looking for the files so that i can understand them but they aren't there.

@VaibhavShewale Ай бұрын

you guys need to be a paid user of his community to check out the code

@fredericherrera Ай бұрын

very good. reverse engineering pub trivia questions

@hqcart1 Ай бұрын

If you really think about it, O1 did not do anything, you already have broken the task into the 15 steps ready, and you could have just sent these 15 steps to o1 to do everything, except o1 cant generate images (for now) , so i do not see what is the benefit...

@idea_list Ай бұрын

It's not that, you should just try to create such a sequence of agent instructions with anything else. You'll immediately understand the difference. And the thing is, with moderately complicated agentic system user shouldn't even be aware of what agents and with what functions there are. User should just generate a task, and the system's mainbrain (o1 in this case) should do all the planning. No other model could do that before, there were lots of tiny (or not so tiny) inaccuracies, and while trying to get rid of them via prompting or structural adjustments you'd generate a thousand more. It was just not viable.

@dogonbb Ай бұрын

i understand what you mean. The "plan" still comes from him. But think about what open ai could possibly do. He did this alone already

@xspydazx Ай бұрын

i dont think you can be wrong my friend ! you just realise its using a graph and a router with and itent detector ! and the guardrails ! the model has not changed !, thye did say they fine tunoed the model on the step by step ! yu have produced these works your self ! ( you just need a graph ).. this is the best way to create a agentic system : you can also use open router ( this can be a agnet in the chain to detect which route to pick , ie the routes can be graphs ) .. Each node can be a agent in nthe graph with its won specific tools ! graphs are recursive until the problem or step is solved ! but it may take a long time but it is heavily direccted as well as being quite inteligence with latitude for the agents to create ad perform ! SO now we can create many types of graph for various types of task ! and the router to pick which path to take ! Master mind ! can be the fat controller !

@iPloox Ай бұрын

1 file, now make it pull any open source repository and fix a bug issue on that repo. If you dont give it steps, it always fails

@lystic9392 Ай бұрын

What kind of costs are to be expected with things like this?

@pnwadventures2955 Ай бұрын

Between $1 and $25.000, or more. You could also do it for free.

@tomaszzielinski4521 Ай бұрын

Need to go and try, but still we're talking about cents here. Unless you get agent stuck in a loop, it won't generate enough tokens to be a concern.

@lystic9392 Ай бұрын

@@tomaszzielinski4521 Really? Hmm... That makes things more interesting.

@aiamfree Ай бұрын

I have a feeling the stream of the answer at the end of o1 is faux-stream

@TomlinsonHume-h6m Ай бұрын

Williamson Loaf

@fmbetterforms5900 Ай бұрын

These are not agents they are functions.

@VaibhavShewale Ай бұрын

wow, damn! you telling us for free? not asking to join your paid user to learn this? either this is ot that good or you just wanted to show good will?? well seeing your full video after so much time feels good!

@finalfan321 Ай бұрын

so many use cases

@Rh22-c9l Ай бұрын

Holy s....... im learning to program ... why ?

@HenningKilset76 Ай бұрын

Did you notice the part where he was looking at the code and validating that it was generated correctly? In real life and with more complex problems - that's how we do our job, still, with AI present everywhere. Because the AI is wrong *a lot*. Still. AI is also usually still pretty shit about connecting multiple parts of a system together when the system is complex.