It's just beginning to get useful. It would be neat to see an agent community that automate a software development cycle, starting with documentation then acceptance tests, specification/contract tests, implementation, test verification and debugging. At the moment all the demos of LLMs making code I see are basically just asking it to write snake, and seeing if it does it, or when it doesn't, if it can fix the code given the compiler/runtime error. But I've found with my own projects that if you invest heavily in the textual and programmatic documentation up-front, they are much better at generating code that actually works.
@Music-mx4ttАй бұрын
Do you see that happening? I definitely do.
@40bombalaАй бұрын
Wow, this is so good. I don't think structured output or JSON mode is available for o1 yet though. Will be even more powerful once those and function calls are available.
@janweber1699Ай бұрын
wow that was more impressiv than expected kinda crazy, the scaling laws seem even more absurd now. great video what a cool dude
@wildfotozАй бұрын
I apologize if you've covered this in a previous video but, what if you do a video where you have an agent that will create a web application that talks to a database? Will it actually create a new database or tables in an existing database. Will it create the different pages or forms in the app and give you a menu system? Another idea, have it search different websites for new TV shows or movies that are coming out and have it show you only the stuff that matches your tastes. Stuff like this would be a more real world use case for agents.
@technolus5742Ай бұрын
AI sailing the 7 seas 😏
@pin65371Ай бұрын
One project that could be interesting to work on would be to think of a bunch of tasks that you do pretty often and use cursor to make a bunch of scripts for you and build almost like a home page with all the scripts and a front end for them. So lets say you want to do transcripts for your videos you could just have a drag and drop where you could just drag the video file on and it would create a transcript. You could also add a little chat for each tool so if you want to add custom instructions you could do that as well. With how quick and cheap it is to do stuff like this now you could quickly just build up a lot of tools that are useful.
@tomaszzielinski4521Ай бұрын
For very complex tasks I'd try manager agent in the middle, which tries to follow the plan and talks to tool agents as needed, but also allows some flexibility, handling blocking issues or even reporting failure if the task is not possible.
@Leto2ndAtreidesАй бұрын
You're making me feel bad about not burning $1,000 on the OpenAI API.
@wurstelei1356Ай бұрын
Did the requests for this video really cost $1000? Sounds expensive. One could buy a descent GPU and run agents forever locally from that money.
@gramnegrodАй бұрын
Open ai is releasing to Tier 4 now:OpenAI’s Tier 4 for API usage has specific requirements and benefits: • Requirements: You must have a payment history of at least 14 days and have spent at least $250 on the API..
@eirikggАй бұрын
@@gramnegrod But you could also use it via openrouter, i did but so far havent seen much improvments over claude 3.5 im my tests
@thedinoman7828Ай бұрын
just use openrouter
@mrpro7737Ай бұрын
WOW, you did a great job with your prompting method Can you made a web search agent that - chat with user about a problem - than generate search queries for Google search - than export 10 first website from every search query - scrape all the websites and analyse the data with web scraping analysist agent - than save this data in vectors - give the saved data to an agent along with the main prompt + system prompt that make it generate an HTML page of an article that have all the solution for the problem along with sources and images its really a cool project, i am working on it 😅 , i am challenging myself to complete it in 1 month
@idea_listАй бұрын
Quite excited to see this, thanks a bunch for pioneering and sharing, it saves others a lot of time) You probably should've explained crucialness of master agent role in more detail though. Half of the comments are from those who don't understand the imprortance of o1-preview output in your system.
@gramnegrodАй бұрын
Awesome video! THX for all the creative applications! I think getting agents to program and deploy browser actions like through Selenium would really open up many valid use cases, basically limitless. I'm talking about having o1 produce workflows similar to what Mullion is trying to do but instead using a library like Skyvern into your agent tools. It is probably a bit grandiose, but it might work.
@wicktorinox6942Ай бұрын
What would be great to see a demo, when it is not starting as "do a basic".... Because, this is where my struggles are starting.
@TheAIVarietyShowАй бұрын
For the snow, 16F would be snow weather. Not sure what 16C is off the top of my head. Maybe that's why the Bart pic didn't come out as expected
@johannesseikowsky819713 күн бұрын
hey! I just became a channel member. where can I find the github for this project?
@MarkAlmeida-CardyАй бұрын
Great content! Any chance, we can have access to the code you used in the video?
@TheTruthIsGonnaHurtАй бұрын
*How long did it take to do the planning?* Kind of a big step in testing capabilities and it wasn't discussed.
@raymond_luxury_yachtАй бұрын
I'm gonna hook up to sonet and tell it to just spit out the code via API cos I'm fed up copying and pasting to Vs ode. It just works!
@alizaman239Ай бұрын
How is this too different from having lambda function rather than agents ?
@micbab-vg2muАй бұрын
very interesting - thank you for sharing:)
@wurstelei1356Ай бұрын
I wonder if the new LLaMa (3.2) models are capable of doing similar things. The smaller models seem better than from the 3.1 version.
@silentage6310Ай бұрын
also interesting. at least the model should be able to call functions. additional training is needed, or make the function call with a different syntax. there are also studies that say that Json is not the best IO format for LLM
@wurstelei1356Ай бұрын
@@silentage6310 LLaMa is capable of function calling. I think AllAboutAI made a video on it half a year ago. Not sure which LLaMa version it was.
@SebKroghАй бұрын
Could this be done in something like flowise or similar? Just thinking if there are any glaring limitations of the low-code setups?
@aaronag7876Ай бұрын
Are all Ai Agents code written in Python ? or is there other languages that can be used ? Like Javascript ?
@nessrinetrabelsi8581Ай бұрын
Can you just ask it to do one task with no details like you did? Like generate a .md which contains 3 images of the weather of next 3 days in SF?
@nufhАй бұрын
Can I get this codes via AI Rookie?
@egoincarnateАй бұрын
Is source for this available? I don't see it on the linked github...
@AyushBhatt-g7qАй бұрын
same, i was looking for the files so that i can understand them but they aren't there.
@VaibhavShewaleАй бұрын
you guys need to be a paid user of his community to check out the code
@fredericherreraАй бұрын
very good. reverse engineering pub trivia questions
@hqcart1Ай бұрын
If you really think about it, O1 did not do anything, you already have broken the task into the 15 steps ready, and you could have just sent these 15 steps to o1 to do everything, except o1 cant generate images (for now) , so i do not see what is the benefit...
@idea_listАй бұрын
It's not that, you should just try to create such a sequence of agent instructions with anything else. You'll immediately understand the difference. And the thing is, with moderately complicated agentic system user shouldn't even be aware of what agents and with what functions there are. User should just generate a task, and the system's mainbrain (o1 in this case) should do all the planning. No other model could do that before, there were lots of tiny (or not so tiny) inaccuracies, and while trying to get rid of them via prompting or structural adjustments you'd generate a thousand more. It was just not viable.
@dogonbbАй бұрын
i understand what you mean. The "plan" still comes from him. But think about what open ai could possibly do. He did this alone already
@xspydazxАй бұрын
i dont think you can be wrong my friend ! you just realise its using a graph and a router with and itent detector ! and the guardrails ! the model has not changed !, thye did say they fine tunoed the model on the step by step ! yu have produced these works your self ! ( you just need a graph ).. this is the best way to create a agentic system : you can also use open router ( this can be a agnet in the chain to detect which route to pick , ie the routes can be graphs ) .. Each node can be a agent in nthe graph with its won specific tools ! graphs are recursive until the problem or step is solved ! but it may take a long time but it is heavily direccted as well as being quite inteligence with latitude for the agents to create ad perform ! SO now we can create many types of graph for various types of task ! and the router to pick which path to take ! Master mind ! can be the fat controller !
@iPlooxАй бұрын
1 file, now make it pull any open source repository and fix a bug issue on that repo. If you dont give it steps, it always fails
@lystic9392Ай бұрын
What kind of costs are to be expected with things like this?
@pnwadventures2955Ай бұрын
Between $1 and $25.000, or more. You could also do it for free.
@tomaszzielinski4521Ай бұрын
Need to go and try, but still we're talking about cents here. Unless you get agent stuck in a loop, it won't generate enough tokens to be a concern.
@lystic9392Ай бұрын
@@tomaszzielinski4521 Really? Hmm... That makes things more interesting.
@aiamfreeАй бұрын
I have a feeling the stream of the answer at the end of o1 is faux-stream
@TomlinsonHume-h6mАй бұрын
Williamson Loaf
@fmbetterforms5900Ай бұрын
These are not agents they are functions.
@VaibhavShewaleАй бұрын
wow, damn! you telling us for free? not asking to join your paid user to learn this? either this is ot that good or you just wanted to show good will?? well seeing your full video after so much time feels good!
@finalfan321Ай бұрын
so many use cases
@Rh22-c9lАй бұрын
Holy s....... im learning to program ... why ?
@HenningKilset76Ай бұрын
Did you notice the part where he was looking at the code and validating that it was generated correctly? In real life and with more complex problems - that's how we do our job, still, with AI present everywhere. Because the AI is wrong *a lot*. Still. AI is also usually still pretty shit about connecting multiple parts of a system together when the system is complex.