Realtime API with Tool Chaining. ADA is BACK. o1 assistant FILE AI Agents

Рет қаралды 44,239

Күн бұрын

Пікірлер: 224

@slamb2k Ай бұрын

Dan, I'm a 47 year old software engineer working at Microsoft. I often think about all of the next gen devs coming up in a world where not having access to sick Gen AI tools like this is unheard of. I find it amusing when muscle memory has me white knuckling it in a search engine to get answers when the default approach should have been an assistant or an LLM. Anywayz, this is just a big thanks for putting out solid content that has inspired an old dog to keep focusing on new tricks. Don't change Brother!!

@radekrousek4688 Ай бұрын

thats the same like at school when they dont want you to use the calculator for too simple tasks

@AllenHorn0507 Ай бұрын

Oh my God, I can’t believe this. I am a blind professional, and this would help me so much.!

@requestfx5585 Ай бұрын

Bro is so professional, he could write a comment without having to see. This technology is awesome, but it's just a combination what's already has been possible, but put together and professionally because of openai. Combining voice models, with powerful llms and function calling, nothing is new here, the only new thing is that it has been done so well and fast

@AllenHorn0507 Ай бұрын

@@requestfx5585 do you think that’s funny? Why would you make such a shitty comment?

@AllenHorn0507 Ай бұрын

@@requestfx5585 I realize that I only represent 3% of the world’s population, but I am still a person with feelings.

@AllenHorn0507 Ай бұрын

@@requestfx5585 why would you say something so nasty do you think it’s funny that I’m blind?

@AllenHorn0507 Ай бұрын

@@requestfx5585 I’m not sure why my comments keep getting deleted, but you’re fucking comment was disgusting and not funny asshole

@user-pt1kj5uw3b Ай бұрын

Its cool seeing someone who really gets where these things are going and also what you can do with them right now. I genuinely think there will be people within the next 5 years who will have super intelligent AI directly accessible in their brain, at least if everything goes right. Which I feel insane for typing but it truly doesn't seem impossible. Thanks for releasing this too.

@mrd6869 Ай бұрын

I'm doing something similar but building AI powered cybersecurity applications instead. And you're right,this whole thing is taking off. We laughed at the Star Trek scene where Scotty,the engineer tried to speak to the computer to build software.They thought he was crazy....he wasn't crazy💯

@faturismee Ай бұрын

hi! im interested in cybersecurity with ai, can you explain more about your project? :)

@mrd6869 Ай бұрын

@@faturismee .Red Team exercises. AI systems being weaponized for cyberwarfare. This will be a thing VERY soon and it's not gonna be pretty lmao

@d.s.ramirez6178 Ай бұрын

Just wanted to say I'm so impressed by your efforts here. I'm reluctant to chime in with a comment because I'm not a coder. I'm an artist, but my second identity is as a nerd. I'm completely devoted to science and people of the highest intellect producing the innovations of the future. I just never learned how to code. As an art project, I'm trying to create the foundational characteristics for an AI which will have a compassionate, ethical personality. The concept is much deeper than this and I've been developing it for 12 years on the theoretical level. Seeing a video like this makes me wish I was surrounded by Silicon Valley people in the hopes that I would find one talented person who could help me bridge the gap of the technology. But I'm holding out hope that it could still happen if I pursue it. Anyway, I just wanted to say that I can tell that this is the leading edge of the AI landscape. This makes it worth me sifting through all the cheesy hype videos in order to find it. 💯

@kennethbeal Ай бұрын

I have a friend you might want to collaborate with. I'll send him your comment.

@jimsandwick2372 Ай бұрын

Thank you. I totally agree with you. I felt after Whisper came along the Speech-to-Text and now Speech-to-Speech combined with these new tools, future models and reduced costs will be such a game changer. I am still surprised it's rarely mentioned and how it will super-charge productivity and fundamentally change the way we code and interact with our devices. I think once it's combined with all of your data (and external data of course) in an intelligent way the creative process will be mind blowing compared to how we work now. Keep the videos up. Thanks

@gabrielketzer7084 Ай бұрын

It’s truly great that you release the code for this for the masses

@JazevoAudiosurf Ай бұрын

issue is currently that you have to program the calls etc. - if it would just write the code for it on the fly, it would be much more automated, this is generally the issue - we don't want to hardcode anything in the future, it should know what to do. also, as well as it works, it's still probabilistic - we need some sort of classification model that checks the answers/outputs for correctness that is much less probabilistic

@georgestander2682 Ай бұрын

given the users request it should create the functions, tools and even ui... thats what Karpathy is on about right now.

@lechugathedoood3595 Ай бұрын

I’m new to coding, but this gave me such a great idea of how I can combine this with my ceramics hobby. Before this video ai was a hot/cold thing for me, but now I see how I can do something cool with it! Thanks

@eintyp4389 Ай бұрын

Whats your experience with a more open ended toolbox for the agent? Like having a database in supabase with functions and Agent Workflows that can be semantically searched and used. This way you dont have to provide a long list of available tools to the agent and adding new tools or workflows or even letting agents create test and then add them for reuse wouldh be easier. Like what they have done with that Minecraft AGent Voyager was its name? Or does this fail and if so were and why?

@indydevdan Ай бұрын

This is a great pattern BUT I've been steering clear of investing time in tool selection systems because I view it as directly competing with OpenAI's advancements. In order to build useful AI Agents (which is a target of OpenAI. I think we'll see this a lot more in 2025) you need reliable tool calling, at scale. We saw them take a great stab at this with structured outputs + tool calling. So although I like your approach I'm holding off on systematizing tool selection until it's clear that OpenAI, Anthropic, Google (big 3) won't invest here.

@User-actSpacing Ай бұрын

Python is slow AF and still this demo worked extremely well! What a time to be alive!!

@mrpocock Ай бұрын

I think python or js is fine for this sort of thing where you're mainly plumbing services and compiled libraries. Particularly now that they have async built in. But I am tempted to rewrite the agent router and configuration in rust ;)

@Billy4321able Ай бұрын

The vast majority of the processing is being done in the cloud. He's just piping in to a bunch of different APIs. You could probably run the whole thing on a calculator.

@SamSargent-kh7gl Ай бұрын

it's all blocking IO so makes no difference

@mrpocock Ай бұрын

@@SamSargent-kh7gl it makes no difference until it does. I had one of the llms cpu bound because it was preprocessing and tokenising the text on the cpu in python. If the python or js is only routing, then it is fine. As soon as it steps into a hot loop, it matters a lot.

@BizInNews Ай бұрын

After listening you I started rethinking completely the system I am developing🎉

@jonatasscdc Ай бұрын

I think I know the answer but did you make this to be opensource? If yes, where is it? Another question, does this work with openrouter API key? Also... I can't believe you have only 20k subs, your channel is so great that I swear, I wait the whole week for your contents, and when they arrive, a beautiful sensation of joy kicks in. Thanks for everything, bro! Huge fan here! Waiting to spend my monthly wage on your courses!

@indydevdan Ай бұрын

Link in description and thank you 🙏. This is built on OpenAI tech so no openrouter access afaik. The engineers/builders that need to know about this channel will find it. AI Coding Course in progress. I'm working hard to make sure it earns you everything you spend on it back and more.

@goforit5 Ай бұрын

Excellent video as usual. What other developers like you are out there? I’m learning a lot as a new developer from your projects. Thanks

@Vitruvian2086 Ай бұрын

I love the breakdown and inner workings (under the hood) great job, very educational

@fackler 23 күн бұрын

Great video Dan. It's crazy how we went from "never let it get on the internet" to "go ahead and write to the file system". Everybody about to have huge bite outta the tree of the knowledge of good and evil. Here's to hoping it's delicious!

@eddited7543 Ай бұрын

Mate, I just wanted to start building EXACTLY this! As if you've read my mind^^ Thank you so much!

@DevPythonUnity Ай бұрын

i have better and way way cheaper soluion

@braysher Ай бұрын

As someone who’s dyslexic I’m soooooo excited by this. Looking forward to your next videos :)

@gr8tbigtreehugger Ай бұрын

Really love your example and passion!! Amazing stuff! I have been building my own real time speech-to-speech system, all the STT and TTS is local, works really well. And, free!

@indydevdan Ай бұрын

Care to share your stack? This is on my project hit list.

@GiomPanot Ай бұрын

Excellent work, a year ago I was able to create coaches with voice it was a bit slow, but now with the ability to do tasks it is super. Got a couple of ideas with that. What you do is really inspiring thank you. If you could share a simple tuto with your code to play that would be awesome. (for dummies). I am not a dev but can do some python and run it locally. :)

@ivanvalentini9349 Ай бұрын

This is the future. Really Nice Projects. BTW I really love your VS Code color schema, very relaxing. Does anyone know what it's called?

@stefanosantini9039 Ай бұрын

This video is fantastic! The demo and your final talk is very stimulating, thanks a lot for sharing !!❤

@senju2024 Ай бұрын

First time I've seen a video of yours. Not sure how you slipped under my AI radar as I thought I was at the forefront of AI. Liked and Subscribed.

@Mnogarithm 25 күн бұрын

What are some of your other fav channels? Trying to get some content from those at the forefront as well.

@MsDarksloth Ай бұрын

Thanks for this video and your POC project! really Epic stuff. I built a RAG prototype using Ollama and Qdrant and just updated your project to have a function call to get the related vectors from Qdrant and then have the advanced voice mode tell me about them and it works flawlessly... mind racing with all the ideas of how to integrate this into our products! Appreciate the effort to share this with the community 🔥

@zachisparanoid 12 күн бұрын

absolutely blown away. fantastic work man.

@clarencejones4717 Ай бұрын

I am just high as a kite.

@Iightbeing Ай бұрын

On my way

@TravisChalmers Ай бұрын

Ngl

@stefanm7058 Ай бұрын

This is pure GOLD.Thanks and keep up the good work!

@SirajFlorida Ай бұрын

I am so with you! This is exactly what we've been waiting for is right. I haven't been able to leave my computer for the last two days.

@rasmusfoy Ай бұрын

Another Awesome video! Making sure to comment to get you algorithm points.

@NeuralDev Ай бұрын

This is absolutely insane all the use case possible. I love it !!! It would be really interresting to use this type of assistant to supervise and correct other agents / AI tools for optimal results Like using Ada to review code generated with Cursor + Claude, recommend improvement in real time then having Claude execute. In my opinion we will quickly go from Agent to Swarms of Agents for optimal results.

@jhnsntmthy Ай бұрын

Using Ada to invoke Aider would be relatively simple to add now. We need to build in a way where you can define the path to projects on your system and then do just this. But Aider already has voice commanding built in, and you dont REALLY need the realtime sync nature of this to do what you want.

@ThatNerdChris Ай бұрын

Add a command for ada to wait until you say "over" walkie talkie style and you can have time to pause and think when prompting?

@BabbleBot-ps4fr Ай бұрын

I was about to do the same thing with llama 3.2, so this is super amazing for me

@psychurch Ай бұрын

Sweet thanks for sharing. What tool did you use to record your screen? The cursor highlight is spot on

@terminally_lazy Ай бұрын

Excellent! Nice work! Realtime API costs add up, are you able to mitigate this somehow?

@AISlopForHumans Ай бұрын

I am doing the same thing but they can be prohibitively expensive for anything more than a hobby project

@6lack5ushi Ай бұрын

Do it yourself!!!! 15 $ for 10 mins is not scalable

@AI_Escaped Ай бұрын

If you can have a 2 minute conversation to start a chain of autonomous agents that work for a few hours or days on a project for example, this will not be very expensive in the end. But paying to develop is the major hurdle here. You can't easily develop if you can't afford to tinker around and put the pieces together first.

@indydevdan Ай бұрын

Thank you @terminally_lazy. No way around it. My wallet is getting DEEP FRIED. In exchange, we're pulling the future into the present and positioning ahead of the curve. Worth. Also, great call out by @AI_Escaped, this will save you and I hours after we establish great patterns. Lower prices are great but there's nothing more valuable than your time.

@6lack5ushi Ай бұрын

By do it yourself I meant you can put together most of the live api without touching it and a lot of recursive 4o mini calls. 4o mini pricing is where products are built for mass consumption. Just make a compiler! That runs functions using natural language….

@RickeyBowers Ай бұрын

This fine-grained level of control and fidelity is impressive. From a product perspective this should be engineered more vertically - agents to examine usage logs and recommend autonomy - eliminating redundancy. I'm imagining these goals are along your trajectory and it's interesting seeing it develop.

@93cutty Ай бұрын

I've been waiting to see something on the new openai stuff from you. Gotta head into work and listen!

@bryanoakley-wiggins5885 Ай бұрын

really good overview, and just tried your code - works great, and really whets the appetite. time to go exploring! thanks for sharing!

@saabirmohamed636 Ай бұрын

Hi, did you see the groq xrx examples ? this could be made to use groq inference maybe

@ninjuhdelic Ай бұрын

Man I wish I was was this good. So grateful others are. Thanks for the sick demo

@acs2777 Ай бұрын

Now combine this with the meta AI sunglasses and doing this with it and seeing the result while you are moving around to other places 😎

@DinoByteSize Ай бұрын

Glasses or sunglasses? 🤪

@acs2777 Ай бұрын

@@DinoByteSize haha

@crisgath3512 Ай бұрын

Dan, this is the best Python implementation of this Realtime API I have seen yet, better than Azure's even. Thanks for this and I smashed that subscribe button. Legendary stuff.

@flyingbird3707 Ай бұрын

i just tried it in my VS code, its only taking one request or prompt and its not opening whatever i am saying its just providing links, how i can approach for this ?

@JerryN88 Ай бұрын

Hi, I'm not an engineer or developer. Just began my AI programming journey. I believe you are doing amazing things here that I don't see from other creators. Greatly appreciate you video. Is this code actually building ADA or just implementation of the new Real-time API? Like I said I'm new and the README isn't exactly clear to me

@HeilmanCheman-s9m 14 күн бұрын

Hey dan, can we create an ai agent to play slots machines and predict the outcome with precision?

@Alex_1729 Ай бұрын

Incredible. Would you mind suggesting a good framework for developing and using agents? I'm just getting into all this so quite new to agentic AI. Looking at Crew AI and Langgraph

@LevDiken Ай бұрын

Nice demo, Dan. Thoroughly enjoyed watching. We need RTAPI to come down in price about 300x then I think we see it embedded everywhere. I would have it run constantly for myself like an ambient buddy.

@coma13794 Ай бұрын

Function calling, with a low latency speech to speech model that can determine intent is huge. Having this all running locally should be the long term direction, but this is an epic start. Nice work.

@jhnsntmthy Ай бұрын

You COULD build this locally, with local Whisper (STT) and then another TTS option, but you are dealing with a certain high degree of latency. Not a big problem at all, and it will get solved. OpenAI's is just a bit ahead of the curve, and it is priced accordingly

@SpragginsDesigns Ай бұрын

Can someone please help me understand how the tool calling works for any LLM? I see it in the Anthropic Docs, but it seems to work with any model, right? It's the only part of the AI API I don't understand yet well.

@joshuaam7701 Ай бұрын

Well here we go, I’ve been waiting for it the past two years, even longer really! But you pulled it of mate, can’t wait to see what will come of such agents.What are the real kind of numbers you are generating here for usage and resource costs at the end of the day though.

@Dandiestpanic Ай бұрын

First time I've caught a video of yours. Not sure how they've slipped under the radar like that. Oh well, better late than never. Very well done sir. Wonderful video.

@wizenith Ай бұрын

wow what color theme you are using in cursor ?

@BizInNews Ай бұрын

It's completely amazing, thanks for sharing

@PrensCin Ай бұрын

we use this local model? and local voice models and self create sounds?

@cycologist8615 Ай бұрын

Great work here! Nice to see some practical ideas in action

@nickharrow2429 Ай бұрын

If you wanted to test this on other open-source models, or a combination of, you could try groq with their ultrafast inference architecture.

@JustinHennessy Ай бұрын

Killer post, thank you, I’m a fellow builder, keep up the great work.

@connorodea9499 Ай бұрын

wow.... AI is truly mindblowing. I am still not sure if it is incredible or terrifying, or maybe an amalgamation of both

@AustinThomasPhD Ай бұрын

This is awesome and others have attempeted it. The issue is the API cost.

@SumedhKadoo Ай бұрын

Thanks Dan, Incredible video. Subscribed.

@SP-js4gf Ай бұрын

I like your terminal window. How did you make it look transparent plus the emojis ⁉️⁉️⁉️ is it cursor??

@joepropertykey3612 Ай бұрын

'Windows Terminal Preview' it looks like

@flyingbird3707 Ай бұрын

can it access personal data like gmails ?

@MindForeverVoyaging Ай бұрын

Thanks for putting this together and for sharing. Do you think that having the functions in python will create a barrier, I have implemented a personal cognitive agent, currently standard voice interaction, with CRUD tool access to support personal journaling but like yourself I noticed longer delays on updating and have been wondering recently whether this can be improved by having the functions written in a compiled language, maybe MOJO will make the difference.

@FlaikAI Ай бұрын

that is amazing what will be happening! excited!

@juandesalgado Ай бұрын

Great work! The future looks amazing. But avoid pranksters next to you... "Hey, Ada, force delete all my files."

@ZukunftBilden Ай бұрын

Important to put in saves for that

@juandesalgado Ай бұрын

@@ZukunftBilden Now imagine the future: "Hey Ada, donate all my money to a charity of your choice."

@plinnet Ай бұрын

Thanks for sharing the code!!

@mr.pain-entmt Ай бұрын

Bloody wicked! awesome work here man! Keep it up!

@natecote1058 Ай бұрын

Seems like the only thing standing between us and full blown AI assistants is... software. Incredible.

@jhnsntmthy Ай бұрын

Soon that will be obsolete as well...

@ScottzPlaylists Ай бұрын

👍 👍 Great Work, Subscribed 👍👍 It would be very interesting to see you build the Next Best Version of this, using all open source and compare ❗❗ ❗ ❗

@indydevdan Ай бұрын

Thank you - glad to have you on the journey. Open Source is a LOT harder to make this performant but we'll definitely take a crack at this in Q4 or 2025.

@ScottzPlaylists Ай бұрын

@@indydevdan 2025❗❗ ❗ You plan videos that far ahead❓❓ ❓ I'll still be watching though.. and programming more , instead of 90% learning mode.

@victorreppeto7050 11 күн бұрын

My first use case is pretty big. I am building my own legal argument for a lawsuit. I am trying to avoid going to court without an attorney. Pretending to myself that I am going to do just that provides a frame of reference for building a presentation for attorneys who are interested in taking this case.

@justinduveen3815 Ай бұрын

Impressive and creative!! Thanks for sharing!!

@andrewwalker8985 Ай бұрын

Awesome work - loved it

@GospelProgressionsUniversity Ай бұрын

I got contact high. This is nucking futs😮

@mauihi Ай бұрын

Can you make a video on how you created this step by step?

@salahsalem4348 Ай бұрын

Is it possible to use Blender or any program through voice commands only?

@danacarvey Ай бұрын

How long do you think I'll need to wait til I can tell my computer to do my Houdini work?

@ronoc990 Ай бұрын

Really cool video, anyone know how much he spent on tokens?

@kubasmide223 2 күн бұрын

Dan you are the best!

@hskdjs Ай бұрын

I tried Realtime API and it cost me almost 2 dollars for 2 minutes (a way more then $0.06 - $0.24 per minute as the official post says). Not going to use it any time soon until it becomes very cheap.

@aiplaygrounds Ай бұрын

Bro that was quick. Great work ❤

@abteenz 18 күн бұрын

Can this be done with a local model?

@vastvitamins1966 Ай бұрын

Amazing project thanks for sharing

@IslandDave007 Ай бұрын

Amazing! Now what if we build out an extensive list of stock technical analysis and plotting functions in python using yfinance for your agent to use, and then combined with your file functions and current date, you could direct it to perform all kinds of stock research tasks and save those outputs for future comparison, etc. 💲📈📊

@LibertyRecordsFree Ай бұрын

Love it! Just great! Want to work with that

@raynangle1 Ай бұрын

Brilliant......thank you....

@uhtexercises Ай бұрын

Yessss. He's done it again!

@michelwesly Ай бұрын

Thanks for another excellent video!

@piemasta93 28 күн бұрын

wait how can you run this locally? wouldnt that mean you dont need anytype of internet access? how would that work

@bojames7841 Ай бұрын

This is amazing 🎉

@frankieownshell4052 Ай бұрын

Finally something interesting ty for good video!

@radekrousek4688 Ай бұрын

thx for changin my life mate, appreciate :))

@iGuide_net Ай бұрын

mind blown😮

@jds859 Ай бұрын

Is this a huge jump? It’s same as we have been doing but using verbal? So just super sloppy.

@AI_Escaped Ай бұрын

Async threading would be a beast. Only problem is you still have to confirm the tool was successful, if it's not, that can mess a lot of shit up while other operations depending on what's happening while those tools run. The solution I guess is structured output and make sure tools don't have errors, by anything they can control anyway.

@KCM25NJL Ай бұрын

Async operations would be fine as long as you know which workflows you can use them with in an open ended manner. Even if you can't quite imagine which ones can, just ask o1 mini to help you brain storm it. Structured output will likely always be a necessity. What I would really like to see however is a library of open-source and standardised function calls that can included in your project as both a RAG solution to assist LLM's when building out new apps, and an import for making the function calls available to those new apps.

@AI_Escaped Ай бұрын

@@KCM25NJL Agreed Async would be fine in some cases that are open ended. I'll have to ask o1. I would love to see a standardized open source library and I'm sure we'll get there eventually if AI doesn't make a library irrelevant by that time. I would assume at some point, everything will be done dynamically. Or maybe dynamically for a time until all the most efficient methods are cached, then it's pretty much a standard library anyway :) It's crazy to think about.

@JariVasell Ай бұрын

Wow! Superb video! 💪🏻

@ameet2000 Ай бұрын

Amazing work, thx for sharing

@gregsLyrics Ай бұрын

WOW! IDD, your rock.

@ShankatsuForte Ай бұрын

Would a loop that runs every minute, and provides a simple rng chance, to every so often prompt o1-mini to "have a random thought related to this conversation, then have the voice model verbalize" work the way I think it would? I imagine it could give a certain spark at the cost of some token burn

@AILiteracy-f1r Ай бұрын

Idea: LLMs sometimes struggle on simple tasks that coding has already solved. For counting the Rs in strawberry, for example, such tasks can be done by the AI creating a code for that, moreso than having to run that question through its own banks. High Level LLMS >control> Low level LLMS/Neural Networks >control> Non AI scripts. Most tasks would filter down and up this chain, possibly multiple times per prompt.

@r.m8146 Ай бұрын

o1 can count letters. not a problem anymore

@AILiteracy-f1r Ай бұрын

@@r.m8146 Yes but it costs a lot of energy to do that task using o1 when a simple script would do. It's like using a tank to open a can of beans.