Dan, I'm a 47 year old software engineer working at Microsoft. I often think about all of the next gen devs coming up in a world where not having access to sick Gen AI tools like this is unheard of. I find it amusing when muscle memory has me white knuckling it in a search engine to get answers when the default approach should have been an assistant or an LLM. Anywayz, this is just a big thanks for putting out solid content that has inspired an old dog to keep focusing on new tricks. Don't change Brother!!
@radekrousek4688Ай бұрын
thats the same like at school when they dont want you to use the calculator for too simple tasks
@AllenHorn0507Ай бұрын
Oh my God, I can’t believe this. I am a blind professional, and this would help me so much.!
@requestfx5585Ай бұрын
Bro is so professional, he could write a comment without having to see. This technology is awesome, but it's just a combination what's already has been possible, but put together and professionally because of openai. Combining voice models, with powerful llms and function calling, nothing is new here, the only new thing is that it has been done so well and fast
@AllenHorn0507Ай бұрын
@@requestfx5585 do you think that’s funny? Why would you make such a shitty comment?
@AllenHorn0507Ай бұрын
@@requestfx5585 I realize that I only represent 3% of the world’s population, but I am still a person with feelings.
@AllenHorn0507Ай бұрын
@@requestfx5585 why would you say something so nasty do you think it’s funny that I’m blind?
@AllenHorn0507Ай бұрын
@@requestfx5585 I’m not sure why my comments keep getting deleted, but you’re fucking comment was disgusting and not funny asshole
@user-pt1kj5uw3bАй бұрын
Its cool seeing someone who really gets where these things are going and also what you can do with them right now. I genuinely think there will be people within the next 5 years who will have super intelligent AI directly accessible in their brain, at least if everything goes right. Which I feel insane for typing but it truly doesn't seem impossible. Thanks for releasing this too.
@mrd6869Ай бұрын
I'm doing something similar but building AI powered cybersecurity applications instead. And you're right,this whole thing is taking off. We laughed at the Star Trek scene where Scotty,the engineer tried to speak to the computer to build software.They thought he was crazy....he wasn't crazy💯
@faturismeeАй бұрын
hi! im interested in cybersecurity with ai, can you explain more about your project? :)
@mrd6869Ай бұрын
@@faturismee .Red Team exercises. AI systems being weaponized for cyberwarfare. This will be a thing VERY soon and it's not gonna be pretty lmao
@d.s.ramirez6178Ай бұрын
Just wanted to say I'm so impressed by your efforts here. I'm reluctant to chime in with a comment because I'm not a coder. I'm an artist, but my second identity is as a nerd. I'm completely devoted to science and people of the highest intellect producing the innovations of the future. I just never learned how to code. As an art project, I'm trying to create the foundational characteristics for an AI which will have a compassionate, ethical personality. The concept is much deeper than this and I've been developing it for 12 years on the theoretical level. Seeing a video like this makes me wish I was surrounded by Silicon Valley people in the hopes that I would find one talented person who could help me bridge the gap of the technology. But I'm holding out hope that it could still happen if I pursue it. Anyway, I just wanted to say that I can tell that this is the leading edge of the AI landscape. This makes it worth me sifting through all the cheesy hype videos in order to find it. 💯
@kennethbealАй бұрын
I have a friend you might want to collaborate with. I'll send him your comment.
@jimsandwick2372Ай бұрын
Thank you. I totally agree with you. I felt after Whisper came along the Speech-to-Text and now Speech-to-Speech combined with these new tools, future models and reduced costs will be such a game changer. I am still surprised it's rarely mentioned and how it will super-charge productivity and fundamentally change the way we code and interact with our devices. I think once it's combined with all of your data (and external data of course) in an intelligent way the creative process will be mind blowing compared to how we work now. Keep the videos up. Thanks
@gabrielketzer7084Ай бұрын
It’s truly great that you release the code for this for the masses
@JazevoAudiosurfАй бұрын
issue is currently that you have to program the calls etc. - if it would just write the code for it on the fly, it would be much more automated, this is generally the issue - we don't want to hardcode anything in the future, it should know what to do. also, as well as it works, it's still probabilistic - we need some sort of classification model that checks the answers/outputs for correctness that is much less probabilistic
@georgestander2682Ай бұрын
given the users request it should create the functions, tools and even ui... thats what Karpathy is on about right now.
@lechugathedoood3595Ай бұрын
I’m new to coding, but this gave me such a great idea of how I can combine this with my ceramics hobby. Before this video ai was a hot/cold thing for me, but now I see how I can do something cool with it! Thanks
@eintyp4389Ай бұрын
Whats your experience with a more open ended toolbox for the agent? Like having a database in supabase with functions and Agent Workflows that can be semantically searched and used. This way you dont have to provide a long list of available tools to the agent and adding new tools or workflows or even letting agents create test and then add them for reuse wouldh be easier. Like what they have done with that Minecraft AGent Voyager was its name? Or does this fail and if so were and why?
@indydevdanАй бұрын
This is a great pattern BUT I've been steering clear of investing time in tool selection systems because I view it as directly competing with OpenAI's advancements. In order to build useful AI Agents (which is a target of OpenAI. I think we'll see this a lot more in 2025) you need reliable tool calling, at scale. We saw them take a great stab at this with structured outputs + tool calling. So although I like your approach I'm holding off on systematizing tool selection until it's clear that OpenAI, Anthropic, Google (big 3) won't invest here.
@User-actSpacingАй бұрын
Python is slow AF and still this demo worked extremely well! What a time to be alive!!
@mrpocockАй бұрын
I think python or js is fine for this sort of thing where you're mainly plumbing services and compiled libraries. Particularly now that they have async built in. But I am tempted to rewrite the agent router and configuration in rust ;)
@Billy4321ableАй бұрын
The vast majority of the processing is being done in the cloud. He's just piping in to a bunch of different APIs. You could probably run the whole thing on a calculator.
@SamSargent-kh7glАй бұрын
it's all blocking IO so makes no difference
@mrpocockАй бұрын
@@SamSargent-kh7gl it makes no difference until it does. I had one of the llms cpu bound because it was preprocessing and tokenising the text on the cpu in python. If the python or js is only routing, then it is fine. As soon as it steps into a hot loop, it matters a lot.
@BizInNewsАй бұрын
After listening you I started rethinking completely the system I am developing🎉
@jonatasscdcАй бұрын
I think I know the answer but did you make this to be opensource? If yes, where is it? Another question, does this work with openrouter API key? Also... I can't believe you have only 20k subs, your channel is so great that I swear, I wait the whole week for your contents, and when they arrive, a beautiful sensation of joy kicks in. Thanks for everything, bro! Huge fan here! Waiting to spend my monthly wage on your courses!
@indydevdanАй бұрын
Link in description and thank you 🙏. This is built on OpenAI tech so no openrouter access afaik. The engineers/builders that need to know about this channel will find it. AI Coding Course in progress. I'm working hard to make sure it earns you everything you spend on it back and more.
@goforit5Ай бұрын
Excellent video as usual. What other developers like you are out there? I’m learning a lot as a new developer from your projects. Thanks
@Vitruvian2086Ай бұрын
I love the breakdown and inner workings (under the hood) great job, very educational
@fackler23 күн бұрын
Great video Dan. It's crazy how we went from "never let it get on the internet" to "go ahead and write to the file system". Everybody about to have huge bite outta the tree of the knowledge of good and evil. Here's to hoping it's delicious!
@eddited7543Ай бұрын
Mate, I just wanted to start building EXACTLY this! As if you've read my mind^^ Thank you so much!
@DevPythonUnityАй бұрын
i have better and way way cheaper soluion
@braysherАй бұрын
As someone who’s dyslexic I’m soooooo excited by this. Looking forward to your next videos :)
@gr8tbigtreehuggerАй бұрын
Really love your example and passion!! Amazing stuff! I have been building my own real time speech-to-speech system, all the STT and TTS is local, works really well. And, free!
@indydevdanАй бұрын
Care to share your stack? This is on my project hit list.
@GiomPanotАй бұрын
Excellent work, a year ago I was able to create coaches with voice it was a bit slow, but now with the ability to do tasks it is super. Got a couple of ideas with that. What you do is really inspiring thank you. If you could share a simple tuto with your code to play that would be awesome. (for dummies). I am not a dev but can do some python and run it locally. :)
@ivanvalentini9349Ай бұрын
This is the future. Really Nice Projects. BTW I really love your VS Code color schema, very relaxing. Does anyone know what it's called?
@stefanosantini9039Ай бұрын
This video is fantastic! The demo and your final talk is very stimulating, thanks a lot for sharing !!❤
@senju2024Ай бұрын
First time I've seen a video of yours. Not sure how you slipped under my AI radar as I thought I was at the forefront of AI. Liked and Subscribed.
@Mnogarithm25 күн бұрын
What are some of your other fav channels? Trying to get some content from those at the forefront as well.
@MsDarkslothАй бұрын
Thanks for this video and your POC project! really Epic stuff. I built a RAG prototype using Ollama and Qdrant and just updated your project to have a function call to get the related vectors from Qdrant and then have the advanced voice mode tell me about them and it works flawlessly... mind racing with all the ideas of how to integrate this into our products! Appreciate the effort to share this with the community 🔥
@zachisparanoid12 күн бұрын
absolutely blown away. fantastic work man.
@clarencejones4717Ай бұрын
I am just high as a kite.
@IightbeingАй бұрын
On my way
@TravisChalmersАй бұрын
Ngl
@stefanm7058Ай бұрын
This is pure GOLD.Thanks and keep up the good work!
@SirajFloridaАй бұрын
I am so with you! This is exactly what we've been waiting for is right. I haven't been able to leave my computer for the last two days.
@rasmusfoyАй бұрын
Another Awesome video! Making sure to comment to get you algorithm points.
@NeuralDevАй бұрын
This is absolutely insane all the use case possible. I love it !!! It would be really interresting to use this type of assistant to supervise and correct other agents / AI tools for optimal results Like using Ada to review code generated with Cursor + Claude, recommend improvement in real time then having Claude execute. In my opinion we will quickly go from Agent to Swarms of Agents for optimal results.
@jhnsntmthyАй бұрын
Using Ada to invoke Aider would be relatively simple to add now. We need to build in a way where you can define the path to projects on your system and then do just this. But Aider already has voice commanding built in, and you dont REALLY need the realtime sync nature of this to do what you want.
@ThatNerdChrisАй бұрын
Add a command for ada to wait until you say "over" walkie talkie style and you can have time to pause and think when prompting?
@BabbleBot-ps4frАй бұрын
I was about to do the same thing with llama 3.2, so this is super amazing for me
@psychurchАй бұрын
Sweet thanks for sharing. What tool did you use to record your screen? The cursor highlight is spot on
@terminally_lazyАй бұрын
Excellent! Nice work! Realtime API costs add up, are you able to mitigate this somehow?
@AISlopForHumansАй бұрын
I am doing the same thing but they can be prohibitively expensive for anything more than a hobby project
@6lack5ushiАй бұрын
Do it yourself!!!! 15 $ for 10 mins is not scalable
@AI_EscapedАй бұрын
If you can have a 2 minute conversation to start a chain of autonomous agents that work for a few hours or days on a project for example, this will not be very expensive in the end. But paying to develop is the major hurdle here. You can't easily develop if you can't afford to tinker around and put the pieces together first.
@indydevdanАй бұрын
Thank you @terminally_lazy. No way around it. My wallet is getting DEEP FRIED. In exchange, we're pulling the future into the present and positioning ahead of the curve. Worth. Also, great call out by @AI_Escaped, this will save you and I hours after we establish great patterns. Lower prices are great but there's nothing more valuable than your time.
@6lack5ushiАй бұрын
By do it yourself I meant you can put together most of the live api without touching it and a lot of recursive 4o mini calls. 4o mini pricing is where products are built for mass consumption. Just make a compiler! That runs functions using natural language….
@RickeyBowersАй бұрын
This fine-grained level of control and fidelity is impressive. From a product perspective this should be engineered more vertically - agents to examine usage logs and recommend autonomy - eliminating redundancy. I'm imagining these goals are along your trajectory and it's interesting seeing it develop.
@93cuttyАй бұрын
I've been waiting to see something on the new openai stuff from you. Gotta head into work and listen!
@bryanoakley-wiggins5885Ай бұрын
really good overview, and just tried your code - works great, and really whets the appetite. time to go exploring! thanks for sharing!
@saabirmohamed636Ай бұрын
Hi, did you see the groq xrx examples ? this could be made to use groq inference maybe
@ninjuhdelicАй бұрын
Man I wish I was was this good. So grateful others are. Thanks for the sick demo
@acs2777Ай бұрын
Now combine this with the meta AI sunglasses and doing this with it and seeing the result while you are moving around to other places 😎
@DinoByteSizeАй бұрын
Glasses or sunglasses? 🤪
@acs2777Ай бұрын
@@DinoByteSize haha
@crisgath3512Ай бұрын
Dan, this is the best Python implementation of this Realtime API I have seen yet, better than Azure's even. Thanks for this and I smashed that subscribe button. Legendary stuff.
@flyingbird3707Ай бұрын
i just tried it in my VS code, its only taking one request or prompt and its not opening whatever i am saying its just providing links, how i can approach for this ?
@JerryN88Ай бұрын
Hi, I'm not an engineer or developer. Just began my AI programming journey. I believe you are doing amazing things here that I don't see from other creators. Greatly appreciate you video. Is this code actually building ADA or just implementation of the new Real-time API? Like I said I'm new and the README isn't exactly clear to me
@HeilmanCheman-s9m14 күн бұрын
Hey dan, can we create an ai agent to play slots machines and predict the outcome with precision?
@Alex_1729Ай бұрын
Incredible. Would you mind suggesting a good framework for developing and using agents? I'm just getting into all this so quite new to agentic AI. Looking at Crew AI and Langgraph
@LevDikenАй бұрын
Nice demo, Dan. Thoroughly enjoyed watching. We need RTAPI to come down in price about 300x then I think we see it embedded everywhere. I would have it run constantly for myself like an ambient buddy.
@coma13794Ай бұрын
Function calling, with a low latency speech to speech model that can determine intent is huge. Having this all running locally should be the long term direction, but this is an epic start. Nice work.
@jhnsntmthyАй бұрын
You COULD build this locally, with local Whisper (STT) and then another TTS option, but you are dealing with a certain high degree of latency. Not a big problem at all, and it will get solved. OpenAI's is just a bit ahead of the curve, and it is priced accordingly
@SpragginsDesignsАй бұрын
Can someone please help me understand how the tool calling works for any LLM? I see it in the Anthropic Docs, but it seems to work with any model, right? It's the only part of the AI API I don't understand yet well.
@joshuaam7701Ай бұрын
Well here we go, I’ve been waiting for it the past two years, even longer really! But you pulled it of mate, can’t wait to see what will come of such agents.What are the real kind of numbers you are generating here for usage and resource costs at the end of the day though.
@DandiestpanicАй бұрын
First time I've caught a video of yours. Not sure how they've slipped under the radar like that. Oh well, better late than never. Very well done sir. Wonderful video.
@wizenithАй бұрын
wow what color theme you are using in cursor ?
@BizInNewsАй бұрын
It's completely amazing, thanks for sharing
@PrensCinАй бұрын
we use this local model? and local voice models and self create sounds?
@cycologist8615Ай бұрын
Great work here! Nice to see some practical ideas in action
@nickharrow2429Ай бұрын
If you wanted to test this on other open-source models, or a combination of, you could try groq with their ultrafast inference architecture.
@JustinHennessyАй бұрын
Killer post, thank you, I’m a fellow builder, keep up the great work.
@connorodea9499Ай бұрын
wow.... AI is truly mindblowing. I am still not sure if it is incredible or terrifying, or maybe an amalgamation of both
@AustinThomasPhDАй бұрын
This is awesome and others have attempeted it. The issue is the API cost.
@SumedhKadooАй бұрын
Thanks Dan, Incredible video. Subscribed.
@SP-js4gfАй бұрын
I like your terminal window. How did you make it look transparent plus the emojis ⁉️⁉️⁉️ is it cursor??
@joepropertykey3612Ай бұрын
'Windows Terminal Preview' it looks like
@flyingbird3707Ай бұрын
can it access personal data like gmails ?
@MindForeverVoyagingАй бұрын
Thanks for putting this together and for sharing. Do you think that having the functions in python will create a barrier, I have implemented a personal cognitive agent, currently standard voice interaction, with CRUD tool access to support personal journaling but like yourself I noticed longer delays on updating and have been wondering recently whether this can be improved by having the functions written in a compiled language, maybe MOJO will make the difference.
@FlaikAIАй бұрын
that is amazing what will be happening! excited!
@juandesalgadoАй бұрын
Great work! The future looks amazing. But avoid pranksters next to you... "Hey, Ada, force delete all my files."
@ZukunftBildenАй бұрын
Important to put in saves for that
@juandesalgadoАй бұрын
@@ZukunftBilden Now imagine the future: "Hey Ada, donate all my money to a charity of your choice."
@plinnetАй бұрын
Thanks for sharing the code!!
@mr.pain-entmtАй бұрын
Bloody wicked! awesome work here man! Keep it up!
@natecote1058Ай бұрын
Seems like the only thing standing between us and full blown AI assistants is... software. Incredible.
@jhnsntmthyАй бұрын
Soon that will be obsolete as well...
@ScottzPlaylistsАй бұрын
👍 👍 Great Work, Subscribed 👍👍 It would be very interesting to see you build the Next Best Version of this, using all open source and compare ❗❗ ❗ ❗
@indydevdanАй бұрын
Thank you - glad to have you on the journey. Open Source is a LOT harder to make this performant but we'll definitely take a crack at this in Q4 or 2025.
@ScottzPlaylistsАй бұрын
@@indydevdan 2025❗❗ ❗ You plan videos that far ahead❓❓ ❓ I'll still be watching though.. and programming more , instead of 90% learning mode.
@victorreppeto705011 күн бұрын
My first use case is pretty big. I am building my own legal argument for a lawsuit. I am trying to avoid going to court without an attorney. Pretending to myself that I am going to do just that provides a frame of reference for building a presentation for attorneys who are interested in taking this case.
@justinduveen3815Ай бұрын
Impressive and creative!! Thanks for sharing!!
@andrewwalker8985Ай бұрын
Awesome work - loved it
@GospelProgressionsUniversityАй бұрын
I got contact high. This is nucking futs😮
@mauihiАй бұрын
Can you make a video on how you created this step by step?
@salahsalem4348Ай бұрын
Is it possible to use Blender or any program through voice commands only?
@danacarveyАй бұрын
How long do you think I'll need to wait til I can tell my computer to do my Houdini work?
@ronoc990Ай бұрын
Really cool video, anyone know how much he spent on tokens?
@kubasmide2232 күн бұрын
Dan you are the best!
@hskdjsАй бұрын
I tried Realtime API and it cost me almost 2 dollars for 2 minutes (a way more then $0.06 - $0.24 per minute as the official post says). Not going to use it any time soon until it becomes very cheap.
@aiplaygroundsАй бұрын
Bro that was quick. Great work ❤
@abteenz18 күн бұрын
Can this be done with a local model?
@vastvitamins1966Ай бұрын
Amazing project thanks for sharing
@IslandDave007Ай бұрын
Amazing! Now what if we build out an extensive list of stock technical analysis and plotting functions in python using yfinance for your agent to use, and then combined with your file functions and current date, you could direct it to perform all kinds of stock research tasks and save those outputs for future comparison, etc. 💲📈📊
@LibertyRecordsFreeАй бұрын
Love it! Just great! Want to work with that
@raynangle1Ай бұрын
Brilliant......thank you....
@uhtexercisesАй бұрын
Yessss. He's done it again!
@michelweslyАй бұрын
Thanks for another excellent video!
@piemasta9328 күн бұрын
wait how can you run this locally? wouldnt that mean you dont need anytype of internet access? how would that work
@bojames7841Ай бұрын
This is amazing 🎉
@frankieownshell4052Ай бұрын
Finally something interesting ty for good video!
@radekrousek4688Ай бұрын
thx for changin my life mate, appreciate :))
@iGuide_netАй бұрын
mind blown😮
@jds859Ай бұрын
Is this a huge jump? It’s same as we have been doing but using verbal? So just super sloppy.
@AI_EscapedАй бұрын
Async threading would be a beast. Only problem is you still have to confirm the tool was successful, if it's not, that can mess a lot of shit up while other operations depending on what's happening while those tools run. The solution I guess is structured output and make sure tools don't have errors, by anything they can control anyway.
@KCM25NJLАй бұрын
Async operations would be fine as long as you know which workflows you can use them with in an open ended manner. Even if you can't quite imagine which ones can, just ask o1 mini to help you brain storm it. Structured output will likely always be a necessity. What I would really like to see however is a library of open-source and standardised function calls that can included in your project as both a RAG solution to assist LLM's when building out new apps, and an import for making the function calls available to those new apps.
@AI_EscapedАй бұрын
@@KCM25NJL Agreed Async would be fine in some cases that are open ended. I'll have to ask o1. I would love to see a standardized open source library and I'm sure we'll get there eventually if AI doesn't make a library irrelevant by that time. I would assume at some point, everything will be done dynamically. Or maybe dynamically for a time until all the most efficient methods are cached, then it's pretty much a standard library anyway :) It's crazy to think about.
@JariVasellАй бұрын
Wow! Superb video! 💪🏻
@ameet2000Ай бұрын
Amazing work, thx for sharing
@gregsLyricsАй бұрын
WOW! IDD, your rock.
@ShankatsuForteАй бұрын
Would a loop that runs every minute, and provides a simple rng chance, to every so often prompt o1-mini to "have a random thought related to this conversation, then have the voice model verbalize" work the way I think it would? I imagine it could give a certain spark at the cost of some token burn
@AILiteracy-f1rАй бұрын
Idea: LLMs sometimes struggle on simple tasks that coding has already solved. For counting the Rs in strawberry, for example, such tasks can be done by the AI creating a code for that, moreso than having to run that question through its own banks. High Level LLMS >control> Low level LLMS/Neural Networks >control> Non AI scripts. Most tasks would filter down and up this chain, possibly multiple times per prompt.
@r.m8146Ай бұрын
o1 can count letters. not a problem anymore
@AILiteracy-f1rАй бұрын
@@r.m8146 Yes but it costs a lot of energy to do that task using o1 when a simple script would do. It's like using a tank to open a can of beans.
@KonfliktProduktionzАй бұрын
How do we build this API ourselves????!!!
@muazashraf409Ай бұрын
I try this code and told assistant to open my chatopenai in my browser and it is not opening that. I did this 5,6 times dude.