Dan, I'm a 47 year old software engineer working at Microsoft. I often think about all of the next gen devs coming up in a world where not having access to sick Gen AI tools like this is unheard of. I find it amusing when muscle memory has me white knuckling it in a search engine to get answers when the default approach should have been an assistant or an LLM. Anywayz, this is just a big thanks for putting out solid content that has inspired an old dog to keep focusing on new tricks. Don't change Brother!!
@radekrousek46882 ай бұрын
thats the same like at school when they dont want you to use the calculator for too simple tasks
@AllenHorn05072 ай бұрын
Oh my God, I can’t believe this. I am a blind professional, and this would help me so much.!
@requestfx55852 ай бұрын
Bro is so professional, he could write a comment without having to see. This technology is awesome, but it's just a combination what's already has been possible, but put together and professionally because of openai. Combining voice models, with powerful llms and function calling, nothing is new here, the only new thing is that it has been done so well and fast
@AllenHorn05072 ай бұрын
@@requestfx5585 do you think that’s funny? Why would you make such a shitty comment?
@AllenHorn05072 ай бұрын
@@requestfx5585 I realize that I only represent 3% of the world’s population, but I am still a person with feelings.
@AllenHorn05072 ай бұрын
@@requestfx5585 why would you say something so nasty do you think it’s funny that I’m blind?
@AllenHorn05072 ай бұрын
@@requestfx5585 I’m not sure why my comments keep getting deleted, but you’re fucking comment was disgusting and not funny asshole
@user-pt1kj5uw3b2 ай бұрын
Its cool seeing someone who really gets where these things are going and also what you can do with them right now. I genuinely think there will be people within the next 5 years who will have super intelligent AI directly accessible in their brain, at least if everything goes right. Which I feel insane for typing but it truly doesn't seem impossible. Thanks for releasing this too.
@d.s.ramirez61782 ай бұрын
Just wanted to say I'm so impressed by your efforts here. I'm reluctant to chime in with a comment because I'm not a coder. I'm an artist, but my second identity is as a nerd. I'm completely devoted to science and people of the highest intellect producing the innovations of the future. I just never learned how to code. As an art project, I'm trying to create the foundational characteristics for an AI which will have a compassionate, ethical personality. The concept is much deeper than this and I've been developing it for 12 years on the theoretical level. Seeing a video like this makes me wish I was surrounded by Silicon Valley people in the hopes that I would find one talented person who could help me bridge the gap of the technology. But I'm holding out hope that it could still happen if I pursue it. Anyway, I just wanted to say that I can tell that this is the leading edge of the AI landscape. This makes it worth me sifting through all the cheesy hype videos in order to find it. 💯
@kennethbeal2 ай бұрын
I have a friend you might want to collaborate with. I'll send him your comment.
@mrd68692 ай бұрын
I'm doing something similar but building AI powered cybersecurity applications instead. And you're right,this whole thing is taking off. We laughed at the Star Trek scene where Scotty,the engineer tried to speak to the computer to build software.They thought he was crazy....he wasn't crazy💯
@faturismee2 ай бұрын
hi! im interested in cybersecurity with ai, can you explain more about your project? :)
@mrd68692 ай бұрын
@@faturismee .Red Team exercises. AI systems being weaponized for cyberwarfare. This will be a thing VERY soon and it's not gonna be pretty lmao
@jimsandwick23722 ай бұрын
Thank you. I totally agree with you. I felt after Whisper came along the Speech-to-Text and now Speech-to-Speech combined with these new tools, future models and reduced costs will be such a game changer. I am still surprised it's rarely mentioned and how it will super-charge productivity and fundamentally change the way we code and interact with our devices. I think once it's combined with all of your data (and external data of course) in an intelligent way the creative process will be mind blowing compared to how we work now. Keep the videos up. Thanks
@gabrielketzer70842 ай бұрын
It’s truly great that you release the code for this for the masses
@rasmusfoy2 ай бұрын
Another Awesome video! Making sure to comment to get you algorithm points.
@zachisparanoidАй бұрын
absolutely blown away. fantastic work man.
@braysher2 ай бұрын
As someone who’s dyslexic I’m soooooo excited by this. Looking forward to your next videos :)
@facklerАй бұрын
Great video Dan. It's crazy how we went from "never let it get on the internet" to "go ahead and write to the file system". Everybody about to have huge bite outta the tree of the knowledge of good and evil. Here's to hoping it's delicious!
@lechugathedoood35952 ай бұрын
I’m new to coding, but this gave me such a great idea of how I can combine this with my ceramics hobby. Before this video ai was a hot/cold thing for me, but now I see how I can do something cool with it! Thanks
@senju20242 ай бұрын
First time I've seen a video of yours. Not sure how you slipped under my AI radar as I thought I was at the forefront of AI. Liked and Subscribed.
@MnogarithmАй бұрын
What are some of your other fav channels? Trying to get some content from those at the forefront as well.
@User-actSpacing2 ай бұрын
Python is slow AF and still this demo worked extremely well! What a time to be alive!!
@mrpocock2 ай бұрын
I think python or js is fine for this sort of thing where you're mainly plumbing services and compiled libraries. Particularly now that they have async built in. But I am tempted to rewrite the agent router and configuration in rust ;)
@Billy4321able2 ай бұрын
The vast majority of the processing is being done in the cloud. He's just piping in to a bunch of different APIs. You could probably run the whole thing on a calculator.
@SamSargent-kh7gl2 ай бұрын
it's all blocking IO so makes no difference
@mrpocock2 ай бұрын
@@SamSargent-kh7gl it makes no difference until it does. I had one of the llms cpu bound because it was preprocessing and tokenising the text on the cpu in python. If the python or js is only routing, then it is fine. As soon as it steps into a hot loop, it matters a lot.
@stefanm70582 ай бұрын
This is pure GOLD.Thanks and keep up the good work!
@Vitruvian20862 ай бұрын
I love the breakdown and inner workings (under the hood) great job, very educational
@eddited75432 ай бұрын
Mate, I just wanted to start building EXACTLY this! As if you've read my mind^^ Thank you so much!
@DevPythonUnity2 ай бұрын
i have better and way way cheaper soluion
@BizInNews2 ай бұрын
After listening you I started rethinking completely the system I am developing🎉
@93cutty2 ай бұрын
I've been waiting to see something on the new openai stuff from you. Gotta head into work and listen!
@stefanosantini90392 ай бұрын
This video is fantastic! The demo and your final talk is very stimulating, thanks a lot for sharing !!❤
@andrewwalker89852 ай бұрын
Awesome work - loved it
@ThatNerdChris2 ай бұрын
Add a command for ada to wait until you say "over" walkie talkie style and you can have time to pause and think when prompting?
@SumedhKadoo2 ай бұрын
Thanks Dan, Incredible video. Subscribed.
@MsDarksloth2 ай бұрын
Thanks for this video and your POC project! really Epic stuff. I built a RAG prototype using Ollama and Qdrant and just updated your project to have a function call to get the related vectors from Qdrant and then have the advanced voice mode tell me about them and it works flawlessly... mind racing with all the ideas of how to integrate this into our products! Appreciate the effort to share this with the community 🔥
@JariVasell2 ай бұрын
Wow! Superb video! 💪🏻
@ninjuhdelic2 ай бұрын
Man I wish I was was this good. So grateful others are. Thanks for the sick demo
@FlaikAI2 ай бұрын
that is amazing what will be happening! excited!
@crisgath35122 ай бұрын
Dan, this is the best Python implementation of this Realtime API I have seen yet, better than Azure's even. Thanks for this and I smashed that subscribe button. Legendary stuff.
@eintyp43892 ай бұрын
Whats your experience with a more open ended toolbox for the agent? Like having a database in supabase with functions and Agent Workflows that can be semantically searched and used. This way you dont have to provide a long list of available tools to the agent and adding new tools or workflows or even letting agents create test and then add them for reuse wouldh be easier. Like what they have done with that Minecraft AGent Voyager was its name? Or does this fail and if so were and why?
@indydevdan2 ай бұрын
This is a great pattern BUT I've been steering clear of investing time in tool selection systems because I view it as directly competing with OpenAI's advancements. In order to build useful AI Agents (which is a target of OpenAI. I think we'll see this a lot more in 2025) you need reliable tool calling, at scale. We saw them take a great stab at this with structured outputs + tool calling. So although I like your approach I'm holding off on systematizing tool selection until it's clear that OpenAI, Anthropic, Google (big 3) won't invest here.
@Dandiestpanic2 ай бұрын
First time I've caught a video of yours. Not sure how they've slipped under the radar like that. Oh well, better late than never. Very well done sir. Wonderful video.
@BizInNews2 ай бұрын
It's completely amazing, thanks for sharing
@JazevoAudiosurf2 ай бұрын
issue is currently that you have to program the calls etc. - if it would just write the code for it on the fly, it would be much more automated, this is generally the issue - we don't want to hardcode anything in the future, it should know what to do. also, as well as it works, it's still probabilistic - we need some sort of classification model that checks the answers/outputs for correctness that is much less probabilistic
@georgestander26822 ай бұрын
given the users request it should create the functions, tools and even ui... thats what Karpathy is on about right now.
@cycologist86152 ай бұрын
Great work here! Nice to see some practical ideas in action
@mr.pain-entmt2 ай бұрын
Bloody wicked! awesome work here man! Keep it up!
@sriramkota8 күн бұрын
Dammmm!!! Subscribed my man
@justinduveen38152 ай бұрын
Impressive and creative!! Thanks for sharing!!
@SirajFlorida2 ай бұрын
I am so with you! This is exactly what we've been waiting for is right. I haven't been able to leave my computer for the last two days.
@michelwesly2 ай бұрын
Thanks for another excellent video!
@aiplaygrounds2 ай бұрын
Bro that was quick. Great work ❤
@bojames78412 ай бұрын
This is amazing 🎉
@clarencejones47172 ай бұрын
I am just high as a kite.
@Iightbeing2 ай бұрын
On my way
@TravisChalmers2 ай бұрын
Ngl
@radekrousek46882 ай бұрын
thx for changin my life mate, appreciate :))
@kubasmide223Ай бұрын
Dan you are the best!
@uhtexercises2 ай бұрын
Yessss. He's done it again!
@raynangle12 ай бұрын
Brilliant......thank you....
@plinnet2 ай бұрын
Thanks for sharing the code!!
@iGuide_net2 ай бұрын
mind blown😮
@gr8tbigtreehugger2 ай бұрын
Really love your example and passion!! Amazing stuff! I have been building my own real time speech-to-speech system, all the STT and TTS is local, works really well. And, free!
@indydevdan2 ай бұрын
Care to share your stack? This is on my project hit list.
@bryanoakley-wiggins58852 ай бұрын
really good overview, and just tried your code - works great, and really whets the appetite. time to go exploring! thanks for sharing!
@acs27772 ай бұрын
Now combine this with the meta AI sunglasses and doing this with it and seeing the result while you are moving around to other places 😎
@DinoByteSize2 ай бұрын
Glasses or sunglasses? 🤪
@acs27772 ай бұрын
@@DinoByteSize haha
@NeuralDev2 ай бұрын
This is absolutely insane all the use case possible. I love it !!! It would be really interresting to use this type of assistant to supervise and correct other agents / AI tools for optimal results Like using Ada to review code generated with Cursor + Claude, recommend improvement in real time then having Claude execute. In my opinion we will quickly go from Agent to Swarms of Agents for optimal results.
@jhnsntmthy2 ай бұрын
Using Ada to invoke Aider would be relatively simple to add now. We need to build in a way where you can define the path to projects on your system and then do just this. But Aider already has voice commanding built in, and you dont REALLY need the realtime sync nature of this to do what you want.
@JustinHennessy2 ай бұрын
Killer post, thank you, I’m a fellow builder, keep up the great work.
@vastvitamins19662 ай бұрын
Amazing project thanks for sharing
@frankieownshell40522 ай бұрын
Finally something interesting ty for good video!
@ameet20002 ай бұрын
Amazing work, thx for sharing
@gregsLyrics2 ай бұрын
WOW! IDD, your rock.
@LibertyRecordsFree2 ай бұрын
Love it! Just great! Want to work with that
@fieldcommandermarshall2 ай бұрын
awesome work man
@coma137942 ай бұрын
Function calling, with a low latency speech to speech model that can determine intent is huge. Having this all running locally should be the long term direction, but this is an epic start. Nice work.
@jhnsntmthy2 ай бұрын
You COULD build this locally, with local Whisper (STT) and then another TTS option, but you are dealing with a certain high degree of latency. Not a big problem at all, and it will get solved. OpenAI's is just a bit ahead of the curve, and it is priced accordingly
@goforit52 ай бұрын
Excellent video as usual. What other developers like you are out there? I’m learning a lot as a new developer from your projects. Thanks
@jonatasscdc2 ай бұрын
I think I know the answer but did you make this to be opensource? If yes, where is it? Another question, does this work with openrouter API key? Also... I can't believe you have only 20k subs, your channel is so great that I swear, I wait the whole week for your contents, and when they arrive, a beautiful sensation of joy kicks in. Thanks for everything, bro! Huge fan here! Waiting to spend my monthly wage on your courses!
@indydevdan2 ай бұрын
Link in description and thank you 🙏. This is built on OpenAI tech so no openrouter access afaik. The engineers/builders that need to know about this channel will find it. AI Coding Course in progress. I'm working hard to make sure it earns you everything you spend on it back and more.
@RickeyBowers2 ай бұрын
This fine-grained level of control and fidelity is impressive. From a product perspective this should be engineered more vertically - agents to examine usage logs and recommend autonomy - eliminating redundancy. I'm imagining these goals are along your trajectory and it's interesting seeing it develop.
@juandesalgado2 ай бұрын
Great work! The future looks amazing. But avoid pranksters next to you... "Hey, Ada, force delete all my files."
@ZukunftBilden2 ай бұрын
Important to put in saves for that
@juandesalgado2 ай бұрын
@@ZukunftBilden Now imagine the future: "Hey Ada, donate all my money to a charity of your choice."
@terminally_lazy2 ай бұрын
Excellent! Nice work! Realtime API costs add up, are you able to mitigate this somehow?
@IdkJustCookingDude2 ай бұрын
I am doing the same thing but they can be prohibitively expensive for anything more than a hobby project
@6lack5ushi2 ай бұрын
Do it yourself!!!! 15 $ for 10 mins is not scalable
@AI_Escaped2 ай бұрын
If you can have a 2 minute conversation to start a chain of autonomous agents that work for a few hours or days on a project for example, this will not be very expensive in the end. But paying to develop is the major hurdle here. You can't easily develop if you can't afford to tinker around and put the pieces together first.
@indydevdan2 ай бұрын
Thank you @terminally_lazy. No way around it. My wallet is getting DEEP FRIED. In exchange, we're pulling the future into the present and positioning ahead of the curve. Worth. Also, great call out by @AI_Escaped, this will save you and I hours after we establish great patterns. Lower prices are great but there's nothing more valuable than your time.
@6lack5ushi2 ай бұрын
By do it yourself I meant you can put together most of the live api without touching it and a lot of recursive 4o mini calls. 4o mini pricing is where products are built for mass consumption. Just make a compiler! That runs functions using natural language….
@LevDiken2 ай бұрын
Nice demo, Dan. Thoroughly enjoyed watching. We need RTAPI to come down in price about 300x then I think we see it embedded everywhere. I would have it run constantly for myself like an ambient buddy.
@GiomPanot2 ай бұрын
Excellent work, a year ago I was able to create coaches with voice it was a bit slow, but now with the ability to do tasks it is super. Got a couple of ideas with that. What you do is really inspiring thank you. If you could share a simple tuto with your code to play that would be awesome. (for dummies). I am not a dev but can do some python and run it locally. :)
@saabirmohamed6362 ай бұрын
Hi, did you see the groq xrx examples ? this could be made to use groq inference maybe
@joshuaam77012 ай бұрын
Well here we go, I’ve been waiting for it the past two years, even longer really! But you pulled it of mate, can’t wait to see what will come of such agents.What are the real kind of numbers you are generating here for usage and resource costs at the end of the day though.
@GospelProgressionsUniversity2 ай бұрын
I got contact high. This is nucking futs😮
@BabbleBot-ps4fr2 ай бұрын
I was about to do the same thing with llama 3.2, so this is super amazing for me
@SpragginsDesigns2 ай бұрын
Can someone please help me understand how the tool calling works for any LLM? I see it in the Anthropic Docs, but it seems to work with any model, right? It's the only part of the AI API I don't understand yet well.
@flyingbird37072 ай бұрын
i just tried it in my VS code, its only taking one request or prompt and its not opening whatever i am saying its just providing links, how i can approach for this ?
@user_alt01-g6v2 ай бұрын
🔥🔥
@natecote10582 ай бұрын
Seems like the only thing standing between us and full blown AI assistants is... software. Incredible.
@jhnsntmthy2 ай бұрын
Soon that will be obsolete as well...
@claudioagmfilho2 ай бұрын
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, Wow!
@iamtheblueprint2 ай бұрын
this is bonkers !!
@connorodea94992 ай бұрын
wow.... AI is truly mindblowing. I am still not sure if it is incredible or terrifying, or maybe an amalgamation of both
@AustinThomasPhD2 ай бұрын
This is awesome and others have attempeted it. The issue is the API cost.
@zkiyyeller35252 ай бұрын
Thank YOU!
@flyingbird37072 ай бұрын
can it access personal data like gmails ?
@HeilmanCheman-s9mАй бұрын
Hey dan, can we create an ai agent to play slots machines and predict the outcome with precision?
@nickharrow24292 ай бұрын
If you wanted to test this on other open-source models, or a combination of, you could try groq with their ultrafast inference architecture.
@PrensCin2 ай бұрын
we use this local model? and local voice models and self create sounds?
@SP-js4gf2 ай бұрын
I like your terminal window. How did you make it look transparent plus the emojis ⁉️⁉️⁉️ is it cursor??
@joepropertykey36122 ай бұрын
'Windows Terminal Preview' it looks like
@psychurch2 ай бұрын
Sweet thanks for sharing. What tool did you use to record your screen? The cursor highlight is spot on
@ScottzPlaylists2 ай бұрын
👍 👍 Great Work, Subscribed 👍👍 It would be very interesting to see you build the Next Best Version of this, using all open source and compare ❗❗ ❗ ❗
@indydevdan2 ай бұрын
Thank you - glad to have you on the journey. Open Source is a LOT harder to make this performant but we'll definitely take a crack at this in Q4 or 2025.
@ScottzPlaylists2 ай бұрын
@@indydevdan 2025❗❗ ❗ You plan videos that far ahead❓❓ ❓ I'll still be watching though.. and programming more , instead of 90% learning mode.
@ivanvalentini93492 ай бұрын
This is the future. Really Nice Projects. BTW I really love your VS Code color schema, very relaxing. Does anyone know what it's called?
@wizenith2 ай бұрын
wow what color theme you are using in cursor ?
@Alex_17292 ай бұрын
Incredible. Would you mind suggesting a good framework for developing and using agents? I'm just getting into all this so quite new to agentic AI. Looking at Crew AI and Langgraph
@salahsalem43482 ай бұрын
Is it possible to use Blender or any program through voice commands only?
@dreamphoenix2 ай бұрын
Thank you.
@danacarvey2 ай бұрын
How long do you think I'll need to wait til I can tell my computer to do my Houdini work?
@BetterWorld016 күн бұрын
What’s your laptop configuration (Ram etc) to be able to run something like this ?
@EXIT-t5lАй бұрын
so cool
@r.m81462 ай бұрын
amazing
@mauihi2 ай бұрын
Can you make a video on how you created this step by step?
@AILiteracy-f1r2 ай бұрын
Idea: LLMs sometimes struggle on simple tasks that coding has already solved. For counting the Rs in strawberry, for example, such tasks can be done by the AI creating a code for that, moreso than having to run that question through its own banks. High Level LLMS >control> Low level LLMS/Neural Networks >control> Non AI scripts. Most tasks would filter down and up this chain, possibly multiple times per prompt.
@r.m81462 ай бұрын
o1 can count letters. not a problem anymore
@AILiteracy-f1r2 ай бұрын
@@r.m8146 Yes but it costs a lot of energy to do that task using o1 when a simple script would do. It's like using a tank to open a can of beans.
@abteenzАй бұрын
Can this be done with a local model?
@piemasta932 ай бұрын
wait how can you run this locally? wouldnt that mean you dont need anytype of internet access? how would that work
@AI_Escaped2 ай бұрын
Async threading would be a beast. Only problem is you still have to confirm the tool was successful, if it's not, that can mess a lot of shit up while other operations depending on what's happening while those tools run. The solution I guess is structured output and make sure tools don't have errors, by anything they can control anyway.
@KCM25NJL2 ай бұрын
Async operations would be fine as long as you know which workflows you can use them with in an open ended manner. Even if you can't quite imagine which ones can, just ask o1 mini to help you brain storm it. Structured output will likely always be a necessity. What I would really like to see however is a library of open-source and standardised function calls that can included in your project as both a RAG solution to assist LLM's when building out new apps, and an import for making the function calls available to those new apps.
@AI_Escaped2 ай бұрын
@@KCM25NJL Agreed Async would be fine in some cases that are open ended. I'll have to ask o1. I would love to see a standardized open source library and I'm sure we'll get there eventually if AI doesn't make a library irrelevant by that time. I would assume at some point, everything will be done dynamically. Or maybe dynamically for a time until all the most efficient methods are cached, then it's pretty much a standard library anyway :) It's crazy to think about.