It's coming along nicely. Thanks for taking the time to share. It's a lot of extra effort for you but it's appreciated.
@2034-SWE2 ай бұрын
Absolutely Perfect sequential and contextual display throughout this video. I love how you almost never have to explain anything to us directly, and instead let the pre-cursor screen focus (i.e. looking at certain parts of a page) do the talking, in conjunction with the subsequent verbal request & speech/file outputs
@stanleylu36252 ай бұрын
You're building Jarvis from iron man. Love the videos and generously sharing your code. I can't code but with your channel and AI I'm able to create full apps with just prompts. Thank you
@SamiSabirIdrissi2 ай бұрын
EXACTLY
@goforit52 ай бұрын
SAME!
@IdPreferNot12 ай бұрын
lol. literally the 2 latest url's i copy/pasted to a .md for my claude Project Knowledge context. I borrowed your code for the tools, memory manager, utils and was able to fold them into the swarm agent framework i was working on. Im a new python coder so couldnt believe i was able to get it to work but its awesome. Thx!
@zoltanabonyi33072 ай бұрын
Please continue developing this project. Absolute mind-blowing stuff.
@higon992 ай бұрын
No it's not the tool that is impressive. It's you and the way you use it are impressive. I can not repeat long sentences until llms get my task done right eventually... but surely you have me excited for the future. Thank you for sharing the experience.
@dwayneholmes50492 ай бұрын
Wow! This is exactly what I’ve been working on for Release Engineering and DevOps. I’m really impressed and will be reviewing your code. I agree ai agents is about doing tasks in parallel….didn’t think about the speech thing!
@SamiSabirIdrissi2 ай бұрын
this is F***ing wild, cant wait to try this. thank you for sharing!
@Sub0x-x402 ай бұрын
this looks fun af. if only my dad was alive to see this, its like commander data from star trek. it would truly blow hes mind
@ph4nt0mcz1302 ай бұрын
Honestly, as an engineer you have to use this tooling to filter the information that you recieve. Imagine how many empty words you get now from “unskilled” product and management people. With the unsupervised power of AI on their side it will triple in size + be hallucinated. Funny thing is that we are getting into the encrypted channel and will have to filter out the AI with the AI
@nathanstreger38512 ай бұрын
I'm giving an AI presentation to my dev department and interested parties about how we're currently utilizing AI in our workflows/products and what I see for the future. Going to get this up and running locally to "Wow" them at the end.
@parkerrex2 ай бұрын
Paused at second 29 to read. This is it man!
@IowIy2 ай бұрын
Awesome. I agree with what your thoughts are on personal AI assistants with memory management.
@stonedoubt2 ай бұрын
Dan… thank you for sharing this.
@K.F-R2 ай бұрын
Great work. Deeply appreciate you sharing as you go.
@juliotriana44492 ай бұрын
Excellent work. Love your videos.
@software_valen2 ай бұрын
nice vid man! i'm doing something similar just to play around and i'm 100% sure that you can achieve the same results just using any whisper locally (easy to setup btw) and an llm surrounded by some agent logic (i applied the react paper here). in this way you'll cut off your costs to zero. but in case you choose to use the openai apis, you will also cut your costs in an unimaginable way.
@aibeginnertutorials2 ай бұрын
Brilliant as usual. Thank you..Looking forward to testing this.
@TonyAlfredsson2 ай бұрын
This is just awesome! I have been working on the real-time API integration into my Flutter project since they released it, but I am having some challenges. I am also using Aider to code and will fork this right away and start with some implementations. Currently saves the transcribed audio into Supabase, so something similar should be a fun project. Thanks!
@YoneCortopassi2 ай бұрын
aistructuralreview AI fixes this. "Realtime API AI Assistant Launch"
@TeamDman2 ай бұрын
great demo! gives some good inspiration for my own tooling
@NLPprompter2 ай бұрын
for someone who thinks randomly speaking is never been my forte, typing us, with typing i can re read what I typed. but yeah parallel building stuff... I'm in!
@ibbbyscode2 ай бұрын
Wow! Amazing stuff. Thanks Dan
@k22marie2 ай бұрын
hey Dan - I love these videos from a senior engineer's perspective. Videos are great, well edited and provide tons of content in an appropriate amount of time. My question: how does a junior engineer leverage this technology appropriately while they're skilling up?
@indydevdan2 ай бұрын
Hey @k2marie, great question. Right now everything is changing in the tech ecosystem. The best thing junior engineers can do is USE this technology and get your REPS in. Theory, research, and 'best practices' are only great after technology has been established. Right now the experimenters, tinkerers, prototypers and builders that are USING this technology for EVERYTHING they possible can will emerge winners. My advice is to spend as much time building, testing, and using the new ai tech, and try to pay attention to people who are ACTUALLY building - not just talking and speculating.
@JonasLindquist2 ай бұрын
This is really impressive, thanks a lot for sharing!
@mindcastsoftware2 ай бұрын
I'm curious if puppeteer/playwright could be set up as a tool for the assistant to perform actions on websites and assess results?
@agenticmark2 ай бұрын
an agent that runs on github issues and produces PRs and then deployment releases for me just to test each. a list of ports to match the PRs. cursor is bad ass, but it doesnt do that yet
@johnMcKartney2 ай бұрын
This is the product that will be shipped to the public and will create a new revenue stream for the big 7, justifiying the immense investments these companies made for their AI-training data centers. I'll give it less then a year and we will have atleast one official release, if not build-in assistants in the new generation of phones and PC's. 12 months ago i tried building something like this but i couldn't manage to do it, cause i'm just not good enough, so i'm glad that this will become a reality very soon. Exciting times to be alive.
@stevebim0002 ай бұрын
Another great video Dan. Could you build in the interruption instead of the push to talk ? Also the streaming output ?
@RybkaZolotse2 ай бұрын
Any thoughts on something akin to "admin" level privilege to the speaker? Could ADA recognize one person's voice giving directives that had "junior" level permissions as a non-compliant condition?
@vastvitamins19662 ай бұрын
Are you planning on put together a paid course that goes step by step into setting this up I'd would love to see that.
@uhtexercises2 ай бұрын
Great stuff, as always. Really love how this is evolving. Can't thank you enough for sharing the code. So much to learn from it. Let us know how we can give back.
@aerotheory2 ай бұрын
I'd like to get a feel for cost/token over productivity.
@ia7cast2 ай бұрын
is uv compatible with miniconda? would be great a tutorial about how install this repo
@TimNagle2 ай бұрын
Insane. This is so good
@mrpocock2 ай бұрын
Do you have agents that can write,test,run,debug a small code project? Bail back to you if they get stuck.
@IlkkaNisula2 ай бұрын
Just curious did you track what was the total cost of this demo using the realtime api and models? I tried some simple realtime api for a just a few prompts and cost was whopping $2.5.
@henno62072 ай бұрын
a relevant question, but also I imagine the price for this plummets rapidly in the near future.
@2034-SWE2 ай бұрын
Realtime API alone is approx $15 per hour - but as the person above commented, it’ll go down. Fast. Look at GPT-4 vs 4o costs for example View the current pricing as irrelevant. Instead, focus on how only
@psinke2 ай бұрын
very nice. Would this be possible as a plugin for something like VSCode? Or would you run into limitations for what a plugin is allowed to do?
@cyberthugFi2 ай бұрын
IDD is the new renaissance goat! I have questions tho, is the Model only able to help you on code? Can you make it see through the camera with Python scripts ;), can it have real world perception if i assign other tools using python or java, other languages, can I customize the ai personality ? How much it costed in tokens just to shoot this one video?
@FurfelOfficial2 ай бұрын
Hi, I really enjoy this series of videos! However, in the last one when I saw the pricing for realtime API, I thought: why not just use Whisper 3 turbo locally? The only thing needed would be to detect then when the speech was started and finished (with let's say 1.5 second silence delay). I just tested it on M1 and it works very well, and doesn't take that much memory.
@jefferystartm9442Ай бұрын
Please we need a Live Kit version , also have you check E2B ( bet you have not 😂)
@EdwinFairchild2 ай бұрын
man i was so close to playing with your code till i realized i dont have access to the realtime API which is weird since they usually give me other betas
@edzynda26 күн бұрын
What are your thoughts on Anthropic's new MCP spec? You see this becoming the standard when adding extra capabilities to LLMs?
@indydevdan24 күн бұрын
BIG if USEFUL. Video is coming. Working through the value prop.
@fbalcoАй бұрын
are you triggering on °go ahead and ... ° or is that just how you like to say it for videos?
@cameronyking2 ай бұрын
Can this be a VS Code extension? Would you build or could I port?
@tharunt3342Күн бұрын
Can you provide a video for Complete guide
@serafinalcantara55202 ай бұрын
Would the assistant be as responsive and efficient to use one of the large open source llms locally like llama3.2/quen/visavis or a quantized model with a speech2text -> inference -> text2speech? As always great content 👍
@Aguiraz2 ай бұрын
man this is hella cool, but can you help us give it a UI (streamlit?) as well for text input ? we often cant use mic and this tool is too powerful to leave bound to mic
@numbah162 ай бұрын
Love it. It's Scrape* though dude, not Scrap lol
@louisduplessis51672 ай бұрын
thx for the vid bro. what voice do u use ?
@christianreid75292 ай бұрын
Thank you so much 🙏🙏
@grymvision30942 ай бұрын
Forgive the question since I think you said it at some point, but what LLM are you using? Would this be viable with a local LLM? Obviously that depends on the complexity of the request, but in general, how viable would a local LLM be at this moment in time?
@ix4564Ай бұрын
What is this exactly? What is the talking agent that can do these things ?
@sephirothcloud39532 ай бұрын
Can you please tell EVERYTIME the AI COST of your video? Otherwise, I have to spend money every day for testing if it's affordable enough to use for my purpose
@bukitsorrento2 ай бұрын
Every AI content creator should know this basic principle.
@d1m182 ай бұрын
Very good point. All these guys showing tech that we can't afford isn't very useful
@bukitsorrento2 ай бұрын
It will be useful for the future when costs go down by 90%. Dan is doing us a big favor by making this video, showcasing the technology despite the current high costs, using his own money.
@cyberthugFi2 ай бұрын
@@bukitsorrento well said
@adamking69572 ай бұрын
@@bukitsorrento Why can he not then just say how much it cost? I mean, he hearted the comment, but he has not updated the description to include the cost. Why not just do it?
@alessandrofrau41962 ай бұрын
Great video. I'd love to make a solid use of Realtime API but it's so expensive, yet blocked in the EU 😮💨
@tomaszzielinski45212 ай бұрын
Oh man, I'm already jealous :>
@tlatoanimachi2 ай бұрын
Beautiful
@ScottzPlaylistsАй бұрын
👍👍👍 Can you put together an Open Source version, that duplicates this as best as possible❓❓❓ That the real future, but seems a little behind SOTA ❗
@mrpocock2 ай бұрын
I still want a better persona. I don't need it constantly asking me what else it can help me with.
@pgg1312 ай бұрын
Time to be another VSCode fork
@cameronyking2 ай бұрын
exactly what i thnk
@aiplaygrounds2 ай бұрын
Cost is to high to run it at the moment an average of $3 p/m
@MdNaimulIslam-y9l2 ай бұрын
wow wow
@seadude2 ай бұрын
kzbin.info/www/bejne/ZmqToIVjYtiSlZo "The future is already here - it's just not very evenly distributed." William Gibson ;)