Senior Engineer’s AI Assistant: Realtime API and AI Agents built to SHIP for you

Рет қаралды 19,261

IndyDevDan

Күн бұрын

Пікірлер: 83

@moresignal 2 ай бұрын

It's coming along nicely. Thanks for taking the time to share. It's a lot of extra effort for you but it's appreciated.

@2034-SWE 2 ай бұрын

Absolutely Perfect sequential and contextual display throughout this video. I love how you almost never have to explain anything to us directly, and instead let the pre-cursor screen focus (i.e. looking at certain parts of a page) do the talking, in conjunction with the subsequent verbal request & speech/file outputs

@stanleylu3625 2 ай бұрын

You're building Jarvis from iron man. Love the videos and generously sharing your code. I can't code but with your channel and AI I'm able to create full apps with just prompts. Thank you

@SamiSabirIdrissi 2 ай бұрын

EXACTLY

@goforit5 2 ай бұрын

SAME!

@IdPreferNot1 2 ай бұрын

lol. literally the 2 latest url's i copy/pasted to a .md for my claude Project Knowledge context. I borrowed your code for the tools, memory manager, utils and was able to fold them into the swarm agent framework i was working on. Im a new python coder so couldnt believe i was able to get it to work but its awesome. Thx!

@zoltanabonyi3307 2 ай бұрын

Please continue developing this project. Absolute mind-blowing stuff.

@higon99 2 ай бұрын

No it's not the tool that is impressive. It's you and the way you use it are impressive. I can not repeat long sentences until llms get my task done right eventually... but surely you have me excited for the future. Thank you for sharing the experience.

@dwayneholmes5049 2 ай бұрын

Wow! This is exactly what I’ve been working on for Release Engineering and DevOps. I’m really impressed and will be reviewing your code. I agree ai agents is about doing tasks in parallel….didn’t think about the speech thing!

@SamiSabirIdrissi 2 ай бұрын

this is F***ing wild, cant wait to try this. thank you for sharing!

@Sub0x-x40 2 ай бұрын

this looks fun af. if only my dad was alive to see this, its like commander data from star trek. it would truly blow hes mind

@ph4nt0mcz130 2 ай бұрын

Honestly, as an engineer you have to use this tooling to filter the information that you recieve. Imagine how many empty words you get now from “unskilled” product and management people. With the unsupervised power of AI on their side it will triple in size + be hallucinated. Funny thing is that we are getting into the encrypted channel and will have to filter out the AI with the AI

@nathanstreger3851 2 ай бұрын

I'm giving an AI presentation to my dev department and interested parties about how we're currently utilizing AI in our workflows/products and what I see for the future. Going to get this up and running locally to "Wow" them at the end.

@parkerrex 2 ай бұрын

Paused at second 29 to read. This is it man!

@IowIy 2 ай бұрын

Awesome. I agree with what your thoughts are on personal AI assistants with memory management.

@stonedoubt 2 ай бұрын

Dan… thank you for sharing this.

@K.F-R 2 ай бұрын

Great work. Deeply appreciate you sharing as you go.

@juliotriana4449 2 ай бұрын

Excellent work. Love your videos.

@software_valen 2 ай бұрын

nice vid man! i'm doing something similar just to play around and i'm 100% sure that you can achieve the same results just using any whisper locally (easy to setup btw) and an llm surrounded by some agent logic (i applied the react paper here). in this way you'll cut off your costs to zero. but in case you choose to use the openai apis, you will also cut your costs in an unimaginable way.

@aibeginnertutorials 2 ай бұрын

Brilliant as usual. Thank you..Looking forward to testing this.

@TonyAlfredsson 2 ай бұрын

This is just awesome! I have been working on the real-time API integration into my Flutter project since they released it, but I am having some challenges. I am also using Aider to code and will fork this right away and start with some implementations. Currently saves the transcribed audio into Supabase, so something similar should be a fun project. Thanks!

@YoneCortopassi 2 ай бұрын

aistructuralreview AI fixes this. "Realtime API AI Assistant Launch"

@TeamDman 2 ай бұрын

great demo! gives some good inspiration for my own tooling

@NLPprompter 2 ай бұрын

for someone who thinks randomly speaking is never been my forte, typing us, with typing i can re read what I typed. but yeah parallel building stuff... I'm in!

@ibbbyscode 2 ай бұрын

Wow! Amazing stuff. Thanks Dan

@k22marie 2 ай бұрын

hey Dan - I love these videos from a senior engineer's perspective. Videos are great, well edited and provide tons of content in an appropriate amount of time. My question: how does a junior engineer leverage this technology appropriately while they're skilling up?

@indydevdan 2 ай бұрын

Hey @k2marie, great question. Right now everything is changing in the tech ecosystem. The best thing junior engineers can do is USE this technology and get your REPS in. Theory, research, and 'best practices' are only great after technology has been established. Right now the experimenters, tinkerers, prototypers and builders that are USING this technology for EVERYTHING they possible can will emerge winners. My advice is to spend as much time building, testing, and using the new ai tech, and try to pay attention to people who are ACTUALLY building - not just talking and speculating.

@JonasLindquist 2 ай бұрын

This is really impressive, thanks a lot for sharing!

@mindcastsoftware 2 ай бұрын

I'm curious if puppeteer/playwright could be set up as a tool for the assistant to perform actions on websites and assess results?

@agenticmark 2 ай бұрын

an agent that runs on github issues and produces PRs and then deployment releases for me just to test each. a list of ports to match the PRs. cursor is bad ass, but it doesnt do that yet

@johnMcKartney 2 ай бұрын

This is the product that will be shipped to the public and will create a new revenue stream for the big 7, justifiying the immense investments these companies made for their AI-training data centers. I'll give it less then a year and we will have atleast one official release, if not build-in assistants in the new generation of phones and PC's. 12 months ago i tried building something like this but i couldn't manage to do it, cause i'm just not good enough, so i'm glad that this will become a reality very soon. Exciting times to be alive.

@stevebim000 2 ай бұрын

Another great video Dan. Could you build in the interruption instead of the push to talk ? Also the streaming output ?

@RybkaZolotse 2 ай бұрын

Any thoughts on something akin to "admin" level privilege to the speaker? Could ADA recognize one person's voice giving directives that had "junior" level permissions as a non-compliant condition?

@vastvitamins1966 2 ай бұрын

Are you planning on put together a paid course that goes step by step into setting this up I'd would love to see that.

@uhtexercises 2 ай бұрын

Great stuff, as always. Really love how this is evolving. Can't thank you enough for sharing the code. So much to learn from it. Let us know how we can give back.

@aerotheory 2 ай бұрын

I'd like to get a feel for cost/token over productivity.

@ia7cast 2 ай бұрын

is uv compatible with miniconda? would be great a tutorial about how install this repo

@TimNagle 2 ай бұрын

Insane. This is so good

@mrpocock 2 ай бұрын

Do you have agents that can write,test,run,debug a small code project? Bail back to you if they get stuck.

@IlkkaNisula 2 ай бұрын

Just curious did you track what was the total cost of this demo using the realtime api and models? I tried some simple realtime api for a just a few prompts and cost was whopping $2.5.

@henno6207 2 ай бұрын

a relevant question, but also I imagine the price for this plummets rapidly in the near future.

@2034-SWE 2 ай бұрын

Realtime API alone is approx $15 per hour - but as the person above commented, it’ll go down. Fast. Look at GPT-4 vs 4o costs for example View the current pricing as irrelevant. Instead, focus on how only

@psinke 2 ай бұрын

very nice. Would this be possible as a plugin for something like VSCode? Or would you run into limitations for what a plugin is allowed to do?

@cyberthugFi 2 ай бұрын

IDD is the new renaissance goat! I have questions tho, is the Model only able to help you on code? Can you make it see through the camera with Python scripts ;), can it have real world perception if i assign other tools using python or java, other languages, can I customize the ai personality ? How much it costed in tokens just to shoot this one video?

@FurfelOfficial 2 ай бұрын

Hi, I really enjoy this series of videos! However, in the last one when I saw the pricing for realtime API, I thought: why not just use Whisper 3 turbo locally? The only thing needed would be to detect then when the speech was started and finished (with let's say 1.5 second silence delay). I just tested it on M1 and it works very well, and doesn't take that much memory.

@jefferystartm9442 Ай бұрын

Please we need a Live Kit version , also have you check E2B ( bet you have not 😂)

@EdwinFairchild 2 ай бұрын

man i was so close to playing with your code till i realized i dont have access to the realtime API which is weird since they usually give me other betas

@edzynda 26 күн бұрын

What are your thoughts on Anthropic's new MCP spec? You see this becoming the standard when adding extra capabilities to LLMs?

@indydevdan 24 күн бұрын

BIG if USEFUL. Video is coming. Working through the value prop.

@fbalco Ай бұрын

are you triggering on °go ahead and ... ° or is that just how you like to say it for videos?

@cameronyking 2 ай бұрын

Can this be a VS Code extension? Would you build or could I port?

@tharunt3342 Күн бұрын

Can you provide a video for Complete guide

@serafinalcantara5520 2 ай бұрын

Would the assistant be as responsive and efficient to use one of the large open source llms locally like llama3.2/quen/visavis or a quantized model with a speech2text -> inference -> text2speech? As always great content 👍

@Aguiraz 2 ай бұрын

man this is hella cool, but can you help us give it a UI (streamlit?) as well for text input ? we often cant use mic and this tool is too powerful to leave bound to mic

@numbah16 2 ай бұрын

Love it. It's Scrape* though dude, not Scrap lol

@louisduplessis5167 2 ай бұрын

thx for the vid bro. what voice do u use ?

@christianreid7529 2 ай бұрын

Thank you so much 🙏🙏

@grymvision3094 2 ай бұрын

Forgive the question since I think you said it at some point, but what LLM are you using? Would this be viable with a local LLM? Obviously that depends on the complexity of the request, but in general, how viable would a local LLM be at this moment in time?

@ix4564 Ай бұрын

What is this exactly? What is the talking agent that can do these things ?

@sephirothcloud3953 2 ай бұрын

Can you please tell EVERYTIME the AI COST of your video? Otherwise, I have to spend money every day for testing if it's affordable enough to use for my purpose

@bukitsorrento 2 ай бұрын

Every AI content creator should know this basic principle.

@d1m18 2 ай бұрын

Very good point. All these guys showing tech that we can't afford isn't very useful

@bukitsorrento 2 ай бұрын

It will be useful for the future when costs go down by 90%. Dan is doing us a big favor by making this video, showcasing the technology despite the current high costs, using his own money.

@cyberthugFi 2 ай бұрын

@@bukitsorrento well said

@adamking6957 2 ай бұрын

@@bukitsorrento Why can he not then just say how much it cost? I mean, he hearted the comment, but he has not updated the description to include the cost. Why not just do it?