How to build a real-time AI assistant (with voice and vision)

Рет қаралды 19,562

Underfitted

Күн бұрын

Пікірлер: 92

@toddroloff93 2 ай бұрын

Incredible video. You're taking your content to the next level. Keep up the good work and thankyou for all you do.

@ChefDomein Ай бұрын

Sir, your AI voice assistant demo's are one of the most valuable and appreciated youtube videos I have come across. Please keep them coming and also would be great to do a demo with Groq for solving latency issue. You are doing great work man and your students are really appriciating it! Thanks a lot brother!

@ronsinolast 2 ай бұрын

Hi, this is great. I tried and its working. I tried to introduce my self, then I ask, "Do you know my name ?" the response is "I'm not able to remember past conversations." So, Can we make it remember the conversations, and also "remember" my face ?

@scott701230 Ай бұрын

That actually a good question. I saw a paper published about persistent memory: short term memory, medium long term memory and building an automated RAG system to automatically RAG information to long term memory, so we can create an assistant that’s goal orientated and proactively manages you and towards meeting your goals.

@dheerajmadaan866 Ай бұрын

This was a really cool stuff. Thanks for sharing such a quality stuff. I ran it on vscode and it worked. The main problem is the latency. It took like 10s for the conversation. Not sure if it is because of the free account or their websocket API has the issue.

@riemannderakhshan1037 2 ай бұрын

You turned your videos to the next level which is pretty amazing. I would like you to ask if is possible, show us how to use open source models in those apps. Thank you in advance.

@davieslacker 2 ай бұрын

Really cool stuff... I def plan to recreate some of these things along with you when I have a bit more time at my computer. Just a thought, adding screen capture in with this would be pretty cool too to get help with whatever applications you're in... I would imagine you could include both camera and screenshot images in the same context and it should be able to distinguish which you're asking about.. or build a different tool that it can function call for that. Can't wait until we get some slightly more expressive voices as an option like OpenAI teased us with.

@ashishtandi4440 Ай бұрын

Incredible!. I tried the same thing there is a noticeable delay. I am not sure if it is the TTS or STT or the LLM API itself. While yours and the default demo at Livekit is damn fast.

@bimalnair 18 күн бұрын

Absolutely fabulous! Thanks for making this one! I loved it!!

@moacirosa 2 ай бұрын

Amazing content with solid explanation. Thanks very much 👏

@underfitted 2 ай бұрын

Glad you liked it!

@iitjeephysics2789 Ай бұрын

from livekit import agents, rtc ImportError: cannot import name 'agents' from 'livekit' (unknown location). I am getting this error.

@codemonkey2k5 17 күн бұрын

Is there any way you could do a version of this that can use a locally run Ollama server? Even if it means that I lose the image feature.

@johanransbygranberg2219 Күн бұрын

My first thought too!!

@SaddamBinSyed Ай бұрын

Hi ..thanks for the nice video.. Can we use local LLM (like ollama) instead of a paid one..?

@solanobordim Ай бұрын

This could be very useful for blind people. Thank you

@vincentzhong3920 Ай бұрын

Sir，what is the latency？ same as gpt4o s demo or much longer？

@huangphoenix Ай бұрын

Great video, keep going. Just wonder if you can add barge-in function?

@mehershahzad-n5s Күн бұрын

Amazing 🙂

@sumitdevraye9725 2 ай бұрын

Great video. Keep these coming.

@7BlackJack8 2 ай бұрын

Can be used with google flash? Thanks for super content!❤

@amazingvideos4824 20 күн бұрын

Man this is amazing Can we deploy it to cloud so it works from anywhere? I deployed it to heroku but its not accessing the webcam

@ridhwanbakare3406 2 ай бұрын

This is really cool. As someone with python knowledge how would you suggest I get started? Any roadmaps or videos you published?

@jameszhang2832 2 ай бұрын

Fantastic, thank you very much. How would you adapt your code if you have multiple participants?

@user-yh2uz6fd7l 2 ай бұрын

Great info, thx! Is there a way to use local LLM (like ollama, local AI etc) on this platform instead of openai?

@ainewsera 9 күн бұрын

I need this but with a face to talk to me in real time. Can you do this?

@jimmywang6177 2 ай бұрын

very interesting! thank you!

@mehmetbakideniz Ай бұрын

great video as always. Does this system keep chat history?

@edgarl.mardal8256 2 ай бұрын

Hi, I am working on creating a closed lan-network, using per to per, and will input a live AI agent, locally stored, getting knowledge from LLM, and wonder if it is possible to have this kind of system then running without using internet?

@ccouto2869 2 күн бұрын

oh my god, amazing, tanks

@abdiasj3692 2 ай бұрын

would love to see how to to implement Deepgram TTS instead of OPenAI !

@underfitted Ай бұрын

It’s actually very simple: simpler than what I had to do to get OpenAI working

@abdiasj3692 Ай бұрын

@@underfitted Hey thank for replying! This would be an awesome! Also using maybe openrouter as well! Wild ideas come to mind!

@andriusem 2 ай бұрын

Hi, great video! How to change the source code that it captures my screen, desktop. Thanks.

@sharplcdtv198 2 ай бұрын

your code generally doesn't run in VScode in windows... some things seem platform dependent unfortunately

@underfitted 2 ай бұрын

I don’t think it’s a problem with my code… it’s a problem with Windows. Try WSL.

@aaronwenniger7966 2 ай бұрын

now i keep running into troubles when using this code, I would love to be able to discuss this so i can get it fixed i want to implement some features to see if it can work for something else to.

@AI_by_AI_007 2 ай бұрын

Yes the API keys do not pass -- what are you experiencing?

@aaronwenniger7966 2 ай бұрын

@@AI_by_AI_007 Hi Yes, So i had to rework the code a little bit to get everything working again. And now its working great except that the voice of the AI is not working and i cannot give voice commands anymore.

@Noahperaudon 2 ай бұрын

@@aaronwenniger7966How do you have do for the livekit api key ?

@Noahperaudon 2 ай бұрын

How for the livekit api key ?

@aaronwenniger7966 2 ай бұрын

@@Noahperaudon ?

@reynoldoramas3138 2 ай бұрын

Hola Santiago saludos desde Cuba, acabo de ver en su perfil de Github que es un coterráneo. Su contenido es muy valioso, por aquí un ingeniero de IA tratando de salir adelante en este mundo. Me encantaría poder contactar con usted y ayudarle en algún proyecto.

@insitegd7483 2 ай бұрын

Thank you, It is very interesting.

@rithikkumar7683 2 ай бұрын

I hope we can we use gemini 1.5 pro? I will try to make this changes in old code

@AmitMarx-ei8tt Ай бұрын

Got stuck with the API Keys, i'm not sure how to set them

@twetemomedical9500 18 күн бұрын

Any one else not getting a response on the interface, it’s registering my commands by no response

@user-yp8qr6kj2e Ай бұрын

I wish the code would process the timeline.

@jock21341 2 ай бұрын

sir can you help me why my assistant isnt talking back or nothings happening but its recognising in chat what im saying

@jeff_holmes 2 ай бұрын

Curious about the latency. I noticed that you cut the video after each question (after <a href="#" class="seekto" data-time="1195">19:55</a>), so I am assuming it was a few seconds?

@underfitted 2 ай бұрын

It wasn’t bad, but GPT-4o is not as fast as it could be, so you definitely have to wait a second or so for an answer

@vesalaasanen2158 2 ай бұрын

@@underfitted , would be nice to add at least one answer in real time so we would get more realistic picture of it.

@theprocess-YT 2 ай бұрын

so i cannaot code can you make toturial for using ph3 which is free and have vision and also use visper ai to convert text to speech and other free tools so minimizing the cost to completely zero I am a student trying out these stuff and don't wanna pay or don't have money to pay for the API or other things so please make a toturial using all the free and open source tools

@juanmanuelzwiener4447 Ай бұрын

Santiago, the voices of assistant are only in english? or also in spanish? abrazo crack!

@underfitted Ай бұрын

They speak Spanish too

@dmitrypehovski 2 ай бұрын

Hi , i start test with all your steps and got stuck on the fact that text and audio from the openai api are not transferred to livekit, all requests pass in the terminal , tried many solutions...doesnt work

@densonsmith2 2 ай бұрын

I think I may have a similar issue on Windows there is some problem with the ffmpeg library.

@sr.modanez 2 ай бұрын

obrigado, fantástico o vídeo 👏👏👏👏👏👏👏👏👏

@Brou15O 2 ай бұрын

could i get this on my smartphone?

@underfitted 2 ай бұрын

As is, no. You’ll need to rewrite it in a phone-friendly language

@rxWar 2 ай бұрын

Nice men thanks

@boooosh2007 2 ай бұрын

Is this functionally any different than your previous video?

@underfitted 2 ай бұрын

While they work the same for the demo, my previous code is very brittle. This one is much better because I’m using an entire existing infrastructure to support it.

@danieladama8105 2 ай бұрын

This is great!

@apdurden Ай бұрын

Yeah, this is cool but not helping. I can open the LiveKit interface but can't find a way to get the agent to connect. API keys all correct

@apdurden Ай бұрын

I think the track management piece has changed since you made this. Running into no local_participant attribute for the Room object

@nmstoker Ай бұрын

Shame that the hook with videos is start open source and then get people draw into handling supporting functions via a commercial platform (for $$$)

@jsanti1000 Ай бұрын

Dangggggggg!!!

@aidanthompson5053 2 ай бұрын

@densonsmith2 2 ай бұрын

Has anyone gotten this to work on Windows?

@LesBrickodeurs Ай бұрын

No I've got an error at line 12

@LesBrickodeurs Ай бұрын

from livekit.agents.voice_assistant import AssistantContext, VoiceAssistant ImportError: cannot import name 'AssistantContext' from 'livekit.agents.voice_assistant' (D:\Github\livekit-assistant\.venv\Lib\site-packages\livekit\agents\voice_assistant\__init__.py)

@davidkeane1820 Ай бұрын

@@LesBrickodeurs yes the SDKs have changed and a lot of this no longer works off the bat ..im still playing around but it probably needs redoing

@Noahperaudon 2 ай бұрын

Hey I have a issue with key api livekit its telling me error like its invalid

@AI_by_AI_007 2 ай бұрын

Me as well -- YOU on windows or MAC as you try this?

@Noahperaudon 2 ай бұрын

@@AI_by_AI_007 windows

@Noahperaudon 2 ай бұрын

@@AI_by_AI_007Windows

@rahahoseini1523 Ай бұрын

@@AI_by_AI_007 How can I access to the API Keys? could you please tell me step by step.

@bhaskerbobby 24 күн бұрын

Hi Im getting following error --> {"message": "draining worker", "level": "INFO", "id": "unregistered", "timeout": 60, "timestamp": "2024-08-16T<a href="#" class="seekto" data-time="146">02:26</a>:14.669243+<a href="#" class="seekto" data-time="0">00:00</a>"} {"message": "shutting down worker", "level": "INFO", "id": "unregistered", "timestamp": "2024-08-16T<a href="#" class="seekto" data-time="146">02:26</a>:14.670548+<a href="#" class="seekto" data-time="0">00:00</a>"}

@YounessArjoune Күн бұрын

How did you get passed this error if you ever did?