Build a ChatGPT-4o Voice Assistant with Groq, Llama3, OpenAI-TTS & Faster-Whisper

Рет қаралды 29,941

Ai Austin

Күн бұрын

Пікірлер: 141

@Powertech1511 4 ай бұрын

Bro your videos are crazy good and I am impressed with it

@foundsomething7 3 ай бұрын

kid its ai voice

@Nativewastaken 20 күн бұрын

@@foundsomething7 you cant create something like that

@damonegrate407 4 ай бұрын

@Ai_Austin... This is EXACTLY the kind of video coders and aspiring coders need. Love your style, plus, the work you did to build this isn't missed on me. Still stuck trying to get a successful screenshot test, but I'm pretty sure I can get this error cleared at some point. Can't believe I'm just finding your channel. Thanks so much.

@MakilHeru 4 ай бұрын

Well done. Even though this is a 23 minute video. It took me hours to reproduce everything you did. Great work man. Keep up the great work!

@royalgun8735 3 ай бұрын

Send the code

@vasvalstan 2 ай бұрын

@@royalgun8735 lazy ass :))) do your bit man, you're not going to learn anything, or much better, support the creator

@bens4446 4 ай бұрын

I come from a statistical programming background (R, Stata). For me, and probably lots of other python/developer noobs like me, this tutorial strikes the perfect balance between comprehensibility, thoroughness, and conciseness. The step by step development of basic code gradually replaced by more advanced code is just the approach I needed. Thanks for making this amazing AI stuff accessible to a wide audience!

@freddy29228 4 ай бұрын

This made me laugh out loud! Jarvis said it looks like a male adult, smiling and giving the middle finger ... (you have a wicked sense of humour) You have amazing coding skills, plus high video editing skills!

@seththunder2077 4 ай бұрын

Thanks for this austin. You are by far the most unique AI developer.

@nejihyuga5250 4 ай бұрын

Awesome video, thanks for the tutorial, this is exactly what I was looking for. I’m going to give it a try. I would like to give this a spin and use it to respond to my live streams 🤓

@Ai_Austin 4 ай бұрын

glad it was helpful!

@xhridhar 3 ай бұрын

@@Ai_Austinthanks for this video. Very helpful. I want to subscribe to your program. How many such videos can we expect every month?

@TuanNguyen-su5ty 4 ай бұрын

It's so crazy for making this video bro. We love it!

@einLasse 3 ай бұрын

Thank you so mutch!! I had the mentioned problem with PyAudio with the last voice assistant, but luckily this time you showed how to fix it

@SmartjinxKimani Ай бұрын

🎯 Key points for quick navigation: 00:00 *Introduction to building a voice-only interface for Google Gemini using Python.* 00:15 *The project involves combining various AI libraries, including an improved version of OpenAI's Whisper.* 00:28 *The assistant will use OpenAI's Text-to-Speech API for generating a human-like voice.* 01:09 *Pro channels on Discord will offer written tutorials and code blocks to accompany the videos.* 01:23 *Python version 3.11 is required for this project, and installation guidance is provided.* 01:36 *Instructions for installing essential Python packages for the voice assistant are outlined.* 03:27 *A simple program to interact with the Gemini API is created using the API key.* 06:45 *Configuration settings for Gemini's performance are adjusted, focusing on response randomness and output length.* 08:19 *Safety filters in Gemini can be turned off for less restricted interaction.* 09:59 *The OpenAI Text-to-Speech API setup requires account creation and credit management.* 14:33 *The program incorporates functions for both speech synthesis and transcription using Whisper.* 18:53 *A wake word detection system is implemented to activate the voice assistant.* 21:10 *The assistant is set to listen continuously, with a half-second delay between audio processing callbacks.* Made with HARPA AI

@kimeiya 4 ай бұрын

thanks for this man, it really is a great assistant model, having fun playing with it tho these comments are really stupid, its literally free, just takes an hr to follow the code if you'll want instant bs copy just pay easy anyways thanks for the video man!

@CM-zl2jw 3 ай бұрын

😂 she thinks you’re rude. Well done. Thanks for the chuckle.

@thelalomorales 4 ай бұрын

well that was fun!!!!!!!!! thanks for posting this. great video dude

@saylorinnovations99 4 ай бұрын

dude, you're like a fuckin LLM yourself..did you cut the vid that way or did you just steamroll it like that off the dome? wow you're a beast man

@longsyee 4 ай бұрын

Im impress with your video. I have tried a few library with the TTS instead using open AI.. I have tried PYTTSx3 & x4 GTTS(so far good) and conqui(very slow) By the way it is great.... Loooking forward for more video

@potaz02 27 күн бұрын

Same bro, what's the problem?

@potaz02 27 күн бұрын

@@MarkMaveicksame bro, what's the problem?

@miketrago4561 4 ай бұрын

"The overall tone of the image is coming across as rude and possibly intended to be disrespectful."

@stevetownley1 2 ай бұрын

Awesome video's! What spec box are you running these local LLM system on?

@6lack5ushi 4 ай бұрын

did the exact same thing but what did you use for images or is there a skip. because nothing I know is that fast with images? Just seen you use Flash, did not know it was that fast! thanks

@m.iranpour 3 ай бұрын

This is really amazing. Great job!

@dentelevate 4 ай бұрын

@Ai_Austin I am impressed and I like this video very much, but I have a question. I am not trying to be rude, but I am genuinely uninformed. How can we use this kind of assistant? How is it different from regular ChatGPT, excluding speed and the ability to take screenshots, see through the camera, and respond to a wake word? If it can see my screen, can I give it the power to drive my mouse and use my PC for me? Can we train it and create a database from all our actions and inputs for it? Can we save routines and make it remember previous tasks? Is there an assistant with the capability to open and close applications, record reminders and events, write and send emails hands-free, make calls, play music, search through Netflix for movies, take notes, search for files or browse the web, manipulate data in Excel tables, manage a CRM, pull data and create charts from Facebook Ads Manager data, and so on, like a regular human assistant built into a PC? If there isn't, can we create it starting from this video? Please let me know if this is possible before I dive into working with Python again.

@damonegrate407 4 ай бұрын

Everything you mentioned IS actually possible, and sure - this Assistant that @Ai_Austin built is a great start. There are only a handful of complete models that can perform all of the tasks this assistant can. Considering what I've used so far, in my opinion 'Multi-On' and maybe something like 'Hyperwrite' would be the closest to what you're describing, but @Ai_Austin gave us a superb foundation. For the ultimate assistant like you described, you'll have to take it from here and code it out yourself. Or link up with a few of us here and we can get it done together?

@ytgabster 3 ай бұрын

Dude I'm already 5 mins into ur clip and u got me , i ve subscribed and liked !! U are so specific going to straight to the root of the problem, I am a newbie and struggling with so many errors and challenges but sure as hell made a lot easier !!!!

@Myplaylist892 4 ай бұрын

Any glimpse in order to make it run remotely using a smartphone?

@rogerdingle9185 4 ай бұрын

I don't know squat about python or VS code but now I really want to learn!!

@loganwilliams4958 2 ай бұрын

Would it be possible to use this together with your unlimited memory technique and a RAG system? Or is the unlimited memory video going over a kind of rag system?

@GabouMorales-h4w 9 сағат бұрын

Hi , real doubt here , once becoming a pro member you share how to switch to pyttsx3 and does still has some quality or it decreases a lot ?

@Ai_Austin 8 сағат бұрын

the PRO tutorials have all of the code from the projects i create in the videos. as pyttsx3 is not a package i use in my implementation, no i do not have extra code implementations in the PRO tutorial. pyttsx3 will absolutely be far more robotic than openai

@FreakingClowning 4 ай бұрын

The day has come!!! ~~

@2121unclesam 4 ай бұрын

how to add pyttsx3 instead of openAI? in the def speak command would I enter the code there isntead of openAI code?

@MainoGates Ай бұрын

Ask ChatGPT with his same prompt at the beginning of the video.

@saylorinnovations99 4 ай бұрын

i'm trying to build an llm assistant to manage my security feeds and build profiles on ppl that come into the view of my feeds..do you think you can help me

@JankJank-om1op 4 ай бұрын

define 'profiles'. do you mean window shopper, paid/repeat customer, walking speed, number of times seen, facial expression/inferred mood, alone/relationship, or..? needless to say some indicators will be harder to code than others

@trilogen 3 ай бұрын

So this is not completely offline and your data/conversations private?

@lunala769 4 ай бұрын

Amazing work man thank you so much !

@AnjarMoslem 3 ай бұрын

wow that's pretty amazing!

@Larimuss 2 ай бұрын

Hell yeah, thanks bro this is awesome. 😮 could elevan labs could also be used for TTS?😢 local ollama setup would be good too instead of grog and another paid api.

@Izngd 2 ай бұрын

middle finger ---- this made me LOL.

@Belus1234 Ай бұрын

apparently the LLama3 model has a limmit can someone tell me how to set the used tokens down only from LLama3 or are there any other solutions, if so please let me know.

@Ai_Austin Ай бұрын

you can create a function to check the total tokens in convo before sending your request, if it exceeds their limit, remove messages until it doesn't then send the prompt.

@vidfan1967 4 ай бұрын

Great tutorial - I learned a lot. On first run, Jarvis understood my voice request and provided an answer. However, I could not even wake Jarvis up on 30+ attempts after that. It does not throw an error - but listens endlessly. Where should I look for the root cause? I am using VSC on a M2 Pro Macbook if that matters. Thanks for your kind help!

@Ai_Austin 4 ай бұрын

when running the program on your mac, there should be an icon in the top right of your screen with a mic icon. click that while the program is running and make sure VS code is using "voice isolation" mode. also helps to stop speaking for a few seconds before speaking your prompt. if you are speaking a bunch before, it will need transcribe all that audio which will increase the response time.

@vidfan1967 4 ай бұрын

@@Ai_Austin Thank you for the hint - that was part of the solution. I also changed the wake-word to "computer" which helped to get consistent voice activation.

@uiy8023 3 ай бұрын

I wonder how could u learn all these？Amazing work！

@CubacrazyProductions 4 ай бұрын

im getting this error on python degubber Exception has occurred: NameError name 'extract_prompt' is not defined File "C:\LLAMA3-OMNI-VOICE-ASSISTANT\assistant.py", line 168, in callback clean_prompt = extract_prompt(prompt_text, wake_word) ^^^^^^^^^^^^^^ NameError: name 'extract_prompt' is not defined no matter what i do to clean_prompt = extract_prompt(prompt_text, wake_word) i get that error

@tonyclif1 4 ай бұрын

Put the error into a LLM and ask it how to fix it. It might need your code in the chat as well.

@CubacrazyProductions 4 ай бұрын

@@tonyclif1 I did and got it to work but got 2 problems now..#1 rate limited to llama3 70b...so I'm trying to adapt the code to use llama3 locally on my 3090ti and #2 had to use pyttsx3 and ai listens all the time...had to remove the part when you call it by name..lol this is the biggest project that I have follow and made it this far lol

@tonyclif1 4 ай бұрын

@@CubacrazyProductions sorry, I'm a newbie to all of this so have no more suggestions. Good luck with it all.

@CubacrazyProductions 4 ай бұрын

@@tonyclif1 we are 2 newbies lol...thanks...this is a great project

@BiaBlanco-jb2ei 4 ай бұрын

can it handle interruptions from user?

@chelseatang5701 3 ай бұрын

Love your vids! When i try to pay for pro, the link doesn't work for me for some reason. Is it just me?

@Felix-co3jo 4 ай бұрын

Google Gemini does not work in my Country... Do you have an alternative? Great Video btw

@ljxiv 4 ай бұрын

its saying that segment_text is not defined on like 159, which is: text = ''.join(segment_text for segment in segments)

@charlescoonz7153 4 ай бұрын

brew install portaudio before the requirements.txt install because you will probably get a pyaudio error.

@milktots6933 4 ай бұрын

Amazing tutorial. Any suggestions as to why my script runs at inconsistent response times? Sometimes within 10 seconds, sometimes no response at all. Sometimes 30 seconds or longer. I can't pin it down

@seukseok 4 ай бұрын

Me too

@Ai_Austin 4 ай бұрын

could be slow computer, slow internet. helps to make sure you don't speak for a few seconds before your wake word so the program doesn't transcribe all that at once. if your computer has a voice isolation mode for your microphone (mac), enable that when the program is running

@milktots6933 4 ай бұрын

@@Ai_Austinended up being a noisy fan and a bad microphone. Thanks again homie

@Nativewastaken 17 күн бұрын

when i say the key word it doesnt respond

@BiaBlanco-jb2ei 4 ай бұрын

AMazinnnngggg!!!

@potaz02 29 күн бұрын

What if we want to change the voice of the AI to, for example, the voice of an anime character

@wolflycan-fz6gq 4 ай бұрын

you sir are a God

@StoicBooost333 3 ай бұрын

At last, My Jarvis is intelligent!!!

@CubacrazyProductions 4 ай бұрын

can you make a tutorial explaining how to actually run this completely locan on your pc? using ollama and pyttsx3? alot of folks rather just keep stuff local...thank you for this video...your channel is full of cool stuff glad i found it 🔥

@nexuslux 4 ай бұрын

second this :)

@mattstroker3742 4 ай бұрын

Better get good hardware

@Ai_Austin 4 ай бұрын

probably not worth a whole video because of this comment above from matt, and the code changes from groq to ollama and openai to pyttsx3 is like 10 lines of code changing. for an open source multimodal llm, i can't even give recommendations because there is no way my mac is running that. so gonna need a couple more of you to join the pro membership if you want to see content that requires me to buy a car priced gpu.

@CubacrazyProductions 4 ай бұрын

@@Ai_Austin i manage to integrate ollama with the same ai helping lol

@psenej 4 ай бұрын

how many it cost ?? with all cloud ai model. possibly to do this completely free with open source ai and gpu based ?

@Ai_Austin 4 ай бұрын

the only payed service is openai tts, which i explain a free open source alternative. if you wanted to run all of this locally, you will need a $15k pc build minimum. if cost is your concern, you are over thinking, because this is exponentially cheaper than your intuition

@psenej 4 ай бұрын

@@Ai_Austin grok, openai and gemini are not free to use ? With a good pc gamer i can run llama8b, it is not the powerfull but with the most common task it is good So i m searching a free whisper and a free image recognition It’s for the science 😂 if it doesn’t work i can make your solution !

@psenej 4 ай бұрын

Maybe you dont hit the rate limit and it’s free but the privacy (i know it is not fun but essential) is not good if you send your screen or your webcam

@CM-zl2jw 3 ай бұрын

I have a feeling this isn’t your own voice. And why would a guy so amazing with AI struggle with putting KZbin videos together? No dis. Just a question. Your videos look amazing.

@Ai_Austin 3 ай бұрын

it is my real voice. because ai can't do anything to speed up my videos production with current ai. can't write this complex of code, can't write half as good of a video script, certainly cannot edit this. ai voices can and have been used on my channel but it's faster to use my own voice

@charlescoonz7153 4 ай бұрын

Awesome work. Finally done. Is there a version with the free voice modeling been used instead of the openai paid one?

@damonegrate407 4 ай бұрын

This is the version you're looking for. He substitutes Llama3 in the video.

@maloukemallouke9735 4 ай бұрын

Excellent

@Darksagan 4 ай бұрын

Fire video and I dont know shit about coding.

@ss_edits0612 3 ай бұрын

Bro how to create chat bot and voice recognization bro please reply this is my project

@morabargav1544 4 ай бұрын

is there any way that i could control my wifi switches with yours ai assistant

@ItsBillTV 4 ай бұрын

Discord channel invite link doesnt work, neesds to be renewed.

@tindertoast1553 4 ай бұрын

banger

@antaishizuku 4 ай бұрын

Where is the Github or code?

@Ai_Austin 4 ай бұрын

my discord, for people that pay to be apart of my pro membership. i make code tutorials for people trying to learn for free. if you just want to click on my videos to take my code, you can skip that join the pro membership and take it in a mutual value exchange for my work.

@Evermysticdesigns 3 ай бұрын

is it possible to assign a custom made voice ?

@Ai_Austin 3 ай бұрын

absolutely possible. you would need something like the coqui library and a very fast pc to run those local models fast enough for a voice assistant

@Evermysticdesigns 3 ай бұрын

@@Ai_Austin i want to use a custom voice for this project kzbin.info/www/bejne/bJewYmqanKd-os0si=I-DA5baUxfiozOR1

@Evermysticdesigns 3 ай бұрын

@@Ai_Austin I tried to send you the long for my KZbin video but it got wioed off , I made a full side droid and I want him to become a fully functional AI

@wallmemes 2 ай бұрын

how do i encorporate lamma 405

@lostsoul8634 4 ай бұрын

Good video. I have found a new TTS service (cartesia ai) with a delay of 135 ms and good voice quality, and most importantly... if I understood correctly, the model has emotion control. Please create a new video or text guide on how to integrate the cartesia ai Sonic API. And I want to know if it's worth paying for this service... maybe it's not as good as they claim.

@Ai_Austin 4 ай бұрын

probably not going to recreate this whole assistant with just a new tts engine. if you feel that it is something you want to add, consider this getting you 95% of the way there. the whole reason i make these code tutorials, is because it's a starter project for you guys to add your ideas. do you want to learn to code or just get me to build your ideas ?

@damonegrate407 4 ай бұрын

@@Ai_Austin For real, lol.

@Jj-qx1cj 4 ай бұрын

Is groq api free?

@MilkGlue-xg5vj 4 ай бұрын

Yes

@Ai_Austin 4 ай бұрын

yes.

@MilkGlue-xg5vj 4 ай бұрын

@@Ai_Austin Yes

@Jj-qx1cj 4 ай бұрын

Hell yeah

@rangerstamizha 2 ай бұрын

Y r my inspiration bro ❤

@JohnDoe-zx8bu 4 ай бұрын

Why don't you use type hints?

@__________________________6910 4 ай бұрын

Your voice is deep

@mbegangsylvain1076 4 ай бұрын

❤❤❤

@appaesthetics1277 4 ай бұрын

This was a good tutorial , a mini tutorial for more advanced users with less explaining would be cool aswell it seems too simplified

@officiallysmooth2221 4 ай бұрын

is there a video on how I can use pyttsx3 instead of openai?

@damonegrate407 4 ай бұрын

That's literally what he suggests you use in the video.

@curiousfurious4162 Ай бұрын

can anyone give me the code, mine is not working.

@kieuphongeg 21 күн бұрын

Easier way, we can use livekit, it's truly opensource

@BrainFlexQuizzes 4 ай бұрын

Sir it free?

@integrateeverything 4 ай бұрын

Please make real time video conversation using generative ai

@Powertech1511 4 ай бұрын

Second Comment

@NLPprompter 4 ай бұрын

a male middle finger you while writing codes what could get wrong? +1 subscriber

@mahdinahmed-zf5yw 2 ай бұрын

where is the time stamp?

@Ai_Austin 2 ай бұрын

00:00

@hobologna 4 ай бұрын

we still don't have it, and i'm already over it. Copilot Pro is even more insufferable than chatting with GPT4.

@Maesdy01 4 ай бұрын

First comment

@dimasdianugrah4704 4 ай бұрын

easy if all use paid services 😅. thumb down

@Ai_Austin 4 ай бұрын

literally every service is free besides the service i gave a free alternative to. groq & gemini both have free unlimited trials for their api. faster-whisper = free and local. you do not need to use openai tts, you can use pyttsx3 like i recommended. your criticism is incorrect, so your dislike is irrelevant. thank you 🙏😮‍💨