Awesome! Time to replace my slow speech to speech code using openAI. Also, added eleven labs for a bit of a comedic touch. Thanks for putting this together.
@mayushi7792Ай бұрын
How much did it cost you? For integrating eleven labs?
@NoLimitYou7 ай бұрын
Too bad you take open source and make it closed.
@mblend277 ай бұрын
Explain?
@NoLimitYou7 ай бұрын
@@mblend27 You take code openly available, and ask people to become a member, to receive the code of what you demo using the open source code. The whole idea of open source is that everyone contributes without putting it behind walls
@Ms.Robot.7 ай бұрын
You can in several ways.
@NoLimitYou7 ай бұрын
You take open source and make something with that and put it behind a wall.
@TheGrobe7 ай бұрын
@@mblend27 You make someone pay to access something on github you comprised of open source components.
@nyny8 ай бұрын
Thats supah cool, I actually built something almost exactly like this yesterday. I get about the same performance. The hard part is needing to figure out threading/process pools/asyncio. To get that latency down. I used small instead of base. I think I get about the same response or better.
@Baptiste__1Trocquet8 ай бұрын
Hi ! Very impressive !! Do you have a github to share your code ?
@CognitiveComputations7 ай бұрын
can we see your code please
@limebulls7 ай бұрын
Im interested in it as well
@williamjustus26548 ай бұрын
Some of the best work and fun that I have seen so far. Can't wait to try on my own. Keep up the great work!!
@LFPGaming8 ай бұрын
do you know of any offline/local way to do translations? i've been searching but haven't found a way to do local translations of video or audio using LargeLanguageModels
@deltaxcd7 ай бұрын
there is a program "subtitle edit" which can do that
@MelindaGreen7 ай бұрын
I'm daunted by the idea of setting up these development systems just to use a model. Any chance people can bundle them into one big executable for Windows and iOS? I sure would love to just load-and-go.
@zyxwvutsrqponmlkh7 ай бұрын
I have tried open voice and bark, but VITS by far makes the most natural sounding voices.
@ales2408 ай бұрын
Just subscribed! can't wait to get my hands on it, looks super cool!
@swannschilling4748 ай бұрын
I am still using Tortoise but Open Voice seems to be promising! 😊 Thanks for this video!! 🎉🎉🎉
@deeplearningdummy7 ай бұрын
I've been trying to figure out how to do this. Great job. I want to support your work and get this up and running for myself, but is KZbin membership the only option?
@tommoves33858 ай бұрын
Hey Kris - that is awesome. I like it very much. Great that you do this open source stuff. Very cool 😎.
@PhillipThomas878 ай бұрын
I mean, this is dependent on your hardware... Are the specs anywhere for this "inference server"
@ryanjames39078 ай бұрын
very cool, low latency voice, thanks for sharing, i watch all your videos, and i look forward to the next one,
@DihelsonMendonca2 ай бұрын
That's wonderful. I wish I had the knowledge to implement that on my LLMs in LM Studio.
@denisblack98978 ай бұрын
I know about this for more than a year now and it still blows my mind. wtf
@SaveTheHuman57 ай бұрын
Hello, please can inform to us what is your cpu, gpu, ram etc?
@JohnSmith762A11B8 ай бұрын
I wonder if you are (or can, if not) caching the processed .mp3 voice model after the speech engine processes it and turns it into partials. That would cut out a lot of latency if it didn't need to process those 20 seconds of recorded voice audio every time. Right now it's pretty fast but the latency still sounds more like they are using walkie talkies than speaking on a phone.
@levieux11378 ай бұрын
it could go way further by using the native libs and dropping all the python-based wrappers that pass data between stages using files and that copy, copy, copy and recopy data all the time. For example llama.cpp is clearly recognizable in the lower layers, all the tunable parameters match it. I don't know for openvoice for example however, but the state the presenter arrived at shows that we're pretty close to reaching a DIY conversational robot, which is pretty cool.
@JohnSmith762A11B8 ай бұрын
@@levieux1137 By native libs, you mean the system tts speech on say Windows and macOS?
@levieux11378 ай бұрын
@@JohnSmith762A11B not necessarily that, but I'm speaking about the underlying components that are used here. In fact if you look, this is essentially python code built as wrapper on top of other parts that already run natively. The llama.cpp server for example is used here apparently. And once wrapped into layers and layers, you see that it becomes heavy to transport contents from one layer to another (particularly when passing via files, but even memcpy is expensive). It might even be possible that some elements are re-loaded from scratch and re-initialized after each sentence. The python script here appears to be mostly a wrapper around all such components,working like a shell script recording input from the microphone to a file then sending it to openvoice, then send that output to a file, then load another component with that file, etc... This is just like a shell script working with files and heavy initialization at every step. Dropping all that layer and directly using the native APIs of the various libs and components would be way more efficient. And it's very possible that past a point the author will discover that Python is not needed at all, which could suddenly offer more possibilities for lighter embedded processing.
@zedboiii4 ай бұрын
that's some Bethesda level of conversation
@yoagcur8 ай бұрын
Fascinating. Any chance you could upgrade it so that specific voices could be used and a recording made automatically, Could make for some interesting Biden v Trump debates
@edgarl.mardal82562 ай бұрын
Jeg kjøper meg patron medlemskap om du setter opp rasa med denne modellen, ettersom hun mangler IQ og structur vil jeg anbefale rasa og bruke salgs teknikk for å få henne til å høres mer logisk ut. Med det mener jeg spinning.
@irraz15 ай бұрын
wow! I would love to have such an assistant to practice languages. The “python hub” code, do you plan to share it at some point?
@josephtilly2585 ай бұрын
really interesting, lot of it i can't understand because I don't know coding but speech to speech could be a big thing within few years
@TomM-p3o8 ай бұрын
This is great. But personally I think a speech recognition with push to talk or push to toggle talk is most useful.
@lokiwhacker7 ай бұрын
Thought this was really cool, love open source. But this really isnt open source if youre hiding it behind a pay wall... smh
@codygaudet80716 ай бұрын
Just earned yourself a sub sir!
@MiguelCayazaya2 ай бұрын
Thanks there are those who go to war and become heroes and those who don't but still write programs
@arkdirfe7 ай бұрын
Interesting, this is similar to a small project I made for myself. But instead of a chatbot conversation, the whisper output is fed into SAM (yes, the funny robot voice) and sent to an audio output. Basically makes SAM say whatever I say with a slight delay. I'm chopping up the speech into small segments so it can start transcribing while I speak for longer, but that introduces occasional weirdness, but I'm fine with that.
@squiddymute8 ай бұрын
no api = pure genius
@DoNotTredOnMe2 ай бұрын
I'd love to see a video of to AI's conversating with one another.
@matthewfuller97605 ай бұрын
I think at even 1/3rd the speed with my rtx titan it would run just fine to learn a new language. Waiting 3 seconds is perfectly acceptable as a novice language learner.
@kleber19836 ай бұрын
Hi, I´d like to know the computer specs required to run your speech to speech system, I m quite interested but I need to know first I my computer can handle it. thanks.
@OdikisOdikis7 ай бұрын
the predefined answer timing is what makes it not real conversation. It should spit answer questions at random timings like any human can think of something and only then answer. Randomizing timings would create more realistic conversations
@duffy6664 ай бұрын
I really like it! It this already on Github for members (could not find it)?
@TheDailyMemesShow2 ай бұрын
OMG, I just noticed I've watched gazillion videos of yours. Why haven't subscribed, though? I swear I thought I had done it before? Something's not adding up here...
@ProjCRys8 ай бұрын
Nice! I was about to create something like this for myself but I still couldn't use OpenVoice because I keep failing to run it on my venv instead of conda.
@Zvezdan888 ай бұрын
How do you even install OpenVoice?
@normanalc3 ай бұрын
I'd like to get a copy of the script please, this one is really cool! thanks for sharing this.
@aladinmovies7 ай бұрын
Good job. Interesting video
@aestendrela8 ай бұрын
It would be interesting to make a real-time translator. I think it could be very useful. The language barrier would end.
@deltaxcd7 ай бұрын
meta didi it already they created speech to speech translation model
@fire171028 ай бұрын
Would love to see some realtime animations to go with the voice, could be a face, but also can be minimalistic (like the R1 rabbit).
@wurstelei13568 ай бұрын
You need a second GPU for this. Lets say you put on Stable Diffusion. Displaying a robot face with emotions would be nice.
@leucome8 ай бұрын
Try Amica AI . It has VRM 3D/vtuber character and multiple option for the voice and the llm backed.
@fire171026 ай бұрын
@@leucomedoes it work locally in real time?
@fire171026 ай бұрын
@@wurstelei1356Again, I think a minimalistic animation would also do the trick , or prerendeing the images once, and using them in the appropriate sequence in realtime.
@leucome6 ай бұрын
@@fire17102 Yes it can work in real-time locally as long as the GPU is fast and has enough vram to run the AI+Voice. It can also connect to online service if required. I uploaded a video where I play Minecraft and talk to the AI at same time with all the component running on a single GPU.
@LadyTink7 ай бұрын
Kinda feels like something the "rabbit R1" does with the whole fast speech to speech thing
@weisland28077 ай бұрын
would be funny if you had this in games - like the people on the streets of gta having convos fueled by somthing like this. maybe it's already happening tho, i'm not in the know. awesomesauce!
@kumar.jayanti97003 ай бұрын
Hi Kris, Where is the Github code for this one. I could not locate it in the Member github.
@64jcl7 ай бұрын
Surely the response time is a function of what rig you are doing this on - an RTX 4080 as you have is no doubt a major contributor here, and I would guess you have a beast of a CPU and high speed memory on a newer motherboard.
@microponics26958 ай бұрын
I have the uncensored model the same one and when I ask it to list curse words it says it can't do that. ???
@jungen10937 ай бұрын
Lmao that’s annoying
@cmcdonough24 ай бұрын
This was great 😃👍
@MegaMijit7 ай бұрын
this is awesome, but voice could use some fine tuning to sound more realistic
@ArnaudMEURET7 ай бұрын
Just to paraphrase your models: “Dude ! Are you actually grabbing the gorram scrollbars to scroll down an effing window !? What is this? 1996 ? Ever heard of a mouse wheel? You know it’s even emulated by double drag on track pads, right?” 🤘
@MrScoffins7 ай бұрын
So if you disconnect your computer from the Internet, will it still work?
@jephbennett7 ай бұрын
Yes, this code package is not pulling APIs (which is why the latency is low), so it doesn't need internet connection. Downside is, it cannot access info outside of it's core dataset, so no current events or anything like that.
@darik314 ай бұрын
Thanks for sharing this mate! I wonder if the code is available somewhere? If so, could you please provide a link? Thanks
@researchforumonline7 ай бұрын
wow very cool! Thanks
@alexander1912977 ай бұрын
I swear on my mother’s grave lol… this AI is hilarious! 😂😂😂
@jacoballessio57067 ай бұрын
I wonder if you could directly convert embeddings to speech to skip text inference
@JohnGallie7 ай бұрын
is there anyway that you can give the python 90% of system resources so it would be faster
@Jesulex822 ай бұрын
Este es un modelo para descargar y poder hablar con la IA? se puede jugar a ro? habla en español?
@mastershake27828 ай бұрын
I am trying to clone a voice from a reference audio file, but despite following the standard process, the output doesn't seem to change according to the reference. When I change the reference audio to a different file, there's no noticeable change in the voice characteristics of the output. The script successfully extracts the tone color embeddings, but the conversion process doesn't seem to reflect these in the final output. I'm using the demo reference audio provided by OpenVoice (male voice), but the output synthesized speech remains in a female voice, typical of the base speaker model. I've double-checked the script, model checkpoints, and audio file paths, but the issue persists. If anyone has encountered a similar problem or has suggestions on what might be going wrong, I would greatly appreciate your insights. Thank you in advance!
@tag_of_frank7 ай бұрын
Why LM Studio over OogaBooga? What are the pros/cons of them? I have been using Ooga, but wondering why one might switch.
@乾淨核能2 ай бұрын
what's the GPU requirement to achieve real time response? thank you
@deltaxcd7 ай бұрын
I think to decrease latency more you need to make it speak before AI finishes its sentence unfortunately there is no obvious way to feed it partial prompt but waiting until it will finish generating reply takes asy too long
@SonGoku-pc7jl8 ай бұрын
thanks, good project. Whisper can translate my spanish to english to spanish directly with little change in code? and tts i need change something also? thanks!
@suminlee65767 ай бұрын
Do you have a video for showing how to do this step by step? I was going to be paid member but I couldn't see how to video in your paid channel?
@Abhi-l6r1k11 күн бұрын
Where is the code available ?, I want to try it on my local
@aboudezoa7 ай бұрын
Running on 4080 🤣 makes sense the damn thing is very fast
@sovietlo8136Ай бұрын
pewdiepie if he started coding... I'm so into this, I'm still learning about this but I want to make my own local AI to be able to manage my business and make everything easier
@kcnb28Ай бұрын
You ever thought about using a virtual assistant?
@mickelodiansurname95788 ай бұрын
can the llm handle being told in a system prompt that it will be taking in the sentences in small chunks? say cut up into 2 second audio chunks per transcript. Can the mistral model do that? Anyway if so you might even be able to get it to 'butt in' to your prompt. now thats low latency!
@deltaxcd7 ай бұрын
No it cant be told that but it is not necessary. just feed it the chunk and then if user speaks before it managed to reply more restart and feed more
@khajask81133 ай бұрын
Hindi and Telugu language supports..?
@mertgundogdu2115 ай бұрын
How I can try this in my computer?? I couldnt find the talk.py in github code??
@Warz-cx6zkАй бұрын
It's his own code and you need to become a member and wait for invite to Github community.
@skullseason17 ай бұрын
How can i do this with the Apple M1, this is soooo awesome i need to figure it out!
@inLofiLife8 ай бұрын
looks interesting but where is this community link you mentioned? :)
@witext6 ай бұрын
I look forward to actual speech to speech LLM, not any speech to text translation layers, pure speech in and speech out, it would be revolutionary imo
@JG27Korny8 ай бұрын
I run the oobabooga silero plus whisper, but those take forever to make voice from text, especially silero.
@musumo19088 ай бұрын
Hey cool…anyway to run this self hosted for an online speech to speech setup? Want to drop this into a chatbot project…what level membership to access the code thanks
@NirmalEleQtra4 ай бұрын
Where can i find whole GitHub repo ?
@smthngsmthngsmthngdarkside7 ай бұрын
So where's the source code mate? Or is this just a hook for your newsletter marketing and crap website?
@Skystunt1234 ай бұрын
Just a hook, the code is not shared.
@BrutalStrike28 ай бұрын
Jumanji Alan
@TanvirsTechTalk3 ай бұрын
How did you actually set it up?
@jeffsmith93848 ай бұрын
I would like to see how a chat room full of different models would problem solve... ChatGPT + Claude + * 7B + Grok + Bard... all in a room, trying to decide what you should have for lunch
@mickelodiansurname95788 ай бұрын
AI: "We got some rich investors on board dude, and their willing to back us up!" I think this script just announced the games commencing in the 2024 US Election... [not in the US so reaches for popcorn]
@ExploreTogetherYT7 ай бұрын
how much RAM do you have to run mistral 7b locally? using gpu or cpu?
@TheDailyMemesShow2 ай бұрын
Would this work on the cloud? If so, how?
@_-JR017 ай бұрын
does openvoice perform better than whisper's TTS?
@Nursultan_karazhigit7 ай бұрын
Thanks . Is whisper api free ?
@m0nxt3r4 ай бұрын
it's open source
@Ms.Robot.7 ай бұрын
❤❤❤🎉 nice
@binthem79978 ай бұрын
Great tutorial but I wish you could share gists or share your code
@kritikusi-6668 ай бұрын
the voices are Mehh...cool project tho. You always have some fire content. You could train a LLM just off your content and be set haha.
@jerryqueen67556 ай бұрын
How can I install this on my PC? I am a member of the channel
@AllAboutAI6 ай бұрын
did you get the gh invite?
@jerryqueen67556 ай бұрын
@@AllAboutAI yes, thanks
@miaohf5 ай бұрын
@@AllAboutAI I am a member of the channel too, how to get gh invite?
@Yossisinterests-hq2qq8 ай бұрын
hi I dont have talk.py, but is there another way of running it im missing?
@Warz-cx6zkАй бұрын
It's his own code, you need to become a member of the channel through subscription and wait for the invite code to github community.
@TheRottweiler_Gemii4 ай бұрын
Anybody done with this and have a code or link can share please
@laalbujhakkar5 ай бұрын
How is a system that goes out to openAI, "local" ????????
@seRko1234 ай бұрын
Open air whisper is locally
@DihelsonMendonca2 ай бұрын
Too complex for the average guy. We need a ready LLM with easy voice options on LM Studio.
@ajayjasperj7 ай бұрын
we can make youtube content with those conversation between bots😂❤
@MetaphoricMinds7 ай бұрын
What GPU are you running?
@AllAboutAI7 ай бұрын
4080 RTX!
@MetaphoricMinds7 ай бұрын
Dude just made a JARVIS embryo.
@JohnGallie7 ай бұрын
you need to get out more man lol. that was toooo much!
@VitorioMiguel8 ай бұрын
Try fast-whisper. Open source and faster
@jcolabzzz8 ай бұрын
Do not make AI lie on your face, man. Thankfully this is local.
@artisalva7 ай бұрын
haha AI conversations could have their own chanels
@Jesulex8215 күн бұрын
ESTARIA BIEN QUE LO PUSIERAS PARA PERSONAS COMO MI HERMANO QUE ES CIEGO....Y ASI PUDIERA ESCUCHAR LO QUE LE CONTESTA LA IA... PERO BUENO TRONCO...QUIEN PIENSA EN PERSONAS COMO MI HERMANO VERDAD?....TE RECOMIENDO QUE CIERRES LOS OJOS DURANTE UNA HORA AL DIA... Y ASI QUIZAS TE HAGAS UNA LIGERA IDEA... Y LUEGO PIENSA... SI ME QUEDARA ASI PARA SIEMPRE.
@KimiMorgam13 күн бұрын
we need a easy tutorial, this is so complicated X_X
@robertgoldbornatyout6 ай бұрын
Could make for some interesting Biden v Trump debates