love to see video on conversation with local agents
@khangvutien2538Сағат бұрын
Thanks. You have given me another reason to buy a Mac mini M4 😉
@MeinDeutschkurs6 сағат бұрын
Sky is back! Wooohooo!!! ❤❤❤❤
@djstraylight6 сағат бұрын
Were there any instuctions on how to train voicepacks?
@samwitteveenai5 сағат бұрын
No I don’t think they have made any
@MojaveHigh4 сағат бұрын
Very helpful, thanks! Any chance you could take a look at RealtimeSTT? And maybe put that and Koroko into a single local conversational AI agent?
@pin653716 сағат бұрын
This would be good for people that want to run something like Alexa locally at home. I know some people have been putting together systems for home assistant. While maybe the OpenAI integration might sound slightly better I'd consider this more than good enough to replace that and not have to send your data to OpenAI.
@samwitteveenai6 сағат бұрын
Yeah that is how I feel too. It’s not the best but it is damn good .
@lovol26 сағат бұрын
Thanks for making this video.
@altmediamedia96542 сағат бұрын
Sam, I can't access the shortened URL links. I can't name this website shortener in my comment but you know which one you are using. it either timesout or is unreachable. Anyone else bothered with this issue?
@MeinDeutschkurs6 сағат бұрын
What I‘d use it for? Voice Chat, based on aya-expanse.
@helloworld77966 сағат бұрын
Is it possible to train own model for some language other than US from scratch?
@samwitteveenai5 сағат бұрын
Yes or you could fine tune this to another language, but you would need some training code as well which currently isn’t in the repo
@MeinDeutschkurs6 сағат бұрын
Is it possible to fade from one voice to another voice? Could help to find great voices. (With values in terminal)
@samwitteveenai6 сағат бұрын
Good question unfortunately it’s not really possible to fade between them because you need to put the full embedding in at the generation time and you can only put one in.
@MeinDeutschkurs6 сағат бұрын
@@samwitteveenai , ok, so I should iterate word by word from 0.0 to 1.0 for both of the values. 😆 Why not? At least the same sentence multiple times to compare it.
@figs32843 сағат бұрын
Transformers js version coming soon from Xenova 👀
@moundercesar31026 сағат бұрын
Very interesting, can we use it as a pdf reader where it reads in real time and not after processing the whole text ?
@samwitteveenai6 сағат бұрын
You would probably process a sentence or a line at a time(maybe even a paragraph to help it with prosody), but should be possible
@VanillaGun5 сағат бұрын
Is there a defined context length it can parse and process at a time? I want to test it out for large text sources.
@finbenton3 сағат бұрын
Idk but I just generated 25min long audio file but it took 5-10mins to generate.
@Quantum_Nebula7 сағат бұрын
Interesting -- definitely is fast for the quality
@SyamsQbattar5 сағат бұрын
Do you know how to add a new language, like Indonesian?
@samwitteveenai5 сағат бұрын
To get a good result you would probably need to mix some real Bahasa audio into the train mix. Or fine tune it later. Might be able to do something with with a phoneme dictionary but really need some example audio
@SyamsQbattar5 сағат бұрын
@@samwitteveenai Is there a step-by-step tutorial on this?
@miklosprisznyak91025 сағат бұрын
Yes, adding a new language is what I would be also interested in... Please enlighten us if you have any clue. 😊
@Notifest2 сағат бұрын
I would appreciate a fine tuning tutorial for a custom voice in any language
@concretec0w5 сағат бұрын
Is it better than piper-ttts? piper is sooooo fast and decent