Kokoro Local TTS + Custom Voices

Рет қаралды 4,099

Sam Witteveen

Күн бұрын

Пікірлер: 32

@andherium 7 сағат бұрын

hmm Tiny TTs is definitely an interesting name

@dinoscheidt 3 сағат бұрын

Took a bit… 🧿🧿

@mageshyt2550 6 сағат бұрын

love to see video on conversation with local agents

@khangvutien2538 Сағат бұрын

Thanks. You have given me another reason to buy a Mac mini M4 😉

@MeinDeutschkurs 6 сағат бұрын

Sky is back! Wooohooo!!! ❤❤❤❤

@djstraylight 6 сағат бұрын

Were there any instuctions on how to train voicepacks?

@samwitteveenai 5 сағат бұрын

No I don’t think they have made any

@MojaveHigh 4 сағат бұрын

Very helpful, thanks! Any chance you could take a look at RealtimeSTT? And maybe put that and Koroko into a single local conversational AI agent?

@pin65371 6 сағат бұрын

This would be good for people that want to run something like Alexa locally at home. I know some people have been putting together systems for home assistant. While maybe the OpenAI integration might sound slightly better I'd consider this more than good enough to replace that and not have to send your data to OpenAI.

@samwitteveenai 6 сағат бұрын

Yeah that is how I feel too. It’s not the best but it is damn good .

@lovol2 6 сағат бұрын

Thanks for making this video.

@altmediamedia9654 2 сағат бұрын

Sam, I can't access the shortened URL links. I can't name this website shortener in my comment but you know which one you are using. it either timesout or is unreachable. Anyone else bothered with this issue?

@MeinDeutschkurs 6 сағат бұрын

What I‘d use it for? Voice Chat, based on aya-expanse.

@helloworld7796 6 сағат бұрын

Is it possible to train own model for some language other than US from scratch?

@samwitteveenai 5 сағат бұрын

Yes or you could fine tune this to another language, but you would need some training code as well which currently isn’t in the repo

@MeinDeutschkurs 6 сағат бұрын

Is it possible to fade from one voice to another voice? Could help to find great voices. (With values in terminal)

@samwitteveenai 6 сағат бұрын

Good question unfortunately it’s not really possible to fade between them because you need to put the full embedding in at the generation time and you can only put one in.

@MeinDeutschkurs 6 сағат бұрын

@@samwitteveenai , ok, so I should iterate word by word from 0.0 to 1.0 for both of the values. 😆 Why not? At least the same sentence multiple times to compare it.

@figs3284 3 сағат бұрын

Transformers js version coming soon from Xenova 👀

@moundercesar3102 6 сағат бұрын

Very interesting, can we use it as a pdf reader where it reads in real time and not after processing the whole text ?

@samwitteveenai 6 сағат бұрын

You would probably process a sentence or a line at a time(maybe even a paragraph to help it with prosody), but should be possible

@VanillaGun 5 сағат бұрын

Is there a defined context length it can parse and process at a time? I want to test it out for large text sources.

@finbenton 3 сағат бұрын

Idk but I just generated 25min long audio file but it took 5-10mins to generate.

@Quantum_Nebula 7 сағат бұрын

Interesting -- definitely is fast for the quality

@SyamsQbattar 5 сағат бұрын

Do you know how to add a new language, like Indonesian?

@samwitteveenai 5 сағат бұрын

To get a good result you would probably need to mix some real Bahasa audio into the train mix. Or fine tune it later. Might be able to do something with with a phoneme dictionary but really need some example audio

@SyamsQbattar 5 сағат бұрын

@@samwitteveenai Is there a step-by-step tutorial on this?

@miklosprisznyak9102 5 сағат бұрын

Yes, adding a new language is what I would be also interested in... Please enlighten us if you have any clue. 😊

@Notifest 2 сағат бұрын

I would appreciate a fine tuning tutorial for a custom voice in any language