Kokoro Local TTS + Custom Voices

  Рет қаралды 4,099

Sam Witteveen

Sam Witteveen

Күн бұрын

Пікірлер: 32
@andherium
@andherium 7 сағат бұрын
hmm Tiny TTs is definitely an interesting name
@dinoscheidt
@dinoscheidt 3 сағат бұрын
Took a bit… 🧿🧿
@mageshyt2550
@mageshyt2550 6 сағат бұрын
love to see video on conversation with local agents
@khangvutien2538
@khangvutien2538 Сағат бұрын
Thanks. You have given me another reason to buy a Mac mini M4 😉
@MeinDeutschkurs
@MeinDeutschkurs 6 сағат бұрын
Sky is back! Wooohooo!!! ❤❤❤❤
@djstraylight
@djstraylight 6 сағат бұрын
Were there any instuctions on how to train voicepacks?
@samwitteveenai
@samwitteveenai 5 сағат бұрын
No I don’t think they have made any
@MojaveHigh
@MojaveHigh 4 сағат бұрын
Very helpful, thanks! Any chance you could take a look at RealtimeSTT? And maybe put that and Koroko into a single local conversational AI agent?
@pin65371
@pin65371 6 сағат бұрын
This would be good for people that want to run something like Alexa locally at home. I know some people have been putting together systems for home assistant. While maybe the OpenAI integration might sound slightly better I'd consider this more than good enough to replace that and not have to send your data to OpenAI.
@samwitteveenai
@samwitteveenai 6 сағат бұрын
Yeah that is how I feel too. It’s not the best but it is damn good .
@lovol2
@lovol2 6 сағат бұрын
Thanks for making this video.
@altmediamedia9654
@altmediamedia9654 2 сағат бұрын
Sam, I can't access the shortened URL links. I can't name this website shortener in my comment but you know which one you are using. it either timesout or is unreachable. Anyone else bothered with this issue?
@MeinDeutschkurs
@MeinDeutschkurs 6 сағат бұрын
What I‘d use it for? Voice Chat, based on aya-expanse.
@helloworld7796
@helloworld7796 6 сағат бұрын
Is it possible to train own model for some language other than US from scratch?
@samwitteveenai
@samwitteveenai 5 сағат бұрын
Yes or you could fine tune this to another language, but you would need some training code as well which currently isn’t in the repo
@MeinDeutschkurs
@MeinDeutschkurs 6 сағат бұрын
Is it possible to fade from one voice to another voice? Could help to find great voices. (With values in terminal)
@samwitteveenai
@samwitteveenai 6 сағат бұрын
Good question unfortunately it’s not really possible to fade between them because you need to put the full embedding in at the generation time and you can only put one in.
@MeinDeutschkurs
@MeinDeutschkurs 6 сағат бұрын
@@samwitteveenai , ok, so I should iterate word by word from 0.0 to 1.0 for both of the values. 😆 Why not? At least the same sentence multiple times to compare it.
@figs3284
@figs3284 3 сағат бұрын
Transformers js version coming soon from Xenova 👀
@moundercesar3102
@moundercesar3102 6 сағат бұрын
Very interesting, can we use it as a pdf reader where it reads in real time and not after processing the whole text ?
@samwitteveenai
@samwitteveenai 6 сағат бұрын
You would probably process a sentence or a line at a time(maybe even a paragraph to help it with prosody), but should be possible
@VanillaGun
@VanillaGun 5 сағат бұрын
Is there a defined context length it can parse and process at a time? I want to test it out for large text sources.
@finbenton
@finbenton 3 сағат бұрын
Idk but I just generated 25min long audio file but it took 5-10mins to generate.
@Quantum_Nebula
@Quantum_Nebula 7 сағат бұрын
Interesting -- definitely is fast for the quality
@SyamsQbattar
@SyamsQbattar 5 сағат бұрын
Do you know how to add a new language, like Indonesian?
@samwitteveenai
@samwitteveenai 5 сағат бұрын
To get a good result you would probably need to mix some real Bahasa audio into the train mix. Or fine tune it later. Might be able to do something with with a phoneme dictionary but really need some example audio
@SyamsQbattar
@SyamsQbattar 5 сағат бұрын
@@samwitteveenai Is there a step-by-step tutorial on this?
@miklosprisznyak9102
@miklosprisznyak9102 5 сағат бұрын
Yes, adding a new language is what I would be also interested in... Please enlighten us if you have any clue. 😊
@Notifest
@Notifest 2 сағат бұрын
I would appreciate a fine tuning tutorial for a custom voice in any language
@concretec0w
@concretec0w 5 сағат бұрын
Is it better than piper-ttts? piper is sooooo fast and decent
How to OPTIMIZE your prompts for better Reasoning!
21:17
Sam Witteveen
Рет қаралды 12 М.
smolagents - HuggingFace's NEW Agent Framework
29:10
Sam Witteveen
Рет қаралды 29 М.
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
How Strong Is Tape?
00:24
Stokes Twins
Рет қаралды 96 МЛН
人是不能做到吗?#火影忍者 #家人  #佐助
00:20
火影忍者一家
Рет қаралды 20 МЛН
Open Reasoning vs OpenAI
26:59
Sam Witteveen
Рет қаралды 32 М.
4 Million Context Unlocked: China's NEW AI Breakthrough!!
6:29
Luma Labs Goes BEAST Mode with a New Video Model!
8:24
Theoretically Media
Рет қаралды 4,4 М.
Qwen Just Casually Started the Local AI Revolution
16:05
Cole Medin
Рет қаралды 125 М.
I replaced a $20,000 server with this
18:51
Linus Tech Tips
Рет қаралды 338 М.
Gemini 2.0 Flash Thinking
20:13
Sam Witteveen
Рет қаралды 29 М.
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН