If you run into any issues or have any ideas, please open up a new issue here: github.com/JarodMica/audiobook_maker/issues Try to make it as descriptive as possible if it's an issue and the same goes with improvements.
@aaagaming202326 күн бұрын
You need to add E2-F5-TTS imo.
@iaincampbell442225 күн бұрын
Defo I actually watched hoping it was using E2-F5 TTS!
@wongr64325 күн бұрын
Mate just want to say i have been following you for some time and really appreciate your tutorial on AI Voice cloning/TTS. Probably the best out there for this niche
@Jarods_Journey24 күн бұрын
Appreciate it :)!
@neonpowar376624 күн бұрын
was just following your old tutorial when i checked your channel and saw this. good luck in all your future endeavors. funily enough one of the things i will be using your audio book maker for is to turn the re zero web nobel into audio books
@Jarods_Journey24 күн бұрын
Ayee I approve the choice :)!
@edalot11 күн бұрын
Exactly what i wanted to go too 😅
@djwhispers315711 күн бұрын
this is a great tool for the project that need, which is a short story. i cannot wait to use it. I just need a tool to help clone and create voice models to read the story in different characters.
@Curtis2525 күн бұрын
Thank you so much for your work. This project is just amazing. It would be cool to have the option to export it as an M4B file, instead of an mp3, or to have the option to export every chapter as a separate audio file.
@NFawc26 күн бұрын
This is exciting stuff. I'm more than happy to in effect pay once for the project if it's then onwardly supported/developed. ;)
@tempertephra14 күн бұрын
Thank you for creating this 🥳🎉 Is it only my impression or is RVC functioning worse than in the cloner? Most of my model don't give the genuine voices they used to. Is there a way to adjust this?
@f4ture13 күн бұрын
hmmm why would i need to purchase the install package when someone pull requested a open source installer at your github?
@ASlaveToReason26 күн бұрын
Sweet thanks for the update. I cannot wait till theres some ai agent which can parse different characters in books so we can feed it into tjisn
@strangeboltz23 күн бұрын
I buy'd you a coffee for the audiobook maker! thank you so much for this.
@daviddurand165626 күн бұрын
Hi Jarods, Is there any foreign languages available like French ? Thnx
@Jarods_Journey24 күн бұрын
Each of the open source engines (tortoise, xtts, styletts, f5tts), whatever languages those support will be supported. This includes custom models that a user may have trained.
@puntogcb21 күн бұрын
@@Jarods_Journeythat means we can add text in spanish (my interest) and that will do? I live your work, and fir sure i will buy you a coffee! Thank you very much for this!
@Melike-oh1ir24 күн бұрын
Incredible project and amazing achievements tbh congrats man. My only issue is that no matter what model I choose my voices always end up super dark pitched(like sauron lol) any clues as of why? I've played around with Pitch and pich methods to no avail. Tried over 4 cusotm trained models EDIT: This only happens with RVC enabled. EDIT2: Feel so stupid it was the sample rate I had to change. Cheers!
@Jarods_Journey24 күн бұрын
Yeah, it's currently a small bug in the rvc library! I'll have to fix it, but SR can be lowered to 0 to resolve it for now
@puntogcb21 күн бұрын
I will be purchasing the package to practice using it soon. I would like it to have a language selection option, not only for the entire audiobook, but for some sentences as well. I am interested in Latin Spanish, and with variations of accents, for example Argentine. Would you add this functionality to this project? Thank you very much again for all this incredible work.
@skistenl65664 күн бұрын
Could you please explain how RVC settings and Tortoise Settings are different? I put in my RVC model in the settings, check Use s2s Engine. But the result is still the random voice from Tortoise
@richardkuhne505426 күн бұрын
I‘m looking for some kind of an immersive reader with a decent text to speech system but also highlights the words so the text is there as a support if you want to follow. Any suggestions for this?
@daywizzle26 күн бұрын
Probably speechify tbh?
@donmarshal207026 күн бұрын
Balabolka checks all of your needs. If you set it up correctly then you are good to go for 100 years.(Speaking from experience)😂 (Just get decent Natural voice or see how to use Natural Edge voice in balabolka)
@mikeutoob26 күн бұрын
Read Aloud using Edge web browser
@ming370626 күн бұрын
So can i download this audio after it is done and upload to my phone to listen?
@Jarods_Journey24 күн бұрын
Yup! It's all yours so do with it what you will
@mauricio958115 күн бұрын
Let's say I am not happy about how the Narrator is saying the sentence with its emotion. Can i use my own voice in combination with the Narrators voice to improve emotional way it says a sentence? How can I implement it? Amazing Tool btw! Love your content
@davidmilligan47519 күн бұрын
getting alot of errors trying to install the rvc files i bought the packaged files and it seems i dont have something right. please help
@Jarods_Journey8 күн бұрын
Hey David, please open up a new issue on the GitHub issues tab and share the error that your getting in the terminal so that we might be able to figure out what's going on: github.com/JarodMica/audiobook_maker/issues
@yanglangfu77326 күн бұрын
Is that Kenjiro Tsuda speaking English? So cool, like, 99% clone 🤯
@Jarods_Journey24 күн бұрын
yup!
@mash965324 күн бұрын
when i restart this project it's show " Configuration file/tts_config.json not found" But
@Metarig26 күн бұрын
Hey, why not use ChatGPT's advanced voice and then switch the voice later with ElevenLabs?
@mucool32826 күн бұрын
Expensive?
@Metarig26 күн бұрын
@@mucool328 like $20?
@Metarig26 күн бұрын
@@mucool328 it's only $20.
@Metarig26 күн бұрын
@@mucool328 To create an audiobook, I'd spend about $50 to get better quality.
@Metarig26 күн бұрын
@@mucool328 Are you serious?
@wakasm25 күн бұрын
I have a use case where I have a database of like 1000 different lines or paragraphs. Is there a way to just jump to a specific line and play that (even maybe through some sort of API or through the interface?) - And specifically map the entries to specific labels? (not nessisarily 1-1000, but maybe some numbers skipped or even stuff labeled like A1, A2, etc) - Think choose your own adventure, that's kind of close to the use case I would try this for.
@Jarods_Journey24 күн бұрын
I'm not quit sure I understand the use case here, but there's only a scroll bar in the table right now that you can use to go up and down. Custom labeling other than speakers is not supported atm
@phenix560926 күн бұрын
That really impressive, i couldn’t watch the full video yet, maybe you talk about it inside, but did you had time to include the e2,f5 new tts voice cloning app you show in one of your video ? Because there "podcast" option need to have the text formatted with the name of each speaker at the first word of there sentence, like: speaker1: …., speaker2:…, and then you give them an audio sample of 10 sec for each voice , and they do like you show at the start of the video. Really impressive. But i only try with English as it say, it work with English and chinese, and i didn’t try yet to see a result for Japanese or french, for me, not sure it would work great, and don’t know how to train a voice with their tech.
@Jarods_Journey24 күн бұрын
F5 will be included in the audiobook maker, other people seem hard at work to adding more languages for it though rn
@stevewarby1226 күн бұрын
Great will buy later. On the text files it would make sense to allocate voices in there. Eg if generating from AI ask it to use format V1: audio text1 V2: audio text 2 V1: audio text3 Then these would auto map to the selected voices index. Eg if the first voice is me all lines with V1: will use this voice. This would save a lot of time manually selecting each voice per line.
@stevewarby1226 күн бұрын
Even if the story has been written openai could re format the text.
@Jarods_Journey24 күн бұрын
Yeah, I'm thinking about how to incorporate it. I could support a custom speaker import option, but I have to think on how I want to make this option available in the audiobook maker
@stevewarby1224 күн бұрын
@@Jarods_Journey On the voice selection per line. Loads of different colours looks very confusing. Have a separate column simply with the speaker name and or meme of the speaker. PS For anyone else... Pay the $14.99 it just worked. No spending hours setting up environments and pip installing for ages.....
@danieldorszu131726 күн бұрын
Hey Is it possible for a program to automatically select different voices for txt an e-book? Than just dialing manually ?
@webinatic21625 күн бұрын
Imagine writing a book then.
@Jarods_Journey24 күн бұрын
Yes, a proof of concept has been proven with chatgpt in it's ability to label sentences. But I need a specific format, so working through some ideas on that
@holdthetruthhostage26 күн бұрын
My question is what's the limit per word, because with Eleven Labs it starts breaking down pass 800 words
@Jarods_Journey24 күн бұрын
Give or take 15-20 seconds max for a tortoise tts example, 20-30 seconds with styletts, and up to 30 seconds with f5tts. Not too sure on the breakdown when it comes to words though.
@MrEffectfilms26 күн бұрын
I have an Nvidia GPU but how important is the 8gb of vram? I have a GTX 1660 super which has 6. Will this just not work?
@Jarods_Journey24 күн бұрын
That should be able to work, you might top out though if using tortoise TTS. If you're familiar with styletts2, when I release the engine for that, it should be able to inference on that without issues.
@martinbobis676411 күн бұрын
where do i see the PC req for running this?
@martinbobis676411 күн бұрын
nvm, i'll give it a try with an old gtx 1070 8gb i dont need to generate that much anyways
@KnutNukem26 күн бұрын
Great project!
@zanshibumi24 күн бұрын
Why is each line read in a different voice? There seem to be 2-3 voices and each line is read by one of them.
@Jarods_Journey24 күн бұрын
In the video? Well, I selected them. If you're running with random, it will change voices as well.
@zanshibumi24 күн бұрын
@@Jarods_Journey I don't understand how to not run it random. With one narrator and nothing added, the tortoise panel doesn't allow other option than random. How can I set it up so every sentence is read in a single voice?
@DarinLawsonHosking24 күн бұрын
Any chance this runs on AMD RX 6800 XT?
@Jarods_Journey24 күн бұрын
Unfortunately not, AMD support is limited on most of these engines and as well, I don't have AMD to test on either. Sorry!
@davepierunc25 күн бұрын
So a 4gb Nvidia won't cut it, right?
@Jarods_Journey24 күн бұрын
Possibly, I don't think you'll be able to do it too well with tortoiseTTS, but when I finish the styleTTS2 engine, 4gb would be fine
@donmarshal207026 күн бұрын
Question:- I only have 6gb Vram but have 64gb ddr5 Physical RAM. So will it work on my system or it just works on Vram?🧐 1. Its laptop not pc so no gpu upgrade 😭 2. I 40B parameters LLM model on it without any hitch ups, (70B with 40 wpm). And it works as when Vram fills out it utilizes Physical RAM. So will this work like same ie. Will Use Physical RAM after Vram is completely utilized?
@Jarods_Journey24 күн бұрын
6gb of vram should work, I think you'll just be topping out a bit with tortoise tts. I think though all of the engines I'm planning on adding inference with at most 4gb of vram needed. It will overflow to ram though if it gets completely utilized afaik.
@donmarshal207024 күн бұрын
@@Jarods_Journey thanks man! I'm currently doing LLM training on laptop & all TTS I've been training are Given input only in "IPA" not "Text typing", so I'm getting better results in form of pronunciation but as far as TRAINING voice using audio clip is not working due to Vram limitation. So I've updated physical ram to compensate for it. So hope it works
@SavvyStaks24 күн бұрын
I think if you run various softwares on a cloud machine like Google Collab, Lightning AI, Kaggle, etc., then everyone will have the opportunity to use the software, because not everyone has a PC with high configuration.
@Jarods_Journey24 күн бұрын
It's possible to outsource the generation to cloud compute, but unfortunately, I don't wanna play around with making an application compatible with cloud machines as I'd have to maintain it and I personally don't use much cloud myself. I'm a big fan of having things locally and as open source gets better, models also get more efficient.
@vidneypopples11 күн бұрын
I've accidentally paid for this on buy me a coffee page but I'm a pay monthly user. Can I be refunded the $14.99 please?
@tripleheadedmonkey661325 күн бұрын
I thought I recognized that first voice. So much more familiar speaking in japanese lol.
@Jarods_Journey24 күн бұрын
If you've watched any anime in the past 5 years, you'll have encountered him lol
@tripleheadedmonkey661323 күн бұрын
@@Jarods_Journey Yeah, ever since he showed up in "Demon Lord Retry" I've been seeing him in literally every anime.
@SaschaFrenzer26 күн бұрын
Great! I wish I could use it in German. Future update for multilanguage maybe?
@Jarods_Journey24 күн бұрын
Possibly! XTTS would support that I believe, but that one is that last engine I'll be adding in
@SaschaFrenzer22 күн бұрын
@@Jarods_Journey That would be brilliant. I write short stories for my nieces (3 and 6 years old) and have already recorded several of them. Unfortunately, I am increasingly lacking the time for this, which is why I have been watching your videos about the audio book maker for a long time.
@iaincampbell442225 күн бұрын
Thinking of purchasing two quick questions, are you planning to implement E2 F5 TTS at some point it's way more expressive! Also will it work on Apple Silicon? (Im on an M1 chip!). Thanks for a great project!
@Jarods_Journey24 күн бұрын
E2/F5 will be implemented soon, currently finishing up styletts then I'll work on that. Unfortunately, no Mac support atm! It may work if you hack around, but I don't have a mac and haven't tested that use case.
@abhaygholap361325 күн бұрын
Who will support which languages?
@Jarods_Journey24 күн бұрын
If you're familiar with these open source engines, it supports whichever language your chosen engine will support. The parser is designed for english right now though, so best compatibility with english.
@rajumolla705925 күн бұрын
Thank you INFLUENCER PANEL for your tremendous support of my AudioBooks Channel are on All Social Networks face, KZbin.
@Point.Aveugle21 сағат бұрын
I've been using this with the new F5TTS engine and captures the person so much better for some people. You basically have to enable "use duration prediction model?" to get speaker to actually nail the sentence at normal pace, but if the sentence is too long it starts skipping words.... Didn't ever experience this with the gradio demo they released. Also thought we would escape from the issue of long wav files having to be converted every new sentence. Luckily I was storing in that same folder all the voices split into 30 seconds segments, so I just needed to rename the folder to F5TTS. One question I do have is how to re-enable deepspeed for tortoise? Is it as simple as uninstalling 2.4 and installing pytorch 2.3? Is it even worth it?
@TejasSurvase-vf1pw25 күн бұрын
Thank you INFLUENCER PANEL for your tremendous support of my AudioBooks Channel are on All Social Networks face, KZbin.