Open Source AI Audiobook Maker - Installation and Usage

Рет қаралды 5,000

Күн бұрын

Пікірлер

@Jarods_Journey 26 күн бұрын

If you run into any issues or have any ideas, please open up a new issue here: github.com/JarodMica/audiobook_maker/issues Try to make it as descriptive as possible if it's an issue and the same goes with improvements.

@aaagaming2023 26 күн бұрын

You need to add E2-F5-TTS imo.

@iaincampbell4422 25 күн бұрын

Defo I actually watched hoping it was using E2-F5 TTS!

@wongr643 25 күн бұрын

Mate just want to say i have been following you for some time and really appreciate your tutorial on AI Voice cloning/TTS. Probably the best out there for this niche

@Jarods_Journey 24 күн бұрын

Appreciate it :)!

@neonpowar3766 24 күн бұрын

was just following your old tutorial when i checked your channel and saw this. good luck in all your future endeavors. funily enough one of the things i will be using your audio book maker for is to turn the re zero web nobel into audio books

@Jarods_Journey 24 күн бұрын

Ayee I approve the choice :)!

@edalot 11 күн бұрын

Exactly what i wanted to go too 😅

@djwhispers3157 11 күн бұрын

this is a great tool for the project that need, which is a short story. i cannot wait to use it. I just need a tool to help clone and create voice models to read the story in different characters.

@Curtis25 25 күн бұрын

Thank you so much for your work. This project is just amazing. It would be cool to have the option to export it as an M4B file, instead of an mp3, or to have the option to export every chapter as a separate audio file.

@NFawc 26 күн бұрын

This is exciting stuff. I'm more than happy to in effect pay once for the project if it's then onwardly supported/developed. ;)

@tempertephra 14 күн бұрын

Thank you for creating this 🥳🎉 Is it only my impression or is RVC functioning worse than in the cloner? Most of my model don't give the genuine voices they used to. Is there a way to adjust this?

@f4ture 13 күн бұрын

hmmm why would i need to purchase the install package when someone pull requested a open source installer at your github?

@ASlaveToReason 26 күн бұрын

Sweet thanks for the update. I cannot wait till theres some ai agent which can parse different characters in books so we can feed it into tjisn

@strangeboltz 23 күн бұрын

I buy'd you a coffee for the audiobook maker! thank you so much for this.

@daviddurand1656 26 күн бұрын

Hi Jarods, Is there any foreign languages available like French ? Thnx

@Jarods_Journey 24 күн бұрын

Each of the open source engines (tortoise, xtts, styletts, f5tts), whatever languages those support will be supported. This includes custom models that a user may have trained.

@puntogcb 21 күн бұрын

@@Jarods_Journeythat means we can add text in spanish (my interest) and that will do? I live your work, and fir sure i will buy you a coffee! Thank you very much for this!

@Melike-oh1ir 24 күн бұрын

Incredible project and amazing achievements tbh congrats man. My only issue is that no matter what model I choose my voices always end up super dark pitched(like sauron lol) any clues as of why? I've played around with Pitch and pich methods to no avail. Tried over 4 cusotm trained models EDIT: This only happens with RVC enabled. EDIT2: Feel so stupid it was the sample rate I had to change. Cheers!

@Jarods_Journey 24 күн бұрын

Yeah, it's currently a small bug in the rvc library! I'll have to fix it, but SR can be lowered to 0 to resolve it for now

@puntogcb 21 күн бұрын

I will be purchasing the package to practice using it soon. I would like it to have a language selection option, not only for the entire audiobook, but for some sentences as well. I am interested in Latin Spanish, and with variations of accents, for example Argentine. Would you add this functionality to this project? Thank you very much again for all this incredible work.

@skistenl6566 4 күн бұрын

Could you please explain how RVC settings and Tortoise Settings are different? I put in my RVC model in the settings, check Use s2s Engine. But the result is still the random voice from Tortoise

@richardkuhne5054 26 күн бұрын

I‘m looking for some kind of an immersive reader with a decent text to speech system but also highlights the words so the text is there as a support if you want to follow. Any suggestions for this?

@daywizzle 26 күн бұрын

Probably speechify tbh?

@donmarshal2070 26 күн бұрын

Balabolka checks all of your needs. If you set it up correctly then you are good to go for 100 years.(Speaking from experience)😂 (Just get decent Natural voice or see how to use Natural Edge voice in balabolka)

@mikeutoob 26 күн бұрын

Read Aloud using Edge web browser

@ming3706 26 күн бұрын

So can i download this audio after it is done and upload to my phone to listen?

@Jarods_Journey 24 күн бұрын

Yup! It's all yours so do with it what you will

@mauricio9581 15 күн бұрын

Let's say I am not happy about how the Narrator is saying the sentence with its emotion. Can i use my own voice in combination with the Narrators voice to improve emotional way it says a sentence? How can I implement it? Amazing Tool btw! Love your content

@davidmilligan4751 9 күн бұрын

getting alot of errors trying to install the rvc files i bought the packaged files and it seems i dont have something right. please help

@Jarods_Journey 8 күн бұрын

Hey David, please open up a new issue on the GitHub issues tab and share the error that your getting in the terminal so that we might be able to figure out what's going on: github.com/JarodMica/audiobook_maker/issues

@yanglangfu773 26 күн бұрын

Is that Kenjiro Tsuda speaking English? So cool, like, 99% clone 🤯

@Jarods_Journey 24 күн бұрын

yup!

@mash9653 24 күн бұрын

when i restart this project it's show " Configuration file/tts_config.json not found" But

@Metarig 26 күн бұрын

Hey, why not use ChatGPT's advanced voice and then switch the voice later with ElevenLabs?

@mucool328 26 күн бұрын

Expensive?

@Metarig 26 күн бұрын

@@mucool328 like $20?

@Metarig 26 күн бұрын

@@mucool328 it's only $20.

@Metarig 26 күн бұрын

@@mucool328 To create an audiobook, I'd spend about $50 to get better quality.

@Metarig 26 күн бұрын

@@mucool328 Are you serious?

@wakasm 25 күн бұрын

I have a use case where I have a database of like 1000 different lines or paragraphs. Is there a way to just jump to a specific line and play that (even maybe through some sort of API or through the interface?) - And specifically map the entries to specific labels? (not nessisarily 1-1000, but maybe some numbers skipped or even stuff labeled like A1, A2, etc) - Think choose your own adventure, that's kind of close to the use case I would try this for.

@Jarods_Journey 24 күн бұрын

I'm not quit sure I understand the use case here, but there's only a scroll bar in the table right now that you can use to go up and down. Custom labeling other than speakers is not supported atm

@phenix5609 26 күн бұрын

That really impressive, i couldn’t watch the full video yet, maybe you talk about it inside, but did you had time to include the e2,f5 new tts voice cloning app you show in one of your video ? Because there "podcast" option need to have the text formatted with the name of each speaker at the first word of there sentence, like: speaker1: …., speaker2:…, and then you give them an audio sample of 10 sec for each voice , and they do like you show at the start of the video. Really impressive. But i only try with English as it say, it work with English and chinese, and i didn’t try yet to see a result for Japanese or french, for me, not sure it would work great, and don’t know how to train a voice with their tech.

@Jarods_Journey 24 күн бұрын

F5 will be included in the audiobook maker, other people seem hard at work to adding more languages for it though rn

@stevewarby12 26 күн бұрын

Great will buy later. On the text files it would make sense to allocate voices in there. Eg if generating from AI ask it to use format V1: audio text1 V2: audio text 2 V1: audio text3 Then these would auto map to the selected voices index. Eg if the first voice is me all lines with V1: will use this voice. This would save a lot of time manually selecting each voice per line.

@stevewarby12 26 күн бұрын

Even if the story has been written openai could re format the text.

@Jarods_Journey 24 күн бұрын

Yeah, I'm thinking about how to incorporate it. I could support a custom speaker import option, but I have to think on how I want to make this option available in the audiobook maker

@stevewarby12 24 күн бұрын

@@Jarods_Journey On the voice selection per line. Loads of different colours looks very confusing. Have a separate column simply with the speaker name and or meme of the speaker. PS For anyone else... Pay the $14.99 it just worked. No spending hours setting up environments and pip installing for ages.....

@danieldorszu1317 26 күн бұрын

Hey Is it possible for a program to automatically select different voices for txt an e-book? Than just dialing manually ?

@webinatic216 25 күн бұрын

Imagine writing a book then.

@Jarods_Journey 24 күн бұрын

Yes, a proof of concept has been proven with chatgpt in it's ability to label sentences. But I need a specific format, so working through some ideas on that

@holdthetruthhostage 26 күн бұрын

My question is what's the limit per word, because with Eleven Labs it starts breaking down pass 800 words

@Jarods_Journey 24 күн бұрын

Give or take 15-20 seconds max for a tortoise tts example, 20-30 seconds with styletts, and up to 30 seconds with f5tts. Not too sure on the breakdown when it comes to words though.

@MrEffectfilms 26 күн бұрын

I have an Nvidia GPU but how important is the 8gb of vram? I have a GTX 1660 super which has 6. Will this just not work?

@Jarods_Journey 24 күн бұрын

That should be able to work, you might top out though if using tortoise TTS. If you're familiar with styletts2, when I release the engine for that, it should be able to inference on that without issues.

@martinbobis6764 11 күн бұрын

where do i see the PC req for running this?

@martinbobis6764 11 күн бұрын

nvm, i'll give it a try with an old gtx 1070 8gb i dont need to generate that much anyways

@KnutNukem 26 күн бұрын

Great project!

@zanshibumi 24 күн бұрын

Why is each line read in a different voice? There seem to be 2-3 voices and each line is read by one of them.

@Jarods_Journey 24 күн бұрын

In the video? Well, I selected them. If you're running with random, it will change voices as well.

@zanshibumi 24 күн бұрын

@@Jarods_Journey I don't understand how to not run it random. With one narrator and nothing added, the tortoise panel doesn't allow other option than random. How can I set it up so every sentence is read in a single voice?

@DarinLawsonHosking 24 күн бұрын

Any chance this runs on AMD RX 6800 XT?

@Jarods_Journey 24 күн бұрын

Unfortunately not, AMD support is limited on most of these engines and as well, I don't have AMD to test on either. Sorry!

@davepierunc 25 күн бұрын

So a 4gb Nvidia won't cut it, right?

@Jarods_Journey 24 күн бұрын

Possibly, I don't think you'll be able to do it too well with tortoiseTTS, but when I finish the styleTTS2 engine, 4gb would be fine

@donmarshal2070 26 күн бұрын

Question:- I only have 6gb Vram but have 64gb ddr5 Physical RAM. So will it work on my system or it just works on Vram?🧐 1. Its laptop not pc so no gpu upgrade 😭 2. I 40B parameters LLM model on it without any hitch ups, (70B with 40 wpm). And it works as when Vram fills out it utilizes Physical RAM. So will this work like same ie. Will Use Physical RAM after Vram is completely utilized?

@Jarods_Journey 24 күн бұрын

6gb of vram should work, I think you'll just be topping out a bit with tortoise tts. I think though all of the engines I'm planning on adding inference with at most 4gb of vram needed. It will overflow to ram though if it gets completely utilized afaik.

@donmarshal2070 24 күн бұрын

@@Jarods_Journey thanks man! I'm currently doing LLM training on laptop & all TTS I've been training are Given input only in "IPA" not "Text typing", so I'm getting better results in form of pronunciation but as far as TRAINING voice using audio clip is not working due to Vram limitation. So I've updated physical ram to compensate for it. So hope it works

@SavvyStaks 24 күн бұрын

I think if you run various softwares on a cloud machine like Google Collab, Lightning AI, Kaggle, etc., then everyone will have the opportunity to use the software, because not everyone has a PC with high configuration.

@Jarods_Journey 24 күн бұрын

It's possible to outsource the generation to cloud compute, but unfortunately, I don't wanna play around with making an application compatible with cloud machines as I'd have to maintain it and I personally don't use much cloud myself. I'm a big fan of having things locally and as open source gets better, models also get more efficient.

@vidneypopples 11 күн бұрын

I've accidentally paid for this on buy me a coffee page but I'm a pay monthly user. Can I be refunded the $14.99 please?

@tripleheadedmonkey6613 25 күн бұрын

I thought I recognized that first voice. So much more familiar speaking in japanese lol.

@Jarods_Journey 24 күн бұрын

If you've watched any anime in the past 5 years, you'll have encountered him lol

@tripleheadedmonkey6613 23 күн бұрын

@@Jarods_Journey Yeah, ever since he showed up in "Demon Lord Retry" I've been seeing him in literally every anime.

@SaschaFrenzer 26 күн бұрын

Great! I wish I could use it in German. Future update for multilanguage maybe?

@Jarods_Journey 24 күн бұрын

Possibly! XTTS would support that I believe, but that one is that last engine I'll be adding in

@SaschaFrenzer 22 күн бұрын

@@Jarods_Journey That would be brilliant. I write short stories for my nieces (3 and 6 years old) and have already recorded several of them. Unfortunately, I am increasingly lacking the time for this, which is why I have been watching your videos about the audio book maker for a long time.

@iaincampbell4422 25 күн бұрын

Thinking of purchasing two quick questions, are you planning to implement E2 F5 TTS at some point it's way more expressive! Also will it work on Apple Silicon? (Im on an M1 chip!). Thanks for a great project!

@Jarods_Journey 24 күн бұрын

E2/F5 will be implemented soon, currently finishing up styletts then I'll work on that. Unfortunately, no Mac support atm! It may work if you hack around, but I don't have a mac and haven't tested that use case.

@abhaygholap3613 25 күн бұрын

Who will support which languages?

@Jarods_Journey 24 күн бұрын

If you're familiar with these open source engines, it supports whichever language your chosen engine will support. The parser is designed for english right now though, so best compatibility with english.

@rajumolla7059 25 күн бұрын

Thank you INFLUENCER PANEL for your tremendous support of my AudioBooks Channel are on All Social Networks face, KZbin.

@Point.Aveugle 21 сағат бұрын

I've been using this with the new F5TTS engine and captures the person so much better for some people. You basically have to enable "use duration prediction model?" to get speaker to actually nail the sentence at normal pace, but if the sentence is too long it starts skipping words.... Didn't ever experience this with the gradio demo they released. Also thought we would escape from the issue of long wav files having to be converted every new sentence. Luckily I was storing in that same folder all the voices split into 30 seconds segments, so I just needed to rename the folder to F5TTS. One question I do have is how to re-enable deepspeed for tortoise? Is it as simple as uninstalling 2.4 and installing pytorch 2.3? Is it even worth it?