Speak Any Language With AI - Realtime Speech-to-Speech Translation & Voice Synthesis (w/Code)

Рет қаралды 4,198

Күн бұрын

Пікірлер: 27

@user-nu1lb2qu2x 8 күн бұрын

God, the day we have this in real time with low latency for livestreams will be amazing. I understand English perfectly well but I don't feel confident streaming in another language lol.

@MisionJapon 3 ай бұрын

Thank you for the video. I am currently living in Osaka, Japan and I am very interested in Instant Translation with AI models. However, what I understand by "Instant Translation" is not: "I say a sentence - The model translates it after a few seconds and I can hear it - I say another sentence - The model translates it after a few seconds and I can hear it..." What I understand by Instant Translation is: "You are talking in Japanese and, while you are talking in Japanese (with a delay of a few senconds), I listen your speech in Spanish. No matter how long it is the speech. May be the Japanese speech is 10 minutes long and I can begin to listen to it after 5 seconds in Spanish and will end 5 seconds after finishing in Japanese". Basically it is like having a interpreteur by your side who doesn't have to wait until the end of the speech to begin translating. That way, the conversation gets more fluid. I know this is not an easy task, as there are SOV and SVO languages. However, I think that Seamless m4t model is able to take this into account aswell. Do you think is it possible to implement such a thing with this model?

@user-vv9sk2uj8u 2 күн бұрын

As a personal study, this is a great sharing, but AI phones such as iOS or Android will soon integrate relevant functions for real-time calls (phone calls or online meetings). Of course, privacy protection will be a constraint

@Alex72RM 4 ай бұрын

It's quite impressive!

@deintez 2 ай бұрын

Hi, I am very interested in your script, but I can't seem to get it running. I don't understand where to input the API keys for each program, as there is no such section in your script. I am encountering a lot of errors. I really need your help.

@alejandroGTES 4 ай бұрын

Awesome project! Is it possible to use another service as translation rather than Chatgpt that doesn't require a subscription?

@AdamLucek 4 ай бұрын

Certainly possible. Translation service could be anything as the sentence string is all thats being passed back and forth. I just used OpenAI for a quick solution, but any service could be substituted in that step.

@alejandroGTES 4 ай бұрын

@@AdamLucek Oh nice, I would love to see and updated version with a free alternative.

@kenibarwick Ай бұрын

Even better, a local version

@bharaths5603 4 ай бұрын

Kalakitta nanba!

@nilamara7620 4 ай бұрын

Really impressive that combination of these 3. But to have a perfect loop how to deal with an input audio (voice) in real time before start speaking to respond ? And another question the generating audio at last could be an emulation of your microphone ?

@AdamLucek 4 ай бұрын

As this is currently setup, the streaming STT from AssemblyAI will transcribe, and then output a final "sentence" after some variable breakpoint of no speech. It is with this output that I process the rest of it into speech. As this is more an MVP, more could be done within that intermediary step (checks for speech, pauses, etc) that could change how and when the translated speech is played back, or even done as a separate process, not sequentially like this is happening!

@dcleinad 2 ай бұрын

Hey Adam is there way to book a 1 on 1 to see if you can help me with this. I just need to get gpt + asembly Ai for the project I want.

@ploylovespeach 3 ай бұрын

Hi Adam, thank you so much for sharing this video! This is exactly what I've been searching for. I'm actually looking for an AI developer to help me create an MVP app for my startup business in Japan in the beauty industry. Would you be open to discussing potential work opportunities, or is this more of a hobby for you?

@iainhmunro 3 ай бұрын

Does it do Tibetan ?

@vv1nter__ 4 ай бұрын

And it can be used to communicate in discord?

@simont733 Ай бұрын

The problem is there is no East Africa Ethiopian Ahmaric language

@JohnSmith-qh4vf 3 ай бұрын

on kaggle pls

@CuriosidadesParaPensar-nn9sn 3 ай бұрын

free or paid ?

@AdamLucek 3 ай бұрын

Paid currently for all three services

@user-vv9sk2uj8u 2 күн бұрын

@AdamLucek 1. The speech-to-text part can be speeded up with faster-whisper, and it's free 2. Translation services, you can use free services such as Google or Microsoft 3. The latest iOS or Mac can create a personal voice, send text to live speech, and use your own voice to make a sound

@user-vv9sk2uj8u 2 күн бұрын

My idea is that the above services can be combined with shortcuts to achieve real-time translation to various voices on mac or iPhone, and Babelta can already be conquered

@user-vv9sk2uj8u 2 күн бұрын

ElevenLabs可以使用Azure AI