God, the day we have this in real time with low latency for livestreams will be amazing. I understand English perfectly well but I don't feel confident streaming in another language lol.
@MisionJapon3 ай бұрын
Thank you for the video. I am currently living in Osaka, Japan and I am very interested in Instant Translation with AI models. However, what I understand by "Instant Translation" is not: "I say a sentence - The model translates it after a few seconds and I can hear it - I say another sentence - The model translates it after a few seconds and I can hear it..." What I understand by Instant Translation is: "You are talking in Japanese and, while you are talking in Japanese (with a delay of a few senconds), I listen your speech in Spanish. No matter how long it is the speech. May be the Japanese speech is 10 minutes long and I can begin to listen to it after 5 seconds in Spanish and will end 5 seconds after finishing in Japanese". Basically it is like having a interpreteur by your side who doesn't have to wait until the end of the speech to begin translating. That way, the conversation gets more fluid. I know this is not an easy task, as there are SOV and SVO languages. However, I think that Seamless m4t model is able to take this into account aswell. Do you think is it possible to implement such a thing with this model?
@user-vv9sk2uj8u2 күн бұрын
As a personal study, this is a great sharing, but AI phones such as iOS or Android will soon integrate relevant functions for real-time calls (phone calls or online meetings). Of course, privacy protection will be a constraint
@Alex72RM4 ай бұрын
It's quite impressive!
@deintez2 ай бұрын
Hi, I am very interested in your script, but I can't seem to get it running. I don't understand where to input the API keys for each program, as there is no such section in your script. I am encountering a lot of errors. I really need your help.
@alejandroGTES4 ай бұрын
Awesome project! Is it possible to use another service as translation rather than Chatgpt that doesn't require a subscription?
@AdamLucek4 ай бұрын
Certainly possible. Translation service could be anything as the sentence string is all thats being passed back and forth. I just used OpenAI for a quick solution, but any service could be substituted in that step.
@alejandroGTES4 ай бұрын
@@AdamLucek Oh nice, I would love to see and updated version with a free alternative.
@kenibarwickАй бұрын
Even better, a local version
@bharaths56034 ай бұрын
Kalakitta nanba!
@nilamara76204 ай бұрын
Really impressive that combination of these 3. But to have a perfect loop how to deal with an input audio (voice) in real time before start speaking to respond ? And another question the generating audio at last could be an emulation of your microphone ?
@AdamLucek4 ай бұрын
As this is currently setup, the streaming STT from AssemblyAI will transcribe, and then output a final "sentence" after some variable breakpoint of no speech. It is with this output that I process the rest of it into speech. As this is more an MVP, more could be done within that intermediary step (checks for speech, pauses, etc) that could change how and when the translated speech is played back, or even done as a separate process, not sequentially like this is happening!
@dcleinad2 ай бұрын
Hey Adam is there way to book a 1 on 1 to see if you can help me with this. I just need to get gpt + asembly Ai for the project I want.
@ploylovespeach3 ай бұрын
Hi Adam, thank you so much for sharing this video! This is exactly what I've been searching for. I'm actually looking for an AI developer to help me create an MVP app for my startup business in Japan in the beauty industry. Would you be open to discussing potential work opportunities, or is this more of a hobby for you?
@iainhmunro3 ай бұрын
Does it do Tibetan ?
@vv1nter__4 ай бұрын
And it can be used to communicate in discord?
@simont733Ай бұрын
The problem is there is no East Africa Ethiopian Ahmaric language
@JohnSmith-qh4vf3 ай бұрын
on kaggle pls
@CuriosidadesParaPensar-nn9sn3 ай бұрын
free or paid ?
@AdamLucek3 ай бұрын
Paid currently for all three services
@user-vv9sk2uj8u2 күн бұрын
@AdamLucek 1. The speech-to-text part can be speeded up with faster-whisper, and it's free 2. Translation services, you can use free services such as Google or Microsoft 3. The latest iOS or Mac can create a personal voice, send text to live speech, and use your own voice to make a sound
@user-vv9sk2uj8u2 күн бұрын
My idea is that the above services can be combined with shortcuts to achieve real-time translation to various voices on mac or iPhone, and Babelta can already be conquered
@user-vv9sk2uj8u2 күн бұрын
ElevenLabs可以使用Azure AI
@smilebig3884Ай бұрын
Its huge latency… who said its realtime
@user-vv9sk2uj8u2 күн бұрын
Using the local open source model can reduce latency, but it can't be latency-free
@user-vv9sk2uj8u2 күн бұрын
Faster-Whisper + Local Translation + Apple's Personal Voice can dramatically reduce latency
@CarasGFTK8 күн бұрын
Hi @Adam ! I just messaged you on linkedin! Would love to chat.