Machine Translation for a 1000 languages

Machine Translation for a 1000 languages - Paper explained

Рет қаралды 5,608

AI Coffee Break with Letitia

Күн бұрын

Пікірлер

@neerav302 Жыл бұрын

wowwww... love your way of explanation .....

@Micetticat 2 жыл бұрын

C-3PO is truly behind the corner.

@hassanmohamadian1420 2 жыл бұрын

I wonder . You are beautiful and intelligent speak english well Impressed by you

@sadface7457 2 жыл бұрын

🥳 thank you w&b for bring us this content

@jiashuhan9011 2 жыл бұрын

8:19 It's correct. Support from china! Love your videos ❤

@DerPylz 2 жыл бұрын

My message to Google: "Dataset or it didn't happen!"

@sadface7457 2 жыл бұрын

does matter once the dataset is in petabyte range?

@DerPylz 2 жыл бұрын

@@sadface7457 of course it does. If the dataset is open source, people can use the parts that are relevant for them. No need to waste the energy and work hours needed to generate it, again. Publishing a "scientific" paper and not providing the data is so weird to me. Especially when we're living in the age of the reproducibility crisis. Maybe I'm an idealist, but I find science is not science if it's not for the progress of society. It's just internal R&D of a company.

@dinhanhx 2 жыл бұрын

@@DerPylz it's called opendata, not opensource.

@DerPylz 2 жыл бұрын

@@dinhanhx of course, thanks for pointing out my typo.

@niofer7247 2 жыл бұрын

Great summary. Love your videos, always great explanations!!!

@fahimfaisal1956 2 жыл бұрын

thanks Letitia for the nice explanation......keep going strong :) :)

@marcocipriano5922 2 жыл бұрын

I love when she says "yes, you guess correctly!" Me still sleepy in bed: ...did I?

@Aldraz 2 жыл бұрын

Would be great if Google improved the already implemented languages. Translating from Japanese to English is still a huge pain and a complete nonsense most of times.

@nourhesham894 2 жыл бұрын

Love your videos! Wondering if you can explain the SegFormer paper in easy terms!

@AICoffeeBreak 2 жыл бұрын

Writing it onto THE list. Did not hear of the paper before, will have a look.

@harumambaru 2 жыл бұрын

8:19 - I hope this is correct. Comment below if not

@AICoffeeBreak 2 жыл бұрын

Let's check how well Google Translate works for Chinese. 😅

@harumambaru 2 жыл бұрын

@@AICoffeeBreak Lets wait for a viewer with at least 2 languages, English and Chinese to find out the truth

@AICoffeeBreak 2 жыл бұрын

⌚⏳

@ANTIMONcom 2 жыл бұрын

Perhaps now google translate will be a bit less shit for all translations that are not to or from English 😑

@Skinishh 2 жыл бұрын

Great video once again! 👏 Question: How does the model know the target language to translate to? Is there a token in the input that states the task at hand? or maybe the first token in the decoder

@AICoffeeBreak 2 жыл бұрын

Hi, thanks for the question. I would have replied earlier, but I was in vacation while reading the comment and couldn't check the paper. Now I did check it. Yes, your intuition is right: they append a language token. Papers usually do not agree on whether to add (1) the source language token in the encoder and the target language to the decoder, or to (2) append a target language to the source sequence (encoder). On page 13, it says: "Different from Siddhant et al. (2022), in addition to the token that was prepended to the source sequence to signify the target language for both translation and MASS tasks, we add a token ( for the translation task, and for the MASS task) that specifies the task to be performed by the model. We find this to be critical for zero-resource performance, especially when model sizes are scaled up. In the absence of this task token, our models learnt to ‘infer’ the task from the source languages instead of relying on the token, resulting in copying sentences when provided zero-resource language sentences with the token."

@Skinishh 2 жыл бұрын

@@AICoffeeBreak thanks! 🙏