Machine Translation for a 1000 languages - Paper explained

  Рет қаралды 5,445

AI Coffee Break with Letitia

AI Coffee Break with Letitia

Күн бұрын

We explain the new polyglot model from Google Research that can translate between 1,000 languages! No need to read the long research paper yourself, because here we explain and summarize it on a high level.
SPONSOR: Weights & Biases 👉 wandb.me/ai-coffee-break
Check out our daily #MachineLearning Quiz Questions: / aicoffeebreak
➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....
Paper 📜: Bapna, Ankur, Isaac Caswell, Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu et al. "Building Machine Translation Systems for the Next Thousand Languages." arXiv preprint arXiv:2205.03983 (2022). arxiv.org/abs/2205.03983
🔗 Facebook’s response: 200 language machine translation and it’s open source: arxiv.org/abs/2207.04672
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Don Rosenthal, Dres. Trost GbR, banana.dev -- Kyle Morris, Julián Salazar, Edvard Grødem, Vignesh Valliappan, Kevin Tsai, Mutual Information, Mike Ton
Outline:
00:00 Machine translation for a 1000 languages
00:42 Weights&Biases (Sponsor)
02:00 Problems with many languages
04:15 Collecting data for 1k languages
11:46 Building MT models
14:13 Results on a thousand languages
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: / aicoffeebreak
Ko-fi: ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: / aicoffeebreak
Twitter: / aicoffeebreak
Reddit: / aicoffeebreak
KZbin: / aicoffeebreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​
Video editing: Nils Trost

Пікірлер: 28
@marcocipriano5922
@marcocipriano5922 Жыл бұрын
I love when she says "yes, you guess correctly!" Me still sleepy in bed: ...did I?
@Micetticat
@Micetticat Жыл бұрын
C-3PO is truly behind the corner.
@jiashuhan9011
@jiashuhan9011 Жыл бұрын
8:19 It's correct. Support from china! Love your videos ❤
@sadface7457
@sadface7457 Жыл бұрын
🥳 thank you w&b for bring us this content
@DerPylz
@DerPylz Жыл бұрын
My message to Google: "Dataset or it didn't happen!"
@sadface7457
@sadface7457 Жыл бұрын
does matter once the dataset is in petabyte range?
@DerPylz
@DerPylz Жыл бұрын
@@sadface7457 of course it does. If the dataset is open source, people can use the parts that are relevant for them. No need to waste the energy and work hours needed to generate it, again. Publishing a "scientific" paper and not providing the data is so weird to me. Especially when we're living in the age of the reproducibility crisis. Maybe I'm an idealist, but I find science is not science if it's not for the progress of society. It's just internal R&D of a company.
@dinhanhx
@dinhanhx Жыл бұрын
@@DerPylz it's called opendata, not opensource.
@DerPylz
@DerPylz Жыл бұрын
@@dinhanhx of course, thanks for pointing out my typo.
@niofer7247
@niofer7247 Жыл бұрын
Great summary. Love your videos, always great explanations!!!
@fahimfaisal1956
@fahimfaisal1956 Жыл бұрын
thanks Letitia for the nice explanation......keep going strong :) :)
@neerav302
@neerav302 Жыл бұрын
wowwww... love your way of explanation .....
@nourhesham894
@nourhesham894 Жыл бұрын
Love your videos! Wondering if you can explain the SegFormer paper in easy terms!
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Writing it onto THE list. Did not hear of the paper before, will have a look.
@Aldraz
@Aldraz Жыл бұрын
Would be great if Google improved the already implemented languages. Translating from Japanese to English is still a huge pain and a complete nonsense most of times.
@harumambaru
@harumambaru Жыл бұрын
8:19 - I hope this is correct. Comment below if not
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Let's check how well Google Translate works for Chinese. 😅
@harumambaru
@harumambaru Жыл бұрын
@@AICoffeeBreak Lets wait for a viewer with at least 2 languages, English and Chinese to find out the truth
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
⌚⏳
@ANTIMONcom
@ANTIMONcom Жыл бұрын
Perhaps now google translate will be a bit less shit for all translations that are not to or from English 😑
@Skinishh
@Skinishh Жыл бұрын
Great video once again! 👏 Question: How does the model know the target language to translate to? Is there a token in the input that states the task at hand? or maybe the first token in the decoder
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Hi, thanks for the question. I would have replied earlier, but I was in vacation while reading the comment and couldn't check the paper. Now I did check it. Yes, your intuition is right: they append a language token. Papers usually do not agree on whether to add (1) the source language token in the encoder and the target language to the decoder, or to (2) append a target language to the source sequence (encoder). On page 13, it says: "Different from Siddhant et al. (2022), in addition to the token that was prepended to the source sequence to signify the target language for both translation and MASS tasks, we add a token ( for the translation task, and for the MASS task) that specifies the task to be performed by the model. We find this to be critical for zero-resource performance, especially when model sizes are scaled up. In the absence of this task token, our models learnt to ‘infer’ the task from the source languages instead of relying on the token, resulting in copying sentences when provided zero-resource language sentences with the token."
@Skinishh
@Skinishh Жыл бұрын
@@AICoffeeBreak thanks! 🙏
@hassanmohamadian1420
@hassanmohamadian1420 Жыл бұрын
I wonder . You are beautiful and intelligent speak english well Impressed by you
@DerPylz
@DerPylz Жыл бұрын
🎉 1629 🤫
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
One hundred six hundred twenty-nine. 🤫
@VV-fl8fi
@VV-fl8fi Жыл бұрын
As a Chinese, the sentence appeared at 8:20 is correct hahaha
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Phew, good to know that Google Translate worked in this case. 😅
DALLE-2 has a secret language!? | Theories and explanations
9:11
AI Coffee Break with Letitia
Рет қаралды 8 М.
Types of Machine Translator
0:26
Tomedes Translations
Рет қаралды 121
My Number card + Convenience Store = Residence Certificate | JAPAN
1:31
Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision
11:19
AI Coffee Break with Letitia
Рет қаралды 18 М.
Can I Use Thermite to Cast an Iron Pan?
38:17
Cody'sLab
Рет қаралды 31 М.
ConvNeXt: A ConvNet for the 2020s - Paper Explained (with animations)
19:20
AI Coffee Break with Letitia
Рет қаралды 20 М.
YOTAPHONE 2 - СПУСТЯ 10 ЛЕТ
15:13
ЗЕ МАККЕРС
Рет қаралды 159 М.