C5W3L07 Attention Model Intuition

Рет қаралды 291,636

6 жыл бұрын

Take the Deep Learning Specialization: bit.ly/2TF1B06
Check out all our courses: www.deeplearning.ai
Subscribe to The Batch, our weekly newsletter: www.deeplearning.ai/thebatch
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai

Пікірлер: 79

@TheBlackhawk2011 2 жыл бұрын

"if you can't explain it simply you don't understand it well enough" Andrew is one of the best instructors in the world. No wonder he teaches at Standord.

@grownupgaming 2 жыл бұрын

as someone from berkeley, this is a good comment!

@Mesenqe 4 жыл бұрын

I can't believe that, Siraj Raval has around 700K subscribers and this most valuable channel only 76K.

@AIPlayerrrr 4 жыл бұрын

the worst channel on youtube...notorious...

@Mesenqe 4 жыл бұрын

@@AIPlayerrrr ok, if #Deeplearning_ai is the worst channel, where almost everyone become the master of Machine learning, and Deep learning...plz recommend me which channel you use? N.B. Not in Chinese, in English.

@AIPlayerrrr 4 жыл бұрын

Gery W. Adhane I am talking about Siraj Raval....

@AIPlayerrrr 4 жыл бұрын

Gery W. Adhane I remember Lex’s interview on Siraj Raval which Siraj said he only knew 50% of the materials lolll

@Mesenqe 4 жыл бұрын

@@AIPlayerrrr Oh sorry, I thought you are referring to #deeplearning_ai. In that case I agree. He manipulates us all.

@mehmetcelepkolu7660 4 жыл бұрын

3:24 - Woah, the legend is speaking French!

@frankie59er 3 жыл бұрын

These explanations are top-notch, definitely deserving of way more views

@MasterofPlay7 3 жыл бұрын

you know the BERT model is the best nlp model right?

@PookyCodes 3 жыл бұрын

Thank you so much for this valuable video!

@aryanyekrangi7093 2 жыл бұрын

Great video series!

@arborymastersllc.9368 Жыл бұрын

Would definitely have to shift the location context weights for different languages as expressions shift word order across languages. Example: 3 left 2 right for language A. And 2 left 4 right Language B, with language specific variance to weight distribution themselves

@bionhoward3159 5 жыл бұрын

thank you!

@anggipermanaharianja6122 4 жыл бұрын

really clear explanation

@arthurswanson3285 3 жыл бұрын

Well explained.

@theSpicyHam 4 жыл бұрын

I learned an lot, thank you

@youssefdirani 3 жыл бұрын

Magical voice

@yeming3777 2 жыл бұрын

helps me a lot

@zhenpingli2313 5 жыл бұрын

real great

@marinzhao3513 4 жыл бұрын

解析的很清晰

@fackarov9412 3 жыл бұрын

to me seems like applying kernels(1D CNN) to RNN and call it "attention"

@sandipansarkar9211 3 жыл бұрын

good expalantion but need to see again

@taku8751 4 жыл бұрын

I hope the cursor can be bigger, I can not see it.

@mohsenboughriou9846 Жыл бұрын

man u're the best

@MrAkhilanil Жыл бұрын

Sounds cool, someone should build a chat bot with this tech!

@abail7010 2 жыл бұрын

The only thing I am not understanding is why its harder to translate shorter sentences? :)

@namHoang-lb6jp 2 жыл бұрын

Some segments in the video are stamped not adjacent to each other

@arborymastersllc.9368 Жыл бұрын

Recursive context checking every so many words? Like after every noun verb combo identified, recheck context appropriatness.

@saeedullah5365 4 жыл бұрын

Why LSTM have more accuracy than Bi directional LSTM though is new approach

@thepresistence5935 2 жыл бұрын

Attention model invented by "avengers" 😆😆 2:55

@beniev1 5 жыл бұрын

How can one know how many words should be in the output?

@beniev1 5 жыл бұрын

I mean when you translate new sentence, not in the training...

@coralbow 5 жыл бұрын

@@beniev1 There's a separate RNN for both encoder and decoder. Encoder RNN takes in a fixed length sequence as input and decoder RNN outputs a fixed length sequence. To make every sentence the same length (for encoder RNN), they usually add 0s to the start of sentences which are shorter than required length. For decoder RNN the idea is similar: they output a fixed length sequence of hidden states (word distributions) and for each state in that sequence they choose the most probable word. In that word distribution there is a word (EOS which basically stands for 0) they don't show this in visualisation because it adds no information.

@abbashoseini9344 5 жыл бұрын

@@beniev1 I think according to this video there is no need to know number of words for your translation and you can't know this until you translate it . your translation will be finished when your network generate EOS word. and then you can count how many words network has generated for translation.

@louisraison 5 жыл бұрын

@@abbashoseini9344 Exactly ! What is a bit misleading in the example here is that each word in the French sentence is translated by exactly one in English, but any word could actually be predicted instead, and the translation of this word could come later.

@abbashoseini9344 5 жыл бұрын

it is important to note that it depend on you application. for example in sequence tagging problems you need to force model to make output and input have equal length. attention mechanism has this power to generalize all kind of problems with various constraints on input and output length.

@danielcai1017 2 жыл бұрын

3:29 it seems he knows French well

@bhimireddyananthreddy1487 4 жыл бұрын

What does some set of features at 3:36 mean?

@ThePritt12 4 жыл бұрын

an encoding = features of a sentence

@Utbdankar 4 жыл бұрын

To determine each feature vector(the set of features) you use the input of the word itself and the previous feature vector, which outputs a new feature vector. You can think of the feature vector as "everything needed to translate the current word that came before the word in the sentence".

@2005sty 2 жыл бұрын

It is not correct that a human translator carries out translation part by part especially in the case of translation of two different languages with different grammatic rules.

@isodoubIet Жыл бұрын

The "parts" don't have to be in the same order.

@moodiali7324 3 жыл бұрын

very good content with very bad audio quality, hope that improves one day

@nichosetiawan1377 2 жыл бұрын

the video image is too poor, you need to fix it more

@fahds2583 3 жыл бұрын

video starts here 3:11

@ananthakrishnank3208 Жыл бұрын

3:00

@procrastipractice Жыл бұрын

Good luck with German where one verb can be split over a huge distance

@heejuneAhn 4 жыл бұрын

Oh, I see that is why Google translation is still bad with Engish-Japanese or Korean!

@socratic-programmer 4 жыл бұрын

Is that because the sentences read in an unusual order? Incidentally, recent models have become a lot better at multi-translation, so maybe they are better now than before.

@peterfireflylund 4 жыл бұрын

@@socratic-programmer No, it's not about an unusual order. That's actually quite easy to handle. It's because Korean and Japanese have completely different grammar from English. They are agglutinating, have pretty free word order because they both "tag" their words with short sounds that tell what role they play in the sentence, and they both allow most of the "real" sentence to be left out if it can be inferred from context. There are also problems with the semantic mapping between J/K and E where context is needed to figure out how to translate words/idioms. To top it all off, Japanese and Korean both have really complicated systems of honorifics. Oh, and copula is handled in *completely* different ways in J/K and E. Current translation systems are really bad at handling context above the sentence level, so... you can see the problems. Wikipedia has pretty good articles on Japanese, Korean, and English grammar.

@peterfireflylund 4 жыл бұрын

Forgot to add that tokenization is another issue. There are translating neural networks that are completely end-to-end: they take characters/punctuation as input and produce characters/punctuation as output. Most deep learning systems use a tokenizer before the input and a "detokenizer" after the output. Such a tokenizer may give all common words their own token number and it may split rarer words into smaller components, often using simple rules based on tables and regular expressions. It may also turn things like "isn't" into "is not" for English and "du" into "de le" for French. How to properly tokenize ideographic scripts like Chinese hanzi and Japanese kanji (and Korean Hanja) is still a research subject. Actually, even tokenization for *English* is still a research subject!

@socratic-programmer 4 жыл бұрын

@@peterfireflylund Some of those linguistic terms elude me, but that makes sense. Thanks for that explanation.