Thank you for the lesson. Excepting log loss, I think I understand most of this. Also, It is a very nice point about projecting onto a 2 dimensional space, thank you!
@harshvardhanagrawal2 ай бұрын
@16:20 Looking at the graph at x=0, the value is still 0. Its starts increasing only after going away from 0 towards the positive side. then should ReLU(z) not be, 0 if z<=0 z if z>0 instead?
@anonymous712082 ай бұрын
... which gives you **exactly** the same result (just plug in 0 to see why)
@harshvardhanagrawal2 ай бұрын
@14:23 Training data xi = vector features yi = gold label Can you please explain this with an example? Let us assume that the training data is the entire data from Wikipedia. Now this becomes xi after tokenisation and word embedding. What is gold label yi here?
@anonymous712082 ай бұрын
This is a "standard supervised machine learning" terminology - we have a bunch of IID "examples" in the training data in the form of x = feature vector, y = the label (vector). An example in NLP could be sentiment classification: A single example is a movie review, one possible baseline "featurization" is a bag-of-word (I believe we cover it in the lectures), and the label would be one of the two values (0 = negative review, 1 = positive review).
@harshvardhanagrawal2 ай бұрын
@@anonymous71208 A follow up question about the example you gave. A review reads: I loved the movie Shawshank Redemption. It's a modern classic. Inference - this is a positive review. Task - Classify this as positive (1) or negative (0), so binary classification. In this case, the text "I loved the movie Shawshank Redemption. It's a modern classic" in its vectorized form is xi the feature vector? and the classification - positive = 1 or negative = 0 (only one of them either 0 or 1) that comes out as a result of the processing of the text by the model, in its vectorized (closer to 0 or 1) form, is the gold label? Sorry to ask a silly question. This is very important for me to clear my concept.
@anonymous712082 ай бұрын
@@harshvardhanagrawal "In this case, the text "I loved the movie Shawshank Redemption. It's a modern classic" in its vectorized form is xi the feature vector?" -> yes "positive = 1 or negative = 0 (only one of them either 0 or 1)" -> this is the so-called "gold label" which is **known** in advance. This example will be used for training the model. The model we discussed here will output a number between zero and one (ie. the conditional probability of y=1 given the input feature vector). For an example from the training data, the closer the prediction is to the gold label, the "less error" the model makes.
@harshvardhanagrawal2 ай бұрын
@@anonymous71208 Thank you so much Prof."
@harshvardhanagrawal3 ай бұрын
@44:50 I don't understand why input should be multiplied to the softmax output. The dot product already kind of gives us a similarity score. Then we pass it through a softmax to get probabilistic score or to check it on a histogram which is also understandable. But then we already have it on histogram the information of how much attention or weightage is given for each word. Could you please explain what is the need for another multiplication with the input matrix?
@anonymous712083 ай бұрын
Have a look at slide 14 onwards in the following lecture (kzbin.info/www/bejne/qaLYZYGDZdh6irs) - it shows that under the hoods there are three inputs (query, key, value) in a general transformer but these are all taken identical in BERT
@harshvardhanagrawal3 ай бұрын
Didn't you record the next videos? i.e. 8th 9th and 10th? Would have been really nice if I could somehow get links to them.. Your way of teaching is really good... Thank you!
@anonymous712083 ай бұрын
Thank you! Yes, I was teaching this course (almost identical) last winter term, here's the GitHub page github.com/trusthlt/nlp-with-deep-learning-lectures/ and here's all the recorded lectures kzbin.info/aero/PL6WLGVNe6ZcB00apoxMtj7WSUOlpm2Xvl Enjoy :)
@harshvardhanagrawal3 ай бұрын
@anonymous71208 thank you so much!
@abedrahman45193 ай бұрын
This has been incredibly beneficial, thank you
@anonymous712083 ай бұрын
Thank you! :)
@anonymous712084 ай бұрын
Some after-thoughts: 1) The size of the sub-sampled set on slide 19 should be actually distributed as Binomial. However, for (very) large n, this might be approximated by Poisson distribution - see e.g., en.wikipedia.org/wiki/Binomial_distribution#Poisson_approximation
@gabrieleghisleni64956 ай бұрын
Great lesson, this is very well explained! Thank you for sharing! I have a question regarding the training phase of the gpt. Basically if I have a questione like "the cat is standing beyond you" I am computing the loss 5 times? "The x", "The cat x", "the cat is x" and so on untile the last one, where the model is predicting x and I am computing a cross entropy loss on that token? Just to be clear if I have a batch size of 16 and all sequences have 50 tokens I am averaging the loss of 50*16 per forward pass? This process is all made in parallel? Thank in advance!
@anonymous712086 ай бұрын
Thanks! So if you look at slide 18, DTransformer gives you the entire list of prob. distributions over vocabulary for each token at once (internally it's using masking so that the attention looks only "back"). And then at slide 21, line 6: the loss is computed for each token and summed up. This is done for each training example, and can be paralellized in batches (which consumer much more memory). Hope it helps!
@DavidChang-z8h8 ай бұрын
You mentioned that embedding vector for each word is part of data to be trained. On the other hand, on this slide kzbin.info/www/bejne/hJmvamRsps2XjdU, I am not able to see how embedding is part of trained data. This is because I am not able to see the green "E box" is involved in the training process. Note that "concat" is in purple. I understand that the network can learn the weights associated with the embedding vector. Then, how are such weights related to embedding vectors?
@anonymous712088 ай бұрын
"embedding vector for each word is part of data to be trained" - I believe you think "embedding vector of each word is to be learned from training data", right? Because "embedding is part of trained data" does not make sense to me, I'm sorry
@anonymous712088 ай бұрын
Clarified in github.com/trusthlt/nlp-with-deep-learning-lectures/issues/8
@DavidChang-j6s8 ай бұрын
Should w_k in Pr(w_k|<S>,w_1,w_2,....,w_n_1) be w_n? I am referring to the factorizing the joint probability into a series of products.
@DavidChang-z8h8 ай бұрын
I mean the slide at this point kzbin.info/www/bejne/hJmvamRsps2XjdU
@anonymous712088 ай бұрын
Which slide number? Appreciate your feedback - can you please open an issue on GitHub instead so we can fix it there? (the same for your other comment)
@anonymous712088 ай бұрын
Fixed in github.com/trusthlt/nlp-with-deep-learning-lectures/issues/9
@DavidChang-j6s9 ай бұрын
Great lecture, thanks for posting it. The style and delivery are very good, easy to follow and understand. One more thing: "a=tan(theta)" may not be correct mathematically kzbin.info/www/bejne/rJrIm2OMo9uqg7s. Another minor issue: DAG = directed acyclic graph kzbin.info/www/bejne/rJrIm2OMo9uqg7s
@anonymous712089 ай бұрын
Thank you, I fixed DAG on GitHub. Regarding the first comment: Why do you think it is not correct? Maybe open an issue on GitHub, where it's easy to write math in markdown, thanks!
@DavidChang-j6s9 ай бұрын
@@anonymous71208 you are correct regarding my first comment. I was thinking about something else. I am sorry. The Github info is helpful!
@DavidChang-j6s9 ай бұрын
Fantastic lecture! Thank YOU!
@anonymous712089 ай бұрын
Thanks, hope you find it helpful!
@DavidChang-j6s9 ай бұрын
Great tutorial
@anonymous7120810 ай бұрын
Correction: PDF is a derivative of CDF (I said it other way around)
@soumyasarkar410010 ай бұрын
Thanks for sharing
@anonymous7120810 ай бұрын
Thanks for commenting! Hope you're doing well :)
@marvinpeng1930 Жыл бұрын
-
@arunnavinjoseph9262 Жыл бұрын
whats going on in the beginning.full of noise.the speaker is not speaking.
@anonymous71208 Жыл бұрын
The speaker is waiting for the students to come in and sit down
@konstantinbachem9800 Жыл бұрын
I don't understand the x^D[i] notation at 43:00 why is it x and not V?
@anonymous71208 Жыл бұрын
That's right, $V^{D_{[i]}}$ would be even "more correct"... but I hope you get the gist: it's a one-hot vector with 1 at position of i-th word in the document.
@olmkiujnb Жыл бұрын
After approximately 48:00 the subs need to be delayed by about 0.2-0.3 seconds, they are a bit too fast.
@anonymous71208 Жыл бұрын
Thanks for pointing this out! In my opinion, it's not that badly out-of-sync, you can still follow, I hope
@olmkiujnb Жыл бұрын
@@anonymous71208 Yes of course! I am just kind of sad that my subtitles were worse than I had hoped for. Working on making them better..
@anonymous71208 Жыл бұрын
Obviously the zero-weight initialization was wrong, I should have double-checked that before! Here's a nice visualization showing that it leads to no gradient flow, so no training at all: www.deeplearning.ai/ai-notes/initialization/index.html
@jannikholmer9211 Жыл бұрын
Regarding the question about why not to intialize the parameters with all zeros: I think it would lead to us having basically a one parameter network as all of the W's would get updated in the same way with the same gradient and learning rate, so the parameters can be zero but just not all of them.
@annaschlander51812 жыл бұрын
I would agree with the statement that a strange woman lying in a pond distributing sword[s] is no basis for a system of government. (1:03:05)
@anonymous712082 жыл бұрын
:) Courtesy of Monty Python and the Holy Grail... a hilarious piece, indeed kzbin.info/www/bejne/gX-clGWKdryAosk
@annaschlander51812 жыл бұрын
@@anonymous71208 Haha awesome. I haven't seen this one from Monty Python, yet and I could totally picture it being hilarious, too.
@trantandat26992 жыл бұрын
One of the best lecture on BERT ! Thank you
@anonymous712082 жыл бұрын
Thanks a lot! You know the rules - if you like it, share it :)
@annaschlander51812 жыл бұрын
Why is the Mandelbrot Set displayed when coming to the part of Regular Expressions for detecting word patterns (44:08) ? Is there a closer connection between these two?
@anonymous712082 жыл бұрын
That's a good question, which I'm afraid only the original author of this particular slide could answer... I don't see any direct connection myself, but there's some research combining regex and fractals out there ( www.sciencedirect.com/science/article/abs/pii/S0096300312007291 )
@annaschlander51812 жыл бұрын
@@anonymous71208 Hey thank you for the link and your fast reply! That looks very interesting.
@sushantupadhyay39763 жыл бұрын
Hey man.. awesome stuff to say the least... iam hooked... it is unbearable that the series o lectures is broken and quite a few appear to be missing !! is it intentional.. can you please upload ALL your lectures ?!
@Manu-he7bu3 жыл бұрын
A few of the lectures are given by Mohsen Mesgar instead. You should find all of the 'missing' lectures here: kzbin.info/door/4VNXEo_buXs4v1dFfIuaew
@anonymous712083 жыл бұрын
Thanks! We split the lectures with my colleague Mohsen Mesgar this year, for the full overview see our Github pages: github.com/dl4nlp-tuda2021/deep-learning-for-nlp-lectures (there's slides and LaTeX source codes too)
@yamenajjour11513 жыл бұрын
Thanks for the Video !
@anonymous712083 жыл бұрын
Thanks, Yamen! :)
@zeys33163 жыл бұрын
33:30 shouldn’t the result for max pooling 0,8 0,9 ?
@anonymous712083 жыл бұрын
The "blue" vector is (0.5, 0.2, -0.2, -0.9), so the second term is 0.5. Maybe the minus sign is not that visible at 33:30 but if you go back to 32:40, you'll notice it.
@zeys33163 жыл бұрын
@@anonymous71208 true, my mistake🤦🏻♂️
@Raventouch3 жыл бұрын
Thanks Sir, the quality of your lecture is high as usual. Just a question though: What is Deel Learning? :P
@anonymous712083 жыл бұрын
Good catch! :) Thx
@Manu-he7bu3 жыл бұрын
And also a typo on slide 31. Recall should be 0.4 or 40% for that matter
@anonymous712083 жыл бұрын
Thanks Manu! Fixed in the slides on github
@anonymous712083 жыл бұрын
Typo on Slide 30: Right bottom should be "True negative"
@johneric83823 жыл бұрын
lol
@venkateshsadagopan25053 жыл бұрын
Dear Ivan, Its a great initiative. Do we have access to assignments as well so that the theoretical knowledge in the videos can be practically applied in assignments?
@anonymous712083 жыл бұрын
Thank you! That's a fair point, but for now we're keeping them private as these are (mostly) graded... But I re-think this next year. Step by step :)
@JAaraMInato3 жыл бұрын
Congrats on getting that promotion/relocation! And thanks again for this lecture
@anonymous712083 жыл бұрын
Thanks, much appreciated!
@Boaque3 жыл бұрын
Unfortunately I got different results at first, as German IP's are blocked from accessing the given gutenberg link. Using a VPN did the trick for me, in case anyone else is encountering the same issue (or use a web proxy link).
@anonymous712083 жыл бұрын
Good catch! Thanks for sharing your experience!
@jensbengrunwald65893 жыл бұрын
Nice last sentence :) that earned you a like As a youtuber you can also remind students to subscribe to your channel and ask to leave comments ^^
@anonymous712083 жыл бұрын
Oh, you're right! >>> SUBSCRIBE <<< and stuff like that... :)) Next time, I promise!