Privacy-preserving NLP: 09
1:03:59
3 ай бұрын
Privacy-preserving NLP: Lecture 08
1:15:20
Privacy-preserving NLP: Lecture 07
1:08:26
Privacy-preserving NLP: Lecture 06
1:12:54
Privacy-preserving NLP: Lecture 05
1:11:00
Privacy-preserving NLP: Lecture 04
1:22:31
Privacy-preserving NLP: Lecture 03
1:24:40
Privacy-preserving NLP Lecture 02
1:06:38
Privacy-preserving NLP: Lecture 01
1:13:46
Пікірлер
@chessketeer
@chessketeer 12 күн бұрын
Thank you for the lesson. Excepting log loss, I think I understand most of this. Also, It is a very nice point about projecting onto a 2 dimensional space, thank you!
@harshvardhanagrawal
@harshvardhanagrawal 2 ай бұрын
@16:20 Looking at the graph at x=0, the value is still 0. Its starts increasing only after going away from 0 towards the positive side. then should ReLU(z) not be, 0 if z<=0 z if z>0 instead?
@anonymous71208
@anonymous71208 2 ай бұрын
... which gives you **exactly** the same result (just plug in 0 to see why)
@harshvardhanagrawal
@harshvardhanagrawal 2 ай бұрын
@14:23 Training data xi = vector features yi = gold label Can you please explain this with an example? Let us assume that the training data is the entire data from Wikipedia. Now this becomes xi after tokenisation and word embedding. What is gold label yi here?
@anonymous71208
@anonymous71208 2 ай бұрын
This is a "standard supervised machine learning" terminology - we have a bunch of IID "examples" in the training data in the form of x = feature vector, y = the label (vector). An example in NLP could be sentiment classification: A single example is a movie review, one possible baseline "featurization" is a bag-of-word (I believe we cover it in the lectures), and the label would be one of the two values (0 = negative review, 1 = positive review).
@harshvardhanagrawal
@harshvardhanagrawal 2 ай бұрын
@@anonymous71208 A follow up question about the example you gave. A review reads: I loved the movie Shawshank Redemption. It's a modern classic. Inference - this is a positive review. Task - Classify this as positive (1) or negative (0), so binary classification. In this case, the text "I loved the movie Shawshank Redemption. It's a modern classic" in its vectorized form is xi the feature vector? and the classification - positive = 1 or negative = 0 (only one of them either 0 or 1) that comes out as a result of the processing of the text by the model, in its vectorized (closer to 0 or 1) form, is the gold label? Sorry to ask a silly question. This is very important for me to clear my concept.
@anonymous71208
@anonymous71208 2 ай бұрын
@@harshvardhanagrawal "In this case, the text "I loved the movie Shawshank Redemption. It's a modern classic" in its vectorized form is xi the feature vector?" -> yes "positive = 1 or negative = 0 (only one of them either 0 or 1)" -> this is the so-called "gold label" which is **known** in advance. This example will be used for training the model. The model we discussed here will output a number between zero and one (ie. the conditional probability of y=1 given the input feature vector). For an example from the training data, the closer the prediction is to the gold label, the "less error" the model makes.
@harshvardhanagrawal
@harshvardhanagrawal 2 ай бұрын
@@anonymous71208 Thank you so much Prof."
@harshvardhanagrawal
@harshvardhanagrawal 3 ай бұрын
@44:50 I don't understand why input should be multiplied to the softmax output. The dot product already kind of gives us a similarity score. Then we pass it through a softmax to get probabilistic score or to check it on a histogram which is also understandable. But then we already have it on histogram the information of how much attention or weightage is given for each word. Could you please explain what is the need for another multiplication with the input matrix?
@anonymous71208
@anonymous71208 3 ай бұрын
Have a look at slide 14 onwards in the following lecture (kzbin.info/www/bejne/qaLYZYGDZdh6irs) - it shows that under the hoods there are three inputs (query, key, value) in a general transformer but these are all taken identical in BERT
@harshvardhanagrawal
@harshvardhanagrawal 3 ай бұрын
Didn't you record the next videos? i.e. 8th 9th and 10th? Would have been really nice if I could somehow get links to them.. Your way of teaching is really good... Thank you!
@anonymous71208
@anonymous71208 3 ай бұрын
Thank you! Yes, I was teaching this course (almost identical) last winter term, here's the GitHub page github.com/trusthlt/nlp-with-deep-learning-lectures/ and here's all the recorded lectures kzbin.info/aero/PL6WLGVNe6ZcB00apoxMtj7WSUOlpm2Xvl Enjoy :)
@harshvardhanagrawal
@harshvardhanagrawal 3 ай бұрын
@anonymous71208 thank you so much!
@abedrahman4519
@abedrahman4519 3 ай бұрын
This has been incredibly beneficial, thank you
@anonymous71208
@anonymous71208 3 ай бұрын
Thank you! :)
@anonymous71208
@anonymous71208 4 ай бұрын
Some after-thoughts: 1) The size of the sub-sampled set on slide 19 should be actually distributed as Binomial. However, for (very) large n, this might be approximated by Poisson distribution - see e.g., en.wikipedia.org/wiki/Binomial_distribution#Poisson_approximation
@gabrieleghisleni6495
@gabrieleghisleni6495 6 ай бұрын
Great lesson, this is very well explained! Thank you for sharing! I have a question regarding the training phase of the gpt. Basically if I have a questione like "the cat is standing beyond you" I am computing the loss 5 times? "The x", "The cat x", "the cat is x" and so on untile the last one, where the model is predicting x and I am computing a cross entropy loss on that token? Just to be clear if I have a batch size of 16 and all sequences have 50 tokens I am averaging the loss of 50*16 per forward pass? This process is all made in parallel? Thank in advance!
@anonymous71208
@anonymous71208 6 ай бұрын
Thanks! So if you look at slide 18, DTransformer gives you the entire list of prob. distributions over vocabulary for each token at once (internally it's using masking so that the attention looks only "back"). And then at slide 21, line 6: the loss is computed for each token and summed up. This is done for each training example, and can be paralellized in batches (which consumer much more memory). Hope it helps!
@DavidChang-z8h
@DavidChang-z8h 8 ай бұрын
You mentioned that embedding vector for each word is part of data to be trained. On the other hand, on this slide kzbin.info/www/bejne/hJmvamRsps2XjdU, I am not able to see how embedding is part of trained data. This is because I am not able to see the green "E box" is involved in the training process. Note that "concat" is in purple. I understand that the network can learn the weights associated with the embedding vector. Then, how are such weights related to embedding vectors?
@anonymous71208
@anonymous71208 8 ай бұрын
"embedding vector for each word is part of data to be trained" - I believe you think "embedding vector of each word is to be learned from training data", right? Because "embedding is part of trained data" does not make sense to me, I'm sorry
@anonymous71208
@anonymous71208 8 ай бұрын
Clarified in github.com/trusthlt/nlp-with-deep-learning-lectures/issues/8
@DavidChang-j6s
@DavidChang-j6s 8 ай бұрын
Should w_k in Pr(w_k|<S>,w_1,w_2,....,w_n_1) be w_n? I am referring to the factorizing the joint probability into a series of products.
@DavidChang-z8h
@DavidChang-z8h 8 ай бұрын
I mean the slide at this point kzbin.info/www/bejne/hJmvamRsps2XjdU
@anonymous71208
@anonymous71208 8 ай бұрын
Which slide number? Appreciate your feedback - can you please open an issue on GitHub instead so we can fix it there? (the same for your other comment)
@anonymous71208
@anonymous71208 8 ай бұрын
Fixed in github.com/trusthlt/nlp-with-deep-learning-lectures/issues/9
@DavidChang-j6s
@DavidChang-j6s 9 ай бұрын
Great lecture, thanks for posting it. The style and delivery are very good, easy to follow and understand. One more thing: "a=tan(theta)" may not be correct mathematically kzbin.info/www/bejne/rJrIm2OMo9uqg7s. Another minor issue: DAG = directed acyclic graph kzbin.info/www/bejne/rJrIm2OMo9uqg7s
@anonymous71208
@anonymous71208 9 ай бұрын
Thank you, I fixed DAG on GitHub. Regarding the first comment: Why do you think it is not correct? Maybe open an issue on GitHub, where it's easy to write math in markdown, thanks!
@DavidChang-j6s
@DavidChang-j6s 9 ай бұрын
@@anonymous71208 you are correct regarding my first comment. I was thinking about something else. I am sorry. The Github info is helpful!
@DavidChang-j6s
@DavidChang-j6s 9 ай бұрын
Fantastic lecture! Thank YOU!
@anonymous71208
@anonymous71208 9 ай бұрын
Thanks, hope you find it helpful!
@DavidChang-j6s
@DavidChang-j6s 9 ай бұрын
Great tutorial
@anonymous71208
@anonymous71208 10 ай бұрын
Correction: PDF is a derivative of CDF (I said it other way around)
@soumyasarkar4100
@soumyasarkar4100 10 ай бұрын
Thanks for sharing
@anonymous71208
@anonymous71208 10 ай бұрын
Thanks for commenting! Hope you're doing well :)
@marvinpeng1930
@marvinpeng1930 Жыл бұрын
-
@arunnavinjoseph9262
@arunnavinjoseph9262 Жыл бұрын
whats going on in the beginning.full of noise.the speaker is not speaking.
@anonymous71208
@anonymous71208 Жыл бұрын
The speaker is waiting for the students to come in and sit down
@konstantinbachem9800
@konstantinbachem9800 Жыл бұрын
I don't understand the x^D[i] notation at 43:00 why is it x and not V?
@anonymous71208
@anonymous71208 Жыл бұрын
That's right, $V^{D_{[i]}}$ would be even "more correct"... but I hope you get the gist: it's a one-hot vector with 1 at position of i-th word in the document.
@olmkiujnb
@olmkiujnb Жыл бұрын
After approximately 48:00 the subs need to be delayed by about 0.2-0.3 seconds, they are a bit too fast.
@anonymous71208
@anonymous71208 Жыл бұрын
Thanks for pointing this out! In my opinion, it's not that badly out-of-sync, you can still follow, I hope
@olmkiujnb
@olmkiujnb Жыл бұрын
@@anonymous71208 Yes of course! I am just kind of sad that my subtitles were worse than I had hoped for. Working on making them better..
@anonymous71208
@anonymous71208 Жыл бұрын
Obviously the zero-weight initialization was wrong, I should have double-checked that before! Here's a nice visualization showing that it leads to no gradient flow, so no training at all: www.deeplearning.ai/ai-notes/initialization/index.html
@jannikholmer9211
@jannikholmer9211 Жыл бұрын
Regarding the question about why not to intialize the parameters with all zeros: I think it would lead to us having basically a one parameter network as all of the W's would get updated in the same way with the same gradient and learning rate, so the parameters can be zero but just not all of them.
@annaschlander5181
@annaschlander5181 2 жыл бұрын
I would agree with the statement that a strange woman lying in a pond distributing sword[s] is no basis for a system of government. (1:03:05)
@anonymous71208
@anonymous71208 2 жыл бұрын
:) Courtesy of Monty Python and the Holy Grail... a hilarious piece, indeed kzbin.info/www/bejne/gX-clGWKdryAosk
@annaschlander5181
@annaschlander5181 2 жыл бұрын
@@anonymous71208 Haha awesome. I haven't seen this one from Monty Python, yet and I could totally picture it being hilarious, too.
@trantandat2699
@trantandat2699 2 жыл бұрын
One of the best lecture on BERT ! Thank you
@anonymous71208
@anonymous71208 2 жыл бұрын
Thanks a lot! You know the rules - if you like it, share it :)
@annaschlander5181
@annaschlander5181 2 жыл бұрын
Why is the Mandelbrot Set displayed when coming to the part of Regular Expressions for detecting word patterns (44:08) ? Is there a closer connection between these two?
@anonymous71208
@anonymous71208 2 жыл бұрын
That's a good question, which I'm afraid only the original author of this particular slide could answer... I don't see any direct connection myself, but there's some research combining regex and fractals out there ( www.sciencedirect.com/science/article/abs/pii/S0096300312007291 )
@annaschlander5181
@annaschlander5181 2 жыл бұрын
@@anonymous71208 Hey thank you for the link and your fast reply! That looks very interesting.
@sushantupadhyay3976
@sushantupadhyay3976 3 жыл бұрын
Hey man.. awesome stuff to say the least... iam hooked... it is unbearable that the series o lectures is broken and quite a few appear to be missing !! is it intentional.. can you please upload ALL your lectures ?!
@Manu-he7bu
@Manu-he7bu 3 жыл бұрын
A few of the lectures are given by Mohsen Mesgar instead. You should find all of the 'missing' lectures here: kzbin.info/door/4VNXEo_buXs4v1dFfIuaew
@anonymous71208
@anonymous71208 3 жыл бұрын
Thanks! We split the lectures with my colleague Mohsen Mesgar this year, for the full overview see our Github pages: github.com/dl4nlp-tuda2021/deep-learning-for-nlp-lectures (there's slides and LaTeX source codes too)
@yamenajjour1151
@yamenajjour1151 3 жыл бұрын
Thanks for the Video !
@anonymous71208
@anonymous71208 3 жыл бұрын
Thanks, Yamen! :)
@zeys3316
@zeys3316 3 жыл бұрын
33:30 shouldn’t the result for max pooling 0,8 0,9 ?
@anonymous71208
@anonymous71208 3 жыл бұрын
The "blue" vector is (0.5, 0.2, -0.2, -0.9), so the second term is 0.5. Maybe the minus sign is not that visible at 33:30 but if you go back to 32:40, you'll notice it.
@zeys3316
@zeys3316 3 жыл бұрын
@@anonymous71208 true, my mistake🤦🏻‍♂️
@Raventouch
@Raventouch 3 жыл бұрын
Thanks Sir, the quality of your lecture is high as usual. Just a question though: What is Deel Learning? :P
@anonymous71208
@anonymous71208 3 жыл бұрын
Good catch! :) Thx
@Manu-he7bu
@Manu-he7bu 3 жыл бұрын
And also a typo on slide 31. Recall should be 0.4 or 40% for that matter
@anonymous71208
@anonymous71208 3 жыл бұрын
Thanks Manu! Fixed in the slides on github
@anonymous71208
@anonymous71208 3 жыл бұрын
Typo on Slide 30: Right bottom should be "True negative"
@johneric8382
@johneric8382 3 жыл бұрын
lol
@venkateshsadagopan2505
@venkateshsadagopan2505 3 жыл бұрын
Dear Ivan, Its a great initiative. Do we have access to assignments as well so that the theoretical knowledge in the videos can be practically applied in assignments?
@anonymous71208
@anonymous71208 3 жыл бұрын
Thank you! That's a fair point, but for now we're keeping them private as these are (mostly) graded... But I re-think this next year. Step by step :)
@JAaraMInato
@JAaraMInato 3 жыл бұрын
Congrats on getting that promotion/relocation! And thanks again for this lecture
@anonymous71208
@anonymous71208 3 жыл бұрын
Thanks, much appreciated!
@Boaque
@Boaque 3 жыл бұрын
Unfortunately I got different results at first, as German IP's are blocked from accessing the given gutenberg link. Using a VPN did the trick for me, in case anyone else is encountering the same issue (or use a web proxy link).
@anonymous71208
@anonymous71208 3 жыл бұрын
Good catch! Thanks for sharing your experience!
@jensbengrunwald6589
@jensbengrunwald6589 3 жыл бұрын
Nice last sentence :) that earned you a like As a youtuber you can also remind students to subscribe to your channel and ask to leave comments ^^
@anonymous71208
@anonymous71208 3 жыл бұрын
Oh, you're right! >>> SUBSCRIBE <<< and stuff like that... :)) Next time, I promise!