DeepMind x UCL | Deep Learning Lectures | 2/12

DeepMind x UCL | Deep Learning Lectures | 2/12 | Neural Networks Foundations

Рет қаралды 111,271

Күн бұрын

Пікірлер: 89

@leixun 4 жыл бұрын

*DeepMind x UCL | Deep Learning Lectures | 2/12 | Neural Networks Foundations* *My takeaways:* *1. Plan for this lecture **0:40* *2. What is not covered in this lecture **1:59* *3. Overview* 3.1 Neural network applications 3:59 3.1 What started the deep learning revolution 5:10 *4. Neural networks **9:17* *5. Single-layer neural networks **17:14* 5.1 Activation function: sigmoid 17:53 5.2 Loss function: cross-entropy for binary classification 19:50 5.3 Final activation function for multi-class classification: softmax 23:15 5.4 Uses 26:01 5.5 Limitations 27:28 *6. Two-layer neural networks **28:15* *7. Tensorflow playground **32:34* *8. Universal Approximation Theorem **33:55* *9. Deep neural networks **40:29* 9.1 Activation function: ReLU 41:05 9.2 Intuition behind network depth 44:20 9.3 Computational graphs 49:00 *10. Learning/training **52:27* 10.1 Optimizer: Gradient descent 53:24 10.2 Optimizers that are built on gradient descent: Adam, RMSProp 54:39 10.3 Computational graphs for training 55:45 10.4 Backpropagation, chain rule 57:15 10.5 Linear layers as computational graph 1:00:48 10.6 ReLU layers as computational graph 1:02:30 10.7 Softmax as computational graph 1:03:09 10.8 Cross-entropy as computational graph 1:04:04 10.9 "Cross-entropy Jungles" 1:06:00 10.10 Computational graph example: 3-layer MLP with ReLU 1:06:58 *11. Pieces of the puzzle: max, conditional execution **1:08:20* *12. Practical issues **1:10:44* 12.1 Overfitting and regularization 1:10:51 12.2 Lp regularization 1:12:44 12.3 Dropout 1:13:14 12.4 As models grow, their learning dynamics changes: double descent 1:13:32 12.5 Diagnosing and debugging 1:16:15 *13. Bouns: Multiplicative interactions **1:19:48*

@Amy_Yu2023 4 жыл бұрын

Lei Xun Thanks for sharing

@rauljrlara9994 3 жыл бұрын

Fool

@kamilziemian995 2 жыл бұрын

Thank you.

@intuitivej9327 3 жыл бұрын

Fantastic~!! 숨을 죽이며 되돌려 보기를 몇 번.. 정말 좋은 강의다.. 문과출신 경력단절 10년 아이 둘 엄마인 내가 인공지능을 알아가는 재미에 빠짐.... 수학은 알수록 참 많은 이야기를 담고 있다.... math is not about numbers but logic and... storytelling as well. Thank you so much from South Korea.

@lukn4100 3 жыл бұрын

Great lecture and big thanks to DeepMind for sharing this great content. - Wojciu, dalej tak. Super!

@Alex-ms1yd 4 жыл бұрын

oh, this are just projections!.. Its such a great intuition, it finally clicked now.. thanks!

@cryptorevolution9547 3 жыл бұрын

We appreciate your compliment..for more guidance WhatsApp.....+::1,,,3,,,,,,1,,,,,,3,,,,,,5,,,,,,3,,,,,,9,,,,,,8,,,,,,.2, ,,,,,7,,,,,,0,,,,,,,,

@raghavram6419 4 жыл бұрын

Wow! Great lecture covering the fundamentals. I liked the focus on computational graphs and on reasoning WHY certain components work or don't work .

@SudhirPratapYadav 3 жыл бұрын

same, usually engineering side of things are missing in deep learning lectures

@annercamping 4 жыл бұрын

my auto play was on and i just woke up to this

@glmchn 2 жыл бұрын

🤣

@senorperez 4 ай бұрын

lol

@eduardoriossanchez3393 4 жыл бұрын

53:17 I think there is a mistake in the Jacobian's formula, in the left-down corner.

@pranavpandey2965 4 жыл бұрын

yes, it should have been df_k/dx_1

@danielpark6010 4 жыл бұрын

20:50 May I have a brief explanation about what "logarithm of probability of correct/entirely correct classification" in these two slides means? What is the significance of it and why is it helpful to negate it?

@luisleal4169 3 жыл бұрын

Lot of reasons and statistical considerations, but as intuitive argument(not proof) negating it helps interpreting small values as good and big values as bad, exactly what a loss function in ML is interpreted.

@Theoneandonly_Justahandle 2 жыл бұрын

You might also look at shannon`s information entropy and associated measures: en.wikipedia.org/wiki/Quantities_of_information . In short, -log(p) is the a measure of the amount of information an event e with prob. p carries. Intuitively, it tells you how often you`d have to divide your space of possibilities in half in order to locate/find the event e with absolute certainty among all other events. (see also kzbin.info/www/bejne/rGebq4yvlqqge6M&ab_channel=3Blue1Brown for a great explanation)

@bingeltube 4 жыл бұрын

Very recommendable! However, the tiny inserts providing further reading recommendations are hard to read. I suggest these recommended readings should be included in the text section underneath the video link.

@synaesthesis 4 жыл бұрын

The slides are in the video description; you can copy the titles and authors of the readings from there. :)

@bingeltube 4 жыл бұрын

@@synaesthesis thank you! Pardon my oversight! :-)

@nguyenngocly1484 4 жыл бұрын

You can turn artificial neural networks inside-out by using fixed dot products (weighted sums) and adjustable (parametric) activation functions. The fixed dot products can be computed very quickly using fast transforms like the FFT. Also the number of overall parameters required is vastly reduced. The dot products of the transform act as statistical summary measures. Ensuring good behavour. See Fast Transform (fixed filter bank) neural networks.

@JY-pf7bc 4 жыл бұрын

Intuitive, fun, weaved with valuable experience and new research results. Excellent!

@xmtiaz 4 жыл бұрын

33:29 33:29 "Play is the highest form of research" - Albert Einstein

@iinarrab19 4 жыл бұрын

Intuitive and explains in detail.

@pervezbhan1708 2 жыл бұрын

kzbin.info/www/bejne/qJC0YmWLfsuAoqc

@christopherparsonson7119 4 жыл бұрын

Are the slides for this series available?

@vmikeyboi323 4 жыл бұрын

this

@jingtao1181 4 жыл бұрын

@@vmikeyboi323 where?

@mateusdeassissilva8009 4 жыл бұрын

@@vmikeyboi323 where?

4 жыл бұрын

@@jingtao1181 When people just comment "this", it usually means something like "I agree". Don't ask my why the word "this", I don't know either.

@jingtao1181 4 жыл бұрын

@ thanks for the reminder

@neurophilosophers994 4 жыл бұрын

If a NN can't compute distances without multiplicative dot products how did Alpha Fold calculate the evolution of protein folding states entirely based on distances?

@havelozo 2 жыл бұрын

What software was used to make these slides?

@jingtao1181 4 жыл бұрын

Thank you for the lecture! In 3e-5, what does e stand for?

@AndreyButenko 4 жыл бұрын

That is scientific notation: 3e-5 is the same as 3 * 10 ^ -5 which is 0.00003 😊

@jingtao1181 4 жыл бұрын

@@AndreyButenko Thank you so much. so this means that setting the learning rate at 0.00003 is really helpful?

@bingeltube 4 жыл бұрын

e stands for Euler's constant and it refers in this case to the exponential function see e.g. en.wikipedia.org/wiki/E_(mathematical_constant)

@jingtao1181 4 жыл бұрын

@@bingeltubeHi, I was wondering whether e is a constant before, but it does not make sense to me either. If e is a constant approximately 2.7, then 3e - 5 = 3.1. However, I think an effective learning rate is somewhat between 0.001-0.1.

@jingtao1181 4 жыл бұрын

@@bingeltube I think Andrey's answer makes more sense.

@JakobGille2 4 жыл бұрын

Me: being happy about a deep learning lecture. Also me: seeing complicated formulas and closing the tab.

@speedfastman 4 жыл бұрын

Don't let it discourage you!

@jingtao1181 4 жыл бұрын

just focus on the concept. There are major APIs that can do the math for you.

@yifanyang806 4 жыл бұрын

It’s not that complicated, don’t give up.

@sumanthnandamuri2168 4 жыл бұрын

It will be even better if you can release course assignments to public for practice

@taimurzahid7877 4 жыл бұрын

Can we please get the slides for these series?

@Aeradill 4 жыл бұрын

They are in the video description Taimur

@davidolushola3419 4 жыл бұрын

Wow it's awesome. Please can I get the slide for this lecture. Thanks 😊

@SudhirPratapYadav 3 жыл бұрын

its in video description

@luksdoc 4 жыл бұрын

A very nice lecture.

@lizgichora6472 3 жыл бұрын

Thank you.

@wy2528 4 жыл бұрын

It is super clear

@marcospereira6034 4 жыл бұрын

The explanations are a little too handwavy in this lecture. Wojciech seems to assume some intuitions are obvious when they aren't for someone without a lot of experience in the field. For example, when showing how sigmoids can emulate an arbitrary function, he said we can just "average" the sigmoid and the reversed sigmoid to form a bump, without mentioning that this averaging comes from the softmax (or am I wrong and it is something else?).

@marcospereira6034 4 жыл бұрын

Still, I appreciate the effort and think the lecture is great overall. Just need to complement it with other sources. Speaking of which, I would recommend Chris Olah's post on understanding neural networks through topology: colah.github.io/posts/2014-03-NN-Manifolds-Topology/ Thank you guys at deepmind for this course!

@marcospereira6034 4 жыл бұрын

Also at 1:22:06, it's hard to interpret the graphs - it's not immediately obvious what blue and green represent. Why would one assume the labels are obvious? Graphs should always be labelled 🙂

@AeroGDrive 4 жыл бұрын

I just think that the averaging with the reverse sigmoid comes from a second neuron, in fact he says there are 6 neurons, and each pair of neurons is the sigmoid + reverse sigmoid, then we have 3 resulting bells (as in the graph) which are then weighted average in the next layer

@eyeofhorus1301 4 жыл бұрын

@@marcospereira6034 And if you know nothing like me there's soo soooo much more that he's hand wavy about and expects to grasp incredibly quickly... lol

@heyna88 2 жыл бұрын

Idk. I felt the entire explanation was quite superficial. I’m not sure whether this depends on the audience though

@TheNotoriousPhD 4 жыл бұрын

Thanks for this Interesting series. Any place I can get related math knowledge ?

@123456wei 4 жыл бұрын

same, would love to have some links to related math content

@SudhirPratapYadav 3 жыл бұрын

Go for Engineering Mathematics (rather than pure, note same topics you will find in pure mathematics too, but learn it from engineering perspective, thus select courses/books named 'engineering mathematics') Topics -> Linear Algebra, Multivariate Calculus, Optimisation, graph theory, discrete mathematics & Most important in this case Numerical Methods for _________ (fill here whatever you like)

@SudhirPratapYadav 3 жыл бұрын

I forgot to mention - Finite point airthmatic

@jonathan-._.- 4 жыл бұрын

:o there is a writing mistake in "neural networks as computational graphs" sotfmax instead of softmax

@ramakanthrama8578 4 жыл бұрын

haha

@mahsaabtahi6633 4 жыл бұрын

I have to try hard to hear and understand the lecturer, I wish he spoke more clearly.

@salomeolusoji7551 4 жыл бұрын

Use caption

@aromax504 4 жыл бұрын

Can someone explain in a simpler way the term '"Numerically Stable"

@aromax504 4 жыл бұрын

@Prasad SeemakurthiThank you Prasad. It was really healpful

@marcospereira6034 4 жыл бұрын

numerical stability refers to the accumulation of errors. an algorithm that is numerically unstable is sensitive to errors and allows them to accumulate, causing the final result to diverge from the correct value.

@SudhirPratapYadav 3 жыл бұрын

I need to clarify one more thing -- here 'error' really comes from converting thing from continuous to discrete. Basically you can't implement (actually you can by analytical/symbolic maths but leaving it aside) continuous computations in computers. So what we do --> We discrete (convert to numbers in this case) and thus there are some 'errors' which accumulate and computations don't converge (they usually blow up to infinity or to 0) -> thus even that function is theoretically (in sense continuous space) converging. It will not do so in actual computation on computers --> thus numerically unstable.

@s3zine342 3 жыл бұрын

35:57

4 жыл бұрын

1:04:30 Hold on… Are you telling me that, given a neural network and a set of weights, I could generate a picture of the most doggish dog ever? :D Edit: The answer is "yes"! kzbin.info/www/bejne/qZm5fJuForljfqc