Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 3 - Backprop and Neural Networks

  Рет қаралды 103,665

Stanford Online

Stanford Online

Күн бұрын

Пікірлер: 19
@kalp2586
@kalp2586 2 күн бұрын
This lecture makes me appreciate how great of a teacher Andrej Karpathy is!
@AramR-m2w
@AramR-m2w Жыл бұрын
🎯 Key Takeaways for quick navigation: 00:00 The *lecture focuses on the math details of neural net learning, introducing the backpropagation algorithm. Assignment 2 is released, emphasizing understanding the math of neural networks and associated software. The speaker discusses the struggle some students may face but encourages grasping the math behind neural networks for a deeper understanding. After this week, the course will transition to using software for complex math, and a tutorial on PyTorch is announced. The lecture introduces a named entity recognition task using a simple neural network and outlines the mathematical foundations for training neural nets through manual computation and backpropagation.* 27:56 Understanding *the process of calculating partial derivatives for a neural network involves breaking down complex functions into simpler ones and using the chain rule.* 32:40 To *efficiently train neural networks, the backpropagation algorithm is used to propagate derivatives using the matrix chain rule, reusing shared derivatives to minimize computation.* 37:22 In *neural network computations, the shape convention is employed, representing gradients in the same shape as parameters for efficient updates using stochastic gradient descent.* 43:58 There *is a discrepancy between the Jacobian form, useful for calculus, and the shape convention used for presenting answers in assignments to ensure correct shapes for gradient updates.* 48:10 Computation *graphs, resembling trees, are constructed to systematically exploit shared derivatives, aiding efficient backpropagation in neural network training.* 50:56 Backpropagation *involves passing gradients backward through the computation graph, updating parameters using the chain rule to minimize loss in neural network training.* 53:21 The *general principle in backpropagation is that the downstream gradient equals the upstream gradient times the local gradient, facilitating efficient parameter updates.* 57:32 During *backpropagation, local gradients are computed for each node, and the downstream gradient is calculated by multiplying the upstream gradient with the local gradient, enabling efficient gradient updates for parameters.* 59:18: Understanding *gradients helps assess the impact of variable changes on output in a computation graph.* 59:45: Changes *in z don't affect output, so gradient df/dz is 0; Changes in x affect output twice as much, with df/dx = 2.* 01:00:12: Calculating *gradients when there are multiple branches in the computation graph involves summing gradients.* 01:01:36: With *multiple outward branches, gradient calculation involves summing gradients using the chain rule.* 01:04:54: Backpropagation *avoids redundant computation, calculating gradients efficiently by systematically moving forward and backward through the graph.* 01:07:45: The *backpropagation algorithm applies to arbitrary computation graphs, but neural networks with regular layer structures allow for parallelization.* 01:11:52: Automatic *differentiation in modern deep learning frameworks like TensorFlow and PyTorch handles most gradient computation automatically.* 01:13:42: Symbolic *computation for derivatives, attempted in Theano, faced challenges, leading to the current approach in deep learning frameworks.* 01:14:36: Automatic *differentiation involves forward and backward passes, computing values and gradients efficiently through the computation graph.* 01:18:47: Numeric *gradient checking is a simple but slow method to verify gradient correctness, mainly used when implementing custom layers.* Made with HARPA AI
@manujarora5062
@manujarora5062 2 жыл бұрын
at 23:29 the reason for non diagonal of jacobian =0 are a little unclear. The derivative at non diagonal positions are zero because h1 is output for input z1 and no other input. Hence h1 would change dh1/dz1 for change in z1 and 0 for change in others i.e. z2, z3
@Wow-bk6ot
@Wow-bk6ot 14 күн бұрын
37:11 : all the dimensions are done as m inputs and n outputs??? but in 19:07 it saids m outputs and n inputs???? shouldn't the dimension of W be (mxn) not (nxm)?
@nynaevealmeera
@nynaevealmeera 3 ай бұрын
In 42:37, why did \frac{\partial s}{\partial b} = h^T \circ f'(z) when it was equal to u^T \circ f'(z) in previous slides?
@ramongarcia8557
@ramongarcia8557 11 ай бұрын
At 40:26 there is a typo in the presentation (page 40). The result of the sum \sum W_{ik} x_k should be z_j rather than x_j
@samson6707
@samson6707 10 ай бұрын
the x_j is referring to the result of the derivativion of z = W_i,j * x_j with respect to W_i,j in this case.
@annawilson3824
@annawilson3824 Жыл бұрын
53:00 left-back-grad is interX-grad * right-back-grad
@GengyinLiu
@GengyinLiu Жыл бұрын
may i get the tutorial of pyTorch mentioned at 4:35 from somewhere?
@nabarupghosh2880
@nabarupghosh2880 8 ай бұрын
kzbin.info/www/bejne/i6eTcnyIp5ijqsk
@quannguyendinh335
@quannguyendinh335 Жыл бұрын
Thank you very much!
@qingqiqiu
@qingqiqiu 2 жыл бұрын
great course
@nanunsaram
@nanunsaram 2 жыл бұрын
Thank you!
@manujarora5062
@manujarora5062 2 жыл бұрын
Question: How is f(z) Jacobian? My understanding: For a single neuron z is going to be a scalar. and f(z) output is also going to be scalar for a neuron. Can a Neuron ever output anything other than a scalar? Perhaps the jacobian holds for the overall network
@louisb8718
@louisb8718 2 жыл бұрын
Jacobian is synonymous with derivative in the multivariable context. You can view the derivative of a univariate function as a 1x1 Jacobian matrix.
@manujarora5062
@manujarora5062 2 жыл бұрын
@@louisb8718 thanks that is an interesting take. I also realised that we are calibrating weights for all neurons and not just a single neuron. Hence while output of neuron is scalar but the layer has vector output.
@annawilson3824
@annawilson3824 Жыл бұрын
1:04:30
@susdoge3767
@susdoge3767 8 ай бұрын
guess im not really missing out on something not being in stanford
@jorgesanabria6484
@jorgesanabria6484 7 ай бұрын
haha why do you say that?!
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 20 МЛН
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 589 М.
Understanding AI from Scratch - Neural Networks Course
3:44:18
freeCodeCamp.org
Рет қаралды 516 М.
Deep Learning Interview Prep Course
3:59:50
freeCodeCamp.org
Рет қаралды 536 М.
Lecture 1 | String Theory and M-Theory
1:46:55
Stanford
Рет қаралды 2,3 МЛН
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 18 МЛН