Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 3

Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 3 - Backprop and Neural Networks

Рет қаралды 103,665

Stanford Online

Күн бұрын

Пікірлер: 19

@kalp2586 2 күн бұрын

This lecture makes me appreciate how great of a teacher Andrej Karpathy is!

@AramR-m2w Жыл бұрын

🎯 Key Takeaways for quick navigation: 00:00 The *lecture focuses on the math details of neural net learning, introducing the backpropagation algorithm. Assignment 2 is released, emphasizing understanding the math of neural networks and associated software. The speaker discusses the struggle some students may face but encourages grasping the math behind neural networks for a deeper understanding. After this week, the course will transition to using software for complex math, and a tutorial on PyTorch is announced. The lecture introduces a named entity recognition task using a simple neural network and outlines the mathematical foundations for training neural nets through manual computation and backpropagation.* 27:56 Understanding *the process of calculating partial derivatives for a neural network involves breaking down complex functions into simpler ones and using the chain rule.* 32:40 To *efficiently train neural networks, the backpropagation algorithm is used to propagate derivatives using the matrix chain rule, reusing shared derivatives to minimize computation.* 37:22 In *neural network computations, the shape convention is employed, representing gradients in the same shape as parameters for efficient updates using stochastic gradient descent.* 43:58 There *is a discrepancy between the Jacobian form, useful for calculus, and the shape convention used for presenting answers in assignments to ensure correct shapes for gradient updates.* 48:10 Computation *graphs, resembling trees, are constructed to systematically exploit shared derivatives, aiding efficient backpropagation in neural network training.* 50:56 Backpropagation *involves passing gradients backward through the computation graph, updating parameters using the chain rule to minimize loss in neural network training.* 53:21 The *general principle in backpropagation is that the downstream gradient equals the upstream gradient times the local gradient, facilitating efficient parameter updates.* 57:32 During *backpropagation, local gradients are computed for each node, and the downstream gradient is calculated by multiplying the upstream gradient with the local gradient, enabling efficient gradient updates for parameters.* 59:18: Understanding *gradients helps assess the impact of variable changes on output in a computation graph.* 59:45: Changes *in z don't affect output, so gradient df/dz is 0; Changes in x affect output twice as much, with df/dx = 2.* 01:00:12: Calculating *gradients when there are multiple branches in the computation graph involves summing gradients.* 01:01:36: With *multiple outward branches, gradient calculation involves summing gradients using the chain rule.* 01:04:54: Backpropagation *avoids redundant computation, calculating gradients efficiently by systematically moving forward and backward through the graph.* 01:07:45: The *backpropagation algorithm applies to arbitrary computation graphs, but neural networks with regular layer structures allow for parallelization.* 01:11:52: Automatic *differentiation in modern deep learning frameworks like TensorFlow and PyTorch handles most gradient computation automatically.* 01:13:42: Symbolic *computation for derivatives, attempted in Theano, faced challenges, leading to the current approach in deep learning frameworks.* 01:14:36: Automatic *differentiation involves forward and backward passes, computing values and gradients efficiently through the computation graph.* 01:18:47: Numeric *gradient checking is a simple but slow method to verify gradient correctness, mainly used when implementing custom layers.* Made with HARPA AI

@manujarora5062 2 жыл бұрын

at 23:29 the reason for non diagonal of jacobian =0 are a little unclear. The derivative at non diagonal positions are zero because h1 is output for input z1 and no other input. Hence h1 would change dh1/dz1 for change in z1 and 0 for change in others i.e. z2, z3

@Wow-bk6ot 14 күн бұрын

37:11 : all the dimensions are done as m inputs and n outputs??? but in 19:07 it saids m outputs and n inputs???? shouldn't the dimension of W be (mxn) not (nxm)?

@nynaevealmeera 3 ай бұрын

In 42:37, why did \frac{\partial s}{\partial b} = h^T \circ f'(z) when it was equal to u^T \circ f'(z) in previous slides?

@ramongarcia8557 11 ай бұрын

At 40:26 there is a typo in the presentation (page 40). The result of the sum \sum W_{ik} x_k should be z_j rather than x_j

@samson6707 10 ай бұрын

the x_j is referring to the result of the derivativion of z = W_i,j * x_j with respect to W_i,j in this case.

@annawilson3824 Жыл бұрын

53:00 left-back-grad is interX-grad * right-back-grad

@GengyinLiu Жыл бұрын

may i get the tutorial of pyTorch mentioned at 4:35 from somewhere?

@nabarupghosh2880 8 ай бұрын

kzbin.info/www/bejne/i6eTcnyIp5ijqsk

@quannguyendinh335 Жыл бұрын

Thank you very much!

@qingqiqiu 2 жыл бұрын

great course

@nanunsaram 2 жыл бұрын

Thank you!

@manujarora5062 2 жыл бұрын

Question: How is f(z) Jacobian? My understanding: For a single neuron z is going to be a scalar. and f(z) output is also going to be scalar for a neuron. Can a Neuron ever output anything other than a scalar? Perhaps the jacobian holds for the overall network

@louisb8718 2 жыл бұрын

Jacobian is synonymous with derivative in the multivariable context. You can view the derivative of a univariate function as a 1x1 Jacobian matrix.

@manujarora5062 2 жыл бұрын

@@louisb8718 thanks that is an interesting take. I also realised that we are calibrating weights for all neurons and not just a single neuron. Hence while output of neuron is scalar but the layer has vector output.

@annawilson3824 Жыл бұрын

1:04:30