Neural Networks (2): Backpropagation

  Рет қаралды 44,749

Alexander Ihler

Alexander Ihler

Күн бұрын

Пікірлер: 15
@dinkarkumar292
@dinkarkumar292 5 жыл бұрын
After going through a lot of articles, it is this video through which I understood the backpropagation logic clearly. Thanks a lot.
@StarProduction369
@StarProduction369 8 жыл бұрын
I've been studying this subject for quite a while but cannot understand it that well but my interest is to get the program and play with it can you steerme in the right direction on which programs to use, so I can play with the neural Networks program an developed the ultimate software program..Thanks
@ShravanKumar147
@ShravanKumar147 7 жыл бұрын
Hi, can you provide the slides here. Thank you
@jeffthom3155
@jeffthom3155 8 жыл бұрын
Hi teacher. Great explanation . Won another registered ! Greetings from Brasil.
@konstantinburlachenko2843
@konstantinburlachenko2843 9 жыл бұрын
Hi. Thanks for video. I will be very very-very-very glad if you answered my question here. It can be shown via differential theory and counting methods that (1) Gradient times -1 is the vector in function domain where function decrease more fast. (2) that convex function has one local and global minimum (3) linear function is convex (4) Composition of convex non decreasing function(convex function) of two convex function is convex.....My point is that gradient descent is wrong method to solve this task because 1/(1+exp(-x)) is not convex. Am i wrong? It's the question for which I dream to know the answer.
@AlexanderIhler
@AlexanderIhler 9 жыл бұрын
+Konstantin Burlachenko Thanks for the interest. I'm not sure I understand your question -- but: (1) the logistic function, 1/(1+exp(-x)), is convex; hence logistic regression (a 1-layer NN with "logistic" or "cross entropy" loss) is convex and relatively easy to optimize (2) a multi-layer NN is not convex in its parameters (3) gradient descent, as an optimization strategy, is very useful on both convex and non-convex objective functions. For non-convex objectives, we generally have no guarantees on the quality of the local minimum found (compared to the global minimum), although in *some* problems such claims can be made. However, for general non-convex problems there are typically few guarantees for any optimization approach. So I don't see that as a drawback of gradient descent. (4) As a more advanced method, Newton or pseudo-Newton methods may be preferable, as they can take adaptive steps depending on the curvature (2nd derivative) of the loss function. Typically these are used to replace batch gradient descent, when the data set size is not too large.
@konstantinburlachenko2843
@konstantinburlachenko2843 9 жыл бұрын
+Alexander Ihler Thanks (1) But to be honest I don't understand why it is convex... (A) #!/usr/bin/env python import math def sigmoid(x): #return 1.0/(1+math.exp(-x)) return x*x alpha = 0.0 x1 = -10.0 x2 = 10.0 while (alpha < 1.0): if (sigmoid(alpha*x1 + (1-alpha)*x2) > alpha * sigmoid(x1) + (1-alpha)*sigmoid(x2)): print "PROBLEM" alpha += 0.01 (B) Visually it seems that sigmoid'' change it's curvature.
@AlexanderIhler
@AlexanderIhler 9 жыл бұрын
+Konstantin Burlachenko It does change its curvature -- but its curvature is always positive, hence it is convex.
@konstantinburlachenko2843
@konstantinburlachenko2843 9 жыл бұрын
+Alexander Ihler I disagree with you (C) #!/usr/bin/env python import math def sigmoid(x): # return x*x return 1.0/(1+math.exp(-x)) def sigmoidSecondDerivative(x): h = 0.01 return (sigmoid(x-h) - 2*sigmoid(x) + sigmoid(x+h))/(h*h) print sigmoidSecondDerivative(-0.5) print sigmoidSecondDerivative(0.5) (D) It change sign of f'' in zero point mathworld.wolfram.com/SigmoidFunction.html
@AlexanderIhler
@AlexanderIhler 9 жыл бұрын
+Konstantin Burlachenko I'm sorry -- are you arguing that the sigmoid function is non-convex (true), or that its log is non-convex (false)? For logistic regression we require that it be log-convex, due to the form of the log-likelihood loss function; this is what I was referring to. (Sorry for any confusion.) The convexity / non-convexity of the non-linearities are not so important, except as they contribute to non-convexity of the loss function.
@Varuu
@Varuu 8 жыл бұрын
wow thank you a lot!
@konstantinburlachenko2843
@konstantinburlachenko2843 9 жыл бұрын
Thanks
@zsh3969
@zsh3969 5 жыл бұрын
Hi Alexander, Very nice work. I wrote the code in matlab but the results are not promising. If you can help me to confirm if the code is right or I am missing something. If you are welling to help, please send me you email and I will send my code to check it. Thanks
Ensembles (1): Basics
6:53
Alexander Ihler
Рет қаралды 64 М.
Backpropagation Details Pt. 1: Optimizing 3 parameters simultaneously.
18:32
StatQuest with Josh Starmer
Рет қаралды 219 М.
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 54 МЛН
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 35 МЛН
Gradient descent, how neural networks learn | DL2
20:33
3Blue1Brown
Рет қаралды 7 МЛН
Neural Networks (1): Basics
13:52
Alexander Ihler
Рет қаралды 227 М.
MIT 6.S191: Convolutional Neural Networks
1:07:58
Alexander Amini
Рет қаралды 101 М.
Lecture 10 - Neural Networks
1:25:16
caltech
Рет қаралды 447 М.
Tutorial 6-Chain Rule of Differentiation with BackPropagation
13:43
Neural Networks Part 7: Cross Entropy Derivatives and Backpropagation
22:08
StatQuest with Josh Starmer
Рет қаралды 136 М.
Watching Neural Networks Learn
25:28
Emergent Garden
Рет қаралды 1,4 МЛН
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 18 МЛН
Back Propagation in training neural networks step by step
32:48
Bevan Smith 2
Рет қаралды 52 М.
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 54 МЛН