Automatic Differentiation in 10 minutes with Julia

Рет қаралды 52,728

Күн бұрын

Automatic differentiation is a key technique in AI - especially in deep neural networks. Here's a short video by MIT's Prof. Alan Edelman teaching automatic differentiation in 10 minutes using Julia. Time Stamps:
00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our KZbin videos to help with discoverability? Find out more here: github.com/JuliaCommunity/You...
Interested in improving the auto generated captions? Get involved here: github.com/JuliaCommunity/You...

Пікірлер: 20

@Hataldir 3 жыл бұрын

Great content, but this should have been a 30min video.

@haakonah 3 жыл бұрын

This is great, but I have to agree with the other comments: if there is more you can say about the details of automatic differentiation that would be helpful!

@TheJuliaLanguage 2 жыл бұрын

More details can be found in the 18.337 Parallel Computing and Scientific Machine Learning course! kzbin.info/www/bejne/sHmziXp4nrmAa6M

@diemilio 2 жыл бұрын

wow. great stuff

@Irfankhan-jt9ug 3 жыл бұрын

can you provide the link for this notebook?

@arnauarnauarnau 3 жыл бұрын

It's in the Julia tutorials repository on github: github.com/JuliaAcademy/JuliaTutorials/blob/master/introductory-tutorials/intro-to-julia/AutoDiff.ipynb

@gigagerard 3 жыл бұрын

Makes sense (lucky I once programmed a square root Newton algorithm when Java BigInteger didn't have it yet) And I was like: who is this guy? Ah, famous professor.

@jackn5509 2 жыл бұрын

Does anyone know a place where I can go to properly understand this concept?

@alexgian9313 Жыл бұрын

If you mean a place where it's really explained basically, like you'd teach it to a school kid, I have not found anything like that yet. But if you look up the basics of Dual Numbers and apply them to the elements of Calculus, then you can build it up quite quickly: For instance, at school you are taught that f'(x) = (f(x + dx) - f(x)) / dx Let's call dx "a.epsilon" , now rearrange the above to get f(x+a.epsilon) = f(x) + f'(x).a.epsilon So the input to your function is a dual number and so is its output. As somebody already pointed out (@John Doe), the coefficient of epsilon in this case will be 1, since that's the derivative of x wrt x. Notice that the above actually yields the Chain Rule FOR FREE! 😀 f(g(x+epsilon)) = f( g(x) + g'(x).epsilon)) NB: The coefficient of epsilon is g'(x) here = f(g(x)) + f'( g(x) ). g'(x).epsilon If you now were to add the generalized form of the Exponentation Rule, i.e. (u^v)' = (u^v) * (u' (v/u) + v' log(u) ) to the Product and Quotient Rules, you'll be able to do calculations involving powers (including Sqrt) WITHOUT needing to resort to Babylonian or any other iterative algorithms! This is what it looks like in Julia, following the style of Edelman in this video. (I hope I typed it in correctly, cut-n-paste unavailable right now) ^(x::D,y::D) = D(( x.f[1]^y.f[1], # first part of Dual, i.e. the function evaluation x.f[1]^y.f[1] * # (u^v) * (x.f[2]* y.f[1]/x.f[1] # u'(v/u) + f.f[2]*log(x.f[1])))) # + v' log(u) e.g. > D((2,1))^0.5 === > D((1.4142...blah, 0.353553...blah)) # iree - no Babahlonian algorithm used mon

@j121212100 2 жыл бұрын

Wow really interesting approach. Does it requires knowing the derivative in the first place?

@HarishNarayanan 2 жыл бұрын

No, you do not need to know the derivative of the entire expression, just derivatives of simple building blocks (like multiplication or exponentiation.)

@brettknoss486 3 жыл бұрын

I’m not sure from the documentation, how to take a derivative at a certain value of x.

@karim123098 Жыл бұрын

There is a bug in the convert function. Rather than initializing the gradient to 0 you should be initializing it to 1. The reason why is because the conversion should map a real number to the identity morphism, which for the category of differentiable functions is a function whos derivative is 1. You even knew it was wrong because when you ran blocks 8 and 9 you used 1 as the initial gradient, not 0.

@alexgian9313 Жыл бұрын

Nice catch. As a Julia beginner, figuring out from first principles, I wondered about that...

@ruanshipaiqiu 8 ай бұрын

I think it is correct to initialise the derivative to zero; this is the forward mode, for a constant scalar c, its tangent w.r.t the input x, dc/dx should always be zero. However, for x itself, dx/dx=1 by definition. If however, all non-dual's derivatives initialise as 1, then you are computing the directional derivative (of direction [1,1,1,1...]) instead of partials.

@ericren764 2 жыл бұрын

There is no subject in this video. Neither autodiff nor the dual number were explained properly.

@charlesperry7300 2 жыл бұрын

Julia seems to be competive with Matab. Matalb is powerful for interactive scientific computation, and easy to use. Julia is too overwhelming for me.

@ayush9psycho 2 жыл бұрын

overwhelming no!! It's as easy as python!!

@navjotsingh2251 2 жыл бұрын

@@ayush9psycho I still think matlab is easier (I use julia, matlab and swift). Matlab makes machine learning feel like you’re walking on a cloud, it just works and works very well.