SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)

Рет қаралды 45,673

Күн бұрын

Пікірлер: 119

@PM-4564 4 жыл бұрын

"If you young kids don't know what an RBF kernel is ... you map it into an infinite space using Guassian Kernels ... yeah ... maybe wikipedia is better at that than I am"

@jsonm05 4 жыл бұрын

I laughed so hard with this one.

@herp_derpingson 4 жыл бұрын

At the first glance this looks like an incredibly complex paper. Thank you for explaining it so simply.

@donniedorko3336 4 жыл бұрын

This a thousand times. I just had my "but couldn't you do it with sine waves" a couple hours ago and this is a great intro

@MiroslawHorbal 4 жыл бұрын

I said it before and I'll say it again. Thank you very much for these videos. You are saving me (and I imagine others in the community) a lot of time having to parse through the details of these papers.

@DanielHesslow 4 жыл бұрын

A small note on the why the gradient of the SDF is 1 almost everywhere: The SDF is just the (signed) distance to the closest point, so the gradient will of course point in the opposite direction to that point and and if you move one unit away from that point the SDF will increase by one. Hence the gradient is one. The almost everywhere part is just that there may be multiple points equally far away or that you are exactly at another point. Also not sure if it was mentioned but the sign is just representing if we're inside or outside of an object.

@imranibrahimli98 4 жыл бұрын

Thank you!

@SunilKumar-zd5kq 4 жыл бұрын

Why does the gradient point to the opposite direction?

@SunilKumar-zd5kq 4 жыл бұрын

Why does the gradient point opposite?

@DanielHesslow 4 жыл бұрын

@@SunilKumar-zd5kq Its the direction where the distance to the closest point increases the fastest.

@PhucLe-qs7nx 4 жыл бұрын

Was reading this paper last night and get confused about "one image is one dataset" as well. So glad what I finally understood is actually true.

@gaoxinlipai 4 жыл бұрын

Please explain this paper "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains ", which is highly related to SIREN

@yangzihaowang3205 3 жыл бұрын

The second half of this talk explained the paper you mentioned. kzbin.info/www/bejne/moG6fayYpZl_gpI It's from one of the co-authors of the paper.

@firedrive45 Жыл бұрын

Few mistakes, around 12:20, its not the Laplacian, since the symbol is the upside down triangle ^2, this is delta, the difference between the ground truth and the result. Also 6:36, subpixel values for images can be done using linear intepolation or other interpolation between the values of the pixels surrounding the subpixel region.

@bluel1ng 4 жыл бұрын

As Yannic says at 17:30: sin(x) has been tried many times before, vanilla and also in combinations like SinReLU. I currently doubt that (even with special initialization) vanilla sin(x) outperforms ReLU & variants like SELU in many classic applications (like classification networks). From my personal experience I can confirm that sin(x) converges faster. The differences of the 3D scene reconstruction-results that they show in their paper-video are impressive though. Maybe it is worth trying sin(x) for generative models.

@Kram1032 4 жыл бұрын

To me it seems like the idea here isn't to use these SIRENs in the same scenarios where a classic RELU or something would be used, but rather, this could be used to augument data. The usual networks could, instead of the original data, work on the weights of some SIREN network. It's basically really powerful data compression. An alternative to something like an Encoder-Decoder with a bottleneck: The SIREN is the vector you'd normally have at that bottleneck.

@bluel1ng 4 жыл бұрын

@@Kram1032 Yes, I just felt I have to say that it may not give magic results in all cases, e.g. the tweet by Geoffrey Hinton about this paper could have been a bit misleading in this respect: twitter.com/geoffreyhinton/status/1273686900285603843?s=19

@Kram1032 4 жыл бұрын

@@bluel1ng Oh yeah, that is very misleading. It works much better than RELUs *for this kind of task* - basically anything where you might suspect fourier analysis to be somehow interesting, I suspect. Since that's kinda all this is: Learning some sort of fourier-like representation of a single given image.

@hyunsunggo855 4 жыл бұрын

@@alexwaese-perlman8788 Right? If adversarial attacks work because of the linearity of a neural network then it should be less vulnerable against them.

@bluel1ng 4 жыл бұрын

@@alexwaese-perlman8788 Yes, the output is bounded and still sin(x) for some epsilon interval is nearly linear (gradient at 0 is 1). Also Important might be that the gradient does not approach 0 asymptotically (as it is the case for tanh or fermi) - that means even when the optimizer jumps too far there is a chance to recover.

@Twilightsfavquill 4 жыл бұрын

As a CogSci, I have to recommend you to review the paper "GAIT-prop: A biologically plausible learning rule derived from backpropagation of error" by Ahmad, van Greven and Ambrogini. I feel like a more bio-inspired way of encoding propagation of error signals through processing networks could hold potential for the investigation of functional behavior by for example Drosophila

@sayakpaul3152 4 жыл бұрын

I don't know if it would have been possible for me to understand this paper all by myself. Thank you so much, Yannic.

@kristiantorres1080 4 жыл бұрын

dude...you deserve a new table! Let's chip in so you can buy a new one. Very good job explaining this interesting paper. Thank you!

@YannicKilcher 4 жыл бұрын

Haha, don't worry if I treat it nicely it usually returns the favor :)

@proreduction 4 жыл бұрын

Great summary Yannic. I am a PhD student with focus in CNNs for classification of binary images and presented this at journal club. Your explanation of implicit neural representations was inspiring.

@proreduction 4 жыл бұрын

Only change I would make it to emphasize that a(x) is the ground truth around 33:15

@donniedorko3336 4 жыл бұрын

Just saw that you broke this into sections for easier access. You're beautiful. Thank you

@andrewmao5747 4 жыл бұрын

Thank you for the simple and intuitive explanation of what at first glance looked like a dense and difficult paper.

@maloxi1472 4 жыл бұрын

44:04 I think a more intuitive way to formulate this point would be to say that we want the boundary to be a level set for the neural representation, since the signed distance is supposed to be zero everywhere on that boundary

@AZTECMAN 4 жыл бұрын

After watching the first few minutes of your Gradient Origin Networks video, I am realizing something: SIREN seems alot like a static shader. That is, a shader (ie shadertoy.com) is defined by input coordinates (x,y) and output colors. The mapping is typically non-linear. One major difference is, for shaders, we often use a frame count (time) as an input as well. However, it's perfectly possible to craft static shaders.

@YannicKilcher 4 жыл бұрын

True. Nice connection.

@rahuldeora1120 4 жыл бұрын

Wow that was quick!! Thanks for this

@PYu-ys8ym 4 жыл бұрын

Thank you for making this videos!!! It surely saves me a ton of time. Thank you again! Please keep making more!!

@JTMoustache 4 жыл бұрын

The almost everywhere comes from measure theory, it means that it should hold except for elements of omega with measure zero (the boundary omega_zero does not have measure zero, but a distance to the boundary of zero). But it seems a bit overkill to state it there..

@donniedorko3336 4 жыл бұрын

Good explanation. Thank you

@pepe_reeze9320 4 жыл бұрын

Great paper. I’m glad, the Broader Impact reads like a persiflage of broader impact. We usually call such text „bullshit bingo“. Simply drop it, people!

@xXMockapapellaXx 4 жыл бұрын

Thank you for making this paper so understandable

@KhaledSharif1993 2 жыл бұрын

At 33:35 what do you mean by "the more differentiable you make it"? I thought functions were either differentiable or not.

@priyamdey3298 2 жыл бұрын

21:43 what about using an exponential function?

@edbeeching 4 жыл бұрын

I think the broader impact statement is there because this is a NeurIPS submission and they require an impact statement this year.

@alex-nk5dt 3 жыл бұрын

this was super helpful, thank you so much!

@CristianGarcia 4 жыл бұрын

Was just viewing the video from the authors via a Hinton tweet and Yannic already has a video? :o

@slackstation 4 жыл бұрын

The fastest gun in the west. I don't know if it's because I asked for this paper but as always, thank you.

@bithigh8301 2 жыл бұрын

Nice video! Yes, start monetizing, your videos are priceless :)

@anthrond 4 жыл бұрын

Yannic, are you familiar with Stephane Mallat's work in physically based neural networks? He talks a lot about using wavelet functions to improve the function approximations of neural networks. Sirens' use of sine activation functions reminded me of that.

@isbestlizard 2 жыл бұрын

THis is interesting because if errors can be backpropogated as phase shifts rather than magnitude changes you can have as many layers as you like and the error doesn't decay away

@kirak Жыл бұрын

Wow this helps me a lot. Thank you!

@heinrichvandeventer357 4 жыл бұрын

Yannic, that function has a similar structure to the mathematics used in physics a function F that depends on coordinates x, function phi, and the gradient of phi w.r.t. x and higher order derivatives. Look at Laplace's equation, the Lagrangian in classical mechanics, and other functionals (oversimplified: functionals map functions to numbers).

@tanguydamart8368 4 жыл бұрын

In the "Poisson image editing section", I do not understand what the training dataset is. But, you said that the data to fit is always (x,y) -> (r,g,b) (or here (x, y) -> (luminosity) since it's gray scale). But in this case we don't know the composite image since we are trying to generate it. So what does phi(x) produce ?

@YannicKilcher 4 жыл бұрын

good question. it doesn't have to be rgb, you can also use sirens to fit the gradients, which is what they do here. so you train xy -> gradient of rgb and then you inference xy -> rgb

@hieuza 4 жыл бұрын

13:51 LOL! I like your sense of humor 😉

@sistemsylar 4 жыл бұрын

Weird I never understood the math in these papers because I thought it would be way too abstract, but then I realize it's not abstract at all!

@gabrigamer00skyrim Жыл бұрын

Great video! Isn't the proposed initialization just Uniform Xavier?

@joirnpettersen 4 жыл бұрын

Is approximating the gradient for the real image as simple as (2/imageSize)(a-b), where a and b are the pixel values to the left and right, and the same for above and below (assuming image function maps take inputs in range 0 to 1)? Would be really cool to see a neural SDF used in raymarching for games

@YannicKilcher 4 жыл бұрын

almost. have a look at sobel filters

@shawnpan4901 4 жыл бұрын

Great introduce! And I wonder what tool you use, the highlight and pen line is so comfortable

@dingleberriesify 4 жыл бұрын

For anyone wondering about RBF kernels...An RBF kernel doesn't really do the infinite dimensional stuff right away. Really, all an RBF (radial basis function; it's a hint!) does is output the distance of two points according to some metric. The normal one people mean when they say RBF is the Gaussian distance. I don't remember the exact form off the top of my head, but it's something like f(x) = [e^ -(x - x_0)] / sigma, where sigma is some scale parameter. This function will output a 1 (if the two points x and x_0 are equal), and will decay to 0 at an exponential rate. In the context of a neural network, the comparison value can be a learned parameter (with the logit obviously being the other input), but the literature I remember reading would normally set these values randomly at init and leave it there. Hope that sates somebody's curiosity! Postscript: Whenever you hear the words "kernel" and "infinite-dimensional" in the same sentence, you're in the land of what's called an RKHS, and the kernel they're refering to is a very specific kind of distance matrix. That sort of stuff is relevant for SVM theory, but kind of goes beyond the scope of this comment. To give a brief sketch, if you do something like linear regression on that kernel matrix, you're implicitly searching through a family of (potentially nonlinear) functions defined by the distance functions. So, nonlinear function approximation but the whole operation is strictly convex and solvable closed-form. People often get mixed up with the "infinite dimensional" function space of the RKHS, and the "project the data to a higher dimension" quote which is also associated with SVMs.

@priyamdey3298 2 жыл бұрын

I would suggest people to look at the lecture video of Richard Turner on Gaussian Processes (it's on youtube) to get an excellent understanding of what it really means to go from a finite covariance matrix to an infinite covariance matrix (essentially turns into a distance function comparing two points) which can represent a whole family of curves whose shapes get controlled by the hyperparameters of that function (eg: sigma in case of a RBF kernel). What I wrote might be confusing. Please go check that out. Cheers!

@luciengrondin5802 4 жыл бұрын

Is it better than a Fourrier transform, though ? Because it seems to me that it does something similar.

@Sjefke3000 3 жыл бұрын

This one seems to represent the data with non-equally spaced frequencies, whereas a Fourier transform is with equally spaced frequencies. That's the largest difference I can see.

@felrobelv 4 жыл бұрын

what about using the Laplace transform instead of Fourier? It's more generic

@etiennetiennetienne 4 жыл бұрын

i am not sure if i understand correctly how they match the gradient of the network, you compute sobel of the output generated over the entire "dataset" (image as x,y)? or you just compute the true gradient dphi/dx, dphi/dy using autodiff? and if you input that into the loss and run backward does not that mean that you need to compute derivative over derivative?

@YannicKilcher 4 жыл бұрын

the gradient of the true image (i.e. your "label") is the sobel filter, yes. And yes, if you match the gradient using gradient descent, you'd need the gradient of the gradient, which you would get using autodiff or an analytic expression, since SIRENS are easily deriveable. At least that's how I understand it.

@etiennetiennetienne 4 жыл бұрын

@@YannicKilcher right! i was trying to make this in pytorch (gist.github.com/etienne87/e65b6bb2493213f436bf4a5b43b943ca) but with autodiff gradient as additional supervision it seems to work less well to fit the image (pbly i did a mistake...). Anyway thanks man your videos are great!

@larrybird3729 4 жыл бұрын

in the past I fell victim of thinking "sin" could be the holy grail of activation functions but gradient descent would play with that function like a roller coaster, I even fell victim of trying to use quaternions but that is another failed story 😆

@shairozsohail1059 4 жыл бұрын

Can you use this to resample a signal like an image to get many similar representations of the same image?

@YannicKilcher 4 жыл бұрын

What do you mean by similar representations? Do you mean similar images? I guess that would work for the hole-filling examples.

@donniedorko3336 4 жыл бұрын

Can anybody explain the initialization? I read the paper, but I'm missing something. I get scaling the later weights by omega_0, but why do we multiply it back into the weights in the forward pass? I built one in Julia as an MNIST classifier, and its learning is incredibly fast and stable *only if I don't multiply by omega_0 in the forward pass*

@donniedorko3336 4 жыл бұрын

If anyone's interested, here's the Julia code I finally got to work. Still no idea about the omega_0 in the forward pass so I've ignored it completely (disclosure: semi copypasta'd from the Julia source code for Dense and Conv layers, but it's open-source so I figured nobody would mind) using Flux # return array of random floats between (-1,1) function uniform(dims...) W = rand(Float64,dims) * 2 .- 1 return W end # Generate SIREN dense layer function SinDense(fan_in::Integer, fan_out::Integer; omega_0=30, is_first=false) d = Dense(fan_in, fan_out, sin, initW=uniform) if is_first params(d)[1] ./= fan_in else params(d)[1] .*= sqrt(6/fan_in) / omega_0 end return d end # Helper functions for conv layers expand(N, i::Tuple) = i expand(N, i::Integer) = ntuple(_ -> i, N) # Create SIREN conv layer from known weights function SinConv(w::AbstractArray{T,N}, b::AbstractVector{T}; stride = 1, pad = 0, dilation = 1, is_first = false, omega_0=30) where {T,N} stride = expand(Val(N-2), stride) pad = expand(Val(2*(N-2)), pad) dilation = expand(Val(N-2), dilation) fan_in = 1 s = size(w) for i in 1:(length(s)-1) fan_in *= s[i] end if is_first w ./= fan_in else w .*= sqrt(6/fan_in) / omega_0 end return Conv(sin, w, b, stride, pad, dilation) end # Create SIREN conv layer from size SinConv(k::NTuple{N,Integer}, ch::Pair{

@朱镝中 3 жыл бұрын

It seems the initialisation with uniform distribution, also multiply 30 in the sin function are so crucial. If you tried the code, and change the number to i.e (5,6,7 etc), the results just mess up. Does anybody know why the 30 is the good choice, a mysterious?

@wonyounglee4417 2 жыл бұрын

Thank you very much

@vladimirtchuiev2218 2 жыл бұрын

I think this is somehow limited by the amount of sine operations, as using it as an activation function uses it a lot less than let's say projecting every input to the fully connected layer with a different sine. The computational cost of the sine can be offset by letting the GPU handle a lot of sines simultaneously. Also, this requires a specific learning rate to work well. Too little and this converges to a flat surface, too large and this converges to noise, therefore I think it is beneficial to use periodic learning rate schedulers here. AdamW with amsgrad also seem to work better than vanilla Adam here. I've tried an MNIST classifier with this, didn't work that well...

@jonatan01i 4 жыл бұрын

How does it perform if we use it for upsampling?

@YannicKilcher 4 жыл бұрын

Idk, I'd like to see that too

@florianhonicke5448 4 жыл бұрын

Great video!

@wyalexlee8578 4 жыл бұрын

Thank you!

@sahibsingh1563 4 жыл бұрын

Awesome

@PixelPulse168 Жыл бұрын

a very important nerf paper

@twobob 2 жыл бұрын

Interesting tool

@tedp9146 4 жыл бұрын

I saw a video which was about the fact that you can represent every picture through sine-waves (I forgot how and why). Is this somehow related? (Sorry if this is answered later in this video, I’m writing the comment at minute 17)

@hyunsunggo855 4 жыл бұрын

Kinda. Sine/cosine functions with different frequencies can be bases and thus can be used to compress data with spatial information along with coefficients through Fourier transform-like algorithms and that's the idea of jpeg compression, I think. In this case, coefficients are replaced with weights and biases. You could say it has a structure of multiple discrete Fourier transform stacked together with learned frequencies and coefficients.

@shivamraisharma1474 4 жыл бұрын

Has anyone tried making a generative model out of this yet?

@YannicKilcher 4 жыл бұрын

Not sure, since it's always just fitting one data points

@karchevskymi 4 жыл бұрын

How to use SIREN for image classification?

@YannicKilcher 4 жыл бұрын

That's not possible out of the box

@nuhaaldausari7019 3 жыл бұрын

@@YannicKilcher is it possible to use SIREN to encode an image for image video gerenation for example?

@JuanBPedro 4 жыл бұрын

Here as an example of the kind of things that SIRENs allow you to do: github.com/juansensio/nangs

@amandinchyba4269 4 жыл бұрын

pogchamp

@smnt 4 жыл бұрын

2:18 Lol, they must have come from physics. That's the general form of an "action".

@yuyingliu5831 4 жыл бұрын

agreed, I am surprised when he says that's abnormal.

@anthonybell8512 4 жыл бұрын

The initialisation proposed looks like the default weight initialisation in tensorflow: github.com/tensorflow/tensorflow/blob/6bfbcf31dce9a59acfcad51d905894b082989012/tensorflow/python/ops/init_ops.py#L527

@YannicKilcher 4 жыл бұрын

The TF one seems to depend on fan_in and fan_out, the SIREN one only depends on fan_in

@nikronic 4 жыл бұрын

@@YannicKilcher Actually, PyTorch also implemented this in same manner. The reason is that in some networks you want to use fan_out. But still it is applicable. The main difference is that if you use fan_out, the standard deviation of the generated distribution would not be equal to 1 (smaller).

@jorgesimao4350 4 жыл бұрын

Without reading the paper..it seems that they are simply using a nn to learn a fourie representation of the image seen a sampled field+gradients..

@hyunsunggo855 4 жыл бұрын

That's what I thought as well, kinda.

@eelcohoogendoorn8044 4 жыл бұрын

No, not really. A single layer of such a network could correspond to a fourier transform, with the weights for each sine encoded in the last learnable downprojection from the hidden state to the output; given that the weights and biases in the layer itself follow the fixed power of two pattern youd find in an fft. However, the frequencies are not predecided but learned, there can be multiple layers, and more importantly, the number of components is much smaller than youd find in a dense fft; with networks 512 neurons wide, thats a boatload less frequency components than youd find in the dense fft of the images they are regressing against.

@kazz811 4 жыл бұрын

@@eelcohoogendoorn8044 that's exactly correct. Fitting a Fourier series is a generalized linear regression problem. This is like a weird hierarchical Fourier representation. Is still infinitely differentiable. But is a different beast. But it seems like that's critical to why it crushes the competition.

@jabowery 4 жыл бұрын

I believe you misspoke at about 33:08 when you said "over the entire image". You should have said "over all the images", right?

@YannicKilcher 4 жыл бұрын

No, it's over the entire Image. We're just fitting one image using the neural network

@bluel1ng 4 жыл бұрын

@@YannicKilcher A bit off-topic here, but what is fascinating about this form of image representation: You can plot the activity (output) of each neuron for all (x, y) coordinates of the image and also see how the 'contribution' of each neuron develops over the course of the training. Unfortunately I have never seen this in NN courses - I think it is a really nice visualization, especially when done with different activation functions for different layers etc. It also shows immediately the internal 'craziness' (complexity) and limitations of generalization if you look at the output of coordinates outside the domain used during training.

@jabowery 4 жыл бұрын

Ah, OK. I was jumping the gun for the machine learning section, thinking the "dataset" included multiple images to somehow reduce the number of parameters per image. By the way, you really should monetize. You're excellent at capturing and conveying the essence of most papers.

@HeitorvitorC 4 жыл бұрын

Thank you a lot for the content, Yannic. I wish you could discuss through your narrative this 2 papers: arxiv.org/abs/1711.10561 and arxiv.org/abs/1711.10566 , as it would be amazing to have a complex analysis in such application of Deep Learning methods (apart from the fact that modern and didactic narratives about Physics Informed Neural Networks are not easy to find in video content with an insiders approach). Of course there is content, but maybe approaches such as yours could provide more achievable understanding for those who are starting in this methodology. Best regards!

@YannicKilcher 4 жыл бұрын

Thanks for the reference :)

@Tferdz 4 жыл бұрын

Why FCC and not CNN, since gradients are local and not global?

@dingleberriesify 4 жыл бұрын

Because it's a compressed mapping from pixel point to colour value...your local information is your input.

@jorgesimao4350 4 жыл бұрын

They are not trying to find spatial regularities/invariants as in cnn..they are simply using a ffn to learn a function thay predicts the value of the field/image+gradients.. this is just glorified curve fitting...no attempt is made to learn natural local representation like oriented edges..which what cnn do..and there is plenty of evidence that brains do that as well..

@kazz811 4 жыл бұрын

Because this a direct function mapping from one point in space to one or more values (like pixel values). CNN's exploit the structure of space (i.e. nearby points are similar). Here that pops out of the function(NN) fit. This is more interpolation than machine learning.

@hyunsunggo855 4 жыл бұрын

Kinda reminds me of grid cells in the brain.

@hyunsunggo855 4 жыл бұрын

It would properly learn scales and thus it'd better interpolate/extrapolate. Linear-like activation functions are terrible at extrapolation. I would like to see how it deals with adversarial examples.

@jonatan01i 4 жыл бұрын

Thank you for mentioning that. This grid cell thing seems to be an interesting stuff to know about.

@hyunsunggo855 4 жыл бұрын

@@jonatan01i It's crazy interesting. It seems like deep learning is adapting neuroscience one way or another, even accidentally.

@Marcos10PT 4 жыл бұрын

Such a shame the authors were probably more worried about seeming clever and professional than making their writing approachable 😔 that introduction says it all! You explain it so well though! Thank you so much!

@YannicKilcher 4 жыл бұрын

It's also a different field than regular ML.

@nikronic 4 жыл бұрын

Sorry to say this, even though Yannic did great job, the original authors explained it very well too. They have provided how to use code for all cases, ready to run, the original source code and also a short video to explain the core ideas.

@bdennyw1 4 жыл бұрын

Thank you for clearly explaining this paper. It's one that I wanted to dig into but found the math off-putting. The authors should have done a better job of communicating this simple idea.