"If you young kids don't know what an RBF kernel is ... you map it into an infinite space using Guassian Kernels ... yeah ... maybe wikipedia is better at that than I am"
@jsonm054 жыл бұрын
I laughed so hard with this one.
@herp_derpingson4 жыл бұрын
At the first glance this looks like an incredibly complex paper. Thank you for explaining it so simply.
@donniedorko33364 жыл бұрын
This a thousand times. I just had my "but couldn't you do it with sine waves" a couple hours ago and this is a great intro
@MiroslawHorbal4 жыл бұрын
I said it before and I'll say it again. Thank you very much for these videos. You are saving me (and I imagine others in the community) a lot of time having to parse through the details of these papers.
@DanielHesslow4 жыл бұрын
A small note on the why the gradient of the SDF is 1 almost everywhere: The SDF is just the (signed) distance to the closest point, so the gradient will of course point in the opposite direction to that point and and if you move one unit away from that point the SDF will increase by one. Hence the gradient is one. The almost everywhere part is just that there may be multiple points equally far away or that you are exactly at another point. Also not sure if it was mentioned but the sign is just representing if we're inside or outside of an object.
@imranibrahimli984 жыл бұрын
Thank you!
@SunilKumar-zd5kq4 жыл бұрын
Why does the gradient point to the opposite direction?
@SunilKumar-zd5kq4 жыл бұрын
Why does the gradient point opposite?
@DanielHesslow4 жыл бұрын
@@SunilKumar-zd5kq Its the direction where the distance to the closest point increases the fastest.
@PhucLe-qs7nx4 жыл бұрын
Was reading this paper last night and get confused about "one image is one dataset" as well. So glad what I finally understood is actually true.
@gaoxinlipai4 жыл бұрын
Please explain this paper "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains ", which is highly related to SIREN
@yangzihaowang32053 жыл бұрын
The second half of this talk explained the paper you mentioned. kzbin.info/www/bejne/moG6fayYpZl_gpI It's from one of the co-authors of the paper.
@firedrive45 Жыл бұрын
Few mistakes, around 12:20, its not the Laplacian, since the symbol is the upside down triangle ^2, this is delta, the difference between the ground truth and the result. Also 6:36, subpixel values for images can be done using linear intepolation or other interpolation between the values of the pixels surrounding the subpixel region.
@bluel1ng4 жыл бұрын
As Yannic says at 17:30: sin(x) has been tried many times before, vanilla and also in combinations like SinReLU. I currently doubt that (even with special initialization) vanilla sin(x) outperforms ReLU & variants like SELU in many classic applications (like classification networks). From my personal experience I can confirm that sin(x) converges faster. The differences of the 3D scene reconstruction-results that they show in their paper-video are impressive though. Maybe it is worth trying sin(x) for generative models.
@Kram10324 жыл бұрын
To me it seems like the idea here isn't to use these SIRENs in the same scenarios where a classic RELU or something would be used, but rather, this could be used to augument data. The usual networks could, instead of the original data, work on the weights of some SIREN network. It's basically really powerful data compression. An alternative to something like an Encoder-Decoder with a bottleneck: The SIREN is the vector you'd normally have at that bottleneck.
@bluel1ng4 жыл бұрын
@@Kram1032 Yes, I just felt I have to say that it may not give magic results in all cases, e.g. the tweet by Geoffrey Hinton about this paper could have been a bit misleading in this respect: twitter.com/geoffreyhinton/status/1273686900285603843?s=19
@Kram10324 жыл бұрын
@@bluel1ng Oh yeah, that is very misleading. It works much better than RELUs *for this kind of task* - basically anything where you might suspect fourier analysis to be somehow interesting, I suspect. Since that's kinda all this is: Learning some sort of fourier-like representation of a single given image.
@hyunsunggo8554 жыл бұрын
@@alexwaese-perlman8788 Right? If adversarial attacks work because of the linearity of a neural network then it should be less vulnerable against them.
@bluel1ng4 жыл бұрын
@@alexwaese-perlman8788 Yes, the output is bounded and still sin(x) for some epsilon interval is nearly linear (gradient at 0 is 1). Also Important might be that the gradient does not approach 0 asymptotically (as it is the case for tanh or fermi) - that means even when the optimizer jumps too far there is a chance to recover.
@Twilightsfavquill4 жыл бұрын
As a CogSci, I have to recommend you to review the paper "GAIT-prop: A biologically plausible learning rule derived from backpropagation of error" by Ahmad, van Greven and Ambrogini. I feel like a more bio-inspired way of encoding propagation of error signals through processing networks could hold potential for the investigation of functional behavior by for example Drosophila
@sayakpaul31524 жыл бұрын
I don't know if it would have been possible for me to understand this paper all by myself. Thank you so much, Yannic.
@kristiantorres10804 жыл бұрын
dude...you deserve a new table! Let's chip in so you can buy a new one. Very good job explaining this interesting paper. Thank you!
@YannicKilcher4 жыл бұрын
Haha, don't worry if I treat it nicely it usually returns the favor :)
@proreduction4 жыл бұрын
Great summary Yannic. I am a PhD student with focus in CNNs for classification of binary images and presented this at journal club. Your explanation of implicit neural representations was inspiring.
@proreduction4 жыл бұрын
Only change I would make it to emphasize that a(x) is the ground truth around 33:15
@donniedorko33364 жыл бұрын
Just saw that you broke this into sections for easier access. You're beautiful. Thank you
@andrewmao57474 жыл бұрын
Thank you for the simple and intuitive explanation of what at first glance looked like a dense and difficult paper.
@maloxi14724 жыл бұрын
44:04 I think a more intuitive way to formulate this point would be to say that we want the boundary to be a level set for the neural representation, since the signed distance is supposed to be zero everywhere on that boundary
@AZTECMAN4 жыл бұрын
After watching the first few minutes of your Gradient Origin Networks video, I am realizing something: SIREN seems alot like a static shader. That is, a shader (ie shadertoy.com) is defined by input coordinates (x,y) and output colors. The mapping is typically non-linear. One major difference is, for shaders, we often use a frame count (time) as an input as well. However, it's perfectly possible to craft static shaders.
@YannicKilcher4 жыл бұрын
True. Nice connection.
@rahuldeora11204 жыл бұрын
Wow that was quick!! Thanks for this
@PYu-ys8ym4 жыл бұрын
Thank you for making this videos!!! It surely saves me a ton of time. Thank you again! Please keep making more!!
@JTMoustache4 жыл бұрын
The almost everywhere comes from measure theory, it means that it should hold except for elements of omega with measure zero (the boundary omega_zero does not have measure zero, but a distance to the boundary of zero). But it seems a bit overkill to state it there..
@donniedorko33364 жыл бұрын
Good explanation. Thank you
@pepe_reeze93204 жыл бұрын
Great paper. I’m glad, the Broader Impact reads like a persiflage of broader impact. We usually call such text „bullshit bingo“. Simply drop it, people!
@xXMockapapellaXx4 жыл бұрын
Thank you for making this paper so understandable
@KhaledSharif19932 жыл бұрын
At 33:35 what do you mean by "the more differentiable you make it"? I thought functions were either differentiable or not.
@priyamdey32982 жыл бұрын
21:43 what about using an exponential function?
@edbeeching4 жыл бұрын
I think the broader impact statement is there because this is a NeurIPS submission and they require an impact statement this year.
@alex-nk5dt3 жыл бұрын
this was super helpful, thank you so much!
@CristianGarcia4 жыл бұрын
Was just viewing the video from the authors via a Hinton tweet and Yannic already has a video? :o
@slackstation4 жыл бұрын
The fastest gun in the west. I don't know if it's because I asked for this paper but as always, thank you.
@bithigh83012 жыл бұрын
Nice video! Yes, start monetizing, your videos are priceless :)
@anthrond4 жыл бұрын
Yannic, are you familiar with Stephane Mallat's work in physically based neural networks? He talks a lot about using wavelet functions to improve the function approximations of neural networks. Sirens' use of sine activation functions reminded me of that.
@isbestlizard2 жыл бұрын
THis is interesting because if errors can be backpropogated as phase shifts rather than magnitude changes you can have as many layers as you like and the error doesn't decay away
@kirak Жыл бұрын
Wow this helps me a lot. Thank you!
@heinrichvandeventer3574 жыл бұрын
Yannic, that function has a similar structure to the mathematics used in physics a function F that depends on coordinates x, function phi, and the gradient of phi w.r.t. x and higher order derivatives. Look at Laplace's equation, the Lagrangian in classical mechanics, and other functionals (oversimplified: functionals map functions to numbers).
@tanguydamart83684 жыл бұрын
In the "Poisson image editing section", I do not understand what the training dataset is. But, you said that the data to fit is always (x,y) -> (r,g,b) (or here (x, y) -> (luminosity) since it's gray scale). But in this case we don't know the composite image since we are trying to generate it. So what does phi(x) produce ?
@YannicKilcher4 жыл бұрын
good question. it doesn't have to be rgb, you can also use sirens to fit the gradients, which is what they do here. so you train xy -> gradient of rgb and then you inference xy -> rgb
@hieuza4 жыл бұрын
13:51 LOL! I like your sense of humor 😉
@sistemsylar4 жыл бұрын
Weird I never understood the math in these papers because I thought it would be way too abstract, but then I realize it's not abstract at all!
@gabrigamer00skyrim Жыл бұрын
Great video! Isn't the proposed initialization just Uniform Xavier?
@joirnpettersen4 жыл бұрын
Is approximating the gradient for the real image as simple as (2/imageSize)(a-b), where a and b are the pixel values to the left and right, and the same for above and below (assuming image function maps take inputs in range 0 to 1)? Would be really cool to see a neural SDF used in raymarching for games
@YannicKilcher4 жыл бұрын
almost. have a look at sobel filters
@shawnpan49014 жыл бұрын
Great introduce! And I wonder what tool you use, the highlight and pen line is so comfortable
@dingleberriesify4 жыл бұрын
For anyone wondering about RBF kernels...An RBF kernel doesn't really do the infinite dimensional stuff right away. Really, all an RBF (radial basis function; it's a hint!) does is output the distance of two points according to some metric. The normal one people mean when they say RBF is the Gaussian distance. I don't remember the exact form off the top of my head, but it's something like f(x) = [e^ -(x - x_0)] / sigma, where sigma is some scale parameter. This function will output a 1 (if the two points x and x_0 are equal), and will decay to 0 at an exponential rate. In the context of a neural network, the comparison value can be a learned parameter (with the logit obviously being the other input), but the literature I remember reading would normally set these values randomly at init and leave it there. Hope that sates somebody's curiosity! Postscript: Whenever you hear the words "kernel" and "infinite-dimensional" in the same sentence, you're in the land of what's called an RKHS, and the kernel they're refering to is a very specific kind of distance matrix. That sort of stuff is relevant for SVM theory, but kind of goes beyond the scope of this comment. To give a brief sketch, if you do something like linear regression on that kernel matrix, you're implicitly searching through a family of (potentially nonlinear) functions defined by the distance functions. So, nonlinear function approximation but the whole operation is strictly convex and solvable closed-form. People often get mixed up with the "infinite dimensional" function space of the RKHS, and the "project the data to a higher dimension" quote which is also associated with SVMs.
@priyamdey32982 жыл бұрын
I would suggest people to look at the lecture video of Richard Turner on Gaussian Processes (it's on youtube) to get an excellent understanding of what it really means to go from a finite covariance matrix to an infinite covariance matrix (essentially turns into a distance function comparing two points) which can represent a whole family of curves whose shapes get controlled by the hyperparameters of that function (eg: sigma in case of a RBF kernel). What I wrote might be confusing. Please go check that out. Cheers!
@luciengrondin58024 жыл бұрын
Is it better than a Fourrier transform, though ? Because it seems to me that it does something similar.
@Sjefke30003 жыл бұрын
This one seems to represent the data with non-equally spaced frequencies, whereas a Fourier transform is with equally spaced frequencies. That's the largest difference I can see.
@felrobelv4 жыл бұрын
what about using the Laplace transform instead of Fourier? It's more generic
@etiennetiennetienne4 жыл бұрын
i am not sure if i understand correctly how they match the gradient of the network, you compute sobel of the output generated over the entire "dataset" (image as x,y)? or you just compute the true gradient dphi/dx, dphi/dy using autodiff? and if you input that into the loss and run backward does not that mean that you need to compute derivative over derivative?
@YannicKilcher4 жыл бұрын
the gradient of the true image (i.e. your "label") is the sobel filter, yes. And yes, if you match the gradient using gradient descent, you'd need the gradient of the gradient, which you would get using autodiff or an analytic expression, since SIRENS are easily deriveable. At least that's how I understand it.
@etiennetiennetienne4 жыл бұрын
@@YannicKilcher right! i was trying to make this in pytorch (gist.github.com/etienne87/e65b6bb2493213f436bf4a5b43b943ca) but with autodiff gradient as additional supervision it seems to work less well to fit the image (pbly i did a mistake...). Anyway thanks man your videos are great!
@larrybird37294 жыл бұрын
in the past I fell victim of thinking "sin" could be the holy grail of activation functions but gradient descent would play with that function like a roller coaster, I even fell victim of trying to use quaternions but that is another failed story 😆
@shairozsohail10594 жыл бұрын
Can you use this to resample a signal like an image to get many similar representations of the same image?
@YannicKilcher4 жыл бұрын
What do you mean by similar representations? Do you mean similar images? I guess that would work for the hole-filling examples.
@donniedorko33364 жыл бұрын
Can anybody explain the initialization? I read the paper, but I'm missing something. I get scaling the later weights by omega_0, but why do we multiply it back into the weights in the forward pass? I built one in Julia as an MNIST classifier, and its learning is incredibly fast and stable *only if I don't multiply by omega_0 in the forward pass*
@donniedorko33364 жыл бұрын
If anyone's interested, here's the Julia code I finally got to work. Still no idea about the omega_0 in the forward pass so I've ignored it completely (disclosure: semi copypasta'd from the Julia source code for Dense and Conv layers, but it's open-source so I figured nobody would mind) using Flux # return array of random floats between (-1,1) function uniform(dims...) W = rand(Float64,dims) * 2 .- 1 return W end # Generate SIREN dense layer function SinDense(fan_in::Integer, fan_out::Integer; omega_0=30, is_first=false) d = Dense(fan_in, fan_out, sin, initW=uniform) if is_first params(d)[1] ./= fan_in else params(d)[1] .*= sqrt(6/fan_in) / omega_0 end return d end # Helper functions for conv layers expand(N, i::Tuple) = i expand(N, i::Integer) = ntuple(_ -> i, N) # Create SIREN conv layer from known weights function SinConv(w::AbstractArray{T,N}, b::AbstractVector{T}; stride = 1, pad = 0, dilation = 1, is_first = false, omega_0=30) where {T,N} stride = expand(Val(N-2), stride) pad = expand(Val(2*(N-2)), pad) dilation = expand(Val(N-2), dilation) fan_in = 1 s = size(w) for i in 1:(length(s)-1) fan_in *= s[i] end if is_first w ./= fan_in else w .*= sqrt(6/fan_in) / omega_0 end return Conv(sin, w, b, stride, pad, dilation) end # Create SIREN conv layer from size SinConv(k::NTuple{N,Integer}, ch::Pair{
@朱镝中3 жыл бұрын
It seems the initialisation with uniform distribution, also multiply 30 in the sin function are so crucial. If you tried the code, and change the number to i.e (5,6,7 etc), the results just mess up. Does anybody know why the 30 is the good choice, a mysterious?
@wonyounglee44172 жыл бұрын
Thank you very much
@vladimirtchuiev22182 жыл бұрын
I think this is somehow limited by the amount of sine operations, as using it as an activation function uses it a lot less than let's say projecting every input to the fully connected layer with a different sine. The computational cost of the sine can be offset by letting the GPU handle a lot of sines simultaneously. Also, this requires a specific learning rate to work well. Too little and this converges to a flat surface, too large and this converges to noise, therefore I think it is beneficial to use periodic learning rate schedulers here. AdamW with amsgrad also seem to work better than vanilla Adam here. I've tried an MNIST classifier with this, didn't work that well...
@jonatan01i4 жыл бұрын
How does it perform if we use it for upsampling?
@YannicKilcher4 жыл бұрын
Idk, I'd like to see that too
@florianhonicke54484 жыл бұрын
Great video!
@wyalexlee85784 жыл бұрын
Thank you!
@sahibsingh15634 жыл бұрын
Awesome
@PixelPulse168 Жыл бұрын
a very important nerf paper
@twobob2 жыл бұрын
Interesting tool
@tedp91464 жыл бұрын
I saw a video which was about the fact that you can represent every picture through sine-waves (I forgot how and why). Is this somehow related? (Sorry if this is answered later in this video, I’m writing the comment at minute 17)
@hyunsunggo8554 жыл бұрын
Kinda. Sine/cosine functions with different frequencies can be bases and thus can be used to compress data with spatial information along with coefficients through Fourier transform-like algorithms and that's the idea of jpeg compression, I think. In this case, coefficients are replaced with weights and biases. You could say it has a structure of multiple discrete Fourier transform stacked together with learned frequencies and coefficients.
@shivamraisharma14744 жыл бұрын
Has anyone tried making a generative model out of this yet?
@YannicKilcher4 жыл бұрын
Not sure, since it's always just fitting one data points
@karchevskymi4 жыл бұрын
How to use SIREN for image classification?
@YannicKilcher4 жыл бұрын
That's not possible out of the box
@nuhaaldausari70193 жыл бұрын
@@YannicKilcher is it possible to use SIREN to encode an image for image video gerenation for example?
@JuanBPedro4 жыл бұрын
Here as an example of the kind of things that SIRENs allow you to do: github.com/juansensio/nangs
@amandinchyba42694 жыл бұрын
pogchamp
@smnt4 жыл бұрын
2:18 Lol, they must have come from physics. That's the general form of an "action".
@yuyingliu58314 жыл бұрын
agreed, I am surprised when he says that's abnormal.
@anthonybell85124 жыл бұрын
The initialisation proposed looks like the default weight initialisation in tensorflow: github.com/tensorflow/tensorflow/blob/6bfbcf31dce9a59acfcad51d905894b082989012/tensorflow/python/ops/init_ops.py#L527
@YannicKilcher4 жыл бұрын
The TF one seems to depend on fan_in and fan_out, the SIREN one only depends on fan_in
@nikronic4 жыл бұрын
@@YannicKilcher Actually, PyTorch also implemented this in same manner. The reason is that in some networks you want to use fan_out. But still it is applicable. The main difference is that if you use fan_out, the standard deviation of the generated distribution would not be equal to 1 (smaller).
@jorgesimao43504 жыл бұрын
Without reading the paper..it seems that they are simply using a nn to learn a fourie representation of the image seen a sampled field+gradients..
@hyunsunggo8554 жыл бұрын
That's what I thought as well, kinda.
@eelcohoogendoorn80444 жыл бұрын
No, not really. A single layer of such a network could correspond to a fourier transform, with the weights for each sine encoded in the last learnable downprojection from the hidden state to the output; given that the weights and biases in the layer itself follow the fixed power of two pattern youd find in an fft. However, the frequencies are not predecided but learned, there can be multiple layers, and more importantly, the number of components is much smaller than youd find in a dense fft; with networks 512 neurons wide, thats a boatload less frequency components than youd find in the dense fft of the images they are regressing against.
@kazz8114 жыл бұрын
@@eelcohoogendoorn8044 that's exactly correct. Fitting a Fourier series is a generalized linear regression problem. This is like a weird hierarchical Fourier representation. Is still infinitely differentiable. But is a different beast. But it seems like that's critical to why it crushes the competition.
@jabowery4 жыл бұрын
I believe you misspoke at about 33:08 when you said "over the entire image". You should have said "over all the images", right?
@YannicKilcher4 жыл бұрын
No, it's over the entire Image. We're just fitting one image using the neural network
@bluel1ng4 жыл бұрын
@@YannicKilcher A bit off-topic here, but what is fascinating about this form of image representation: You can plot the activity (output) of each neuron for all (x, y) coordinates of the image and also see how the 'contribution' of each neuron develops over the course of the training. Unfortunately I have never seen this in NN courses - I think it is a really nice visualization, especially when done with different activation functions for different layers etc. It also shows immediately the internal 'craziness' (complexity) and limitations of generalization if you look at the output of coordinates outside the domain used during training.
@jabowery4 жыл бұрын
Ah, OK. I was jumping the gun for the machine learning section, thinking the "dataset" included multiple images to somehow reduce the number of parameters per image. By the way, you really should monetize. You're excellent at capturing and conveying the essence of most papers.
@HeitorvitorC4 жыл бұрын
Thank you a lot for the content, Yannic. I wish you could discuss through your narrative this 2 papers: arxiv.org/abs/1711.10561 and arxiv.org/abs/1711.10566 , as it would be amazing to have a complex analysis in such application of Deep Learning methods (apart from the fact that modern and didactic narratives about Physics Informed Neural Networks are not easy to find in video content with an insiders approach). Of course there is content, but maybe approaches such as yours could provide more achievable understanding for those who are starting in this methodology. Best regards!
@YannicKilcher4 жыл бұрын
Thanks for the reference :)
@Tferdz4 жыл бұрын
Why FCC and not CNN, since gradients are local and not global?
@dingleberriesify4 жыл бұрын
Because it's a compressed mapping from pixel point to colour value...your local information is your input.
@jorgesimao43504 жыл бұрын
They are not trying to find spatial regularities/invariants as in cnn..they are simply using a ffn to learn a function thay predicts the value of the field/image+gradients.. this is just glorified curve fitting...no attempt is made to learn natural local representation like oriented edges..which what cnn do..and there is plenty of evidence that brains do that as well..
@kazz8114 жыл бұрын
Because this a direct function mapping from one point in space to one or more values (like pixel values). CNN's exploit the structure of space (i.e. nearby points are similar). Here that pops out of the function(NN) fit. This is more interpolation than machine learning.
@hyunsunggo8554 жыл бұрын
Kinda reminds me of grid cells in the brain.
@hyunsunggo8554 жыл бұрын
It would properly learn scales and thus it'd better interpolate/extrapolate. Linear-like activation functions are terrible at extrapolation. I would like to see how it deals with adversarial examples.
@jonatan01i4 жыл бұрын
Thank you for mentioning that. This grid cell thing seems to be an interesting stuff to know about.
@hyunsunggo8554 жыл бұрын
@@jonatan01i It's crazy interesting. It seems like deep learning is adapting neuroscience one way or another, even accidentally.
@Marcos10PT4 жыл бұрын
Such a shame the authors were probably more worried about seeming clever and professional than making their writing approachable 😔 that introduction says it all! You explain it so well though! Thank you so much!
@YannicKilcher4 жыл бұрын
It's also a different field than regular ML.
@nikronic4 жыл бұрын
Sorry to say this, even though Yannic did great job, the original authors explained it very well too. They have provided how to use code for all cases, ready to run, the original source code and also a short video to explain the core ideas.
@bdennyw14 жыл бұрын
Thank you for clearly explaining this paper. It's one that I wanted to dig into but found the math off-putting. The authors should have done a better job of communicating this simple idea.
@keri_gg2 жыл бұрын
does any one else speed up his videos cause he speaks so slow?