Implicit Neural Representations with Periodic Activation Functions

  Рет қаралды 56,257

Stanford Computational Imaging Lab

Stanford Computational Imaging Lab

4 жыл бұрын

-- Project page --
vsitzmann.github.io/siren
-- arXiv preprint --
arxiv.org/abs/2006.09661
-- Abstract --
Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal’s spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or SIRENs, are ideally suited for representing complex natural signals and their derivatives. We analyze SIREN activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how SIRENs can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine SIRENs with hypernetworks to learn priors over the space of SIREN functions.

Пікірлер: 50
@mannyk7634
@mannyk7634 2 жыл бұрын
Very nice work especially the sinusoidal activation. I like to point out Candes in 1997 covered it rigorously in "Harmonic Analysis of Neural Networks" about periodic activation function - "admissible neural activation function". Strangely enough, the paper is not even cited by the authors.
@paulofalca0
@paulofalca0 4 жыл бұрын
Awesome work!
@wyalexlee8578
@wyalexlee8578 4 жыл бұрын
Love this! Thank you!!
@siarez
@siarez 4 жыл бұрын
How is this compare to just taking the Fourier (or discrete cosine) transform of the signal?
@aidungeon4539
@aidungeon4539 4 жыл бұрын
Super cool!
@NerdyRodent
@NerdyRodent 3 жыл бұрын
That’s amazing!
@Marcos10PT
@Marcos10PT 4 жыл бұрын
Goodbye ReLU, you had a good run! I feel I have to watch this a few more times to have a good idea of what's going on 😄but it looks like a breakthrough!
@TileBitan
@TileBitan Жыл бұрын
The music part was outstanding. Audio waveforms are just stacked sinewaves, as opposed to images or text where the input may not be too related to the sine function. So it just feels right to use sine activations and the required tweaks to make that work, instead of ReLUs, but I'm going to be careful with this as even though I have some experience in ML i haven't ever touched anything other than ReLUs, sigmoids, tanh and straight up linear activations
@Oktokolo
@Oktokolo Жыл бұрын
You can aproximate _everything_ with stacked sine waves. All modern video and image compression algorithms are based on that.
@TileBitan
@TileBitan Жыл бұрын
@@Oktokolo let me rephrase that then. Audio waveforms can be approximated by a relatively SMALL number of stacked sine waves, so it feels natural to use them in NNs. Everything can be approximated by infinite numbers of sine waves, but sometimes it doesn't make sense to do it
@Oktokolo
@Oktokolo Жыл бұрын
@@TileBitan It obviously makes sense for images as that is how the best compression algorithms use. It should also be possible to encode text reasonably well - even though the resulting set of weights is probably larger than the text itself when not encoding input of a huge language model...
@TileBitan
@TileBitan Жыл бұрын
@@Oktokolo i don't understand. Sounds are different amplitude waves with different frequencies inside the hearing range. Images nowadays can be 100M pixels with 3 times 256 on the BEST case scenario, where relationships between pixels can be really close to nothing. The case is completely different. The text case doesn't really have much to do with a wave. They might use FFTs for images but you gotta agree with me, for the same error you need way way less terms for sound than images.
@Oktokolo
@Oktokolo Жыл бұрын
@@TileBitan Doesn't matter whether it looks like it has anything to do with a wave or not or whether adjacent values look like they are in any relation to eachother. Treating data as signals and then encoding the signal as stacked waves just works surprisingly well. It might not work well for truly random bit noise. But most data interesting to humans seems to exhibit a surprisingly low entropy and can be compressed using stacked sines.
@tigeruby
@tigeruby 4 жыл бұрын
lol at tanh - but very cool general purpose work; i can imagine this being a good exploratory topic/bonus project for intro signal processing courses
@dann_y5319
@dann_y5319 24 күн бұрын
Awesome
@rahuldeora1120
@rahuldeora1120 4 жыл бұрын
Can you please share your code? The link on the project page is not working
@luciengrondin5802
@luciengrondin5802 4 жыл бұрын
The application of this kind of representations for 3D rendering is fascinating. Could it be that in the future modelers will give up on the polygon+textures model and represent the whole scene with a neural network instead ?
@kwea123
@kwea123 4 жыл бұрын
It requires huge processing time at the inference stage. Take sdf as example, you'll need to query the network for a huge number of times to find out where the surface is (also depends on the discretization resolution). I think currently it's only good at offline use.
@luciengrondin5802
@luciengrondin5802 4 жыл бұрын
@@kwea123 I expressed myself poorly. I was thinking of rendering, not modeling. More precisely, I was thinking of the problem of rendering scenes with very high poly counts, for instance where a very long draw distance is required. Currently modelers have to use level of details but this technique has limitations.
@qsobad
@qsobad 4 жыл бұрын
@@luciengrondin5802 that is solved by another solution, take a look at ue5
@Oktokolo
@Oktokolo Жыл бұрын
@@luciengrondin5802 I hope, the network does not have to learn to give the pixel values for a coordinate - but could also learn to give coordinates and pixel values for an index. The real issue would be the compression stage requiring training a network of apropriate size on the scene to be "compressed".
@volotat
@volotat 4 жыл бұрын
Wow, just watch Yannic Kilcher's video on this work and this is fascinating... I bet this work is going to change many things in ML. Please share the code!
@anilaxsus6376
@anilaxsus6376 10 ай бұрын
yeah i was wondering why people weren't using sin's and cosine's cause i watched a video and the guy explained that, a neural network of L number of layers, and N number of nodes per Layer, which use relu activation, can perfectly match a function with N to the power L number of bends or turning points in its curve (assuming the neural network has a single scalar node output), i guess that is why it failed on the audio, there is a lot of turning point in audio data, so technical the SIREN networks performance can be matched by a large enough relu neural network, so am looking at SIREN as an optimization on the usual relu networks. Am glad i saw this, i will look into it further. i suspect that sinusoidal activation will be useful in domains with some sort of repetition, cause relu act more like threshold switches.
@convolvr
@convolvr 4 жыл бұрын
The first layer Sin(Wx + b) could be thought of as a vector of waves with frequency wi and phase offset bi. After the second linear layer, we have a vector of trigonometric series which look like a Fourier expansions except the frequencies and phase offsets can be anything. Although the next nonlinearity might do something new, we can already represent any function with the first 1 1/2 layers. What advantages does this approach offer vs representing and evaluating functions as a Fourier series?
@_vlek_
@_vlek_ 4 жыл бұрын
Because you can learn the representation of lots of different signals via gradient descent?
@luciengrondin5802
@luciengrondin5802 4 жыл бұрын
@@_vlek_ I think this is it, indeed. Efficient Fourier transform algorithms only work with a regularly sampled signal and, if I'm not mistaken, of low dimension. This machine learning approach can work with any kind of signal, I think.
@isodoubIet
@isodoubIet 10 ай бұрын
Fourier series are linear
@convolvr
@convolvr 10 ай бұрын
​@@isodoubIetthe Fourier transform is linear. The Fourier series is not. I assume you're implying that the neural net is a fundamentally more expressive by being nonlinear. But the Fourier series is also nonlinear.
@isodoubIet
@isodoubIet 10 ай бұрын
@@convolvr Eh no if you have a smooth periodic signal it's still expressible as a linear combination of Fourier components, so, yes, this is fundamentally more expressive.
@IsaacGerg
@IsaacGerg 4 жыл бұрын
The arxiv has an incorrect reference. The paper states, "or positional encoding strategies proposed in concurrent work [5]" and video mentions a paper in 2020, but reference [5] your current arxiv is C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, and J. Verdera. Filling-in by joint interpolation of vectorfields and gray levels.IEEE Trans. on Image Processing, 10(8):1200-1211, 2001. I believe this should reference what you list as [35].
@kwea123
@kwea123 4 жыл бұрын
Yes, Nerf: Representing scenes as neural radiance fields for view synthesis uses positional encoding. And they recently published a paper that uses Fourier transform.
@rodrigob
@rodrigob 4 жыл бұрын
No link to the paper in the video description ?
@rodrigob
@rodrigob 4 жыл бұрын
And project page at vsitzmann.github.io/siren/
@TiagoTiagoT
@TiagoTiagoT Жыл бұрын
How does it compare to using a sawtooth wave in place of the sine wave?
@DrPapaya
@DrPapaya 4 жыл бұрын
Code available?
@Singularitarian
@Singularitarian 4 жыл бұрын
What if you were to use exp(i x) = cos(x) + i sin(x) as the activation function? That seems potentially more elegant.
@rainerzufall1868
@rainerzufall1868 4 жыл бұрын
What would it mean for an activation to have a complex output? Or 2 outputs?
@OliverBatchelor
@OliverBatchelor 4 жыл бұрын
@@rainerzufall1868 Twice as many outputs - just doubling the features. You can do a similar thing with ReLU where you threshold at maximum zero and at minimum zero and split into two parts, I'm not sure it's a whole lot better than just one though...
@luciengrondin5802
@luciengrondin5802 4 жыл бұрын
@@rainerzufall1868 Does the activation function necessarily have to be real ? I don't think so. I think using a complex exponential could help making the calculations and implementation clearer. It could have an overhead computational cost, though.
@rainerzufall1868
@rainerzufall1868 4 жыл бұрын
@@luciengrondin5802 I don't think it would simplicate things if you model it as the activation having 2 outputs, it would need some re-implementation.. and if you instead use 1 complex output and complex multiplication, the libraries are not optimized at all for this.. thus the computational hit would be big, i think..
@rainerzufall1868
@rainerzufall1868 4 жыл бұрын
Also, cosine and sine are the same except for a constant difference in the input, which we could learn from the bias. Thus, i don't think we would add much value. On the flipside, the deritive of sine is cosine and vice versa (with a minus), such that we can just reuse the output from the other in the derivative computation.
@sherwoac
@sherwoac 4 жыл бұрын
already available implementation at: github.com/titu1994/tf_SIREN
@_zproxy
@_zproxy 3 ай бұрын
is it like a new jpg?
@edsonjr6972
@edsonjr6972 Ай бұрын
Did anyone try using this in transformers?
@enginechen7312
@enginechen7312 4 жыл бұрын
Hi, could I download this video and upload it to bilibili.com, where Chinese students and researchers can visit freely?
@sistemsylar
@sistemsylar 4 жыл бұрын
post colab
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 301 М.
Alat Seru Penolong untuk Mimpi Indah Bayi!
00:31
Let's GLOW! Indonesian
Рет қаралды 14 МЛН
My little bro is funny😁  @artur-boy
00:18
Andrey Grechka
Рет қаралды 13 МЛН
ИРИНА КАЙРАТОВНА - АЙДАХАР (БЕКА) [MV]
02:51
ГОСТ ENTERTAINMENT
Рет қаралды 9 МЛН
The day of the sea 🌊 🤣❤️ #demariki
00:22
Demariki
Рет қаралды 98 МЛН
Implicit Neural Representations: From Objects to 3D Scenes
26:13
Andreas Geiger
Рет қаралды 14 М.
Neural Network Architectures & Deep Learning
9:09
Steve Brunton
Рет қаралды 776 М.
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
7:23
Stanford Computational Imaging Lab
Рет қаралды 8 М.
Vincent Sitzmann: Implicit Neural Scene Representations
56:42
Andreas Geiger
Рет қаралды 10 М.
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Рет қаралды 137 М.
How AI Learns Concepts
14:22
Art of the Problem
Рет қаралды 171 М.
Matthew Tancik: Neural Radiance Fields for View Synthesis
49:11
Andreas Geiger
Рет қаралды 29 М.
Main filter..
0:15
CikoYt
Рет қаралды 13 МЛН
Собери ПК и Получи 10,000₽
1:00
build monsters
Рет қаралды 2,1 МЛН