Dynamic Inference with Neural Interpreters (w/ author interview)

  Рет қаралды 14,941

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 35
@YannicKilcher
@YannicKilcher 3 жыл бұрын
OUTLINE: 0:00 - Intro & Overview 3:00 - Model Overview 7:00 - Interpreter weights and function code 9:40 - Routing data to functions via neural type inference 14:55 - ModLin layers 18:25 - Experiments 21:35 - Interview Start 24:50 - General Model Structure 30:10 - Function code and signature 40:30 - Explaining Modulated Layers 49:50 - A closer look at weight sharing 58:30 - Experimental Results Paper: arxiv.org/abs/2110.06399 Guests: Nasim Rahaman: twitter.com/nasim_rahaman Francesco Locatello: twitter.com/FrancescoLocat8 Waleed Gondal: twitter.com/Wallii_gondal
@zeev
@zeev 3 жыл бұрын
Yanic you sound more excited than usual about this concept , than other concepts. something tells me this has some magic.
@anthonyrepetto3474
@anthonyrepetto3474 3 жыл бұрын
I'd been hoping for this sort of approach since 2017! Wonderful to see that you all have fit the pieces together well, to make Mixture of Experts with Attention in a composable fashion! All I did was write a vague essay - "Neural Networks: a Mixture of Experts with Attention" and then I wandered off to something else. Math-life! Thank you for putting the thought and rigor into making this real!
@oncedidactic
@oncedidactic 3 жыл бұрын
Great minds and all that 🤩
@johnpope1473
@johnpope1473 3 жыл бұрын
5 seconds in - oh man - this is great. Having the authors that wrote the paper explain the damn thing. Awesome 🔥🔥🔥🔥🔥🔥
@JBoy340a
@JBoy340a 3 жыл бұрын
Another great video. I really like you having the authors on so you can have them answer the questions others might have.
@SimonJackson13
@SimonJackson13 3 жыл бұрын
Sandbox stability violation error on programblame example url. Stabalize via min span all essentials plus minimal impact cover plus benefit bound bias :D
@mikejason3822
@mikejason3822 3 жыл бұрын
Nice video. One point to note is that Waleed tried to add points to the conversation few times but did not get a chance eg: 1:18:47. It could have been better if every person got equal attention to talk when they wanted to talk.
@thegistofcalculus
@thegistofcalculus 3 жыл бұрын
Pretty cool. I get the sense that if they were to scale this up and genuinely capture some kind of causality property of reality within most of the functions then a more sophisticated routing scheme may be required to direct the flow of information, since the functions would only do something useful within a narrow context. So awesome to see causality getting chipped away at just like unsupervised learning became demystified lately.
@Guytron95
@Guytron95 3 жыл бұрын
man! these interactive discussion are freakin' HOT! thanks :)
@ChristosKyrkou
@ChristosKyrkou 3 жыл бұрын
First! Thanks Yannic for the great videos
@drdca8263
@drdca8263 3 жыл бұрын
I’m surprised at the \otimes being element-wise multiplication? I would have thought to use \odot for that? Like, when I see \otimes , I’m thinking tensor product (which could also be meaningful in that location)
@nasimrahaman7886
@nasimrahaman7886 3 жыл бұрын
Good pointer (thx!), \odot would have made more sense.
@alpers.2123
@alpers.2123 3 жыл бұрын
I have an idea idk if it makes sense. Can we train a model that some part of it is forced to accept and produce binary vectors. Then convert them to native code with bitwise operations, then fine-tune the rest. Like a learned logic circuit, which can also be implemented later on ASIC. The model can be decomposed to 3 parts, encoder, logic unit, decoder. Discretized logic layers lose differentiability therefore you cannot backpropagate through it. So you can only fine-tune decoder part. Encoder can be designed sparse, because converting floating-point vectors to bitsets loses information. The goal is to produce a faster and more compact model. Can this be possible? Was it done already?
@paxdriver
@paxdriver 3 жыл бұрын
Are they running a second training operation on sets of outputs of early layers? or are they running an internal typeinference(x) model underneath using attention on the results? ... or did I completely misunderstand this one lol?
@nasimrahaman7886
@nasimrahaman7886 3 жыл бұрын
> "Are they running a second training operation on sets of outputs of early layers?" We're not, though this should also work. We messed around with two ways of fine-tuning this: * Funetuning only the function signatures and codes -- think of these as learnable vectors that "instruct" the model what to do with its inputs. They usually won't amount to more than a few thousand parameters, and if there's not a lot of data, this is the way to go. We tested it with as few as 128 samples. * Finetuning everything, like you would any other model. If you have a good amount of data, this is a good place to start.
@paxdriver
@paxdriver 3 жыл бұрын
@@nasimrahaman7886 thanks for clarifying for me :) I'm really impressed by the communcation, you guys rock.
@erickmarin6147
@erickmarin6147 3 жыл бұрын
What if the script is generizable to graph neural networks with a function in every node?
@ScottzPlaylists
@ScottzPlaylists 8 ай бұрын
Will the code be released?
@arahir1129
@arahir1129 3 жыл бұрын
Hi Yannic. Can I ask what software do you use for writing notes on these papers?
@SimonJackson13
@SimonJackson13 3 жыл бұрын
Ah estimated future code line ... maybe useful to feed OoO stats on machine code optimizers. Common factors pulled earlier out of a loop eg. ... what's the outputs? How many errors can accumulate and be reduced to none? The effective S space for a lingo might be interesting.
@SimonJackson13
@SimonJackson13 3 жыл бұрын
LOCs? AST statements? Closest valid AST?
@SimonJackson13
@SimonJackson13 3 жыл бұрын
Adversarial spare dispercity? Adversarial solute S gravity inversion? Does it lock on a never list deterministic pattern match?
@SimonJackson13
@SimonJackson13 3 жыл бұрын
Godelian sandbox creation exception within experimental context. Outer kernal solidity execution precontext add swing. Back inference type stability markations on type for safe extraction of axiomatization of base code.
@laurenpinschannels
@laurenpinschannels 3 жыл бұрын
yo I kind of like where you're going with this but I think you might need to turn your temperature down bro
@laurenpinschannels
@laurenpinschannels 3 жыл бұрын
It sounds like what you're saying is that you could really beef up compilers with this. that does seem plausible to me.
@JanBlok
@JanBlok 3 жыл бұрын
We might be watching the start of a new paradigm here 😀, anyone seen the code?
@444haluk
@444haluk 3 жыл бұрын
Yannic is missing some of his hairs.
@amaniarman460
@amaniarman460 3 жыл бұрын
Great stuff Yannic I really enjoy this series w/ author. Did you see Andrej's and Justin's paper review with first author of DALL-E... you might find it intriguing. kzbin.info/www/bejne/hqXHoYp5bLilb5o Blessings
@erickmarin6147
@erickmarin6147 3 жыл бұрын
Imagine throwing a problem to an AI that decides the scripts to use
@Adhil_parammel
@Adhil_parammel 3 жыл бұрын
Cheap Automated replication , differentiation and integration of neural network is all you need.
@vincent-uh5uo
@vincent-uh5uo 3 жыл бұрын
2
I Sent a Subscriber to Disneyland
0:27
MrBeast
Рет қаралды 104 МЛН
Ozoda - Alamlar (Official Video 2023)
6:22
Ozoda Official
Рет қаралды 10 МЛН
Жездуха 42-серия
29:26
Million Show
Рет қаралды 2,6 МЛН
Symposium: Deep Learning - Alex Graves
21:16
Microsoft Research
Рет қаралды 10 М.
I attended Trump’s inauguration yesterday. Here are my thoughts.
7:01
Senator Bernie Sanders
Рет қаралды 4,1 МЛН
Victor Veitch: Linear Structure of (Causal) Concepts in Generative AI
1:04:02
Online Causal Inference Seminar
Рет қаралды 1,3 М.
I Sent a Subscriber to Disneyland
0:27
MrBeast
Рет қаралды 104 МЛН