OUTLINE: 0:00 - Intro & Overview 3:00 - Model Overview 7:00 - Interpreter weights and function code 9:40 - Routing data to functions via neural type inference 14:55 - ModLin layers 18:25 - Experiments 21:35 - Interview Start 24:50 - General Model Structure 30:10 - Function code and signature 40:30 - Explaining Modulated Layers 49:50 - A closer look at weight sharing 58:30 - Experimental Results Paper: arxiv.org/abs/2110.06399 Guests: Nasim Rahaman: twitter.com/nasim_rahaman Francesco Locatello: twitter.com/FrancescoLocat8 Waleed Gondal: twitter.com/Wallii_gondal
@zeev3 жыл бұрын
Yanic you sound more excited than usual about this concept , than other concepts. something tells me this has some magic.
@anthonyrepetto34743 жыл бұрын
I'd been hoping for this sort of approach since 2017! Wonderful to see that you all have fit the pieces together well, to make Mixture of Experts with Attention in a composable fashion! All I did was write a vague essay - "Neural Networks: a Mixture of Experts with Attention" and then I wandered off to something else. Math-life! Thank you for putting the thought and rigor into making this real!
@oncedidactic3 жыл бұрын
Great minds and all that 🤩
@johnpope14733 жыл бұрын
5 seconds in - oh man - this is great. Having the authors that wrote the paper explain the damn thing. Awesome 🔥🔥🔥🔥🔥🔥
@JBoy340a3 жыл бұрын
Another great video. I really like you having the authors on so you can have them answer the questions others might have.
@SimonJackson133 жыл бұрын
Sandbox stability violation error on programblame example url. Stabalize via min span all essentials plus minimal impact cover plus benefit bound bias :D
@mikejason38223 жыл бұрын
Nice video. One point to note is that Waleed tried to add points to the conversation few times but did not get a chance eg: 1:18:47. It could have been better if every person got equal attention to talk when they wanted to talk.
@thegistofcalculus3 жыл бұрын
Pretty cool. I get the sense that if they were to scale this up and genuinely capture some kind of causality property of reality within most of the functions then a more sophisticated routing scheme may be required to direct the flow of information, since the functions would only do something useful within a narrow context. So awesome to see causality getting chipped away at just like unsupervised learning became demystified lately.
@Guytron953 жыл бұрын
man! these interactive discussion are freakin' HOT! thanks :)
@ChristosKyrkou3 жыл бұрын
First! Thanks Yannic for the great videos
@drdca82633 жыл бұрын
I’m surprised at the \otimes being element-wise multiplication? I would have thought to use \odot for that? Like, when I see \otimes , I’m thinking tensor product (which could also be meaningful in that location)
@nasimrahaman78863 жыл бұрын
Good pointer (thx!), \odot would have made more sense.
@alpers.21233 жыл бұрын
I have an idea idk if it makes sense. Can we train a model that some part of it is forced to accept and produce binary vectors. Then convert them to native code with bitwise operations, then fine-tune the rest. Like a learned logic circuit, which can also be implemented later on ASIC. The model can be decomposed to 3 parts, encoder, logic unit, decoder. Discretized logic layers lose differentiability therefore you cannot backpropagate through it. So you can only fine-tune decoder part. Encoder can be designed sparse, because converting floating-point vectors to bitsets loses information. The goal is to produce a faster and more compact model. Can this be possible? Was it done already?
@paxdriver3 жыл бұрын
Are they running a second training operation on sets of outputs of early layers? or are they running an internal typeinference(x) model underneath using attention on the results? ... or did I completely misunderstand this one lol?
@nasimrahaman78863 жыл бұрын
> "Are they running a second training operation on sets of outputs of early layers?" We're not, though this should also work. We messed around with two ways of fine-tuning this: * Funetuning only the function signatures and codes -- think of these as learnable vectors that "instruct" the model what to do with its inputs. They usually won't amount to more than a few thousand parameters, and if there's not a lot of data, this is the way to go. We tested it with as few as 128 samples. * Finetuning everything, like you would any other model. If you have a good amount of data, this is a good place to start.
@paxdriver3 жыл бұрын
@@nasimrahaman7886 thanks for clarifying for me :) I'm really impressed by the communcation, you guys rock.
@erickmarin61473 жыл бұрын
What if the script is generizable to graph neural networks with a function in every node?
@ScottzPlaylists8 ай бұрын
Will the code be released?
@arahir11293 жыл бұрын
Hi Yannic. Can I ask what software do you use for writing notes on these papers?
@SimonJackson133 жыл бұрын
Ah estimated future code line ... maybe useful to feed OoO stats on machine code optimizers. Common factors pulled earlier out of a loop eg. ... what's the outputs? How many errors can accumulate and be reduced to none? The effective S space for a lingo might be interesting.
@SimonJackson133 жыл бұрын
LOCs? AST statements? Closest valid AST?
@SimonJackson133 жыл бұрын
Adversarial spare dispercity? Adversarial solute S gravity inversion? Does it lock on a never list deterministic pattern match?
@SimonJackson133 жыл бұрын
Godelian sandbox creation exception within experimental context. Outer kernal solidity execution precontext add swing. Back inference type stability markations on type for safe extraction of axiomatization of base code.
@laurenpinschannels3 жыл бұрын
yo I kind of like where you're going with this but I think you might need to turn your temperature down bro
@laurenpinschannels3 жыл бұрын
It sounds like what you're saying is that you could really beef up compilers with this. that does seem plausible to me.
@JanBlok3 жыл бұрын
We might be watching the start of a new paradigm here 😀, anyone seen the code?
@444haluk3 жыл бұрын
Yannic is missing some of his hairs.
@amaniarman4603 жыл бұрын
Great stuff Yannic I really enjoy this series w/ author. Did you see Andrej's and Justin's paper review with first author of DALL-E... you might find it intriguing. kzbin.info/www/bejne/hqXHoYp5bLilb5o Blessings
@erickmarin61473 жыл бұрын
Imagine throwing a problem to an AI that decides the scripts to use
@Adhil_parammel3 жыл бұрын
Cheap Automated replication , differentiation and integration of neural network is all you need.