Dynamic Inference with Neural Interpreters (w/ author interview)

  Рет қаралды 14,845

Yannic Kilcher

Yannic Kilcher

Күн бұрын

#deeplearning #neuralinterpreter #ai
This video includes an interview with the paper's authors!
What if we treated deep networks like modular programs? Neural Interpreters divide computation into small modules and route data to them via a dynamic type inference system. The resulting model combines recurrent elements, weight sharing, attention, and more to tackle both abstract reasoning, as well as computer vision tasks.
OUTLINE:
0:00 - Intro & Overview
3:00 - Model Overview
7:00 - Interpreter weights and function code
9:40 - Routing data to functions via neural type inference
14:55 - ModLin layers
18:25 - Experiments
21:35 - Interview Start
24:50 - General Model Structure
30:10 - Function code and signature
40:30 - Explaining Modulated Layers
49:50 - A closer look at weight sharing
58:30 - Experimental Results
Paper: arxiv.org/abs/2110.06399
Guests:
Nasim Rahaman: / nasim_rahaman
Francesco Locatello: / francescolocat8
Waleed Gondal: / wallii_gondal
Abstract:
Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call \emph{functions}. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization
Authors: Nasim Rahaman, Muhammad Waleed Gondal, Shruti Joshi, Peter Gehler, Yoshua Bengio, Francesco Locatello, Bernhard Schölkopf
Links:
TabNine Code Completion (Referral): bit.ly/tabnine-yannick
KZbin: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
LinkedIn: / ykilcher
BiliBili: space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 35
@YannicKilcher
@YannicKilcher 2 жыл бұрын
OUTLINE: 0:00 - Intro & Overview 3:00 - Model Overview 7:00 - Interpreter weights and function code 9:40 - Routing data to functions via neural type inference 14:55 - ModLin layers 18:25 - Experiments 21:35 - Interview Start 24:50 - General Model Structure 30:10 - Function code and signature 40:30 - Explaining Modulated Layers 49:50 - A closer look at weight sharing 58:30 - Experimental Results Paper: arxiv.org/abs/2110.06399 Guests: Nasim Rahaman: twitter.com/nasim_rahaman Francesco Locatello: twitter.com/FrancescoLocat8 Waleed Gondal: twitter.com/Wallii_gondal
@zeev
@zeev 2 жыл бұрын
Yanic you sound more excited than usual about this concept , than other concepts. something tells me this has some magic.
@johnpope1473
@johnpope1473 2 жыл бұрын
5 seconds in - oh man - this is great. Having the authors that wrote the paper explain the damn thing. Awesome 🔥🔥🔥🔥🔥🔥
@anthonyrepetto3474
@anthonyrepetto3474 2 жыл бұрын
I'd been hoping for this sort of approach since 2017! Wonderful to see that you all have fit the pieces together well, to make Mixture of Experts with Attention in a composable fashion! All I did was write a vague essay - "Neural Networks: a Mixture of Experts with Attention" and then I wandered off to something else. Math-life! Thank you for putting the thought and rigor into making this real!
@oncedidactic
@oncedidactic 2 жыл бұрын
Great minds and all that 🤩
@mikejason3822
@mikejason3822 2 жыл бұрын
Nice video. One point to note is that Waleed tried to add points to the conversation few times but did not get a chance eg: 1:18:47. It could have been better if every person got equal attention to talk when they wanted to talk.
@JBoy340a
@JBoy340a 2 жыл бұрын
Another great video. I really like you having the authors on so you can have them answer the questions others might have.
@SimonJackson13
@SimonJackson13 2 жыл бұрын
Sandbox stability violation error on programblame example url. Stabalize via min span all essentials plus minimal impact cover plus benefit bound bias :D
@thegistofcalculus
@thegistofcalculus 2 жыл бұрын
Pretty cool. I get the sense that if they were to scale this up and genuinely capture some kind of causality property of reality within most of the functions then a more sophisticated routing scheme may be required to direct the flow of information, since the functions would only do something useful within a narrow context. So awesome to see causality getting chipped away at just like unsupervised learning became demystified lately.
@parker1981xxx
@parker1981xxx 2 жыл бұрын
I like paper review videos, they are even better when they involve the authors. Keep up the good work Yannic.
@Guytron95
@Guytron95 2 жыл бұрын
man! these interactive discussion are freakin' HOT! thanks :)
@ChristosKyrkou
@ChristosKyrkou 2 жыл бұрын
First! Thanks Yannic for the great videos
@drdca8263
@drdca8263 2 жыл бұрын
I’m surprised at the \otimes being element-wise multiplication? I would have thought to use \odot for that? Like, when I see \otimes , I’m thinking tensor product (which could also be meaningful in that location)
@nasimrahaman7886
@nasimrahaman7886 2 жыл бұрын
Good pointer (thx!), \odot would have made more sense.
@alpers.2123
@alpers.2123 2 жыл бұрын
I have an idea idk if it makes sense. Can we train a model that some part of it is forced to accept and produce binary vectors. Then convert them to native code with bitwise operations, then fine-tune the rest. Like a learned logic circuit, which can also be implemented later on ASIC. The model can be decomposed to 3 parts, encoder, logic unit, decoder. Discretized logic layers lose differentiability therefore you cannot backpropagate through it. So you can only fine-tune decoder part. Encoder can be designed sparse, because converting floating-point vectors to bitsets loses information. The goal is to produce a faster and more compact model. Can this be possible? Was it done already?
@arahir1129
@arahir1129 2 жыл бұрын
Hi Yannic. Can I ask what software do you use for writing notes on these papers?
@paxdriver
@paxdriver 2 жыл бұрын
Are they running a second training operation on sets of outputs of early layers? or are they running an internal typeinference(x) model underneath using attention on the results? ... or did I completely misunderstand this one lol?
@nasimrahaman7886
@nasimrahaman7886 2 жыл бұрын
> "Are they running a second training operation on sets of outputs of early layers?" We're not, though this should also work. We messed around with two ways of fine-tuning this: * Funetuning only the function signatures and codes -- think of these as learnable vectors that "instruct" the model what to do with its inputs. They usually won't amount to more than a few thousand parameters, and if there's not a lot of data, this is the way to go. We tested it with as few as 128 samples. * Finetuning everything, like you would any other model. If you have a good amount of data, this is a good place to start.
@paxdriver
@paxdriver 2 жыл бұрын
@@nasimrahaman7886 thanks for clarifying for me :) I'm really impressed by the communcation, you guys rock.
@erickmarin6147
@erickmarin6147 2 жыл бұрын
What if the script is generizable to graph neural networks with a function in every node?
@JanBlok
@JanBlok 2 жыл бұрын
We might be watching the start of a new paradigm here 😀, anyone seen the code?
@ScottzPlaylists
@ScottzPlaylists 2 ай бұрын
Will the code be released?
@SimonJackson13
@SimonJackson13 2 жыл бұрын
Ah estimated future code line ... maybe useful to feed OoO stats on machine code optimizers. Common factors pulled earlier out of a loop eg. ... what's the outputs? How many errors can accumulate and be reduced to none? The effective S space for a lingo might be interesting.
@SimonJackson13
@SimonJackson13 2 жыл бұрын
LOCs? AST statements? Closest valid AST?
@SimonJackson13
@SimonJackson13 2 жыл бұрын
Adversarial spare dispercity? Adversarial solute S gravity inversion? Does it lock on a never list deterministic pattern match?
@SimonJackson13
@SimonJackson13 2 жыл бұрын
Godelian sandbox creation exception within experimental context. Outer kernal solidity execution precontext add swing. Back inference type stability markations on type for safe extraction of axiomatization of base code.
@laurenpinschannels
@laurenpinschannels 2 жыл бұрын
yo I kind of like where you're going with this but I think you might need to turn your temperature down bro
@laurenpinschannels
@laurenpinschannels 2 жыл бұрын
It sounds like what you're saying is that you could really beef up compilers with this. that does seem plausible to me.
@444haluk
@444haluk 2 жыл бұрын
Yannic is missing some of his hairs.
@Adhil_parammel
@Adhil_parammel 2 жыл бұрын
Cheap Automated replication , differentiation and integration of neural network is all you need.
@erickmarin6147
@erickmarin6147 2 жыл бұрын
Imagine throwing a problem to an AI that decides the scripts to use
@amaniarman460
@amaniarman460 2 жыл бұрын
Great stuff Yannic I really enjoy this series w/ author. Did you see Andrej's and Justin's paper review with first author of DALL-E... you might find it intriguing. kzbin.info/www/bejne/hqXHoYp5bLilb5o Blessings
@vincent-uh5uo
@vincent-uh5uo 2 жыл бұрын
2
Mama vs Son vs Daddy 😭🤣
00:13
DADDYSON SHOW
Рет қаралды 46 МЛН
GEOMETRIC DEEP LEARNING BLUEPRINT
3:33:23
Machine Learning Street Talk
Рет қаралды 174 М.
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 2 МЛН
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 377 М.
The spelled-out intro to neural networks and backpropagation: building micrograd
2:25:52
Хакер взломал компьютер с USB кабеля. Кевин Митник.
0:58
Последний Оплот Безопасности
Рет қаралды 2 МЛН
Лучший браузер!
0:27
Honey Montana
Рет қаралды 903 М.
Новые iPhone 16 и 16 Pro Max
0:42
Romancev768
Рет қаралды 2,2 МЛН
Better Than Smart Phones☠️🤯 | #trollface
0:11
Not Sanu Moments
Рет қаралды 15 МЛН