Sparsely-Gated Mixture-of-Experts Paper Review - 18 March, 2022

  Рет қаралды 2,429

Numenta

Numenta

Күн бұрын

Пікірлер
@yarrowflower
@yarrowflower 2 жыл бұрын
Thank you so much for uploading this! I am champing at the bit to see some form of sparsity implemented in large neural nets running on GPUs or NPUs. (Especially in commercial settings!)
@hyunsunggo855
@hyunsunggo855 2 жыл бұрын
19:08 The order matters a bit. Doing kWTA afterwards would result in the probabilities not summing up to 1.0.
@stardustsong1680
@stardustsong1680 2 жыл бұрын
It's quite a long time that I haven't seen an update of this channel.
@MattGorbet
@MattGorbet 2 жыл бұрын
at 58:20 to about 58:39 I was surprised everyone just moved on from this point analogizing coritcal columns to experts. Maybe I misunderstood something but couldn't the operation of the gating function, which is applied to huge batches of unknown, raw generic data (think of it as all possible data in the world, from the point of view of the system) and deciding which experts should handle it, be analogous to the actual physical connections in the brain that deterimine which 'experts' - i.e. cortical colums - deal with or ignore specific inputs? So when you say "different cortical columns will process different parts of the visual field", (and similarly I imagine input via touch is processed by different 'expert' cortical columns than input via sight)... is this biological 'filtering' not analogous in some way to the purpose of the gating function the authors are proposing? Every part of our brain does not process every single possible 'bit' of information from the world, and in the brain's case it's done via biology & specific routing, rather than an alogrithm in software. But I think the parallel that was pretty much dismissed is still valid. Perhaps I've misunderstood - I'm still quite new to all this. Thanks so much for making these discussions public, it is really interesting.
@subutaiahmad8208
@subutaiahmad8208 2 жыл бұрын
It's a good question, and I can see why people might think we moved on too quickly there. The main reason we moved on quickly is that cortical columns embody a far more intricate and complex structure than the layer structure of the sparse-mixture paper. Each cortical column does process a different subset of the sensory input. Two visual cortical columns will process different subsets of the incoming image. More importantly, cortical columns have a diverse motif of recurrent connections, receive lots of feedback from other areas, and incorporate movement in complex ways. To get a sense of some aspects of this, you can look at our paper [1] below. The mapping to biology section in [1] details some of this with references to the neuroscience literature. Perhaps there are some really really high level connections between the two ideas, but to me it's so high-level that it quickly become meaningless. [1] A Theory of How Columns in the Neocortex Enable Learning the Structure of the World. www.frontiersin.org/articles/10.3389/fncir.2017.00081/full
@MattGorbet
@MattGorbet 2 жыл бұрын
@@subutaiahmad8208 thanks for the reply. I will read this paper. My comment was indeed making a high-level, maybe even conceptual link between filtering input for experts & the way various input we perceive are 'pre-filtered' by our diverse senses. I can see this being meaningless in the context of optimizing the sparse-mix algorithm... For me it was more the aha of realizing that not all input is created equal, and unlike AI systems we have multiple pre-filtered data streams automatically going to the 'right' experts via our varied sense organs. Thanks!
Правильный подход к детям
00:18
Beatrise
Рет қаралды 11 МЛН
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН
Beat Ronaldo, Win $1,000,000
22:45
MrBeast
Рет қаралды 158 МЛН
Understanding Mixture of Experts
28:01
Trelis Research
Рет қаралды 10 М.
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Рет қаралды 210 М.
Simon Sinek's Advice Will Leave You SPEECHLESS 2.0 (MUST WATCH)
20:43
Alpha Leaders
Рет қаралды 675 М.
Mixture of Experts LLM - MoE explained in simple terms
22:54
Discover AI
Рет қаралды 14 М.
Правильный подход к детям
00:18
Beatrise
Рет қаралды 11 МЛН