Mixture of Experts LLM - MoE explained in simple terms

  Рет қаралды 15,041

Discover AI

Discover AI

Күн бұрын

Пікірлер: 21
@HugoCatarino
@HugoCatarino Жыл бұрын
What a great class! Very much appreciated 🙌👏👏🙏
@javiergimenezmoya86
@javiergimenezmoya86 Жыл бұрын
Video implementation with MoE training with several swiching Lora layers would be great!
@patxigonzalez4206
@patxigonzalez4206 Жыл бұрын
Woah...thanks a lot for this clean and powerful explanation about this dense topics, as a representative of average people, I appreciate it so much.
@TylerLali
@TylerLali Жыл бұрын
Hopefully this doesn’t sound entitled, but rather expresses my gratitude towards your excellent work - yesterday I did a KZbin search for MOE on this topic and saw several videos but decided not to watch others and rather wait for your analysis- and here I am today and this video enters my feed automatically :) Thanks for all you do for your community!
@suleimanshehu5839
@suleimanshehu5839 Жыл бұрын
Please create a video on Fine tuning a MoE LLM using LoRA adapters. Can one train individual expert LLM within a MoE such as Mixtral 8x7B
@hoangvanhao7092
@hoangvanhao7092 Жыл бұрын
00:02 Mixture of Experts LLM enables efficient computation and research allocation for AI models. 02:46 Mixture of Experts LLM uses different gating functions to assign tokens to specific expert systems. 05:24 Mega Blocks addressed limitations of classical MoE system and optimized block sparse computations. 08:12 Mixture of Experts selects the top K expert system based on scores. 10:59 Mixture of Experts LLM enhances model parameters without computational expense 13:33 Mixture of Experts LLM - MoE efficiently organizes student-teacher distribution 16:07 Block Spar formulation ensures no token is left behind 18:35 Mixture of Expert system dynamically adjusts block sizes for more efficiency in matrix multiplication 20:57 Mixture of expert layer consists of independent feed-forward experts with an intelligence gating functionality.
@yinghaohu8784
@yinghaohu8784 10 ай бұрын
In autoregressive model, the generation of the token is progressively. However, when will the router works? Is it in each pass or the routing will be decided at the very beginning ?
@TheDoomerBlox
@TheDoomerBlox 7 ай бұрын
Is this where I raise the obvious question of "wouldn't a Grokked(tm) model be the perfect fit for an Expert-Picking mechanism?"
@darknessbelowth1409
@darknessbelowth1409 Жыл бұрын
very nice, thank you for a great vid.
@ricardocosta9336
@ricardocosta9336 Жыл бұрын
yaya!🎉🎉🎉🎉🎉 ty so much once again
@YashNimavat-b3s
@YashNimavat-b3s Жыл бұрын
which PDF reader you are using to read the research paper?
@LNJP13579
@LNJP13579 10 ай бұрын
Can you please share a link to your Presentation. Need to use the content to make my own abridged notes.
@Jason-ju7df
@Jason-ju7df Жыл бұрын
I wonder if I can get them to do RPA
@krishanSharma.69.69f
@krishanSharma.69.69f Жыл бұрын
I made them do SEX. I was tough but I managed.
@davidamberweatherspoon6131
@davidamberweatherspoon6131 Жыл бұрын
Can you explain to me how to mix MoE with Lora adapters?
@densonsmith2
@densonsmith2 Жыл бұрын
Do you have a patreon or other paid subscription?
@cecilsalas8721
@cecilsalas8721 Жыл бұрын
🤩🤩🤩🥳🥳🥳👍
@matten_zero
@matten_zero Жыл бұрын
Hello!
@PaulSchwarzer-ou9sw
@PaulSchwarzer-ou9sw Жыл бұрын
@omaribrahim5519
@omaribrahim5519 Жыл бұрын
cool but MoE is so fool
@EssentiallyAI
@EssentiallyAI Жыл бұрын
You're not Indian! 😁
Mistral 8x7B Part 1- So What is a Mixture of Experts Model?
12:33
Sam Witteveen
Рет қаралды 43 М.
Mind Evolution: Deeper Thinking at Inference (by Google)
19:54
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 120 МЛН
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 42 МЛН
СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️
01:01
DO$HIK
Рет қаралды 3,3 МЛН
人是不能做到吗?#火影忍者 #家人  #佐助
00:20
火影忍者一家
Рет қаралды 20 МЛН
NEW Transformer2: Self Adaptive PEFT Expert LLMs in TTA
36:52
Discover AI
Рет қаралды 3,7 М.
Finally: Grokking Solved - It's Not What You Think
27:02
Discover AI
Рет қаралды 16 М.
Understanding Mixture of Experts
28:01
Trelis Research
Рет қаралды 10 М.
Attention in transformers, step-by-step | DL6
26:10
3Blue1Brown
Рет қаралды 2,1 МЛН
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 1,6 МЛН
What is Mixture of Experts?
7:58
IBM Technology
Рет қаралды 12 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,4 МЛН
DeepSeek R1 vs o1: AI EXPLAINS Autonomy of Experts (a better MoE)
35:51
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 120 МЛН