1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

Рет қаралды 44,841

Күн бұрын

Пікірлер: 165

@bycloudAI 2 ай бұрын

To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription! Like this comment if you wanna see more MoE related content, I have quite a good list for a video;)

@erobusblack4856 2 ай бұрын

You should do a video on virtual humans and cognitive AI.Virtual humans. Look at all the nonplayer character technology.We have a red dead redemption and the Sims Though a chat bottom one of those and we have a great virtual human

@PankajDoharey 2 ай бұрын

Thanks for linking to all papers in the description.

@pro100gameryt8 2 ай бұрын

Imagine assembling 1 milliont PhD students together to discuss someone's request like "write a poem about cooking eggs with c++". Thats MoE irl

@MrPicklesAndTea 2 ай бұрын

i'm tellin chatgpt this now.

@tommasocanova4547 2 ай бұрын

Enjoy: In a kitchen lit by screens, Where code and cuisine intertwine, A programmer dreams of breakfast scenes, With a syntax so divine. Int main() begins the day, With ingredients lined up neat. Eggs and spices on display, Ready for a code-gourmet feat. int eggs = 2; // Declare the count, Double click on the pan. Heat it up, and don’t discount, Precision’s the plan. std::cout

@ElaraArale 2 ай бұрын

hahahahaha LMAO

@igorsawicki4905 2 ай бұрын

AI: Resonable request sir

@zipengli4243 2 ай бұрын

And MoME is getting 1 million 5th graders to teach a baby to PhD level only on how to write a poem about cooking eggs with c++

@GeoMeridium 2 ай бұрын

It's crazy how Meta's 8B parameter Llama 3 model has nearly the same performance as the original GPT-4 with 1.8T parameters. That's a 225x reduction in compute in just 2 years.

@gemstone7818 2 ай бұрын

to some extent this seems closer to how brains work

@tushargupta9428 2 ай бұрын

neurons

@redthunder6183 2 ай бұрын

Yeah, kind of like how spiking networks work, but more discrete/blocky and less efficient. I think this concept should be applied to fundamental MLP, so you can increase model performance with out decreasing speed or RAM usage. The only sacrifice being storage which is easily scalable. IMO this is the future

@reinerzer5870 2 ай бұрын

Jeff Hawkins approves this message

@johndoe-j7z 2 ай бұрын

I think this is how almost any informational system works. From molecules to galaxies, there are specialized units that use and process information individually in the system. An agentic expert approach was long forthcoming and is certainly the future of AI. Even individual ants have specialized jobs in the colony.

@tempname8263 2 ай бұрын

@@johndoe-j7z That's how perceptrons worked right from the start

@sorakagodess 2 ай бұрын

The only thing in my mind is "MoE moe kyuuuuun!!!"

@JuuzouRCS 2 ай бұрын

Intentional naming fr.

@AkysChannel 2 ай бұрын

These videos format is GOLD 🏆 such specific and nerdy topics produced as memes 😄

@randomlettersqzkebkw 2 ай бұрын

i see what you did there with "catastrophic forgetting" lmao 🤣

@Askejm 2 ай бұрын

troll emoji

@Quantum_Nebula 2 ай бұрын

Now I really am excited for a 800B model with fine-grained MoE to surface that I can run on basically any device.

@FuZZbaLLbee 2 ай бұрын

You would still need a lot of storage tough, but that is easier then downloading VRAM 😋

@KCM25NJL 2 ай бұрын

In a very real sense, the MoME concept is similar to diffusion networks. On their own, the tiny expert units are but grains of noise in an ocean of noise..... and the routing itself is the thing being trained. Whether or not it's more efficient than having a monolithic neural net with simpler computation units (neurons)........ I dunno. I suspect like most things ML, there is probably a point of diminishing return.

@lazyalpaca7 2 ай бұрын

3:37 wasn't it just yesterday that they released their model 😭

@cdkw2 2 ай бұрын

I watch you so that I feel smart, it really works!

@simeonnnnn 2 ай бұрын

Damn.. You blew my mind on the 1 million experts and Forever learning thing

@michaelbondarenko4650 Ай бұрын

Idk if this was intended just as entertainment, but I used it as education Like I needed to understand MoE/MMoE on a high level for my research and this video totally helped me. It will be easier to dive deeper into one of the papers now

@Saphir__ 2 ай бұрын

I watch your videos yet I have no idea what you are explaining 99% of the time. 🙃

@bycloudAI 2 ай бұрын

I will try better next time 😭

@BeansTonight 2 ай бұрын

@@bycloudAI Personally I watch your content because you elaborate on academic papers and their relevancy very well. Do hope you continue with content like this. But I can see something like a fireship style code report for LLMs being duigestable.

@johndoe-j7z 2 ай бұрын

@@bycloudAI I liked the video, but to their point, it might help to give a brief overview of what things are. I.e. parameters, feed forward, etc., the exact same way you briefly explained what hybridity and redundancy are. This is a good video if you're already familiar with LLMs and how they work but can probably be pretty confusing if you aren't.

@farechildd 2 ай бұрын

Thank u for linking the papers in the description ❤

@-mwolf 2 ай бұрын

I'm telling you: Just do it like the brain. Have every expert/node be a router, choosing who to send to.

@-mwolf 2 ай бұрын

And, have every node be a RL agent.

@Alice_Fumo 2 ай бұрын

Today I saw a video about the paper "Exponentially faster Language Modelling" and I feel like the approach is just better than MoE and I wonder why not more work has been done on top of it.. (although I think it's possible thats how GPT-4o mini was made, but who knows)

@soraygoularssm8669 2 ай бұрын

Actually really cool idea, i liked the deep seek meo version too, it's so clever

@redthunder6183 2 ай бұрын

I would love a model with the performance of a 8b model with practical performance like gpt-3.5, but with much smaller active parameters so it can run on anything super lightweight.

@4.0.4 2 ай бұрын

Current 8B beats GPT 3.5 on most metrics, we've come a long way.

@redthunder6183 2 ай бұрын

@@4.0.4 yeah, but metrics are not everything, and from my experience, gpt 3.5 still beats out llama 3 8b (or at least 8b quantized) in terms of interpolation/generalization/flexibility, meaning while it can mess up in difficult, specific or confusing tasks, it doesn't get overly lost/confused. metrics are good at simple well defined one-shot questions, which I'd agree it is better at

@4.0.4 2 ай бұрын

@@redthunder6183 remember not to run 8b at q4 (default in ollama for example, but BAD, use q8)

@4.0.4 2 ай бұрын

@@redthunder6183 true but make sure you're using 8-bit quant, not 4-bit - it matters for those small LLMs

@mirek190 2 ай бұрын

@@redthunder6183 llama 3 8b? That model is so outdated already ...who is even using that ancient model ....

@pathaleyguitar9763 2 ай бұрын

Was hoping someone would make a video on this! Thank you! Would love to see you cover Google's new Diffusion Augmented Agents paper.

@larssy68 2 ай бұрын

I Like Your Funny Words, Magic Man

@tiagotiagot 2 ай бұрын

How far are we from just having a virtual block of solid computronium with inference result simply being the exit points of virtual Lichtenberg figures forming thru it, with most of the volume of the block remaining dark?

@electroflame6188 Ай бұрын

it's about the distance between you and the inside of your skull

@akkilla5166 2 ай бұрын

Thank you. I think i understand the impact of moe.

@hodlgap348 2 ай бұрын

What is the source of your 3D transformation layer demonstration???? plz tell me

@NIkolla13 2 ай бұрын

Mixture of a million experts just sounds like a sarcastic description of Reddit

@PotatoKaboom 2 ай бұрын

hey where are the 3d visualisations of the transformer blocks from?

@j.d.4697 2 ай бұрын

I have no idea what you just said but I'm glad they didn't just stubbornly stick to increasing training data and nothing else, like everyone seemed to assume they would. 🙂

@RevanthMatha 2 ай бұрын

I lost track at 8:24

@npc4416 2 ай бұрын

my go to channel to understand ai

@ChristophBackhaus 2 ай бұрын

1991... We are standing on the shoulders of giants.

@Limofeus 2 ай бұрын

I'd imagine in a month someone will come with MoE responsible for choosing the best MoE to choose the best MoE out of billions of experts

@keri_gg 2 ай бұрын

What resource is this 2:01 seems useful for teaching

@hightidesed 2 ай бұрын

Can you maybe make a video explaining how Llama 3.1 8B is able to have a 128k context window while still fitting in an average computers ram?

@sabarinathyt 2 ай бұрын

00:01 Exploring the concept of fine-grained MoE in AI expertise. 01:35 Mixr has unique Fe-Forward Network blocks in its architecture. 03:11 Sparse MoE method has historical roots and popularity due to successful models. 04:46 Introducing Fine-Grained MoE method for AI model training 06:16 Increasing experts can enhance accuracy and knowledge acquisition 07:52 Efficient expert retrieval mechanism using pure layer technique 09:29 Large number of experts enables lifelong learning and addresses catastrophic forgetting 11:01 Brilliant offers interactive lessons for personal and professional growth Crafted by Merlin AI.

@anren7445 2 ай бұрын

Where did you get the clips of attention mechanism visualization from?

@shApYT 2 ай бұрын

Yo dog, I heard you liked AI so we put an AI inside your AI which has an AI in the AI which can AI another AI so that you can AI while you AI.

@Zonca2 2 ай бұрын

Ngl, I wish we got more videos about video generators making anime waifus like in the old days, but it seems like development on that front is slowing down at the moment, hopefully you'll cover any new breakthroughs in the future.

@renanmonteirobarbosa8129 2 ай бұрын

Damn, you finally catching up. You should try Nemo and Megatron-LM they have the best MoE framework

@Words-. 2 ай бұрын

4:13 nice editing here🤣

@marcombo01 2 ай бұрын

What is the 3D animation around 1:45 ?

@P1XeLIsNotALittleSquare 2 ай бұрын

yea, want to know this too

@holycow6935 2 ай бұрын

Blender

@SweetHyunho 2 ай бұрын

1:05 Brilliant pays youtubers $20000-50000 per sponsored video!?

@Napert 2 ай бұрын

if 13B is ~8gb (q4) then why does ollama load the entire 47b (26gb) model into memory?

@norlesh 2 ай бұрын

What tool was used for the Transformer visualization starting at 2:01 ?

@warsin8641 2 ай бұрын

I love these rabit holes!

@Filup 2 ай бұрын

I did a semester of ML the first half of this year, and I don't understand half of what you post lmao. Do you have any recommended resources to learn from? It is very hard to learn.

@noiJadisCailleach 2 ай бұрын

So if these Millions of Experts are cute... Should we call them... Moe MoE?

@CristianGarcia 2 ай бұрын

Thanks! Incredibly useful to keep up.

@MrJul12 2 ай бұрын

Can you cover Deepminds recent breakthrough on winning the math olympiad? does that mean RL is the way forward when it comes to reasoning? because as of right now, as far as I know, LLM's cant actually 'reason', they are just guesing the next token, but reasoning does not work like that.

@jorjiang1 Ай бұрын

how is the visualization in 2:01 made

@marshallodom1388 2 ай бұрын

You lost me when that guy pointed at the gravesite of his brother

@Ryu-ix8qs 2 ай бұрын

Great Video once again

@kendingel8941 2 ай бұрын

YES!!! NEW BYCLOUD VIDEO!!!

@zergosssss 2 ай бұрын

5k views after 3h is a shame, you deserve much more, go go go algorithm

@vinniepeterss 2 ай бұрын

great video!

@cvs2fan 2 ай бұрын

0:42 Undrinkable water my favorite :v

@setop123 2 ай бұрын

We might be onto something here... 👀

@RedOneM 2 ай бұрын

Seems as if the greatest optimisation for practical AI tech are dynamic mechanisms. Lifelong memory plus continuous learning would become game changers in the space. At this rate humanity will be able to leave machines behind, which are able to recall our biological era. At least something is able to carry on our legacy for at least hundreds of thousands of years.

@pedrogames443 2 ай бұрын

Bro is good

@ickorling7328 2 ай бұрын

Bro, but did you read about Lory? It merges models with soft merging building on several papers. Lory is new paint on a method developed for vision AI to make soft mergers possible for LLM's. ❤

@ickorling7328 2 ай бұрын

Whats really key about Lory is back propagation to update it's own weights, it's fine tuning itself at inference. It's also compatable with transformer or Mamba, or Mamba-2. In addition, it looks like Test Time Training could be used with all these methods for even more context awareness.

@myta1837 6 күн бұрын

Bot

@just..someone 2 ай бұрын

it should be MMoE ... massive mixture of experts XD

@maxpiau4004 2 ай бұрын

wow, top quality video

@nanow1990 2 ай бұрын

Peer doesn't scale, I've tried multiple times

@rishipandey6219 2 ай бұрын

i did'nt understand anything but sounded cool

@npc4416 2 ай бұрын

meanwhile meta having no moe

@大支爺 2 ай бұрын

Best based language models (multi Languages) + LoRAs is enough.

@narpwa 2 ай бұрын

what about 1T experts

@cwhyber5631 2 ай бұрын

Yes we need more MOM-eis 💀💀

@realtimestatic 2 ай бұрын

The thing about life long learning really reminds me of our human brains. Basically for every different thought or key combination it sounds like its building a seperate new model with all the required experts for said task. So basically like all relevant neurons we trained working on one thought to solve it, with the possibility of changing and adding new neurons. I can't see it going well if we keep increasing the number of experts forever tho, as the expert picking will become more and more fragmented. I think being able to forget certain thing would probably be useful too. I'm no scientist but I really do wonder how close this comes to the actual way our brain works.

@ricardocosta9336 2 ай бұрын

Dude! Ty❤

@koktszfung 2 ай бұрын

more like a mixture of a million toddlers

@revimfadli4666 2 ай бұрын

Shared expert isolation seems to be doing something similar to the value output in duelling networks; collecting the gradients for shared information so other subnets only need to account for the small tweaks. This mean the shared infirmation is learned faster, which in turn speeds up the learning of the tweaks

@thedevo01 2 ай бұрын

Your thumbnails are a bit too similar to Fireship

@Waffle_6 2 ай бұрын

also the entire composition of his videos, a little more than just taking inspiration lol

@MasamuneX 2 ай бұрын

wait till the use genetic programing with monty carlo tree search and UTP and other stuff on the router

@VincentKun 2 ай бұрын

1 millions beer

@imerence6290 2 ай бұрын

MoME ? Nah. MOMMY ✅🤤

@jondo7680 2 ай бұрын

I'm a big fan of

@picklenickil 2 ай бұрын

😂😂😂as a behavioral scientist.. i think this one is going straight to the crapper.. mark my words.😂😂😂

@borb5353 2 ай бұрын

i was like schizophrenic AI but then they went further............ anyway finally they are optimizing instead of making them bigger

@Summanis 2 ай бұрын

Eat your heart out Limitless, we're making AI smarter by having them use less of their "brain" at a time

@Words-. 2 ай бұрын

That million expert strategy sounds super cool. I'm not too knowledgeable, though it does seem to sound like it literally allows for a more liquid neural network by using the attention mechanism to literally pick neurons to be used. I feel like this will be the future of NNs.

@blockshift758 2 ай бұрын

Ill call it Moe(moe ehh) instead of em oh ih

@driedpotatoes 2 ай бұрын

too many cooks 🎶

@crisolivares7847 2 ай бұрын

fireship clone

@hglbrg 2 ай бұрын

Oh yeah "acidentally" added something to a graph they intended to show. Not just builing hype to inflate the bubble of nothing that is this whole business?

@mishl-dev 2 ай бұрын

why all comments before this one bots???

@OnTheThirdDay 2 ай бұрын

It's possible that it's because KZbin shaddow banned all the real comments.

@cesarsantos854 2 ай бұрын

@@OnTheThirdDayBut they can't ban bots...

@OnTheThirdDay 2 ай бұрын

@@cesarsantos854 I don't know why bots (and I mean, obvious bots) do not always get banned but half of my comments that I write out myself do.

@IllD. 2 ай бұрын

Don't see any bots 3 hours after this comment. Gj KZbin 👍

@OwenIngraham 2 ай бұрын

always bet on owens

@raunakchhatwal5350 2 ай бұрын

Honestly I think your old moe video was better.

@OnTheThirdDay 2 ай бұрын

I agree. Definitely more understandable and this one would be harder to follow without seeing that first.

@veyselbatmaz2123 2 ай бұрын

Good news: Digitalism is killing capitalism. A novel perspective, first in the world! Where is capitalism going? Digitalism vs. Capitalism: The New Ecumenical World Order: The Dimensions of State in Digitalism by Veysel Batmaz is available for sale on Internet.

@saltyBANDIT 2 ай бұрын

Temu fireship…oh I’ll watch it tho.

@OnTheThirdDay 2 ай бұрын

This channel seems to go into more detail and is more AI focused.

@UnemployMan396-xd7ov 2 ай бұрын

I knew it, your content so mid bro has to redemp it

@unimposings 2 ай бұрын

Dude Wake Up, AI is just a Stupid Buzzword! There is no AI.

@Waffle_6 2 ай бұрын

ive made my own transformer model before, as shitty as it was, it sorta worked. i agree that the term “ai” is misleading as its not sentient or anything like that. its just a really fancy autocomplete generator that understands surprisingly abstract and complex connections, relations, and context. but these models are real and arent just a million indians typing you essay for you , you can download models like llama to try it out locally

@bluehorizon9547 2 ай бұрын

Using MoE is an admission of failure. It means that they are unable to make a "smarter" model and have to rely on arbitrary gimmics.

@zrakonthekrakon494 2 ай бұрын

Not really, they are testing if it makes models smarter without having to do much more work

@a_soulspark 2 ай бұрын

I don't see it as a problem. if you think about it, all things in machine learning are just arbitrary gimmicks that happen to work out

@bluehorizon9547 2 ай бұрын

@@a_soulspark As a human if you understand N new disciplines you become N^2 more powerful because you can apply ideas from one field to any another. This is why you want a monolith not MoE. They chose MoE because they run into the wall, they can't improve the fundamentals so they have to use add-hoc measures just to boost the numbers.

@francisco444 2 ай бұрын

RLHF seems gimmicky but it worked. MoE might seem gimmicky, but it works. Multimodality might seems gimmicky but it works.

@bluehorizon9547 2 ай бұрын

@@zrakonthekrakon494 Nobody would even bother with MoE if they hadn't run into the wall. They did.