Stanford CS25: V4 I Demystifying Mixtral of Experts

  Рет қаралды 4,729

Stanford Online

Stanford Online

Ай бұрын

April 25, 2024
Speaker: Albert Jiang, Mistral AI / University of Cambridge
Demystifying Mixtral of Experts
In this talk I will introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combines their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. I will go into the architectural details and analyse the expert routing decisions made by the model.
About the speaker:
Albert Jiang is an AI scientist at Mistral AI, and a final-year PhD student at the computer science department of Cambridge University. He works on language model pretraining and reasoning at Mistral AI, and language models for mathematics at Cambridge.
More about the course can be found here: web.stanford.edu/class/cs25/
View the entire CS25 Transformers United playlist: • Stanford CS25 - Transf...

Пікірлер: 4
@marknuggets
@marknuggets 29 күн бұрын
Cool format, Stanford quickly becomes my favorite blogger lol
@user-uy4rx3hs3x
@user-uy4rx3hs3x 23 күн бұрын
where to get slides
@acoustic_boii
@acoustic_boii 29 күн бұрын
Dear Stanford online recently I have completed product management course from Stanford online but i haven't got the certificate help me please how will I get the certificate
@Ethan_here230
@Ethan_here230 28 күн бұрын
Wait u will get it - Ethan from Stanford
Stanford CS25: V4 I Hyung Won Chung of OpenAI
36:31
Stanford Online
Рет қаралды 45 М.
Mind-bending new programming language for GPUs just dropped...
4:01
Cute Barbie Gadget 🥰 #gadgets
01:00
FLIP FLOP Hacks
Рет қаралды 51 МЛН
A pack of chips with a surprise 🤣😍❤️ #demariki
00:14
Demariki
Рет қаралды 35 МЛН
Watermelon Cat?! 🙀 #cat #cute #kitten
00:56
Stocat
Рет қаралды 26 МЛН
Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI
1:17:07
Stanford Online
Рет қаралды 143 М.
What's next for AI agentic workflows ft. Andrew Ng of AI Fund
13:40
Sequoia Capital
Рет қаралды 249 М.
When Computers Write Proofs, What's the Point of Mathematicians?
6:34
Quanta Magazine
Рет қаралды 375 М.
A Path Towards Autonomous Machine Intelligence with Dr. Yann LeCun
1:03:05
AFOSR, Air Force Office of Scientific Research
Рет қаралды 18 М.
Spectral Graph Theory For Dummies
28:17
Ron & Math
Рет қаралды 43 М.
10 weird algorithms
9:06
Fireship
Рет қаралды 1,1 МЛН
Cute Barbie Gadget 🥰 #gadgets
01:00
FLIP FLOP Hacks
Рет қаралды 51 МЛН