Stanford CS25: V4 I Demystifying Mixtral of Experts

  Рет қаралды 7,121

Stanford Online

Stanford Online

Күн бұрын

April 25, 2024
Speaker: Albert Jiang, Mistral AI / University of Cambridge
Demystifying Mixtral of Experts
In this talk I will introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combines their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. I will go into the architectural details and analyse the expert routing decisions made by the model.
About the speaker:
Albert Jiang is an AI scientist at Mistral AI, and a final-year PhD student at the computer science department of Cambridge University. He works on language model pretraining and reasoning at Mistral AI, and language models for mathematics at Cambridge.
More about the course can be found here: web.stanford.e...
View the entire CS25 Transformers United playlist: • Stanford CS25 - Transf...

Пікірлер: 6
@marknuggets
@marknuggets 4 ай бұрын
Cool format, Stanford quickly becomes my favorite blogger lol
@crwhhx
@crwhhx Ай бұрын
6:22 here “xq_LQH = wq(x_LD).view(L, N, H)” should be “xq_LQH = wq(x_LD).view(L, Q, H)” right?
@何孟飞
@何孟飞 4 ай бұрын
where to get slides
@acoustic_boii
@acoustic_boii 4 ай бұрын
Dear Stanford online recently I have completed product management course from Stanford online but i haven't got the certificate help me please how will I get the certificate
@Ethan_here230
@Ethan_here230 4 ай бұрын
Wait u will get it - Ethan from Stanford
@gemini_537
@gemini_537 2 ай бұрын
Gemini 1.5 Pro: The video is about demystifying mixture of experts (MoE) and Sparse Mixture of Experts (Smoe) models. The speaker, Albert Jang, who is a PhD student at the University of Cambridge and a scientist at Mistral AI, first introduces the concept of dense Transformer architecture. Then he dives into the details of Smoes. He explains that Smoes are a type of neural network architecture that can be more efficient than standard Transformers by using a gating network to route tokens to a subset of experts. This can be useful for training very large models with billions of parameters. Here are the key points from the talk: * Mixture of Experts (MoE) is a neural network architecture that uses a gating network to route tokens to a subset of experts. * Sparse Mixture of Experts (Smoe) is a type of MoE that can be more efficient than standard Transformers. * Smoes use a gating network to route tokens to a subset of experts, which can be more efficient than training a single large model. * Smoes are well-suited for training very large models with billions of parameters. The speaker also discusses some of the challenges of interpreting Smoes and the potential for future research in this area. Overall, the talk provides a good introduction to Smoes and their potential benefits for training large language models.
Stanford CS25: V4 I Aligning Open Language Models
1:16:21
Stanford Online
Рет қаралды 22 М.
Stanford ECON295/CS323 I 2024 I The AI Awakening, Erik Brynjolfsson
1:12:05
An Unknown Ending💪
00:49
ISSEI / いっせい
Рет қаралды 51 МЛН
Остановили аттракцион из-за дочки!
00:42
Victoria Portfolio
Рет қаралды 3,4 МЛН
小丑在游泳池做什么#short #angel #clown
00:13
Super Beauty team
Рет қаралды 42 МЛН
Stanford CS25: V4 I Hyung Won Chung of OpenAI
36:31
Stanford Online
Рет қаралды 189 М.
Workshop on Useful and Reliable AI Agents
3:32:19
Princeton Language & Intelligence
Рет қаралды 2,6 М.
Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI
1:17:07
Stanford Online
Рет қаралды 164 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 318 М.
General Relativity Lecture 1
1:49:28
Stanford
Рет қаралды 3,9 МЛН
Stanford CS25: V4 I Overview of Transformers
1:17:29
Stanford Online
Рет қаралды 55 М.
An Unknown Ending💪
00:49
ISSEI / いっせい
Рет қаралды 51 МЛН