xLSTM: Extended Long Short-Term Memory

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Players push long pins through a cardboard box attempting to pop the balloon!

Don't underestimate anyone

Yay😃 Let's make a Cute Handbag for me 👜 #diycrafts #shorts

Чистка воды совком от денег

xLSTM: Extended Long Short-Term Memory

Рет қаралды 2,037

Gabriel Mongaras

Gabriel Mongaras

Күн бұрын

Пікірлер: 5

@gabrielmongaras

@gabrielmongaras 6 ай бұрын

Forgot to mention, you just stack sLSTM/mLSTM layers similar to a transformer, like usual 😏 The sLSTM uses a transformer-like block and the mLSTM uses a SSM-like block which can be seen in section 2.4.

@acasualviewer5861

@acasualviewer5861 6 ай бұрын

Is it slow to train like LSTMs and RNNs are? A major benefit from Transformers is faster parallelized training. I would assume xLSTMs would be constrained by their sequential nature.

@gabrielmongaras

@gabrielmongaras 6 ай бұрын

Yep, should still be slow to train. I don't see any way to make one of the cells into something parallel like a transformer since the cells are so complicated.

@-slt 6 ай бұрын

constant movement of the screen makes my (and sure many others) head to explode. please move a little less. zoom in and out less. it helps the viewer to focus on the text and your explanation. thanks. :)

@gabrielmongaras

@gabrielmongaras 6 ай бұрын

Thanks for the feedback! Will keep this in mind next time I'm recording

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

46:25

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

Gabriel Mongaras

Рет қаралды 6 М.

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

1:14:43

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Gabriel Mongaras

Рет қаралды 9 М.

Players push long pins through a cardboard box attempting to pop the balloon!

00:31

Players push long pins through a cardboard box attempting to pop the balloon!

Daily Viral Brief

Рет қаралды 44 МЛН

Don't underestimate anyone

00:47

Don't underestimate anyone

奇軒Tricking

Рет қаралды 23 МЛН

Yay😃 Let's make a Cute Handbag for me 👜 #diycrafts #shorts

00:33

Yay😃 Let's make a Cute Handbag for me 👜 #diycrafts #shorts

LearnToon - Learn & Play

Рет қаралды 117 МЛН

Чистка воды совком от денег

00:32

Чистка воды совком от денег

FD Vasya

Рет қаралды 3,7 МЛН

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

1:02:30

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Gabriel Mongaras

Рет қаралды 4,6 М.

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

35:52

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Gabriel Mongaras

Рет қаралды 2,8 М.

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

40:14

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Gabriel Mongaras

Рет қаралды 2,1 М.

Capacitance of a Capacitor with Partially Filling Dielectric

2:00:36

Capacitance of a Capacitor with Partially Filling Dielectric

PhysicsSolutions

Рет қаралды 4

CoPE - Contextual Position Encoding: Learning to Count What's Important

38:55

CoPE - Contextual Position Encoding: Learning to Count What's Important

Gabriel Mongaras

Рет қаралды 1,3 М.

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

32:49

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Gabriel Mongaras

Рет қаралды 3,7 М.

Round and Round We Go! What makes Rotary Positional Encodings useful?

32:31

Round and Round We Go! What makes Rotary Positional Encodings useful?

Gabriel Mongaras

Рет қаралды 730

Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt

1:13:10

Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt

Gabriel Mongaras

Рет қаралды 1,4 М.

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

25:22

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Gabriel Mongaras

Рет қаралды 1,2 М.

Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction

37:00

Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction

Gabriel Mongaras

Рет қаралды 2,1 М.

Players push long pins through a cardboard box attempting to pop the balloon!

00:31

Players push long pins through a cardboard box attempting to pop the balloon!

Daily Viral Brief

Рет қаралды 44 МЛН