29 - Science of Deep Learning with Vikrant Varma

  Рет қаралды 373

AXRP

AXRP

Күн бұрын

In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that generalizes. What's going on with these discoveries? Are they all they're cracked up to be, and if so, how are they working? In this episode, I talk to Vikrant Varma about his research getting to the bottom of these questions.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
Topics we discuss, and timestamps:
0:00:36 - Challenges with unsupervised LLM knowledge discovery, aka contra CCS
0:00:36 - What is CCS?
0:09:54 - Consistent and contrastive features other than model beliefs
0:20:34 - Understanding the banana/shed mystery
0:41:59 - Future CCS-like approaches
0:53:29 - CCS as principal component analysis
0:56:21 - Explaining grokking through circuit efficiency
0:57:44 - Why research science of deep learning?
1:12:07 - Summary of the paper's hypothesis
1:14:05 - What are 'circuits'?
1:20:48 - The role of complexity
1:24:07 - Many kinds of circuits
1:28:10 - How circuits are learned
1:38:24 - Semi-grokking and ungrokking
1:50:53 - Generalizing the results
1:58:51 - Vikrant's research approach
2:06:36 - The DeepMind alignment team
2:09:06 - Follow-up work
The transcript: axrp.net/episo...
Vikrant's Twitter/X account: / vikrantvarma_
Main papers:
- Challenges with unsupervised LLM knowledge discovery: arxiv.org/abs/...
- Explaining grokking through circuit efficiency: arxiv.org/abs/...
Other research discussed:
- Discovering latent knowledge in language models without supervision (CCS): arxiv.org/abs/...
- Eliciting Latent Knowledge: How to Tell if your Eyes Deceive You: docs.google.co...
- Discussion: Challenges with unsupervised LLM knowledge discovery: www.lesswrong....
- Comment thread on the banana/shed results: www.lesswrong....
- Fabien Roger, What discovering latent knowledge did and did not find: www.lesswrong....
- Scott Emmons, Contrast Pairs Drive the Performance of Contrast Consistent Search (CCS): www.lesswrong....
- Grokking: Generalizing Beyond Overfitting on Small Algorithmic Datasets: arxiv.org/abs/...
- Keeping Neural Networks Simple by Minimizing the Minimum Description Length of the Weights (Hinton 1993 L2): dl.acm.org/doi...
- Progress measures for grokking via mechanistic interpretability: arxiv.org/abs/...

Пікірлер: 1
@nowithinkyouknowyourewrong8675
@nowithinkyouknowyourewrong8675 4 ай бұрын
Love it, I've run CCS and I learned a lot. You guys had a tranquil vibe too
30 - AI Security with Jeffrey Ladish
2:15:44
AXRP
Рет қаралды 789
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 387 М.
An Unknown Ending💪
00:49
ISSEI / いっせい
Рет қаралды 51 МЛН
АЗАРТНИК 4 |СЕЗОН 1 Серия
40:47
Inter Production
Рет қаралды 1,4 МЛН
The mind behind Linux | Linus Torvalds | TED
21:31
TED
Рет қаралды 6 МЛН
24 - Superalignment with Jan Leike
2:08:29
AXRP
Рет қаралды 1,5 М.
Watching Neural Networks Learn
25:28
Emergent Garden
Рет қаралды 1,3 МЛН
Variational Autoencoders
15:05
Arxiv Insights
Рет қаралды 499 М.
Mechanistic Interpretability - NEEL NANDA (DeepMind)
3:57:44
Machine Learning Street Talk
Рет қаралды 40 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 317 М.
What Is an AI Anyway? | Mustafa Suleyman | TED
22:02
TED
Рет қаралды 1,5 МЛН
An Unknown Ending💪
00:49
ISSEI / いっせい
Рет қаралды 51 МЛН