Рет қаралды 404
What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
Topics we discuss, and timestamps:
0:00:26 - What is singular learning theory?
0:16:00 - Phase transitions
0:35:12 - Estimating the local learning coefficient
0:44:37 - Singular learning theory and generalization
1:00:39 - Singular learning theory vs other deep learning theory
1:17:06 - How singular learning theory hit AI alignment
1:33:12 - Payoffs of singular learning theory for AI alignment
1:59:36 - Does singular learning theory advance AI capabilities?
2:13:02 - Open problems in singular learning theory for AI alignment
2:20:53 - What is the singular fluctuation?
2:25:33 - How geometry relates to information
2:30:13 - Following Daniel Murfet's work
The transcript: axrp.net/episode/2024/05/07/e...
Daniel Murfet's twitter/X account: / danielmurfet
Developmental interpretability website: devinterp.com
Developmental interpretability KZbin channel: / @devinterp
Main research discussed in this episode:
- Developmental Landscape of In-Context Learning: arxiv.org/abs/2402.02364
- Estimating the Local Learning Coefficient at Scale: arxiv.org/abs/2402.03698
- Simple versus Short: Higher-order degeneracy and error-correction: www.lesswrong.com/posts/nWRj6...
Other links:
- Algebraic Geometry and Statistical Learning Theory (the grey book): www.cambridge.org/core/books/...
- Mathematical Theory of Bayesian Statistics (the green book): www.routledge.com/Mathematica...
- In-context learning and induction heads: transformer-circuits.pub/2022...
- Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity: arxiv.org/abs/2106.15933
- A mathematical theory of semantic development in deep neural networks: www.pnas.org/doi/abs/10.1073/...
- Consideration on the Learning Efficiency Of Multiple-Layered Neural Networks with Linear Units: papers.ssrn.com/sol3/papers.c...
- Neural Tangent Kernel: Convergence and Generalization in Neural Networks: arxiv.org/abs/1806.07572
- The Interpolating Information Criterion for Overparameterized Models: arxiv.org/abs/2307.07785
- Feature Learning in Infinite-Width Neural Networks: arxiv.org/abs/2011.14522
- A central AI alignment problem: capabilities generalization, and the sharp left turn: www.lesswrong.com/posts/GNhMP...
- Quantifying degeneracy in singular models via the learning coefficient: arxiv.org/abs/2308.12108