Paul Christiano: Formalizing Explanations of Neural Network Behaviors

  Рет қаралды 554

Sydney Mathematical Research Institute - SMRI

Sydney Mathematical Research Institute - SMRI

Күн бұрын

Paul Christiano (Alignment Research Center): October 26
Abstract: Existing research on mechanistic interpretability usually tries to develop an informal human understanding of “how a model works,” making it hard to evaluate research results and raising concerns about scalability. Meanwhile formal proofs of model properties seem far out of reach both in theory and practice. In this talk I’ll discuss an alternative strategy for “explaining” a particular behavior of a given neural network. This notion is much weaker than proving that the network exhibits the behavior, but may still provide similar safety benefits. This talk will primarily motivate a research direction and a set of theoretical questions rather than presenting results.
Course homepage: sites.google.c...

Пікірлер
Francois Charton: Transformers for maths, and maths for transformers
1:18:02
Sydney Mathematical Research Institute - SMRI
Рет қаралды 595
Neel Nanda: Mechanistic Interpretability & Mathematics
56:33
Sydney Mathematical Research Institute - SMRI
Рет қаралды 2,3 М.
How Strong is Tin Foil? 💪
00:26
Preston
Рет қаралды 88 МЛН
LIFEHACK😳 Rate our backpacks 1-10 😜🔥🎒
00:13
Diana Belitskay
Рет қаралды 3,3 МЛН
отомстил?
00:56
История одного вокалиста
Рет қаралды 6 МЛН
Francis Su: Sperner's Lemma - A generalization with surprising applications
49:52
Sydney Mathematical Research Institute - SMRI
Рет қаралды 195
We shouldn't build conscious AIs - Paul Christiano
6:39
Dwarkesh Patel
Рет қаралды 2,8 М.
How are memories stored in neural networks? | The Hopfield Network #SoME2
15:14
Watching Neural Networks Learn
25:28
Emergent Garden
Рет қаралды 1,3 МЛН
Shane G. Henderson: A Tutorial and Perspectives on Monte Carlo Simulation Optimization
47:44
Sydney Mathematical Research Institute - SMRI
Рет қаралды 237
'Zeros' by Andrej Bauer
50:08
Sydney Mathematical Research Institute - SMRI
Рет қаралды 461
D-Modules Course: Week 16
1:10:22
Sydney Mathematical Research Institute - SMRI
Рет қаралды 106
The Platonic Representation Hypothesis
1:09:11
MITCBMM
Рет қаралды 3,2 М.
How Strong is Tin Foil? 💪
00:26
Preston
Рет қаралды 88 МЛН