David Bau-Editing Facts in GPT, Interpretability

  Рет қаралды 1,161

The Inside View

The Inside View

Күн бұрын

David Bau is an Assistant Professor studying the structure and interpretation of deep networks, and the co-author on "Locating and Editing Factual Associations in GPT" which introduced Rank-One Model Editing (ROME), a method that allows users to alter the weights of a GPT model, for instance by forcing it to output that the Eiffel Tower is in Rome.
David is a leading researcher in interpretability, with an interest in how this could help AI Safety. The main thesis of David's lab is that understanding the rich internal structure of deep networks is a grand and fundamental research question with many practical implications, and they aim to lay the groundwork for human-AI collaborative software engineering, where humans and machine-learned models both teach and learn from each other.
David's lab: baulab.info/
Patron: / theinsideview
Twitter: / michaeltrazzi
Website: theinsideview.ai
OUTLINE
00:00 Intro
01:16 Interpretability
02:27 AI Safety, Out of Domain behavior
04:23 On the difficulty which AI application might become dangerous or impactful
06:00 ROME / Locating and Editing Factual Associations in GPT
13:04 Background story for the ROME paper
15:41 Twitter Q: where does key value abstraction break down in LLMs?
19:03 Twitter Q: what are the tradeoffs in studying the largest models?
20:22 Twitter Q: are there competitive and cleaner architectures than the transformer?
21:15 Twitter Q: is decoder-only a contributor to the messiness? or is time-dependence beneficial?
22:45 Twitter Q: how could ROME deal with superposition?
23:30 Twitter Q: where is the Eiffel tower actually located?

Пікірлер: 4
@abstract9580
@abstract9580 10 ай бұрын
Davids original work changed my perspective on tractability of explainability, and has shaped my current research interest. Would love to work under his group.
@radicalengineer2331
@radicalengineer2331 Ай бұрын
I am also aspiring to work in interpretability, may I get your LinkedIn?
@TheInsideView
@TheInsideView 10 ай бұрын
OUTLINE 01:16 Interpretability 02:27 AI Safety, Out of Domain behavior 04:23 It's difficult to predict which AI application might become dangerous or impactful 06:00 ROME / Locating and Editing Factual Associations in GPT 13:04 Background story for the ROME paper 15:41 Twitter Q: where does key value abstraction break down in LLMs? 19:03 Twitter Q: what are the tradeoffs in studying the largest models? 20:22 Twitter Q: are there competitive and cleaner architectures than the transformer? 21:15 Twitter Q: is decoder-only a contributor to the messiness? or is time-dependence beneficial? 22:45 Twitter Q: how could ROME deal with superposition? 23:30 Twitter Q: where is the Eiffel tower actually located?
@rgs2007
@rgs2007 17 күн бұрын
I wonder how we could make the model understand Bill Gates is the founder of Microsoft and Microsoft was founded by Bill Gates as one thing? How smaller the models can become and how faster they could perform?
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Please be kind🙏
00:34
ISSEI / いっせい
Рет қаралды 91 МЛН
Универ. 10 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:04:59
Комедии 2023
Рет қаралды 2,1 МЛН
ROCK PAPER SCISSOR! (55 MLN SUBS!) feat @PANDAGIRLOFFICIAL #shorts
00:31
David Bau - Direct Model Editing and Mechanistic Interpretability
1:00:29
Adam Gleave - Vulnerabilities in GPT-4 APIs & Superhuman Go AIs
2:16:09
ChatGPT: 30 Year History | How AI Learned to Talk
26:55
Art of the Problem
Рет қаралды 1 МЛН
Anthropic Solved Interpretability Again? (Walkthrough)
31:23
The Inside View
Рет қаралды 1,4 М.
GPT Engineer: Things Are Starting to Get Weird
10:14
ArjanCodes
Рет қаралды 841 М.
What is RAG? (Retrieval Augmented Generation)
11:37
Don Woodlock
Рет қаралды 95 М.
How To Unlock Your iphone With Your Voice
0:34
요루퐁 yorupong
Рет қаралды 25 МЛН
Ждёшь обновление IOS 18? #ios #ios18 #айоэс #apple #iphone #айфон
0:57