OUTLINE 01:12 Owain's Research Agenda 02:25 Defining Situational Awareness 03:30 Safety Motivation 04:58 Why Release A Dataset 06:17 Risks From Releasing It 10:03 Claude 3 on the Longform Task 14:57 Needle in a Haystack 19:23 Situating Prompt 23:08 Deceptive Alignment Precursor 30:12 Distribution Over Two Random Words 34:36 Discontinuing a 01 sequence 40:20 GPT-4 Base On the Longform Task 46:44 Human-AI Data in GPT-4's Pretraining 49:25 Are Longform Task Questions Unusual 51:48 When Will Situational Awareness Saturate 53:36 Safety And Governance Implications Of Saturation 56:17 Evaluation Implications Of Saturation 57:40 Follow-up Work On The Situational Awarenss Dataset 01:00:04 Would Removing Chain-Of-Thought Work? 01:02:18 Out-of-Context Reasoning: the "Connecting the Dots" paper 01:05:15 Experimental Setup 01:07:46 Concrete Function Example: 3x + 1 01:11:23 Isn't It Just A Simple Mapping? 01:17:20 Safety Motivation 01:22:40 Out-Of-Context Reasoning Results Were Surprising 01:24:51 The Biased Coin Task 01:27:00 Will Out-Of-Context Resaoning Scale 01:32:50 Checking If In-Context Learning Work 01:34:33 Mixture-Of-Functions 01:38:24 Infering New Architectures From ArXiv 01:43:52 Twitter Questions 01:44:27 How Does Owain Come Up With Ideas? 01:49:44 How Did Owain's Background Influence His Research Style And Taste? 01:52:06 Should AI Alignment Researchers Aim For Publication? 01:57:01 How Can We Apply LLM Understanding To Mitigate Deceptive Alignment? 01:58:52 Could Owain's Research Accelerate Capabilities? 02:08:44 How Was Owain's Work Received? 02:13:23 Last Message
@Max-bh1pl3 ай бұрын
Finally, a new episode! I've been eagerly waiting for this!
@MrCheeze3 ай бұрын
We're so barack
@human_shaped3 ай бұрын
Really very interesting. It's good to let AIs know how they're being tested so they can take that into consideration too. Thanks for the transcript ;)
@simonstrandgaard55033 ай бұрын
great interview
@TheJokerReturns2 ай бұрын
I'll like to see if we can coordinate on podcasts. How can we best reach you?