Dan Hendrycks on Catastrophic AI Risks

Рет қаралды 2,618

Күн бұрын

Dan Hendrycks joins the podcast again to discuss X.ai, how AI risk thinking has evolved, malicious use of AI, AI race dynamics between companies and between militaries, making AI organizations safer, and how representation engineering could help us understand AI traits like deception. You can learn more about Dan's work at www.safe.ai
Timestamps:
00:00 X.ai - Elon Musk's new AI venture
02:41 How AI risk thinking has evolved
12:58 AI bioengeneering
19:16 AI agents
24:55 Preventing autocracy
34:11 AI race - corporations and militaries
48:04 Bulletproofing AI organizations
1:07:51 Open-source models
1:15:35 Dan's textbook on AI safety
1:22:58 Rogue AI
1:28:09 LLMs and value specification
1:33:14 AI goal drift
1:41:10 Power-seeking AI
1:52:07 AI deception
1:57:53 Representation engineering

Пікірлер: 9

@kimholder 6 ай бұрын

I got a lot out of this and am reading the associated paper. I have some questions. Why isn't criminal liability also included?

@PauseAI 7 ай бұрын

Is there a source for Elon Musk's p(doom)?

@mrpicky1868 6 ай бұрын

he is much better in making a serious risk taken seriously then Eliezer. hope he does more interviews

@geaca3222 5 ай бұрын

I hope so too, he also recently published a very informative safety book online

@mrpicky1868 5 ай бұрын

books have no power .sadly so more interviews and broader public understanding is what will make difference @@geaca3222

@geaca3222 5 ай бұрын

@@mrpicky1868 Agree, but as an addition I think the online book is very helpful as a good source of information. It gives a concise overview of the CAIS research findings that is readily accessible for international AI safety agents and the general public. the website also offers courses on the subject.

@michaelsbeverly 7 ай бұрын

_Knock, knock!"_ "Who's there?" _"Hello Amazon, I'm agent of the court with service..."_ "This is about that destroying humanity thing?" _"That's right."_ "Yeah, um, about that..."

@Dan-dy8zp 3 ай бұрын

He doesn't provide any justification for why we should be more concerned about these problems than about the alignment of true super-intelligence, nor any for why he thinks we are in a 'medium take-off' situation, or why we should be replaced with a 'species' instead of a singleton. *(These programs don't mate. They are not related to each other. They don't age and die and replace themselves. One would probably triumph in the end, I think, however long that takes)*. I'm left with the impression he just likes to tackle easier problems. Though if the former problem, super-alignment, is totally intractable you could argue that it makes sense to focus on what is doable and just hope we get lucky about the alignment. He doesn't really make that argument though.