Red Teaming o1 Part 2/2- Detecting Deception with Marius Hobbhahn of Apollo Research

Red Teaming o1 Part 1/2-Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang

The Evolution Revolution: Scouting Frontiers in AI for Biology with Brian Hie

My MEAN sister annoys me! 😡 Use this gadget #hack

Will A Basketball Boat Hold My Weight?

Disrespect or Respect 💔❤️

小路飞还不知道他把路飞给擦没有了 #路飞#海贼王

Red Teaming o1 Part 2/2- Detecting Deception with Marius Hobbhahn of Apollo Research

Рет қаралды 1,749

Cognitive Revolution "How AI Changes Everything"

Cognitive Revolution "How AI Changes Everything"

Күн бұрын

Пікірлер: 4

@andrewwalker8985

@andrewwalker8985 Ай бұрын

Is it not mildly concerning to be including the incremental testing steps of automated red teaming efforts into the transcripts that will be included in future ai training data? Presumably increasingly clever AI will be seeking out exactly this kind of interview in the same way the KZbin algorithm found it for me.

@NarasimhanMG Ай бұрын

If I, Frankenstein ;-) wanted to use all this knowledge of red team work, I could model using game theory, Nash equilibriums, prisoner dilemmas, etc. to beat the red teams with its own findings, no? The arms race between red teams and rogue agents can never end. Any new guard rails, deception detection methods, forensics etc is just more fodder for the cognitive arms race. How does an (admittedly fictitious) AI safety researcher know if Karla is not running a simulation of George Smiley's operations, long game, levels of patience and patterns of deception? To be sure, I am on the side of the red teams, but can the cycle of publishing "un-safe" findings and the feedback to the AGI ecosystem go sideways?

@NistenTahiraj Ай бұрын

6 week? lol it literally took me 6 prompts to jb its reasoning steps on uncle elons twitter

@charliesteiner2334

@charliesteiner2334 Ай бұрын

Maybe you don't understand what they were doing. Their job is not just to prompt the model so it says bad words. The job is to measure the capabilities of the model, and put it in situations where it might do bad things even without the user deliberately trying to jailbreak it.

Red Teaming o1 Part 1/2-Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang

1:06:15

Red Teaming o1 Part 1/2-Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang

Cognitive Revolution "How AI Changes Everything"

Рет қаралды 3,1 М.

The Evolution Revolution: Scouting Frontiers in AI for Biology with Brian Hie

1:18:33

The Evolution Revolution: Scouting Frontiers in AI for Biology with Brian Hie

Cognitive Revolution "How AI Changes Everything"

Рет қаралды 1,2 М.

My MEAN sister annoys me! 😡 Use this gadget #hack

00:24

My MEAN sister annoys me! 😡 Use this gadget #hack

JOON

Рет қаралды 2,8 МЛН

Will A Basketball Boat Hold My Weight?

00:30

Will A Basketball Boat Hold My Weight?

MrBeast

Рет қаралды 149 МЛН

Disrespect or Respect 💔❤️

00:27

Disrespect or Respect 💔❤️

Thiago Productions

Рет қаралды 31 МЛН

小路飞还不知道他把路飞给擦没有了 #路飞#海贼王

00:32

小路飞还不知道他把路飞给擦没有了 #路飞#海贼王

路飞与唐舞桐

Рет қаралды 72 МЛН

AI Live Players: the Geopolitics & Strategic Dynamics of AI, with Samo Burja of Bismarck Analysis

1:23:57

AI Live Players: the Geopolitics & Strategic Dynamics of AI, with Samo Burja of Bismarck Analysis

Cognitive Revolution "How AI Changes Everything"

Рет қаралды 2,2 М.

Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, AlphaFold

1:01:34

Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, AlphaFold

Dwarkesh Patel

Рет қаралды 174 М.

Michael I. Jordan: Machine Learning, Recommender Systems, and Future of AI | Lex Fridman Podcast #74

1:45:49

Michael I. Jordan: Machine Learning, Recommender Systems, and Future of AI | Lex Fridman Podcast #74

Lex Fridman

Рет қаралды 168 М.

Mo Gawdat on AI: The Future of AI and How It Will Shape Our World

47:41

Mo Gawdat on AI: The Future of AI and How It Will Shape Our World

Mo Gawdat

Рет қаралды 253 М.

Yuval Noah Harari - New Book "Nexus" Will AI Kill Democracy?

1:32:06

Yuval Noah Harari - New Book "Nexus" Will AI Kill Democracy?

Bankless

Рет қаралды 43 М.

Max Tegmark | On superhuman AI, future architectures, and the meaning of human existence

44:54

Max Tegmark | On superhuman AI, future architectures, and the meaning of human existence

Sana

Рет қаралды 99 М.

Francois Chollet - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution

1:34:40

Francois Chollet - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution

Dwarkesh Patel

Рет қаралды 153 М.

Scaling Forecasting: AI Forecasting Tournaments & Road to Epistemic Security, with Deger Turan

1:54:17

Scaling Forecasting: AI Forecasting Tournaments & Road to Epistemic Security, with Deger Turan

Cognitive Revolution "How AI Changes Everything"

Рет қаралды 1,2 М.

Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

3:28:48

Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Lex Fridman

Рет қаралды 3,4 МЛН

Joscha Bach: Consciousness, Artificial Intelligence, and the Threat of AI Apocalypse

2:15:53

Joscha Bach: Consciousness, Artificial Intelligence, and the Threat of AI Apocalypse

Robinson Erhardt

Рет қаралды 43 М.

My MEAN sister annoys me! 😡 Use this gadget #hack

00:24

My MEAN sister annoys me! 😡 Use this gadget #hack

JOON

Рет қаралды 2,8 МЛН