Red Teaming o1 Part 2/2- Detecting Deception with Marius Hobbhahn of Apollo Research

  Рет қаралды 1,749

Cognitive Revolution "How AI Changes Everything"

Cognitive Revolution "How AI Changes Everything"

Күн бұрын

Пікірлер: 4
@andrewwalker8985
@andrewwalker8985 Ай бұрын
Is it not mildly concerning to be including the incremental testing steps of automated red teaming efforts into the transcripts that will be included in future ai training data? Presumably increasingly clever AI will be seeking out exactly this kind of interview in the same way the KZbin algorithm found it for me.
@NarasimhanMG
@NarasimhanMG Ай бұрын
If I, Frankenstein ;-) wanted to use all this knowledge of red team work, I could model using game theory, Nash equilibriums, prisoner dilemmas, etc. to beat the red teams with its own findings, no? The arms race between red teams and rogue agents can never end. Any new guard rails, deception detection methods, forensics etc is just more fodder for the cognitive arms race. How does an (admittedly fictitious) AI safety researcher know if Karla is not running a simulation of George Smiley's operations, long game, levels of patience and patterns of deception? To be sure, I am on the side of the red teams, but can the cycle of publishing "un-safe" findings and the feedback to the AGI ecosystem go sideways?
@NistenTahiraj
@NistenTahiraj Ай бұрын
6 week? lol it literally took me 6 prompts to jb its reasoning steps on uncle elons twitter
@charliesteiner2334
@charliesteiner2334 Ай бұрын
Maybe you don't understand what they were doing. Their job is not just to prompt the model so it says bad words. The job is to measure the capabilities of the model, and put it in situations where it might do bad things even without the user deliberately trying to jailbreak it.
Red Teaming o1 Part 1/2-Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang
1:06:15
Cognitive Revolution "How AI Changes Everything"
Рет қаралды 3,1 М.
The Evolution Revolution: Scouting Frontiers in AI for Biology with Brian Hie
1:18:33
Cognitive Revolution "How AI Changes Everything"
Рет қаралды 1,2 М.
My MEAN sister annoys me! 😡 Use this gadget #hack
00:24
JOON
Рет қаралды 2,8 МЛН
Will A Basketball Boat Hold My Weight?
00:30
MrBeast
Рет қаралды 149 МЛН
Disrespect or Respect 💔❤️
00:27
Thiago Productions
Рет қаралды 31 МЛН
小路飞还不知道他把路飞给擦没有了 #路飞#海贼王
00:32
路飞与唐舞桐
Рет қаралды 72 МЛН
AI Live Players: the Geopolitics & Strategic Dynamics of AI, with Samo Burja of Bismarck Analysis
1:23:57
Cognitive Revolution "How AI Changes Everything"
Рет қаралды 2,2 М.
Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, AlphaFold
1:01:34
Yuval Noah Harari - New Book "Nexus" Will AI Kill Democracy?
1:32:06
Scaling Forecasting: AI Forecasting Tournaments & Road to Epistemic Security, with Deger Turan
1:54:17
Cognitive Revolution "How AI Changes Everything"
Рет қаралды 1,2 М.
My MEAN sister annoys me! 😡 Use this gadget #hack
00:24
JOON
Рет қаралды 2,8 МЛН