34 - AI Evaluations with Beth Barnes

  Рет қаралды 629

AXRP

AXRP

Күн бұрын

How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barnes, founder of and head of research at METR, about these questions and more.
Patreon: / axrpodcast
Ko-fi: ko-fi.com/axrp...
The transcript: axrp.net/episo...
Topics we discuss, and timestamps:
0:00:37 - What is METR?
0:02:44 - What is an "eval"?
0:14:42 - How good are evals?
0:37:25 - Are models showing their full capabilities?
0:53:25 - Evaluating alignment
1:01:38 - Existential safety methodology
1:12:13 - Threat models and capability buffers
1:38:25 - METR's policy work
1:48:19 - METR's relationships with labs
2:04:12 - Related research
2:10:02 - Roles at METR, and following METR's work
Links for METR:
METR: metr.org
METR Task Development Guide - Bounty: taskdev.metr.o...
METR - Hiring: metr.org/hiring
Autonomy evaluation resources: metr.org/blog/...
Other links:
Update on ARC's recent eval efforts (contains GPT-4 taskrabbit captcha story) metr.org/blog/...
Password-locked models: a stress case for capabilities evaluation: www.alignmentf...
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training: arxiv.org/abs/...
Untrusted smart models and trusted dumb models: www.alignmentf...
AI companies aren't really using external evaluators: www.lesswrong....
Nobody Knows How to Safety-Test AI (Time): time.com/69588...
ChatGPT can talk, but OpenAI employees sure can’t: www.vox.com/fu...
Leaked OpenAI documents reveal aggressive tactics toward former employees: www.vox.com/fu...
Beth on her non-disparagement agreement with OpenAI: www.lesswrong....
Sam Altman's statement on OpenAI equity: x.com/sama/sta...

Пікірлер
RORY SUTHERLAND’S 10 RULES OF ALCHEMY
18:38
Ebury Reads
Рет қаралды 149 М.
The "Modern Day Slaves" Of The AI Tech World
52:42
Real Stories
Рет қаралды 617 М.
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 25 МЛН
24 - Superalignment with Jan Leike
2:08:29
AXRP
Рет қаралды 1,5 М.
How to Evaluate LLM Performance for Domain-Specific Use Cases
56:43
What Is an AI Anyway? | Mustafa Suleyman | TED
22:02
TED
Рет қаралды 1,5 МЛН
28 - Suing Labs for AI Risk with Gabriel Weil
1:57:30
The A.I. Dilemma - March 9, 2023
1:07:31
Center for Humane Technology
Рет қаралды 3,4 МЛН