Adam Gleave - Adversarial Robustness of Superhuman AI Systems

Maciej and Bartek - Fine-tuning Reinforcement Learning Models is a Forgetting Mitigation Problem

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

How to treat Acne💉

ЧТО ЖЕ МЫ КУПИЛИ СОБАКЕ ВМЕСТО ТАБАЛАПОК😱#shorts

Try this prank with your friends 😂 @karina-kola

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

Adam Gleave - Adversarial Robustness of Superhuman AI Systems

Рет қаралды 91

UCL DARK

Күн бұрын

Invited talk by Adam Gleave on September 16, 2024 at UCL DARK.
Title:
Adversarial Robustness of Superhuman AI Systems
Abstract:
A combination of algorithmic advances and increased model, dataset size and training compute have produced increasingly capable models in the average-case, even achieving superhuman performance in a wide variety of tasks. However, safety-critical tasks demand not just good average-case performance, but worst-case guarantees. We will start by sharing vulnerabilities we discovered in superhuman Go AIs, and our attempts to defend them. We will then turn our attention to jailbreaks in LLMs, comparing scaling trends in capabilities and robustness. Our results suggest that model scale alone does little to improve robustness - but that defences such as adversarial training are more sample efficient in larger models.

Пікірлер

Maciej and Bartek - Fine-tuning Reinforcement Learning Models is a Forgetting Mitigation Problem

47:52

Maciej and Bartek - Fine-tuning Reinforcement Learning Models is a Forgetting Mitigation Problem

UCL DARK

Рет қаралды 626

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

1:00:59

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

The Royal Institution

Рет қаралды 539 М.

How to treat Acne💉

00:31

How to treat Acne💉

ISSEI / いっせい

Рет қаралды 108 МЛН

ЧТО ЖЕ МЫ КУПИЛИ СОБАКЕ ВМЕСТО ТАБАЛАПОК😱#shorts

00:34

ЧТО ЖЕ МЫ КУПИЛИ СОБАКЕ ВМЕСТО ТАБАЛАПОК😱#shorts

INNA SERG

Рет қаралды 7 МЛН

Try this prank with your friends 😂 @karina-kola

00:18

Try this prank with your friends 😂 @karina-kola

Andrey Grechka

Рет қаралды 9 МЛН

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

01:00

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

БЕЗУМНЫЙ СПОРТ

Рет қаралды 2 МЛН

Mastering Pester: PowerShell Unit and Infrastructure Testing by Jaap Brasser, Justin Grote

3:01:46

Mastering Pester: PowerShell Unit and Infrastructure Testing by Jaap Brasser, Justin Grote

Confreaks

Рет қаралды 8

Matthew Fontaine - Quality Diversity Scenario Generation for Robust Autonomy @ UCL DARK

40:33

Matthew Fontaine - Quality Diversity Scenario Generation for Robust Autonomy @ UCL DARK

UCL DARK

Рет қаралды 421

What if all the world's biggest problems have the same solution?

24:52

What if all the world's biggest problems have the same solution?

Veritasium

Рет қаралды 1,2 МЛН

Anssi Kanervisto - After 8 years, Minecraft continues to push the frontier of AI

41:37

Anssi Kanervisto - After 8 years, Minecraft continues to push the frontier of AI

UCL DARK

Рет қаралды 308

The Race For Chip Dominance | CNBC Marathon

1:09:51

The Race For Chip Dominance | CNBC Marathon

CNBC

Рет қаралды 39 М.

Single Systems | Understanding Quantum Information & Computation - Lesson 01

1:10:02

Single Systems | Understanding Quantum Information & Computation - Lesson 01

Qiskit

Рет қаралды 211 М.

AI: Grappling with a New Kind of Intelligence

1:55:51

AI: Grappling with a New Kind of Intelligence

World Science Festival

Рет қаралды 811 М.

Kenneth O. Stanley - Novel Opportunities in Open-Endedness @ UCL DARK

52:22

Kenneth O. Stanley - Novel Opportunities in Open-Endedness @ UCL DARK

UCL DARK

Рет қаралды 2,4 М.

Deep Learning Interview Prep Course

3:59:50

Deep Learning Interview Prep Course

freeCodeCamp.org

Рет қаралды 540 М.

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

58:06

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Online

Рет қаралды 113 М.

How to treat Acne💉

00:31

How to treat Acne💉

ISSEI / いっせい

Рет қаралды 108 МЛН