Adam Gleave - Adversarial Robustness of Superhuman AI Systems

  Рет қаралды 91

UCL DARK

UCL DARK

Күн бұрын

Invited talk by Adam Gleave on September 16, 2024 at UCL DARK.
Title:
Adversarial Robustness of Superhuman AI Systems
Abstract:
A combination of algorithmic advances and increased model, dataset size and training compute have produced increasingly capable models in the average-case, even achieving superhuman performance in a wide variety of tasks. However, safety-critical tasks demand not just good average-case performance, but worst-case guarantees. We will start by sharing vulnerabilities we discovered in superhuman Go AIs, and our attempts to defend them. We will then turn our attention to jailbreaks in LLMs, comparing scaling trends in capabilities and robustness. Our results suggest that model scale alone does little to improve robustness - but that defences such as adversarial training are more sample efficient in larger models.

Пікірлер
What's the future for generative AI? - The Turing Lectures with Mike Wooldridge
1:00:59
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН
What if all the world's biggest problems have the same solution?
24:52
The Race For Chip Dominance | CNBC Marathon
1:09:51
CNBC
Рет қаралды 39 М.
AI: Grappling with a New Kind of Intelligence
1:55:51
World Science Festival
Рет қаралды 811 М.
Deep Learning Interview Prep Course
3:59:50
freeCodeCamp.org
Рет қаралды 540 М.
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН