EPFL AI Center - Adversarial attacks as a baby version of A(G)I alignment - Stanislav Fort

  Рет қаралды 53

EPFL AI Center

EPFL AI Center

Күн бұрын

This talk is part of the AI Fundamentals seminar series organized by the EPFL AI Center.
Title
Adversarial attacks as a baby version of A(G)I alignment
Abstract
Adversarial attacks pose a significant challenge to the robustness, reliability and alignment of deep neural networks from simple computer vision to hundred-billion-parameter language models. Despite their ubiquitous nature, our theoretical understanding of their character and ultimate causes, as well as our ability to successfully defend against them, are noticeably lacking. This talk examines the robustness of modern deep learning methods and the surprising scaling of attacks on them, and showcases several practical examples of transferable attacks on the largest closed-source vision-language models out there. Building on biological insights and new empirical evidence, I will introduce our solution proposed in [1], in which we make a step towards the alignment of the implicit human and the explicit machine vision representations, closely connecting interpretability and robustness. I will conclude with a direct analogy between the problem of adversarial examples and the much larger task of general AI alignment.
[1] Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness. Stanislav Fort, Balaji Lakshminarayanan
Bio
Stanislav Fort is a senior research scientist at Google DeepMind, specializing in robustness, interpretability and safety. He received his PhD in 2022 from Stanford University with Prof. Surya Ganguli. In the past, Stanislav spent time at Google Brain as an AI Resident, worked on the Claude model at Anthropic, and led the language model team at Stability AI. He received his Bachelor's and Master's degrees in theoretical physics from the University of Cambridge.
Academic publications: scholar.google...
Personal website: stanislavfort....

Пікірлер
Optical Fourier Surfaces for Photonic Applications - Webinar by Yannik Glauser
41:40
Job Dekker- Mechanisms of Chromosome Folding
1:14:24
Center for Physical Genomics and Engineering
Рет қаралды 255
Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨
00:21
Two More French
Рет қаралды 42 МЛН
1% vs 100% #beatbox #tiktok
01:10
BeatboxJCOP
Рет қаралды 67 МЛН
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН
NOW? Jurg Conzett in Conversation with Mohsen Mostafavi
1:13:22
Harvard GSD
Рет қаралды 6 М.
2023 EPFL Physics Day - Quantum Optomechanics
41:24
K-LAB
Рет қаралды 1,7 М.
Single cell transcriptomics - Differential gene expression and Enrichment analysis (8 of 10)
1:06:42
SIB - Swiss Institute of Bioinformatics
Рет қаралды 3,8 М.
Daniel Wegmann: Tracing the spread of farming into Europe using ancient DNA
54:56
SIB - Swiss Institute of Bioinformatics
Рет қаралды 2,4 М.
Build ANYTHING With AI Agents For FREE! (DeepSeek-R1 Beats ChatGPT)
21:43
The Paleomicrobiology
51:10
Université de Lausanne
Рет қаралды 44 М.