DSI Seminar Series | How Could We Design Aligned and Provably Safe AI?

Рет қаралды 176

Күн бұрын

On April 19, 2024, Dr. Yoshua Bengio presented “How Could We Design Aligned and Provably Safe AI?” His talk was co-sponsored by LLNL’s Data Science Institute and the Center for Advanced Signal and Image Sciences. A Turning Award winner, Bengio is recognized as one of the world’s leading AI experts, known for his pioneering work in deep learning. He is a full professor at the University of Montreal, and the founder and scientific director of the Mila - Quebec AI Institute. In 2022, Bengio became the most-cited computer scientist in the world.
Evaluating the risks with a learned AI system statically seems hopeless because the number of contexts in which it could behave is infinite or exponentially large and static checks can only verify a finite and relatively small set of such contexts. However, if we had a run-time evaluation of risk, we could potentially prevent actions with an unacceptable level of risk. The probability of harm produced by an action or a plan in a given context and past data under the true explanation for how the world works is unknown. However, under reasonable hypotheses related to Occam's Razor and having a non-parametric Bayesian prior (that thus includes the true explanation) it can be shown to be bounded by quantities that can in principle be numerically approximated or estimated by large neural networks, all based on a Bayesian view that captures epistemic uncertainty about what is harm and how the world works. Capturing this uncertainty is essential: The AI could otherwise be confidently wrong about what is “good” and produce catastrophic existential risks, for example through instrumental goals or taking control of the reward mechanism (wrongly thinking that the rewards recorded in the computer are what it should maximize). The bound relies on a kind of paranoid theory, the one that has maximal probability given that it predicts harm and given the past data. The talk will discuss the research program based on these ideas and how amortized inference with large neural networks could be made to estimate the required quantities.
LLNL-VIDEO-865371