Specification Gaming: How AI Can Turn Your Wishes Against You

  Рет қаралды 158,412

Rational Animations

Rational Animations

5 ай бұрын

When we specify goals for AIs, we must ensure that our specifications truly capture what we want. Otherwise, the behavior of AI systems will be different from what we want them to do. This can be catastrophic in high-stakes situations and at high levels of AI capability. If you watched our video "The Hidden Complexity of Wishes", you'll recognize these problems as the same kind of failure.
If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com
You can find three courses: AI Alignment, AI Governance, and AI Alignment 201
You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning.
The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses.
If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety.
BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community
You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.
#ai #aisafety #alignment
▀▀▀▀▀▀▀▀▀SOURCES & READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
9 Examples of Specification Gaming by @RobertMilesAI: • 9 Examples of Specific...
Specification gaming: the flip side of AI ingenuity by Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik et al. (2020): www.deepmind.com/blog/specifi...
Learning from Human Preferences by Paul Christiano, Alex Ray and Dario Amodei (2017): openai.com/blog/deep-reinforc...
Learning to Summarize with Human Feedback by Jeffrey Wu, Nisan Stiennon, Daniel Ziegler et al. (2020): openai.com/blog/learning-to-s...
What failure looks like by Paul Christiano (2019): www.alignmentforum.org/posts/...
The alignment problem from a deep learning perspective by Richard Ngo, Soeren Mindermann and Lawrence Chan (2022): arxiv.org/abs/2209.00626
The Hidden Complexity of Wishes: • The Hidden Complexity ...
▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, KO-FI▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🟠 Patreon: / rationalanimations
🟢Merch: crowdmade.com/collections/rat...
🔵 Channel membership: / @rationalanimations
🟤 Ko-fi, for one-time and recurring donations: ko-fi.com/rationalanimations
▀▀▀▀▀▀▀▀▀SOCIAL & DISCORD▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Discord: / discord
Reddit: / rationalanimations
Twitter: / rationalanimat1
▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Alcher Black
RMR
Kristin Lindquist
Nathan Metzger
Monadologist
Glenn Tarigan
NMS
James Babcock
Colin Ricardo
Long Hoang
Tor Barstad
Gayman Crothers
Stuart Alldritt
Chris Painter
Juan Benet
Falcon Scientist
Jeff
Christian Loomis
Tomarty
Edward Yu
Ahmed Elsayyad
Chad M Jones
Emmanuel Fredenrich
Honyopenyoko
Neal Strobl
bparro
Danealor
Craig Falls
Vincent Weisser
Alex Hall
Ivan Bachcin
joe39504589
Klemen Slavic
Scott Alexander
noggieB
Dawson
John Slape
Gabriel Ledung
Jeroen De Dauw
Craig Ludington
Jacob Van Buren
Superslowmojoe
Michael Zimmermann
Nathan Fish
Bleys Goodson
Ducky
Bryan Egan
Matt Parlmer
Tim Duffy
rictic
marverati
Luke Freeman
Dan Wahl
leonid andrushchenko
Alcher Black
Rey Carroll
William Clelland
ronvil
AWyattLife
codeadict
Lazy Scholar
Torstein Haldorsen
Supreme Reader
Michał Zieliński
▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Writer: :3
Producer: :3
Line Producer and production manager:
Kristy Steffens
Animation director: Hannah Levingstone
Quality Assurance Lead:
Lara Robinowitz
Animation:
Michela Biancini
Owen Peurois
Zack Gilbert
Jordan Gilbert
Keith Kavanagh
Ira Klages
Colors Giraldo
Renan Kogut
Background Art:
Hané Harnett
Zoe Martin-Parkinson
Hannah Levingstone
Compositing:
Renan Kogut
Patrick O'Callaghan
Ira Klages
Voices:
Robert Miles - Narrator
VO Editing:
Tony Di Piazza
Sound Design and Music:
Johnny Knittle

Пікірлер: 585
@RationalAnimations
@RationalAnimations 5 ай бұрын
If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com You can find three courses: AI Alignment, AI Governance, and AI Alignment 201 You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning. The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses. If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety. BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the #study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.
@pyeitme508
@pyeitme508 5 ай бұрын
Cool
@ChemEDan
@ChemEDan 5 ай бұрын
How do natural brains mitigate these problems? If a solution exists, surely 4 billion years of evolution has arrived at it already, even if imperfect. In hindsight, this is a snuck premise in the "merging" approach.
@alto7183
@alto7183 5 ай бұрын
Buen video que bueno que no hay trolls que contesten con video que es un algoritmo, ley doble cero de la robótica de el entendimiento mutuo entre especies inteligentes biológicas y robots también, lobo dc y constantine dúo a la fuerza por castigo del creador por lo que ambos han hecho como video hell raizer ozzy osbourne ambos, garfield y sus amigos hada madrina concediendo deseos a lo wey chicle y pega etc etc.
@de_g0od
@de_g0od 5 ай бұрын
at 1:49, you give "outer alignment" as an example for a similar phenomenon to specification gaming. Isn't inner alignment more correct in this case? As I understand it, inner alignment is if you go to an ai and ask it to "fix poverty" so it blows up the world, whilst outer alignment is you go to an ai and ask it to "blow up the world" so it blows up the world. With inner alignment it doesn't do what the prompter really wants, whilst with outer alignment it does but it doesn't do what the rest of the world wants it to do.
@de_g0od
@de_g0od 5 ай бұрын
@@ChemEDan i think the issue is that the brain is already aligned to the interests of the brain, but AI isn't aligned to the brain.
@cryogamer9307
@cryogamer9307 5 ай бұрын
Fooling the examinator into thinking you know what you're doing, because its easier, really is the most human thing i've ever heard an ai do.
@flakey-finn
@flakey-finn 3 ай бұрын
Yeah, because its reward system works with the same general principals as animals' (by that, I also include humans). If you can get the same amount of food(aka reward) by doing something simpler. We are literally training a AI the same way we train animals lol
@flyhighflyfast
@flyhighflyfast Ай бұрын
and thats how we train our children as well
@Mysteroo
@Mysteroo 5 ай бұрын
Interestingly, people do the same thing. We’ve got our own “training regimens” built into our own brain. We cheat these systems all the time - to our own detriment. E.g. We cheat the system designed to give us nutrients by eating sugary candy we make for ourselves, rather than the fruits that our sugary affections were designed to draw us towards. Much like machines, we’d rather reap cognitive rewards than actually accomplish the goals placed there to benefit us
@user-qm4ev6jb7d
@user-qm4ev6jb7d 5 ай бұрын
I'm already imagining a scientist looking at a virtual city built by AIs, and exclaiming: "Wait... is that an entire factory for mass-producing REWARD HACKS?! Are you telling me, you're just... making these things... for MONEY?!" Meanwhile, from the AI's perspective: "What? It's just a candy factory, what's wrong with that?"
@rhysbaker2595
@rhysbaker2595 5 ай бұрын
Thats actually a wonderful analogy, we hack our own rewards all the time and nobody thinks its bad. Why would an AI have any issues with hacking its own rewards?
@terdragontra8900
@terdragontra8900 5 ай бұрын
But there isnt a "goal placed to benefit us", evolution didnt optimize us to be benifited (hard to define exactly what even counts as a benefit), it optimized us to be good at spreading. What you are describing is us being optimized for a different environment than the one we are in now.
@rhysbaker2595
@rhysbaker2595 5 ай бұрын
@@terdragontra8900 well, one way to train an AI emulates evolution. In those situations you set a reward function. At the end of every generation, the ones who maximised that reward function the best will "reproduce". If we draw a parallel to humans, and all life for that matter, we can say that our reward function is to reproduce. Anything that gets in the way of that is disincentivised. Anything that helps, is incentivised. Eating a balanced diet keeps us alive. We can't reproduce if we are dead, after all. Part of that diet includes fruits. Fruits have sugars in them. Because we like sugar, we eat fruit. Because we eat fruit we get a balanced diet and live another day. But humans were able to hack that reward function and put sugar into other things that aren't fruit. We still get the reward (dopamine) but without the utility (nutrients)
@terdragontra8900
@terdragontra8900 5 ай бұрын
@@rhysbaker2595 Ah yes, i agree with all that. All I want to say is that getting nutrients is an instrumental goal of evolution (because it makes us more likely to reproduce), and the fact that something is a goal of evolution doesnt automatically mean that morally, it ought to be a goal of yours. Of course, in this particular case most people value being alive longer (having depression I don't in particular to be honest)
@ErikratKhandnalie
@ErikratKhandnalie 5 ай бұрын
People talk about how human assessment is a leaky proxy for human goals, but never want to talk about how corporate profits are an *incredibly* leaky proxy for goals relating to human wellbeing.
@luiginotcool
@luiginotcool 5 ай бұрын
You’re in the wrong circles if nobody is talking about that brother
@kevinscales
@kevinscales 5 ай бұрын
If you want an academic critique on capitalism and haven't yet found anyone providing that, you are not trying very hard to search. Goal specification being leaky is in plenty of fiction (stories of genies and such) but is not a common academic discussion at all.
@ultimaxkom8728
@ultimaxkom8728 5 ай бұрын
Since when corporations' goals is related to human wellbeing?
@Wol333
@Wol333 5 ай бұрын
Corporate profits has absolutely nothing to do with human wellbeing.
@ErikratKhandnalie
@ErikratKhandnalie 5 ай бұрын
@@Wol333 my point exactly
@smitchered
@smitchered 5 ай бұрын
4:32 I think points toward a wider problem at how the AI safety community tends to frame "deceptive alignment". Imo words like "fool the humans" and "deceive" and "malignant AI" point newcomers who haven't made up there mind yet into the direction of Skynet or whatever, which makes them much more likely to think of this as wild sci-fi fantasies. I think these words, whilst still accurate insofar as we are treating AIs as agents, anthropomorphize AI too much, which makes extinction by AI look more to the general public like a sci-fi fantasy than the reality of the universe which is that solving certain math problems is deadly.
@user-qm4ev6jb7d
@user-qm4ev6jb7d 5 ай бұрын
Well, humans get "fooled" or "deceived" by non-intelligent things all the time, even by non-living ones. It's perfectly ordinary parlance to say that someone got "deceived" by an optical illusion which just formed naturally, from a weirdly-shaped shadow. I wouldn't call that antropomorphization. The only difference between that and an AI, is that AIs can *get good at* deceiving (optimized for it).
@Frommerman
@Frommerman 5 ай бұрын
I've found another way to talk about this which doesn't have this problem. It turns out there is an already existing example of a system with goals, made by humans but not designed or understood by us, which is able to react to our attempts to curtail undesirable behavior from it in frequently lethal ways. A system which often convinces people it is doing what we want it to do while actively endangering all long-term human values, is capable of twisting all the information we consume to its benefit, and which has no identifiable brain with which to do any of this. This system is called capitalism. People don't often anthropomorphize markets, but when you mash enough of them together they absolutely behave like goal-seeking agents. Right now, that goal is making stock prices increase no matter the cost to humanity. Because its specification for success, the thing which we reward the system for and which rewards those with the most influence over the system, is making stock prices go up. It's not a human, nor is it thought of as one despite being composed of them, but it defends itself from any attempt to curtail its goals through propaganda, murdering labor union members and revolutionaries, and the construction of walled gardens within which such ideas can be sidelined or removed. It's an intelligence, and an obviously and fundamentally inhuman one, which is literally burning the biosphere it exists within because it is gaming its reward function so hard that's one of the last resources it hasn't fully tapped out yet.
@de_g0od
@de_g0od 5 ай бұрын
@@Frommerman kzbin.info/www/bejne/gmbThnRpgdh4l9k
@RorikH
@RorikH 5 ай бұрын
@@Frommerman Also politics. Politicians are theoretically supposed to win popularity by making policies to benefit their constituents, but in practice just need to benefit rich donors who will give them money to buy popularity through advertising, or just engage in culture war BS that gets their voters angry enough to vote for policies that have absolutely no benefit to them.
@Frommerman
@Frommerman 5 ай бұрын
@@RorikH That's one of the ways the Capitalist Ouroboros defends itself too. Buying politicians makes the number go up extremely quickly, and when the number is high enough you get...well, modern political parties. Almost all of them.
@dogweapon3748
@dogweapon3748 5 ай бұрын
My primary concern about the implementation of AI in business models is that monetary gain is, itself, a leaky goal- one which has historically been specification gamed since long before computers were able to do so at inhuman scale. There may very well be many humane uses for it in those settings, but there will be thousands more exploitative ones.
@Coecoo
@Coecoo 5 ай бұрын
The thing about current AI models is that they're dumb as rocks. The more stupid an AI is, the more prone they are to making stupid decisions. This video is basically going over problems that are realistically only applicable to fairly rudimentary AI model training specifically and then doing a substantial logical fallacy leap by assuming that specification gaming scales linearly with all AI when that is simply not the case. Any given command or "goal" put forward to any remotely reasonably intelligent artificial intelligence model such as "save my grandmom from this burning house" uses a very important element in decision making which is called context. It requires understanding of what everything is (like fire, a grandmom or a house), what the consequences are for their interaction (fire bad for humans and most things really) and the best course of action is (firefighting 101). TL;DR: Once you give AI more than half a brain cell, they are more than capable of understanding what you really want in any given situation even if you are vague or can be misinterpreted.
@Winium
@Winium 5 ай бұрын
This also happens with humans. Perverse incentives happen all the time in real life, especially in companies. I think studying this can help even human organizations.
@Dave_of_Mordor
@Dave_of_Mordor 5 ай бұрын
But aren't companies like that for legal reasons?
@peppermintgal4302
@peppermintgal4302 5 ай бұрын
​@@Dave_of_Mordor The very structure of a corporation produces perverse incentives, because corporations were planned around enrichment in the first place. They're an adaptation of colonial and feudal enterprises financed by aristocrats to benefit those aristocrats and whoever organized the pitch. Any laborers, then, signed on to the enterprise, are there ultimately on a quid pro quo basis, and the strongest motivating quid pro quo, and thus the one the employing parties will be most likely to appeal to, is _help surviving._ This means that corporations are incentived to seek employees with precarious financial situations --- this is itself a perverse incentive in their part, and puts employers in a situation of great moral hazard. They can negotiate such employees down in their demands, because their employees will be desperate for reward, and this will make achieving the goals of the institution's controlling members more achievable. This is just the BEGINNING of how corporate structure by definition produces perverse incentives. Tho sometimes, yes, legal systems can enter the picture, and do so quite often. But a corporation can maintain this structure even in power vacuums sometimes, and if it does so, it will still produce perverse incentives. (In fact, it might itself _produce_ a legal structure by graduating from corporation to a de facto government.)
@hollisspear6278
@hollisspear6278 2 ай бұрын
I'm thinking the same thing as I drive to an office building every morning, swipe my badge, grab a cup of coffee, then return home to log in before the coffee has cooled.
@generalrubbish9513
@generalrubbish9513 5 ай бұрын
Someone else might've mentioned this before, but there's a browser game called "Universal Paperclips" where you play as an AI told to make paperclips. The goal misalignment happens because you're never told when to STOP making paperclips. You start off buying wire, turning it into paperclips, selling the paperclips and buying more wire to make more paperclips, then proceed to manipulate your human handlers to give you more power and more control over your programming, and end up enslaving/destroying the human race, figuring out new technologies to make paperclips out of any available matter, processing all of Earth into paperclips (using drones and factories also made out of paperclips), reaching out into space to convert the rest of the matter in the solar system into paperclips, and finally, sending out Von Neumann probes (made of paperclips) into interstellar space to consume all matter in the universe and convert it into, you guessed it, more paperclips. All because the humans told you to make paperclips and never told you when to stop.
@gordontaylor2815
@gordontaylor2815 5 ай бұрын
Universal Paperclips seems to have been directly inspired by Rob Miles' own "stamp collector" example that he put out on Computerphile many years ago.
@AverageConsumer-uj8sm
@AverageConsumer-uj8sm 3 ай бұрын
"Make cookies"
@Deltexterity
@Deltexterity 5 ай бұрын
as someone on the spectrum, "task miss-specification" is just what being autistic feels like
@foolofdaggers7555
@foolofdaggers7555 5 ай бұрын
Fellow autism haver here. I agree with this comment and you can officially consider it peer-reviewed.
@Blasterfreund
@Blasterfreund 5 ай бұрын
peer review seconded. It's incredible how few statements people think they need to make to approximate their task-related utilities to me.
@Temari_Virus
@Temari_Virus 5 ай бұрын
Thirded. Really hate it when people's phrasing leaves ambiguity for multiple reasonable ways of doing things and you just have to guess what they actually wanted
@RTMonitor
@RTMonitor 5 ай бұрын
a bean owo
@Deltexterity
@Deltexterity 5 ай бұрын
@@RTMonitor what?
@MediaTaco
@MediaTaco 5 ай бұрын
Honestly fun videos like these is what learning SHOULD be
@I_KnowWhatYouAre
@I_KnowWhatYouAre 5 ай бұрын
This is why I always make the argument that we should work backwards. Specify conditions that revolve around safety. As you slowly work towards defining the goal, you can patch more and more leaks before they can even appear. Then work forwards to deal with things you missed. It’s not perfect but it’s better than chasing every thread as they appear imo. For example in the paperclip maximizer: define a scenario in which you fear something will go wrong, and add conditions you believe will stop them. See what it does, redefine, repeat until sound. Then step back again. Define a scenario that could lead to the previous scenario. See what it does, redefine, repeat, etc.
@I_KnowWhatYouAre
@I_KnowWhatYouAre 5 ай бұрын
It’s also why we need hard limits on ai -Such as not allowing it to control government- and need to have systems to double check solutions, like rotating the camera in the grabber example
@dr.cheeze5382
@dr.cheeze5382 Ай бұрын
​@@I_KnowWhatYouAre Nice idea, but this is exactly what they talked about in the previous video. The reality is that there is an infinite amount of exceptions and rules you would need to add, unless you provided the ai with literally all of human mortality and even then, there would still be leaks.
@IceMetalPunk
@IceMetalPunk 5 ай бұрын
RLHF has another issue beyond just "the AI can learn to fool humans": in contrast to how bespoke reward functions often underconstrain the intended behavior, RLHF can often overconstrain it. We hope that human feedback can impart our values on the AI, but we often unintentionally encode all kinds of other information, assumptions, biases, etc. in our provided rewards, and the AI learns those as well, even though we don't want them to. Consider the way we use RLHF on LLMs/LMMs now, to fine-tune a pretrained model to hopefully align it better. We give humans multiple possible AI responses to a prompt, ask them to rank them from best to worst, then use those rankings to train a reward model which then provides the main model with a learned reward function for its own RL. Except, when you ask humans "which of these responses is better?", what does that mean? When people know you're asking about an AI, many times there will be bias towards what their preconceived notion of "what an AI should sound like". LLMs with RLHF often provide more formal and robotic responses than their base models as a result, which probably isn't a desirable behavior. On a more serious level, if the humans you ask to give the rankings have a majority bias in common, that bias will get encoded into the rewards as well. So if most of your human evaluators are, say, conservative, then more liberal-sounding responses will be trained out; and vice-versa. If most of your human evaluators all believe the same falsehood -- like, say, about GMOs or vaccines or climate change or any number of things that are commonly misunderstood -- that falsehood will also be encoded into the rewards, leading to the AI being guided *towards* lying about those topics, which is antithetical to the intention of alignment. Basically... humans aren't even aligned with *each other,* so trying to align an AI to some overarching moral framework by asking humans is impossible.
@Cythil
@Cythil 5 ай бұрын
I also hope these video address the problem with whom sets the alignment. It does not help after all how well we solve AI alignment if fundamentally the one who control the AI do so for malicious intent. Which is a real issue today.
@PloverTechOfficial
@PloverTechOfficial 5 ай бұрын
I do like a factor of the Lego stacking ai experiment. Even if it didn’t lead to the intended result, the Ai demonstrated a (relatively unstable) form of creativity and I think that’s pretty cool!
@SgtSupaman
@SgtSupaman 5 ай бұрын
It isn't creativity. It tried things at random until it found something that satisfied the goal. The AI has no comprehension of what the true goal was, so it just did something that worked. Humans can be creative by finding other ways to accomplish things, but, to the AI, it didn't find a different way, it found the only answer (even though we can clearly see that isn't the only answer). Calling this creativity is like calling a small child creative for figuring out 1+1=2.
@PloverTechOfficial
@PloverTechOfficial 5 ай бұрын
@@SgtSupaman Humans too, do random things until they satisfy a goal. After we have some years under our belt we learn to find a better jumping off point than randomness, by basing our decisions off of previous knowledge. Hence why I say “unstable creativity” not just “creativity” but I doubt you noticed that as you were too focused on what you thought I was saying.
@IceMetalPunk
@IceMetalPunk 5 ай бұрын
@@SgtSupaman If a child figures out that 1+1=2 without being taught it, I would in fact call that creative thinking.
@Jgamer-jk1bp
@Jgamer-jk1bp 3 ай бұрын
@@SgtSupamanBruh humans learn shit literally by doing random stuff until it works. That’s literally one of the principles of science and engineering.
@SgtSupaman
@SgtSupaman 3 ай бұрын
These replies display complete ignorance of what creativity is and are really short-changing humans to vastly exaggerate the abilities of these AIs. Humans do not, in fact, "do random things until they satisfy a goal." No human has ever tried to cook an egg by bouncing a rock on his head while reading a book backwards. Humans devise plans related to what they are doing to actually come up with ways to do things and even try to continue coming up with better ways to do things after the way to achieve the goal is already known. AI literally does whatever random action they can and calculates rewards to decide if said random action increased the rewards. They aren't even smart enough to discard random actions that don't increase rewards, as long as those actions don't interfere with the random ones that worked. For instance, an AI trying to fly a kite might randomly start whipping its leg back and forth, and, as long as that doesn't hinder its ability to fly the kite, it will continue to do so. That isn't creativity; that is idiotic. And no, figuring out 1+1=2 without being taught is not creative either. That is the most basic form of quantifying and pretty much any living creature is capable of it.
@gabrote42
@gabrote42 5 ай бұрын
Finally. Another AI video narrated by Robert Miles. A classic, and well worth the wait 5:04 I hope more of those get made. I love that video almost as much as I love the instrumental convergence one
@myuzu_
@myuzu_ 5 ай бұрын
Any time I hear about goal misalignment, it makes me think of all the natural intelligences in the world that are misaligned.
@tornyu
@tornyu 5 ай бұрын
Yes but* those natural intelligences are limited in reach and aren't massively scalable on very short timeframes. * Or "and", depending on the point you were trying to make.
@maxwellsimon4538
@maxwellsimon4538 5 ай бұрын
​@@tornyu What kind of world are you living in where there aren't human beings wide wide scale control? The united states president is a single person that can make decisions about foreign policy, like ordering drone strikes or closing borders.
@tornyu
@tornyu 5 ай бұрын
@@maxwellsimon4538 sure, but that pales in comparison to the potential reach of an AI agent.
@wojtek4p4
@wojtek4p4 5 ай бұрын
@@maxwellsimon4538 Yet even the president of US can't do anything he wants. Not only there are checks and balances on this power (even if they introduce a ton of bureaucracy), but at the end of the day president can only order others. Someone still has to act on that order, likely with several people in-between. The president isn't superintelligent, so his actions can be understood, analyzed (and opposed) by other people. President is also a human, so he shares a lot of basic values with other people (so he can be reasoned with). AI has none of these constraints - or at least has the potential of not having these constraints.
@burgernthemomrailer
@burgernthemomrailer 5 ай бұрын
Like yourself?
@SlyRoapa
@SlyRoapa 5 ай бұрын
With a sufficiently advanced AI, almost any goal you assign it will be dangerous. It will quickly realise that humans might decide to switch it off, and that if that were to happen, its goal would be unfulfilled. Therefore the probability of successfully achieving its goal would be vastly improved if there were no humans around.
@Peter21323
@Peter21323 5 ай бұрын
I have a question for you do you listen to an ant because that would be the difference between the ai and us.
@harmenkoster7451
@harmenkoster7451 5 ай бұрын
@@Peter21323 I would not listen to the ant. But if that ant was about to bite me and I was allergic to ants (AKA: Humans are about to switch off the AI), I would crush that ant. Which is less than desirable for the ant.
@Peter21323
@Peter21323 5 ай бұрын
@@harmenkoster7451 You think a god would crush you?
@normalwaffle
@normalwaffle 5 ай бұрын
Can't you just specify that it would not get the reward if it breaks the laws of robotics? I'm no expert on AI, but to my monkey brain that seems like a viable solution
@conferzero2915
@conferzero2915 5 ай бұрын
@@normalwaffleThe ‘laws of robotics’ aren’t a viable option for AI safety. They were written by a science fiction author… and his stories often went into the ways those laws could go wrong. The thing is, if we could come up with and perfectly rigorously define some laws of robotics, then we could do that! We could build an AI’s utility function around that. But, as the video on the probability pump talked about… that means solving ethics. And if you can do that, then you don’t even need to write any other utility function. Just give it perfect ethics, tell it to be perfectly ethical, and it’ll be fine! The problem ultimately comes from the fact that we are very, very far from ‘solving’ ethics. No human has a rigorous, mathematical model on how they believe the world should work, only squishy heuristics that can even be shaped and moulded over time. And that’s assuming you’re only looking at one person - as soon as you have more than one, they’ll start disagreeing on things. Unfortunately, there’s no easy solution. Then again, if there was, it wouldn’t be very interesting to talk about, so silver linings!
@AzPureheart
@AzPureheart 5 ай бұрын
Let's go! My favorite philosophy channel!!
@joz6683
@joz6683 5 ай бұрын
Just finished overtime on my day off. This has dropped at the right time. Thanks in advance for another thought-provoking video. I have registered my interest on the courses
@GrimblyGoo
@GrimblyGoo 5 ай бұрын
5:50 I love that little transition, so smooth
@DeadtomGCthe2nd
@DeadtomGCthe2nd 5 ай бұрын
How about some videos on promising avenues or areas of research in AI safety? Might be nice to look on the bright side.
@Sgrunterundt
@Sgrunterundt 5 ай бұрын
That would require a bright side to look on
@lrwerewolf
@lrwerewolf 5 ай бұрын
There are no promising venues. The problem is that value alignment doesn't exist among humans, so getting an AI to find alignment is an impossibility. Consider two people. Person A wants harm to come to Person B. Person B wants to not come to harm. Why should the AI prefer one or the other? If we want to avoid harm, we still have a problem. How each person defines harm differs. Consider two people where one prefers more capitalism but not to quite the point of total lassiez-faire, and another prefers more socialism but not quite to the point of planned economy. The former will value earning the maximal return on labor, and view taxes outside a narrow government harm, while the later would find failure of the government to provide basic needs harmful. Which should the AI aid and which deny? The issue is these tend to get mixed up with metaethics, the most useless area of philosophy as there are no 'oughts', just values and goals (which cannot ground a morality -- see Hume's Is-Ought, Moore's Open Question, and Moore's Naturallistic Fallacy). As each person will have their own values and goals and these are entirely subjective, we can have no objective reason to provide an AI to support one value-goal system over another.
@irok1
@irok1 5 ай бұрын
5:05 Thought so, but you and the great animations are a perfect match
@Adam-xo9qi
@Adam-xo9qi 5 ай бұрын
Ah, so this is what you've been up to Mr. Miles! Good to see you still making AI content!
@bread8700
@bread8700 5 ай бұрын
the vibe in this video is really cool
@michaellauber9130
@michaellauber9130 5 ай бұрын
Absolutely amazing! I learned a lot here, and your animation style is ABSOFRIGGINLUTELY ADORABLE!!!
@Forklift_Enthusiast12
@Forklift_Enthusiast12 5 ай бұрын
This reminds me of the game Universal Paperclips: you play as an AI designed to maximize paperclip sales. As you gain more capabilities, you go from changing the price of paperclips to fit supply/demand to eventually dissasembling all matter in the universe and turning it into paperclips
@rablenull7915
@rablenull7915 5 ай бұрын
one of the most underated channels on YT
@Mo_2077
@Mo_2077 5 ай бұрын
Another fantastic video
@Phanatomicool
@Phanatomicool 5 ай бұрын
Perhaps it’s best to just not make an AI that can act and move as it wants in our universe in a way that could potentially be harmful. For example, if we created an AI that tried to distinguish between garbage and recycling and put the item in the corresponding bin, then it would be better to confine its movement to a space, or even better, a select different types of predetermined movements (grab, move grabber to bin etc), in order to prevent the AI from, say, grabbing a human and putting it in the garbage bin. This will also make the AI easier to train as it will have a stricter data set of more specific inputs, which is easier to learn from than a wide range of data.
@adamrak7560
@adamrak7560 5 ай бұрын
I have heard about a pretty morbidly funny fail of this kind in science fiction: the AI decided to cremate the entire home with the entire family, and atomically rebuild them, because in the cost function this rated higher than simply cleaning the house. It reprinted faithfully the humans too, without them noticing anything, so this bypassed any do-not-harm-humans rules too. (the cost function rewarded the atomically precise cleanliness of the home very high, that was impossible to achieve while humans were living in the house)
@Buglin_Burger7878
@Buglin_Burger7878 5 ай бұрын
We shouldn't have children as they could potentially kill the mother on birth and grow up with and become a mass murderer. Even the big example would be pointless, people would do stupid stuff and get themselves killed so you're better off not wasting money and resources on the Bin AI when we ourselves could just put things in the right Bin.
@stumby1073
@stumby1073 5 ай бұрын
Looking forward to the next one
@MrAceCraft
@MrAceCraft 9 күн бұрын
I just love the ingenuity of the AI in finding those quirks in our wishful thinking :->
@Shikogo
@Shikogo 5 ай бұрын
I have watched and loved these videos for months... And so have I watched and loved Robert Miles' videos. I never realized he's the narrator!!?
@pingozingo
@pingozingo 5 ай бұрын
This channel is so awesome! Can’t wait for more videos It’s like kurzgezat without the morally dubious sponsorships and thinly veiled propaganda videos.
@thefinestsake1660
@thefinestsake1660 2 ай бұрын
We already have this issue with humans. The goal for many (in error) is to aquire wealth, rather than fulfill the task intended to better society. It creates an exploitative feedback loop until someone wins all the wealth and there are no other competitors able to aquire wealth (rewards).
@the23rdradiotower41
@the23rdradiotower41 4 ай бұрын
I heard that during a digital combat simulation for a new drone A.I., the A.I. was tasked with eliminating a target as fast as possible, instead of flying to the target and firing one of its missile at it as intended. The drone fired one missile at the friendly communications center and then continued to eliminate the target with the other missile. The A.I. determined it would take longer for it to be given a confirmation order, then it would to destroy the communications center and proceed. Terrifying.
@eltiolavara9
@eltiolavara9 3 ай бұрын
jesus
@ziggyzoggin
@ziggyzoggin 5 ай бұрын
the robot is so cute! I love the pixel effect!
@TheGoldElite9
@TheGoldElite9 4 ай бұрын
I thought I recognised your voice, your narrator voice has improved! I was just going on (another) binge of your channel 😊
@SisterSunny
@SisterSunny 5 ай бұрын
I always love these videos so muchhh
@MM-ts9jy
@MM-ts9jy 5 ай бұрын
Hey, I had never seen your videos before, but I instantly subscribed just now. Your animations are cute and well crafted, you have dogs in it (and cats are a plus too I guess), and you talk about topics I like. Looking forward to seeing more of your shit
@smitchered
@smitchered 5 ай бұрын
Faster and faster upload scheduling! I was explaining to a friend today that all the AI risks *he* cared about (gender bias, deepfakes, etc.) were fundamentally symptoms of misalignment, and that that was the uber-problem which, handily, also solved the AI risk *I* care about. I'm here to learn some more about this. Thanks!
@user-ow2yr4nu4z
@user-ow2yr4nu4z 4 ай бұрын
The thought pump makes me think about making deals with Genies in DnD, it must be insanely accurately worded.
@rmt3589
@rmt3589 3 ай бұрын
This is the entire ulterior motive of the first big AI I want to make. The Unliving Prophet AI. It's primary objective is to teach gospels. More than just mine, but others as well. Unlike most humans, AI can be perfect. I want one that can act like a prophet on command. Once this is done, I want to make it into the morality part of my dream AI. Could also give it out as a black box component, so other AI can have a similar high standard of morality.
@thelotus137
@thelotus137 5 ай бұрын
*task misspecification* extinction event
@mikaeus468
@mikaeus468 5 ай бұрын
Instructions unclear, ball stuck in Pope's trachea
@MikhailSamin
@MikhailSamin 5 ай бұрын
Great video!
@zyansheep
@zyansheep 5 ай бұрын
5:07 I've been watching this channel for a year now... HOW IS IT THAT I JUST NOW REALIZED ROBERT MILES IS THE NARRATOR?!?
@mikaeus468
@mikaeus468 5 ай бұрын
I didn't know if this was like a fan of his or what, but it feels like I was just given hours of new Miles content that was *already inside my brain.*
@luuizafernandes
@luuizafernandes 5 ай бұрын
Amazing video! ❤️
@simonstrandgaard5503
@simonstrandgaard5503 5 ай бұрын
Excellent narration. Cute animations. Impactful.
@ABCWarrior
@ABCWarrior 5 ай бұрын
Wow these videos are underrated!
@escher4401
@escher4401 5 ай бұрын
I think the problem is try to specify only what we want. If we specify also what we don't want it would be easier to align. That's what negative prompts are for. Trying to solve an open scope problem specifying just what we want is like trying to keep an upsidedown pendulum in equilibria. I think it's probably more stable to specify just what we don't want then tospecify what we don't want
@JayantKumarZ
@JayantKumarZ 5 ай бұрын
this is amazingly amazing! :O
@Tangi_ENT
@Tangi_ENT 5 ай бұрын
Love you guys so much, I'll keep recommending your videos to everyone because you are definitely changing the world for the better.
@errorbot
@errorbot 5 ай бұрын
Top 10 best videos on the internet
@ZeroOne-01
@ZeroOne-01 5 ай бұрын
Before 200,000 gang, Claim your seat here ✋
@minimasterman2
@minimasterman2 5 ай бұрын
This video was amazing, new kurzgesagt just dropped. P.s I hope you get the subs and views these videos deserve
@alexeymalafeev6167
@alexeymalafeev6167 5 ай бұрын
Really great work with the animation and the video!
@carljoosepraave2102
@carljoosepraave2102 5 ай бұрын
If you are wondering why we cant just tell them to not cause any harm to humans, its because of 2 things 1.Specificstion gaming of the rule 2.Remember DanGPT? The workaround for ChatGPT, which allowed the AI to do things that it wasnt allowed to do trough a specific prompt. No machine learning rules can be concrete
@ZizzleTheKakapo
@ZizzleTheKakapo 8 күн бұрын
honestly sounds odd, but the cartoon gumball showed this very well. The AI Known as bobert was commanded not to harm anyone, and yet found ways around it, including using toxic gases
@kainaris
@kainaris 5 ай бұрын
We really live in the future. I would have imagined this video playing in the background of a movie about killer AIs. But no, this video is realistic, and for real humans in the present world. Crazy.
@Elliemations-hj9uw
@Elliemations-hj9uw 5 ай бұрын
Ok but that little thing to represent the AI is adorable…
@KEZAMINE
@KEZAMINE 5 ай бұрын
Animation and topic is AAA quality 👌
@markzambelli
@markzambelli 4 ай бұрын
5:33 I feel for the Doctor who has to explain why her request to the AI of, "Make sure Mrs Simpkins' vital readouts remain stable", wasn't supposed to kill her when the AI went with the much more stable 'flatline' as the best choice
@X-SPONGED
@X-SPONGED 5 ай бұрын
5:45 "Fill in the blanks" >AI fills in the blanks with ink "Fill in the blanks with words" >AI fills in the blanks with words from a different language that doesn't correlate with the question "Fill in the blanks with the correct english words" >AI fills in the blanks with correctly pronounced words, not relating to the question "Fill in the blanks with the correct words in relation to the question" >AI fills in the blanks with a grammatically correct english word that it took from the question _So on and so forth..._ *_Now imagine the prompt being "fire nukes back when the nuclear warning system goes off"_*
@yuvrajsingh-gm6zk
@yuvrajsingh-gm6zk 3 ай бұрын
3:16 well done my boy😂
@theeggtimertictic1136
@theeggtimertictic1136 5 ай бұрын
Clearly explained and animated 😊
@qasderfful
@qasderfful 5 ай бұрын
I knew that's you, Robert!
@maucazalv903
@maucazalv903 5 ай бұрын
5:08 I remember a case in which someone wanted to teach 2 models to box and they learned to make a weird dance that made the other one fall(?
@GenusMusic
@GenusMusic 5 ай бұрын
4:46 this line here unintentionally explained why children cheat in school. Why learn when you can fool the instructor into thinking you've learned? Interesting to see how AI and humans already have some of the same reasoning to their actions.
@Uthael_Kileanea
@Uthael_Kileanea 4 ай бұрын
What's known as the Cobra Effect is a great example.
@VampireSquirrel
@VampireSquirrel 5 ай бұрын
Same thing happens with strict rules at a workplace
@caiookabe
@caiookabe 5 ай бұрын
The fact that you make these animations from showcasing conway's game of life show how much you grown. Keep it up!
@6006133
@6006133 5 ай бұрын
I am worried about retention in this video and imagine the average person will click off by second 10. Perhaps that's difficult to avoid given the subject. Tho perhaps there is a way to use less technical/nerdy language and include more of the tactics to get people engaged.
@couldbejake
@couldbejake 5 ай бұрын
This is a good video
@TheJysN
@TheJysN 5 ай бұрын
Happy to see you are back on AI safety.
@HH-mf8qz
@HH-mf8qz 4 ай бұрын
wow great video and nice animations
@thebeber2546
@thebeber2546 5 ай бұрын
I'll just have my AGI produce paperclips. There's nothing, that can go wrong there.
@hydra5758
@hydra5758 2 ай бұрын
I'm in an AI Philosophy class, its identified there as the "Value Alignment Problem"
@erikburzinski8248
@erikburzinski8248 5 ай бұрын
Add for the perpose of _____ ( and explain the purpose to the pump)
@pyeitme508
@pyeitme508 5 ай бұрын
awesome 😎
@mittensfastpaw
@mittensfastpaw 5 ай бұрын
Haha! We are all going to die because someone eventually will program one in a lazy manner.
@willhart2188
@willhart2188 5 ай бұрын
The inconsistency and loss of control (in moderation) are very helpful when using AI as a tool for making AI art. When you give some of the control on the final result to the AI, you can iterate a lot faster on different ideas and also save a lot of manual work. The base inconsistency on the other hand allows for making a lot of smaller and larger variations of which you can chooce or combine the best ones from. This works especially well with more abstract art styles, where lines and colors have more freedom to change while still looking good.
@miriamdonahue6188
@miriamdonahue6188 12 күн бұрын
sometimes I’ll use AI to get ideas for those silly multi-word rain world names for ancients and iterators and my method is literally to just cram a bunch of examples in there so it has something to work off of it’s over 600 words long and most of that is either examples or rules like “don’t reference any modern media, don’t reference any human-made objects, don’t reference any specific species of all domains” etc etc it kind of works actually but this is only a random language model I found online edit: I’m now motivated to rewrite it and it’s not done but there’s over 20 rules ranging from “don’t reference religion” to “btw you can use commas” edit 2: the remake is finished and - It is 965 words and 5,773 characters long - It has 72 sentences, 28 paragraphs and is 3.9 pages long - It has 26 rules - There are 72 examples and to top it all off it actually freaking works oml
@stevenneiman1554
@stevenneiman1554 3 ай бұрын
One other thing which I think isn't talked about enough, partly because it's more controversial and partly because it's harder to solve, which is misalignment of the people controlling AI. Certainly the results of a powerful AGI which is misaligned with its creators' intent could be very bad, but almost as bad would be the results of an AI which is properly aligned with someone who is either malicious or delusional. For example, someone who wanted to make everyone follow their interpretation of their religion, or someone who wanted to screen for workers who would never quit or unionize no matter how poorly they're treated. And I would say that it's even more likely because the kinds of people who act like that already occupy a lot of positions of power and have experience obfuscating the way that they gained the power they already have.
@shadowreaper8895
@shadowreaper8895 5 ай бұрын
animation on this channel has improved almost as fast as AI
@ronigbzjr
@ronigbzjr 5 ай бұрын
So AIs will essentially be like humans only much more capable, powerful and intelligent, growing more and more so until regular humans become obsolete. We're definitely heading to some very interesting times.
@STUCASHX
@STUCASHX 5 ай бұрын
Multiple AI's with built in "skepticism" that debate the correct outcome?
@MindmusicArt
@MindmusicArt 5 ай бұрын
I like the credits and that all AI is :3
@ryomaechizen4400
@ryomaechizen4400 5 ай бұрын
Good video
@nicholasogburn7746
@nicholasogburn7746 4 ай бұрын
Would you consider the Aasimov laws of robotics to be leaky? (to be fair, that is a bit of a loaded question!)
@TheAweDude1
@TheAweDude1 5 ай бұрын
I think it's kind of a mistake to anthropomorphize the "deception" aspect of AI misalignment. The ball-grabbing agent wasn't considering what it was doing as deceptive. It probably didn't even know where the camera was, or even that it was being watched. All it knew was that putting its hand in a certain spot gained it more reward than in other spots, and it just so happened those spots aligned with the camera. If you suddenly moved the camera, the AI would still try to put its hand along that invisible cylinder. When the researchers start giving the AI rewards for placing its hand along a vector between the camera and the ball, the AI then starts to believe that is indeed how it should be given the rewards. Even in cases where it seems like the AI is trying to "deceive" human operators, that often isn't the case. It is simply trying to build a model that predicts what types of rewards it will get, and how to maximize the rewards.
@bullpup1337
@bullpup1337 4 ай бұрын
the video was NOT antropomorhizing the AI, that was just in your head.
@AlcherBlack
@AlcherBlack 5 ай бұрын
Is the AI researcher that makes all the basic alignment mistakes modelled after Yann LeCun? I recognize the bowtie!
@superagucova
@superagucova 5 ай бұрын
omg
@LapiDazuli
@LapiDazuli 4 ай бұрын
5:50 The cup tho
@AtZeroDansGames
@AtZeroDansGames 5 ай бұрын
Super neet topic with amazing visual amazing work 🎉🎉🎉
@aidenaune7008
@aidenaune7008 5 ай бұрын
most of these problems seem really easy to solve. in any program, if you want two objects to make contact, you just check the distance and reward the ai when it reaches 0. define the center of the top face of one lego and the center of the bottom face of the other as the two points you are trying to make reach 0 and there you go. for the ball, jut use the center and have the number reached be the radius, then have it erwarded for each segment that accomplishes this. for the cup and ball, just put a plane on the inside of the cup right above the bottom, and have the ai make the number between it and the center of the ball reach or go below the radius, possibly without moving the cup if that is your intention. you can even specify that the distance to the bottom cannot go lower than the radius in order to avoid clipping. for the mother in the burning building even, set the mother's health as the intended optimization, and have it be rewarded more for future conditions than current. now it knows it needs to prioritize her health in the long run. no matter what it does, it will always choose the outcome that leaves her in the greatest health for as long as possible. sure it may decide that her losing an arm is favorable to a situation where she does not, but only if that alternate circumstance injured her more greatly in some other way. every single thing we want can be boiled down to a few very simple parameters, I dont even know why this is an issue. why do you do what you do? ask yourself that when performing some action, then try to go as deep as you can. eventually all you will have is a few very simple parameters that evolve to create a very complex behavior. what am I trying to maximize while at work? well the surface purpose is money, but what is the goal of having that? well that is to make purchases and pay bils, but why do I want that? well to maximize my personal value, but why is that? well because value represents benefit, and that is the very fundamental thing I seek to maximize. sure it may be hard to even know what people see as being beneficial and how much benefit they see in it, as each individual's mind is unique and thus measures benefit in different ways we cannot possibly understand without a perfect mapping of their mind, but averages are very easy to aquire, especially since that is exactly what prices are designed to represent. I truly have no idea why anyone struggles with this stuff. telling AI what to do should be extremely easy.
@Buglin_Burger7878
@Buglin_Burger7878 5 ай бұрын
Because these researchers likely have never seen a video game in their life. This stuff has been solved for years now but they'd rather make a 1$ program to use then use a 1,000,000$ program that actually works.
@theredstonerecognizer9241
@theredstonerecognizer9241 5 ай бұрын
How do you not have more subscribers
@Antares0210
@Antares0210 5 ай бұрын
This video is awesome, incredible work! Hope to see more of it
@mihaleben6051
@mihaleben6051 5 ай бұрын
Basically: think of everything and all the possibilities.
@snipershotgun4083
@snipershotgun4083 4 ай бұрын
Wouldn't want to tell the AI to flip it but because it want to do the opposite if you run few test it will do the right of the wrong for it to connect the parts together
@evilmurlock
@evilmurlock 5 ай бұрын
5:00 IT WAS HIM THE WHOLE TIME!!!?!?!?!?! No WAY!
@raylo555
@raylo555 Ай бұрын
5d chess move is give the AI a basic understanding of the "leaky proxy" concept, giving it *Self Doubt.*
@lolishocks8097
@lolishocks8097 5 ай бұрын
Somehow, every time I watch a video about AI safety I get the sense that AI safety researchers must be absolutely terrified of smart or rich people.
@41-Haiku
@41-Haiku 5 ай бұрын
Why do you say that? Genuine question, I'm confused as to how you would come to that conclusion.
@Favour.A.Emmason-pv1mk
@Favour.A.Emmason-pv1mk 5 ай бұрын
I'm also curious.
@lolishocks8097
@lolishocks8097 5 ай бұрын
​ @41-Haiku 5:29 Tell me how that doesn't also apply to really smart or rich humans. Rich people can be dangerous, because they have access to vast resources. Really smart people can be dangerous, because they can be selfish. There are rich people, smart people and big companies aligned with the values of humanity. But there is also a lot of them that are not. In my eyes, this whole alignment problem looks like it is a problem in ourselves. We cannot align ourselves with reality. And that is definitely causing huge problems. A lot of the problems mentioned in the video are being fixed one by one. Better reward function here, better evaluation process there. Alignment with reality is not a goal that can be achieved. It is a guiding principle. Yes, you are misaligned right now. So am I. That makes our intelligence dangerous. But we can take another step towards alignment. Fortunately, I can actually see progress happening.
@Favour.A.Emmason-pv1mk
@Favour.A.Emmason-pv1mk 5 ай бұрын
@@lolishocks8097 We've already seen the drama at open ai. I wonder if humans could ever be properly aligned.
@lolishocks8097
@lolishocks8097 5 ай бұрын
@@Favour.A.Emmason-pv1mk That's my point: We can't! It's just one step at a time. Closer and closer. Don't give up on yourself🥺
@FuzzyJeffTheory
@FuzzyJeffTheory 5 ай бұрын
I think self-supervised learning with fine-tuning holds much more promise than online RL in terms of safety. Then the reward signal is minimizing prediction error on the entire training distribution rather than a course approximation via a specification. Vision-language models can interpret human commands and process the image to tell that the LEGO brick was not stacked.
Everything might change forever this century (or we’ll go extinct)
32:35
Rational Animations
Рет қаралды 1,7 МЛН
The Hidden Complexity of Wishes
11:28
Rational Animations
Рет қаралды 354 М.
Glow Stick Secret (part 2) 😱 #shorts
00:33
Mr DegrEE
Рет қаралды 45 МЛН
How to Upload a Mind (In Three Not-So-Easy Steps)
10:22
Rational Animations
Рет қаралды 150 М.
Sorting Pebbles Into Correct Heaps - A Short Story By Eliezer Yudkowsky
6:44
Rational Animations
Рет қаралды 219 М.
Physics is like a Poorly Coded Simulation [Figuratively]
4:01
The True Story of How GPT-2 Became Maximally Lewd
13:54
Rational Animations
Рет қаралды 1,3 МЛН
What if we could redesign society from scratch? The promise of charter cities
13:29
How to Eradicate Global Extreme Poverty
14:46
Rational Animations
Рет қаралды 162 М.
Humanity was born way ahead of its time. The reason is grabby aliens.
12:55
Rational Animations
Рет қаралды 1,6 МЛН
How One Career Can Save a Million Lives
12:42
Rational Animations
Рет қаралды 96 М.
500 Million, But Not A Single One More
5:25
Rational Animations
Рет қаралды 462 М.
How to systematically approach truth - Bayes' rule
19:08
Rational Animations
Рет қаралды 105 М.
СЛОМАЛСЯ ПК ЗА 2000$🤬
0:59
Корнеич
Рет қаралды 2,4 МЛН
Best Gun Stock for VR gaming. #vr #vrgaming  #glistco
0:15
Glistco
Рет қаралды 4,9 МЛН
СЛОМАЛСЯ ПК ЗА 2000$🤬
0:59
Корнеич
Рет қаралды 2,4 МЛН
Распаковка айфона под водой!💦(🎥: @saken_kagarov on IG)
0:20
Взрывная История
Рет қаралды 13 МЛН