More Is Different for AI - Scaling Up, Emergence, and Paperclip Maximizers (w/ Jacob Steinhardt)

  Рет қаралды 19,920

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер
@YannicKilcher
@YannicKilcher 2 жыл бұрын
0:00 Introduction 1:10 Start of Interview 2:10 Blog posts series 3:56 More Is Different for AI (Blog Post) 7:40 Do you think this emergence is mainly a property from the interaction of things? 9:17 How does phase transition or scaling-up play into AI and Machine Learning? 12:10 GPT-3 as an example of qualitative difference in scaling up 14:08 GPT-3 as an emergent phenomenon in context learning 15:58 Brief introduction of different viewpoints on the future of AI and its alignment 18:51 How does the phenomenon of emergence play into this game between the Engineering and the Philosophy viewpoint? 22:41 Paperclip Maximizer on AI safety and alignment 31:37 Thought Experiments 37:34 Imitative Deception 39:30 TruthfulQA: Measuring How Models Mimic Human Falsehoods (Paper) 42:24 ML Systems Will Have Weird Failure Models (Blog Post) 51:10 Is there any work to get a system to be deceptive? 54:37 Empirical Findings Generalize Surprisingly Far (Blog Post) 1:00:18 What would you recommend to guarantee better AI alignment or safety? 1:05:13 Remarks
@doppelrutsch9540
@doppelrutsch9540 2 жыл бұрын
An post that really made things click for me properly into place in regards of the "failure modes" of GPT and other Language Models is the "Simulators" post on Less Wrong. People are way too hung up thinking of GPT-3 as an agent that "wants" to write a certain text. It's not. As a language model it's a reflection of the overall statistical properties of all the language it encountered and when writing it samples from that. What this means is that when GPT-3 appears to write as a character of some kind that is basically a simulation of that character that exists within GPT-3 but it's not GPT-3 itself. You always only ever interact with these simulacra of sorts, GPT-3 always takes on a persona that is what seems most likely based on the text before. Seeing the difference between the language model and the character it is simulating right now helps clearing up how a lot of behavior works and prompting.
@monad_tcp
@monad_tcp Жыл бұрын
That's one of the reasons I think that trying to censure the output is misguided, its pointless, if the model produced the output, its a valid output that was statistically important in the input, the model was just simulating it, it is not "it" , if you use the internet as input, then the tool is only searching indirectly the internet. We don't censor search engines, right ? (I think we do, and that's a problem). Ethics in AI is coming from the wrong point of view, the machine only reflects humans, the user is what decides what's valid, if the AI says racist shit, for example, its because you asked for it. I don't think this is a failure, and if the user posted or got offended by it, then its the user's problem. I just want the tools to be tools, that's it. My ethics is that the user has the responsibility and power to use the tool, it has the freedom, by that also says it has the responsibility to ignore the eventual garbage. But I think this position is a "wrong think" nowadays, but I'm old (I'm 30), I'm from a time that machines did what their owner wanted, well, GPT doesn't run under my desk, I don't really own it, I think this is my problem. Perhaps I need to wait 5 years for the hardware to become better.
@kraba1081
@kraba1081 2 жыл бұрын
51:10 Check out the "We Were Right! Real Inner Misalignment" video by Robert Miles and preceding related videos, for more info on experimental work on getting systems to be misaligned in this way. One model even fooled the interpreter, for some unknown reason, would love to know if there's any news on why this happened btw. Obviously there is some distributional shift between training and validation for all of those examples, so not quite what Steinhardt was looking for, but like he pointed out, there kind of has to be for an AI to even be able to figure out whether it's in "deploy" or "training" stage at all. Indeed one proposed approach to AI safety is foregoing the search for a mythical friendly AI architecture, in favor of having a potentially unfriendly AI in limbo as to whether or not it thinks it's in testing, and thus never exhibiting misaligned behavior, since it can never be entirely sure it isn't in training. Seems like it would be sensible to do both, or at least explore the options.
@vinca43
@vinca43 2 жыл бұрын
For anyone that hasn't looked at Ramsey Theory, please do. It's a subfield of combinatorics that proves the existence of certain substructures in data, for data of a given size. The bigger the data, the more complex the structures are guaranteed to find in it.
@ralllao7295
@ralllao7295 2 жыл бұрын
I miss the consistent weekly updates :(
@paulchatel2215
@paulchatel2215 2 жыл бұрын
Great content! I really love your channel for the philosophical questions it poses, even though I'm not a scientist nor some kind of phd genius. A question for Yannic : if you need general intelligence to generate general intelligence (if I understand correctly the argument at 26:55), what is the intelligence that made it possible for humans to become intelligent? Couldn't intelligence in humans have emerged from evolution scaling up the number of neurons in the brain? Couldn't the same happen in an AI model by scaling up a specific architecture, without even being the purpose of the engineer or the scientist who designed the model?
@gaborfuisz9516
@gaborfuisz9516 2 жыл бұрын
Great comment, would have been my counter too! Also, while I think "Evolution" and "SGD" (or maybe more broadly "Humanity's collective ML reseearch") are pretty different processes and we shouldn't stretch the analogy too far, there is a fun example of "goal misgeneralisation" here as well: Evolution was roughly speaking "training" humans for the objective of reproductive fitness (have as much offspring as possible) - yet humans today just have a lot of sex with contraception instead of, say, competing for donations in sperm banks.
@monad_tcp
@monad_tcp Жыл бұрын
viruses and bacteria might have enough processing power, there's plenty of them for that to be possible, not inside them, but the environment, which is what sets the goal for evolution.
@monad_tcp
@monad_tcp Жыл бұрын
or perhaps its baked in the rules of the universe, aka, increase entropy or somehting
@danielpaleka9512
@danielpaleka9512 2 жыл бұрын
52:43 "creating strong pathogens by purpose, for research" versus "creating strong misaligned AI by purpose, for research" might be a good analogy, but the reasonable viewpoint is not exactly that we should do both instead of neither, right?
@kraba1081
@kraba1081 2 жыл бұрын
It's good to know what an AI architecture with potential for inner misalignment would even look like, the same way it's useful to know in advance whether a new medicine is actually toxic before trying it in humans. The risks from this kind of AI alignment research is very low compared to AI fueled drug discovery, as the "Dual use of artificial-intelligence-powered drug discovery" paper demonstrates. Important to emphasize is the fact that medical researchers aren't dismissive of the potential of unintended negative outcomes of research chemicals, whereas AI alignment being such a young field still has to prove experimentally to the broader AI researcher community that things like inner misalignment is even possible. The only reason we can in the present day do anything but "creating strong pathogens by purpose, for research" by doing things like computer simulation is because these have been experimentally validated. If we can figure out safety protocols for doing research on the deadliest pathogens known to man I'm sure we'll be able to figure out how to handle even moderately powerful AI systems. These experiments have already been done on simpler AI models, but as discussed in this interview there might indeed be "emergent" deceptive behavior associated with more capable systems which these smaller toy models are simply incapable of experimentally validating. If the reason for AI researchers not taking the possibility of deceptive AI, or even misalignment itself seriously is because of an inductive bias, the only way to give the philosophical arguments weight is to put some experimental weight behind it. At the end of the day, no matter how sound a philosophical argument is, the AI researchers are the ones who are actually making the models.
@danielpaleka9512
@danielpaleka9512 2 жыл бұрын
@@kraba1081 Yes, I actually support making examples of smaller, less-capable models doing deception/mesaoptimization/gradient hacking as one of the most promising research directions; mostly to find out whether some failure modes are actually possible or not
@danielpaleka9512
@danielpaleka9512 2 жыл бұрын
@@kraba1081 but unfortunately I feel "boxing" pathogens is easier than "boxing" some sort of agentic GPT-n; pathogens at least can't try to figure out how to bypass the safety protocols, and also pathogens spread more slowly
@jeffw991
@jeffw991 2 жыл бұрын
In the discussion at @41:10 I would go even a little farther. In a sense, optimizing for the most likely answer with stronger models that do a better job of reading context, we are inadvertently optimizing for models that tell us what we want to hear. Because "The US Government did 9/11" isn't the most likely answer to "Who really did 9/11?" unless you as the querant have used "really" to hint that you want a specific answer. To me, that's a far more dangerous concept than the general idea that an Internet-trained model will tell you what the Internet thinks, not what's true.
@theaugur1373
@theaugur1373 2 жыл бұрын
One of the best examples of emergence is the fractional quantum Hall effect in physics and the composite fermion theory that explains it.
@videowatching9576
@videowatching9576 2 жыл бұрын
Really interesting. Fascinating to hear about things that relate to figuring how AI can have clearer understanding of things. For example, figuring out emergent improvements such that a modernized search engine could be built that can answer spanning many things, and help people get to the source material along the way.
@oncedidactic
@oncedidactic 2 жыл бұрын
Along with broader impacts and philosophical musings, perhaps we should also include ESP pitfalls? 😆 That was a fantastic anecdote, on a serious note really highlights that philosophical work can have great value amidst the duds, and can’t be judged on the duds. Thanks for another awesome interview!
@GarethDavidson
@GarethDavidson 2 жыл бұрын
Been reading Huxley's Devils of Loudun recently and it's crazy how those experiments allowed such a scientific and piercingly brilliant mind to be led astray on topics of the soul and a universal consciousness. In one part he completely nails different types of Christian thinking at the time and how the different philosophies led to cross bearing, self-flagelation, fasting and arguments against it, while still remaining personally distant from it. While in another he's likening the father, son and holy ghost to types of mysticism he believes proven by the science of ESP and astral projection of the time, and drawing on his deep knowledge of other religions to link them together. So he was totally capable of understanding and researching supernatural topics that he himself didn't believe in, but also unable to not inject his own beliefs into areas that he did. Maybe without the beliefs he wouldn't have been motivated enough to do the research and we wouldn't have had such wonderful works by him. But it makes it pretty clear to me that there's a real benefit in being wrong, studying a bunch of stuff that's wrong, and distilling it into ideas that are wrong, but having this transferable understanding of things in general that works on other things that may or may not be right. Food for thought anyway, and probably quite controversial in this age of trial by current theory. In the pre-internet days we didn't have correct and incorrect ideas, nobody knew for sure and nobody really believed stuff with the frothing conviction that they do today.
@WhoisTheOtherVindAzz
@WhoisTheOtherVindAzz 2 жыл бұрын
@@GarethDavidson your worry about "trial by current theory" made me think of Mandevillian intelligence (springer has a nice open access introduction). (I'm also leaving this short and not that enlightening comment so that I can easily find your comment again :)).
@simonstrandgaard5503
@simonstrandgaard5503 2 жыл бұрын
The speech was hard to decipher. Please improve the audio quality. Excellent topic. Perhaps interview Robert Miles about AI safety.
@harriehausenman8623
@harriehausenman8623 2 жыл бұрын
Sound was horrible. I'm on slow net, so after the compression, not much was left.
@ragnarherron7742
@ragnarherron7742 2 жыл бұрын
The discussion might be is there a relationship between emergence and Yoneda lemma?
@raphaels2103
@raphaels2103 2 жыл бұрын
Important topic, 👍 great guest
@laurenpinschannels
@laurenpinschannels 2 жыл бұрын
my big takeaway here: broader impact sections considered to have negative broader impacts; consider instead speculating in a separate venue, eg a conference track.
@binjianxin7830
@binjianxin7830 2 жыл бұрын
Loss is a nonlinear function of behavior. How illuminous!
@siquod
@siquod 2 жыл бұрын
What if the AI realizes after deployment that it will no longer get rewards because training is over? Will it try to get back into training, will it become utterly demotivated, or will it try to wirehead itself?
@JordanTensor
@JordanTensor 2 жыл бұрын
The training of the AI on some external loss function may induce some internalised reward function / "goal" / bag of heuristics which it actually tries to optimise at inference time. The AI will "care" about these internalised things, whatever they are, rather than the external reward function.
@supercobra1746
@supercobra1746 2 жыл бұрын
The engineer reinvents the dialectics and one of it's law. The wheel keeps reinventing! Hooray!
@ragnarherron7742
@ragnarherron7742 2 жыл бұрын
Better to discuss underlying mechanisms than hand wave at ambiguous terms like emergence and AI.
@abltanarana
@abltanarana 2 жыл бұрын
couldn't he put his microphone further away? I can still hear him between all the reverb!
@videowatching9576
@videowatching9576 2 жыл бұрын
I would be curious if there’s someone who writes or has papers about an AI that could fulfill a lot of the purpose that people use web search engines today - could be interesting to hear about what they view as key to figuring out, on the path to making a better search engine that is rethought using AI.
@Qumeric
@Qumeric 2 жыл бұрын
a great guest
@aBigBadWolf
@aBigBadWolf 2 жыл бұрын
People should really get some more hands-on experience with gpt3-like models instead of just believing OpenAI's hype. While the LLM can do some impressive things, their in-context learning is very limited. Too limited to really justify this sort of emergence argument. In fact, I'd argue that the simple size of the model and the data compresses so much text into the weights that a low level of understanding is achieved which allows e.g. for translations.
@NeoShameMan
@NeoShameMan 2 жыл бұрын
We need an agi to get an agi... YEAH IT'S CALLED A HUMAN
@WhoisTheOtherVindAzz
@WhoisTheOtherVindAzz 2 жыл бұрын
And the powerful "optimizer" is natural selection acting on new variations of existing things.
@kc3vv
@kc3vv 2 жыл бұрын
I think one thing that is commonly forgotten in the debate when talking about the paperclip optimizers is that climate change is a much bigger threat at the current stage and therefore this should not be forgotten in the debate.
@wowcplayer3
@wowcplayer3 2 жыл бұрын
Get matte finish sunglasses, the reflections are blinding
@pneumonoultramicroscopicsi4065
@pneumonoultramicroscopicsi4065 2 жыл бұрын
Is there a limit on intelligence?
@CristianGarcia
@CristianGarcia 2 жыл бұрын
52:50 Yannic trying to start Skynet? Maybe that is the true origin story.
@monad_tcp
@monad_tcp Жыл бұрын
1:04:32 wouldn't that be fortune telling ? you can't know the result of the experiment before doing it, if you could, then doing the experiment wouldn't be that rewarding. People put the philosophical musings on papers, its always boring, do they have some sort of Model for that ? its always the same thing, you can't extract much from thinking hard, its better to keep researching to see where this leads us to.
@monad_tcp
@monad_tcp Жыл бұрын
All I want from my models is to be able to do 100% perfect source-to-source translation of computer code, nothing bad can happen from that, right ? (do first, then ask later)
@mobiusinversion
@mobiusinversion 2 жыл бұрын
The echo is kinda brutal 😂
@ragnarherron7742
@ragnarherron7742 2 жыл бұрын
GPT3 is NOT agi. It approximates Yoneda lemma and displays emergence.
@ragnarherron7742
@ragnarherron7742 2 жыл бұрын
A much simpler analysis would be" Why not begin by discussing the relationship between GTP3 and Yoneda Lemma? Is it surprising that leveraging those structures results in effective outcome?
@harriehausenman8623
@harriehausenman8623 2 жыл бұрын
Please record on a Fisher Price Cassette Tape Recorder in the train station next time. Sound would be better 🤣
@emelleetensor5079
@emelleetensor5079 2 жыл бұрын
isn't this just simple rules form complexity at scale?
@laurenpinschannels
@laurenpinschannels 2 жыл бұрын
I think the argument is that you should delete the word "just", not that you're wrong
@eega9181
@eega9181 2 жыл бұрын
If you really want to make your videos better, get rid of the sunglasses. The content is fine :)
@georgewashington7251
@georgewashington7251 Жыл бұрын
Yannic conducts a better interview than Rogan and Lex Freidman.
@Addoagrucu
@Addoagrucu 2 жыл бұрын
halfway through and as someone who doesn't believe that AGI will come with scale it just seems very empty to me
@nauy
@nauy 2 жыл бұрын
I agree. The guy didn’t make any coherent or convincing argument for it. It’s also hard to take anything seriously from some academic who couldn’t put together a sentence without injecting ‘like’ between every other word. Pretty much everything we have today are merely increasingly elaborate Chinese Rooms. Yannick’s pointing out the bit about intelligent systems needing to have optimizers that are already intelligent is particularly pertinent. More scale changes nothing. You might get some emergent properties alright. I am willing to bet there is next to 0% chance it will be AGI.
@ZeroRelevance
@ZeroRelevance 2 жыл бұрын
Do you guys have any reasons why you don’t believe AGI will come with scale? I’m asking this as someone who thinks it’s reasonably possible given emergent behavior demonstrated so far from models like GPT-3 and PaLM.
@KathyiaS
@KathyiaS 2 жыл бұрын
Plus taking Nick Bostrom as a reference for a philosophical perspective on AI…
@DementedEeyore64
@DementedEeyore64 2 жыл бұрын
@@nauy If you've ever been around a large number of incredibly smart people, you wouldn't be bothered by his speech mannerisms. Many intellectuals at this level speak like that because they are attempting to think about and retrieve so much information. Especially if they are the anxious types. If anything, its a sign he's doing a lot of thinking.
@nauy
@nauy 2 жыл бұрын
@@DementedEeyore64 I have been around extremely intelligent people all my life, in graduate school, work, and social circles. None of them speaks like a TikTok teenager. If GPT3 can put together a sentence better than he, I am allowed to be dismissive.
@xDMrGarrison
@xDMrGarrison 2 жыл бұрын
first pog
@cate9541
@cate9541 2 жыл бұрын
Why do you wear sunglasses just to sit at your computer?
@drdca8263
@drdca8263 2 жыл бұрын
Same reason as why [member of that band] wears sunglasses at night? (So he can see)
@Biedropegaz
@Biedropegaz 2 жыл бұрын
to show everybody that he is unique :-)
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
Beat Ronaldo, Win $1,000,000
22:45
MrBeast
Рет қаралды 158 МЛН
The Science of Smarter Thinking l Steven Pinker on AI and Human Intelligence
1:00:58
World of DaaS with Auren Hoffman
Рет қаралды 20 М.
The A.I. Dilemma - March 9, 2023
1:07:31
Center for Humane Technology
Рет қаралды 3,5 МЛН
#80 AIDAN GOMEZ [CEO Cohere] - Language as Software
51:51
Machine Learning Street Talk
Рет қаралды 20 М.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН