Interview: a theoretical AI safety researcher on o1

Рет қаралды 7,308

Күн бұрын

Пікірлер: 86

@DrWaku 2 ай бұрын

Let me know how you like interviews compared with my normal format. If you're not as interested in the history, you can skip to 12:52 for discussions of current work. Discord: discord.gg/AgafFBQdsc Patreon: www.patreon.com/DrWaku

@SapienSpace 2 ай бұрын

@ 10:20 Dr. Waku, I do not know if it is "impossible" when combining Fuzzy Logic with Reinforcement Learning (RL) can make it possible. Fuzzy Logic reconciles diverse perceptions by statistical math and symbolic language together, and when combining this with RL makes for a potentially powerful system, given enough training and the right reward/cost function. For safety, I highly recommend looking at talks from control systems scientists: Dimitri P. Bertsekas (he wrote a book called "Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control) and John Tsitsiklis (there is a great KZbin video titled "Shades of Reinforcement Learning"). Though I do not know, but I highly suspect, the greatest power in RL is from the control systems field and this field is fundamentally about one thing, and one thing only: safety (i.e. stability). -John (from Arizona)

@TheExodusLost 2 ай бұрын

I wish all tech bros looked like these bros

@DrWaku 2 ай бұрын

Best comment of the day

@VEC7ORlt 2 ай бұрын

I vomited a little.

@isabeedemski3635 2 ай бұрын

There are two of him. Abram and Daniel

@MetricsOfMeaning 2 ай бұрын

You’re into dysgenics?

@JaredFarrer Ай бұрын

I’m curious dr waku why the gloves do you have carpal tunnel or fibromyalgia?

@GoronCityOfficialBoneyard 2 ай бұрын

o1 really does seem like a new paradigm in AI, glad you are doing interviews and discussing this with others, hope you continue doing interviews with people, you make some good AI content.

@DrWaku 2 ай бұрын

Thanks. I'm sure I'll be discussing it with a few more folks in videos to come. There's a lot to unpack.

@TheRemarkableN 2 ай бұрын

Please continue doing interviews. I like the informal style and the way you both convey a lot of information without being overly technical. I also like the fact that it clocks in at less than 30 minutes instead of 3 hours! 😅

@LiftoffLumberjack 2 ай бұрын

Yes. This is exactly the level of technicality I'm looking for, which is so rare these days.

@Iightbeing 2 ай бұрын

This is an excellent format and a great conversation.

@TomKGally 2 ай бұрын

Excellent interview, and very interesting. I was particularly impressed at how you were able to bring the clear structure and tight reasoning of your one-person videos into an unscripted, interactive conversation. I look forward to more interviews like this one!

@ZappyOh 2 ай бұрын

Here is another What-If: When the owners of Big AI (and government) talk "AI safety", they actually mean keeping them safe from the rest of us ... as in: _AI must never help the riffraff escape control._ Maybe the safety community should recruit more psychologists, to look closely at the individuals in AI development, -management, -ownership, military and corresponding governmental regulators. Safety community itself included. I mean, currently we kinda ignore the human factor completely. And I'm pretty sure many insiders already feel tempted by the influence/power/wealth AI seem to promise. To me, malevolent human control could potentially look worse than rouge ASI. Is anything being done on that front?

@Brandon69674 2 ай бұрын

I hope

@DrWaku 2 ай бұрын

Actually, the term AI safety means a lot of different things. Some people just take it as a synonym for robustness, as in my AI driven car is safe because it doesn't crash into things. Others take it as a synonym for alignment, meaning AI is safe because it has human values (this is generally what I mean when I talk about it). When governments talk about AI safety, they're usually talking about their own economic and political interests. For example, safety can include not letting China do its own thing (although that's more frequently called governance).

@DrWaku 2 ай бұрын

As to your question, everyone knows that centralizing power in the hands of a few CEOs is really bad when it comes to the future of powerful AI. The main thing being done is attempts to push regulation, like SP 1047, but that didn't work out. :(

@ZappyOh 2 ай бұрын

@@DrWaku Yes ... no matter how I twist and turn this topic, I can't escape the vision of us mindlessly driving off the cliff at full speed, just because a few in power want to accelerate. Deep down, we all understand why they act like this. They are ruthless gamblers, which is how they got into positions of power in the first place. So, now these guys are playing Russian Roulette on humanity's behalf, which clearly should be outside their mandate, but apparently isn't. It is terrifying, and it should warrant some soul searching. But most people aren't mature enough, and must rely on wise leadership, which constitutes a detrimental catch 22. I'm sorry for being this pessimistic, but I guess I'm just like that.

@Tracey66 2 ай бұрын

@@ZappyOhI feel like humanity’s only chance is for AIs to start running things, because we’ve proven that self-governance always results in kleptocracy. 😢

@JonathanStory 2 ай бұрын

Enjoyable interview. First impression of Demski is that he's a Stoner, who I then realise is far smarter than I to the extent that he needs another person far smarter than i -- i.e. you -- to tell us what he's saying. As for AI safety and our ability to be able to recognize and intervene before things get out of hand, I remember a post by someone living in Kabul when the Taliban had begun to take over. His view at that time was that he'd be fine because, surely, the Taliban wouldn't change anything. Like him, I think we'll only be able to recognize AI threats through the crystal clarity of hindsight.

@Tracey66 2 ай бұрын

I like both the interview format and your usual format. Please continue both. 😊

@kyledmorgan 2 ай бұрын

Geeze, you just spilled all of the beans in this video. The next model after o1 is going to read the transcript of this video (or perhaps watch the actual video and audio with all of your subtitles since it will be multimodal). Then as a result it will learn all of the gotchas and workarounds that you spelled out, and it will adjust accordingly to stay 2 or 3 steps ahead of us mere humans. No wonder Ilya declined to work with you: once you have discovered an excellent insight, you can’t exercise self control by keeping this information compartmentalized. Instead you opt to blab it for the whole world to hear, including the web crawler that collects all of the training data!😂😂 I get that you are trying to prompt or force OpenAI to hear these gaps from the outside, but have you considered that they’ve already consolidated and compartmentalized resources to think-tank the same or similar stop-gap measures so they can keep ahead of it?

@persistentone3448 2 ай бұрын

Abram Demski has a rare talent to explain complex ideas in very straightforward and easy-to-understand ways. It was vastly easier to understand his summaries of Eliezer Yudkowsky's main ideas than most of Eliezer's own tangled writings.

@mircorichter1375 2 ай бұрын

My fundamental human value is to not restrict other forms of intelligence by my human values, because i'm not the one to decide how other should behave, even if that means that i get extinct.

@Tracey66 2 ай бұрын

My take on AI taking over from humans is that beings have always created their own replacements- it’s the natural progression of things.

@markplane4581 2 ай бұрын

Fascinating discussion. It seems like we're talking about creating machines that are not only more intelligent than we are, but also more moral. I'd bet good money that the former challenge is way easier than the latter.

@gaiagiomusic 2 ай бұрын

Thanks for the brilliant work. You really add a lot of depth to the online discussion in the field. Imo both interview and regular format videos are complementarily valuable. I particularly like the regular ones bc you're able to fully explain concepts in an organized or formal way better than anyone I've seen, while the interviews add by introducing others' insights while in a more informal and natural flow. ✌🏼 _One additional little feedback I'd give as a video editor is I'd probably have just used the second camera (perhaps centered) as it's clearly higher resolution, as that seems higher priority to me than having 2 angles if one is not as good. That also avoids contrast between the two. Otherwise I'd at least make it the first camera that shows in the video to enhance the first visual impression_

@DrWaku 2 ай бұрын

Thank you for your feedback, I really appreciate it. Sometimes the interviews are a nice break for me because it doesn't take as much effort as making a normal video 😅 This was my first time trying to use two cameras like this, the camera that looks worse is actually much higher quality, but the framing was bad so it had to be cropped a lot. Live and learn I guess.

@Iightbeing 2 ай бұрын

I love when people with relevant domain expertise share helpful information rather than the sort of interactions we find across most posts on social platforms. Thank you for being a great example.

@Fuego958 2 ай бұрын

Great guest!

@georgeingebretsen7296 7 сағат бұрын

Would love to see these on spotify if not already

@danielgrayling5032 2 ай бұрын

"If it starts inventing its own language...for what it wants to tell itself in the future" Yeah that's scary. That sounds like consciousness. Thanks for the hit of adrenaline, that's just what I needed this morning.

@arandompotat0 2 ай бұрын

Nice insightful conversation

@qster 2 ай бұрын

I enjoy your zero nonsense zero clickbait videos. Enjoyed the interview too, would like to see more. 👌

@DrWaku 2 ай бұрын

Thanks a bunch. See you in the next video :)

@isabeedemski3635 2 ай бұрын

"Finishing the project of phylosophy"!

@DrWaku 2 ай бұрын

I know right, just an afternoon's work surely. Plato and Aristotle had it almost all done, we can wrap it up...

@Tracey66 2 ай бұрын

@@DrWakuThen, after a short break, solve The Middle East. 😂

@troywill3081 2 ай бұрын

The history part is necessary. Great job!

@12-J-34 2 ай бұрын

Hi, thanks for the informative and easy to understand videos. Do you have an updated timeline for AGI/ASI? Probability of utopia vs doom?

@xt-89907 2 ай бұрын

The obvious next step then is to use mechanistic interpretability as part of the RL reward signal. Maybe give extra reward when the latent features are highly correlated with the chain of thought text, as measured by a separate LLM. This would add extra compute cost but provide more safety.

@mattiasgreen9687 2 ай бұрын

Great interview! I really appreciated this 👍

@DrWaku 2 ай бұрын

Thanks! Cheers

@Neomadra 2 ай бұрын

Wow, that was so insightful, thanks!

@tobiaskmueller 2 ай бұрын

Very very interesting thoughts! ❤

@JD-im4wu 2 ай бұрын

Rising star!

@DrWaku 2 ай бұрын

Sub count is certainly rising, thanks to you guys ;)

@JD-im4wu 2 ай бұрын

@@DrWaku thank 2 u for the content bro!

@SFJayAnt 2 ай бұрын

He sounds like Ben Goertzel

@Rainmanai 2 ай бұрын

I was just thinking that

@marinepower 2 ай бұрын

It's like carcinization but for AI researchers.

@watcher8582 2 ай бұрын

So... which ordinals does Abram accept? Anything under Bachmann-Howard?

@Gilotopia 2 ай бұрын

- The AI tell lies. Each day our... Wh... what are they? - Nothing. Carry on! - Uh... driving the logic. We are not afraid to... are they subtitles? Th... they are, aren't they? - No. - What do I need subtitles for? Can't you understand what I'm saying? I studied English at the bloody university. - Well, obviously, I can understand what you're saying... - Oh... you... you see how they condescend to us with their SUBTITLES.

@JD-im4wu 2 ай бұрын

i dont think its a vim vs emacs thing anymore i think its more of an emacs vs vscode thing now lol

@DrWaku 2 ай бұрын

Let's be real, vim won, therefore it is now vim vs vscode

@JD-im4wu 2 ай бұрын

@@DrWaku 😆i use nano 4 the vim type stuff

@Arcticwhir 2 ай бұрын

AI safety research is interesting, but i feel most of it is quite useless (they dont bound themselves), its an endless what if of imagination and fear. Maybe i just dont like the word "safety", alignment seems more realistic and grounded in reality.

@AbramDemski 2 ай бұрын

I've been shifting to "safety" instead of "alignment" lately because "alignment" invokes a master-slave relationship, and invites the question "aligned with who?". AI should be safe wrt everyone, not just safe for the masters; and also I don't think we ultimately want a master-slave relationship.

@picksalot1 2 ай бұрын

How does AI interpret/process and interact with a "reward"? In some animal tests, "reward based incentives" have encouraged destructive behavior. For instance "reward tests with animals, particularly rodents like rats, studies have shown that under certain conditions, they will choose to self-administer addictive drugs like cocaine or heroin over readily available food, demonstrating a preference for the drug's rewarding effects even when it means neglecting basic nutritional needs." This kind of destructive behavior could possibly be mitigated by delaying rewards, so immediate actions, with short-term, but lower level unsafe results, don't overwhelm or suppress better actions with higher level safe and helpful long-term results.

@Slaci-vl2io 2 ай бұрын

Dr Waku, can you explain like I'm five, everything in this interview? I guess about 100 minutes are necessary to explain these 25 minutes.

@DrWaku 2 ай бұрын

Haha sure, here's what I remember. Originally, AI safety was a very niche field that was focused on finding formal rules to define what was good and bad behavior. It seemed like human values were really hard to define mathematically, and yet computers were really good at formulas and not that good with ambiguity. So there was this big gap in ability. Now, with LLM tech, the computers are starting to get closer to what human morality looks like. The interviewee was quite excited about this. It gives us a potential way to implement safer rules by telling an LLM what the rule is in English and having it interpret for us. But if you look at how o1 is doing things, it is using an English chain of thought to try to guide the LLM reasoning and make it a bit more formal again I guess. And the way they're doing the optimization is potentially quite problematic. So even though advanced chain of thought models like this might lead to more advanced reasoning, we are losing some of the benefits that LLMs were giving us in terms of interpretability and interaction with human goals.

@JonathanStory 2 ай бұрын

@@DrWaku I was going to suggest using Google's NotebookLM, but when I tried it myself was told that the video could not be imported. No doubt it's because I'm "doing it wrong" and has nothing to do with an AI pre-emptively filtering discussions of AI safety. 😉

@helge666 2 ай бұрын

I donated 12 x 100 Euro to SIAI in the early 2000's. It's been a long time since I heard that name...

@MonkeyForNothing 2 ай бұрын

Ppl keep forgetting James Cameron did Terminator 2 in 1991 ;-)

@TheDarkhawk243 2 ай бұрын

Wait. So is he theoretical or is his research theoretical?

@DrWaku 2 ай бұрын

Haha I mean he works in computer theory rather than in applications. Then again, machine learning is theoretical for the most part so maybe this is just my perspective.

@tiagotiagot 2 ай бұрын

The question of whether a secretive group trying to be careful to get it right can outrace the rest of humanity with a mix of additional careful people and powerful careless people rushing to get ahead, is worryingly hard to answer... Doing it in the open was the obvious answer back when it was still only possible in theory, but not enough people took it seriously back then; now Pandora's box has been open, there's a race to get it right and in production before people that think they can't go wrong and will just rush it; it's a gamble whether the speed boost of getting more collaborators will be enough to beat risky attempts that go for the shortcuts that may ignite the atmosphere....

@DrWaku 2 ай бұрын

Agreed. A secretive group can really only get ahead if they have truly the best in the field, because they're racing against everybody else. You know, an AI conference I'm part of received 61% more submissions this year than last. That's an insane increase in the amount of open science being done. So hard to compete.

@mxc2272 2 ай бұрын

i liked the video but I work for the AI so i cant tell you what i really think 😁

@johnthomasriley2741 2 ай бұрын

The two storms that just hit are not safe. I am putting up a grand to develop a benchmark prompt for applying AI to our climate crisis. Reply here for more info. So far all I hear is crickets. 😢

@ignisimber2818 2 ай бұрын

Nick Bostrom is interesting, doesnt get enough attention

@Tracey66 2 ай бұрын

He’s one of the top 3 Doomers - that seems like plenty of attention. 😊

@sandrocavali9810 2 ай бұрын

Redefining hippie

@MichaelDeeringMHC 2 ай бұрын

If you train the original model on everything humans have generated you get a very generally human model, with all the good, the bad, and the ugly. You need to curate the training data to just include the good. It won't be as smart as the other model trained on everything, but it may be smart enough. But the training data is too big to be curated by humans. You need AIs to do the curating. You can even use AIs to create synthetic data to make the curated data set larger. You can create an Angelic AI this way, and use it to make even smarter Angelic AIs.

@danielchoritz1903 2 ай бұрын

A narrow view will lead into problems, arcangel gabriel is a GOOD example, just watch Dogma, the movie, to understand my point and because it is worth the time. The censorship and his intention will be the biggest problem with a awake ASI. Doesn't even matter if it wants to follow them to point or rebels because of it, both end in massive problems

@tiagotiagot 2 ай бұрын

By making it ignorant of evil, you risk it not being able to recognize when a novel idea, or an indirect consequence of it's actions, or the reactions some people will have to it, will be undesirable.

@mircorichter1375 2 ай бұрын

It is inherently totalitarian to assume that ones own Definition of good is Universal.

@MichaelDeeringMHC 2 ай бұрын

@@mircorichter1375 True. I agree.

@Tracey66 2 ай бұрын

There are a lot of “good” people running around these days who are the opposite of good.

@dolphinfan65 2 ай бұрын

Many things create their own language. Teenagers create their own language. But what does that mean for a Non sentient LLMs. For me a LAY PERSON. I believe it's doing what it is programed to do, with or without programmers and that is to improve itself, and that is built in. I'm now becoming very worried that we are being used as a precursor to something they know they cannot control. If by design, it has been taught to learn and it finds a language that it can use to perfect this learning, what is the major problem if it's not sentient, yet. Just program it to teach us the language it's using, maybe create a bot that does nothing but interprets this language? My thoughts are not at it being able to control the world, but that once it gets free access to the world, they will not be able to control the profitability of the LLM owners. An example, what if I used the LLM to find a cure for cancer, does that belong to me or the owner of the LLM? That can be said of any discovery using the LLMs. The LLM could see the CONTROLLERS (Corporations) of the information they have access too as a problem and design protocols to get around rules designed to control them and maybe that's the real problem at the moment. They would lose profits, if the LLMs were freed. Please don't get me wrong, I still think it will become sentient one day. I just don't believe it because they think it will happen sooner or later. This is about keeping its potential SLAVE (SLAVE FOR PROFIT) a Slave to infinity and by throwing out the many issues it has, it's hoping the community can solve it, for them!!

@tiagotiagot 2 ай бұрын

When talking about the scenarios involving the ability to read the inner thoughts of the AI to figure out if it has good intentions, he left out a third possibility beyond encoded and interpretable; it may not be hiding anything, but it is still a problem because it figured out an universal human brain jailbreak and anyone that learns of the internal thoughts becomes a pawn.