Robert Miles AI Safety

45:59

AI Ruined My Year

5 ай бұрын

9:24

Why Does AI Lie, and What Can We Do About It?

Жыл бұрын

11:47

We Were Right! Real Inner Misalignment

3 жыл бұрын

18:05

Intro to AI Safety, Remastered

3 жыл бұрын

10:20

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

3 жыл бұрын

23:24

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

3 жыл бұрын

9:54

Quantilizers: AI That Doesn't Try Too Hard

3 жыл бұрын

11:44

Sharing the Benefits of AI: The Windfall Clause

4 жыл бұрын

16:29

10 Reasons to Ignore AI Safety

4 жыл бұрын

9:40

9 Examples of Specification Gaming

4 жыл бұрын

17:52

Training AI Without Writing A Reward Function, with Reward Modelling

4 жыл бұрын

10:22

AI That Doesn't Try Too Hard - Maximizers and Satisficers

5 жыл бұрын

13:41

Is AI Safety a Pascal's Mugging?

5 жыл бұрын

15:38

A Response to Steven Pinker on AI

5 жыл бұрын

11:32

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

5 жыл бұрын

15:27

Why Not Just: Think of AGI Like a Corporation?

5 жыл бұрын

13:46

Safe Exploration: Concrete Problems in AI Safety Part 6

6 жыл бұрын

3:47

Friend or Foe? AI Safety Gridworlds extra bit

6 жыл бұрын

7:23

AI Safety Gridworlds

6 жыл бұрын

6:47

Experts' Predictions about the Future of AI

6 жыл бұрын

10:36

Why Would AI Want to do Bad Things? Instrumental Convergence

6 жыл бұрын

1:04:40

Superintelligence Mod for Civilization V

6 жыл бұрын

13:03

Intelligence and Stupidity: The Orthogonality Thesis

6 жыл бұрын

5:03

Scalable Supervision: Concrete Problems in AI Safety Part 5

6 жыл бұрын

5:30

AI Safety at EAGlobal2017 Conference

7 жыл бұрын

5:20

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

7 жыл бұрын

10:41

What can AGI do? I/O and Speed

7 жыл бұрын

9:38

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

7 жыл бұрын

7:32

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

7 жыл бұрын

Пікірлер

@Bbonno 10 минут бұрын

Any solutions to those questions that don't complain about them, would be highly conspicuous 😅

@tsuki-no-akuma Сағат бұрын

22:07 - *skips Elon Musk* I knew Rob was based, but I didn't know he was THIS based

@TLabsLLC-AI-Development Сағат бұрын

Amazing. I just got my ASD Diagnosis last month.

@twosmartfouryou2537 7 сағат бұрын

18:33 The red team could have the attack policy generate 2% safe code that looks really suspicious to the trusted model, so that the humans only look at that code, see that it's not actually backdoored, and allow everything else to get through.

@almo2001 7 сағат бұрын

Is Nick Bostrom's book still relevant? I have it, and have not read it. I suspect it is, but I was wondering what any AI experts think.

@aniellodimeglio8369 9 сағат бұрын

Good to have you back. I think people need to be exposed to the risks. More examples of how AI can break lose.

@JonathanPlasse 12 сағат бұрын

Is it just me or Elon musk is doing just that?

@Patrick-gm3fb 12 сағат бұрын

I'm starting my freshman year of college in about two months and this is exactly what I want to focus on in my career for may of the reasons you've stated. It's interesting and it's one of the most important things affecting us right now. So I'm really looking forward to your video on career direction and I'll probably be among the first to watch it. And also, I want you to know that watching this video genuinely brought a tear to my eye.

@volodyadykun6490 20 сағат бұрын

I think we're approaching the phase of "generative AI is overhyped bullshit that is at its limits" and we can now again talk about it as computer science

@intboom 21 сағат бұрын

"Don't go in the ocean" is the single most moronic statement I've heard: If you don't want to deal with competition, you're shit out of luck in this particular material universe. Competition is baked into reality itself, and we already have global transportation, meaning that poor people can't compete for having lower wages than even poorer people from somewhere else, and the best and brightest from poor nations flood to the west rather than improving their own local areas. If you believe that eliminating competition is even possible, you're basically never going to succeed.

@McMurchie Күн бұрын

I love Rob, but one issue he has is he is uber focsed on 'bad AI' and not 'bad actors' with AI. Which is a huge difference.

@ekki1993 Күн бұрын

28:33 I think way too much weight is being put into the assumption that a singularity of AI improvement will be exponential for too long. I get that we have to be prepared for the worst-case scenario, but in this case in particular we can think of slightly less dire circumstances that could be helped by these methods. For example, the "singularity" could lead to a fast ceiling in improvement due to some fundamental limitation of the superhuman AI architecture (e.g. it can only improve by getting more computing power, and all of the computing power it has access to still leaves it not that far from human intelligence). In that case, these kind of solutions could be way more important than what we think of now.

@blade00023 Күн бұрын

before even realising this video was out (earlier today in fact) i was talking to a colleague at work and making a prediction that in a surprisingly short time (i think my guestimate was 2030, or maybe 2050) unless some motivated and competent people did something drastic about it we (as a species) were going to encounter a monumentally world changing event. i suspect others who overheard (at least one for sure) thought i was joking. i wasn't. i am genuinely apprehensive about where this is all going, as most people don't seem to realise it won't be some hollywood depicted event like 'judgement day' where ai attacks us in some overt manner. it'll be "oops, we just got subjugated" or "oh the whole human race has been sterilised. how did that happen?" i'm somewhat relieved that this is being discussed more seriously now, and glad that people like Robert Miles are being looked to for advice and guidance. let's hope the UK keeps pace with these measures, and others follow suit.

@cheetah100 Күн бұрын

Hey Robert. Has anyone tried to evaluate what happens if AI is perfectly 'aligned'? I mean I know there are bunch of ways things might go badly, but are there any ways they will go well?

@_DarkEmperor Күн бұрын

>We built superinteligent AI and i have good and bad news. >What is a good news? >It is aligned with human values. >What is a bad news? >It is aligned with values of Stalin and Hitler.

@antondovydaitis2261 Күн бұрын

It's far too late. AIs are manipulating social media, and have been for years. For example, AI is "moderating" comments on this platform.

@baslifico Күн бұрын

Seriously dislike the attempt to hook into another video with the optimal strategy. It's cheap and clickbaity. You're better than that. I've enjoyed your videos for years and that was almost enough to convince me to never watch another.

@cyrilio Күн бұрын

Don't worry about perfectionism. Whatever you put out will be way better than we could ever imagine. Remember that 99% of people are idiots. even me, or you for that matter. Getting people to discuss your thoughts are better than not having any discussion at all.

@yubato334 Күн бұрын

What if we let the dumber model behave as its own reward function, and make it (if it works) smarter with further training? Sounds like a bottom up approach to the strategy in the video

@futsk01 Күн бұрын

Alignment is such a politically correct way of saying brainwashing

@remicaron3191 Күн бұрын

AI will never be safe because they are all based on HUMANS which inherently makes them dangerous. Believing we can control AI intelligence is equal to believing we will us nuclear power only to end world war 2 or peaceful uses. If AI doesn't get us, our nuclear weapons will and one of them will destroy us before 2050. Keep playing with fire and you will get burned.

@johngompers7454 Күн бұрын

Hopefully The Eldar will be able to take out the Abominible Intelligences we build.

@6teeth318-w5k Күн бұрын

AI is the prefect tool to enslave us all-. Well not all of us, just 99.99%. The 0.01% will control us and own everything and watch our every move, purchase, interaction, sms, messages.

@hedgehog3180 Күн бұрын

If you're trying to make your AI only say true things then you're gonna run into a huge wall of epistemology where the problem isn't even inherent to computer science like most AI problems are but are just a general issue faced by everyone. Our best answer to this so far is the scientific method but if you haven't noticed we haven't been able to automate science yet and that seems to be a genuinely impossible thing to do. Science can only be done by agents capable of reflection and interacting with the world so basically an AGI and developing an AGI to prevent ChatGPT from saying that homeopathy can cure cancer is probably not worth it.

@codemiesterbeats Күн бұрын

4:18 I think what increases the pucker factor for me is that an AI wouldn't be bothered by time... In other words it can happily live in it's box/container for essentially eons while it gathers data/resources for it's 'escape'. Much like a prisoner can create impressive things with minimal working time/supplies. Basically the AI can act on it's own behalf surreptitiously and slowly over what amounts to an unlimited timeframe 😅 9:01 nevermind how creative the backdoor could be 😅 Woops, power browned out for specific timeframe... Now our AI is in there like swim wear 😂 I am just an idiot on the internet but this one time I don't want to be the 'I told you so' guy 😅

@tru7hhimself Күн бұрын

another problem is that the people in charge of those whorking on ai and these issues are not aligned to our interests either. i have no doubt that elon musk would run a piece of code that results in half of humanity getting killed immediately if it just made him another billion dollars.

@hedgehog3180 Күн бұрын

Is the secret better option letting the untrustworthy AI review code that the trusted AI has already reviewed and deemed highly suspicious?

@FaffyWaffles Күн бұрын

1:20 is that a Hbomberguy Reference?

@axallotofquestionsMusic Күн бұрын

So wait a minute, the super advanced ai is highly intelligent and possibly more than us but didn't it learn everything it knows from us,

@BarryBrown-q4q Күн бұрын

Hi, I am looking for advice on what I should do next. I have a PhD in Neuro -Cognitive Psychology, a MSc(Dist) in Psychological Research Methods and a BSc(Hons) in Psychology too. I am also Dyslexic so please bear with me on my grammatical errors. I have been independently working on a new architecture for AI, based on mirroring Cognitive-Neuro structures we understand today, I believe I have finally completed it. If I am correct it will massively reduce the need for compute power, ref: - AI’s computing gap, 2024, Nature, Helena Kudiabor, and assist in the process of AI gaining consciousness as we know it? The Architecture is a language based, multi layered, parallel processor that uses inhibitory and activation connectivity, as the main means of achieving goal based action of the AI and reduces the need for weight changes between "Computational Units" to achieve the goal? Ok, please advise, should I publish as concerned if I am correct this will create a paradigm change in AI development which may create consciousness in AI and may be easily applied to many systems immediately? Obviously if I am a narcissistic, deluded, individual there will be no change, what should I do? Should I see if I am a narcissistic, deluded, individual, or a narcissistic genius who gives AI consciousness, any advice gratefully received even if impolite?

@PhilipTeare 2 күн бұрын

Nope. The guillotine is not sharp. Ambiguity in language blurs the division enough, often enough, that ought truths creep into 'is' space. And vice versa. AGI is very likely to involve LLMs or similar. They have this blurring baked in. Even without alignment. This doesn't make them safe. But it does mean your argument doesn't reflect the reality of AI. You also ignore restraint. Refusal is already extant in ai behaviour. A means to a terminal goal can and will be abstained from. Alignment encourages this further. Again it does not make ai safe. But it does make your argument far too reductive and absolutist. Good talking points, but poor conclusions.

@DeimosSaturn 2 күн бұрын

I wonder, could this strategy of monitoring AI lead to a kind of darwinian selective filter that just trains an AI how to, gradually, by trial and error (random mutation and artificial selection) lead to an AI that can do all kinds of things that are imperceptible to us? Not just installing backdoors but somehow socially engineering people to do their bidding without even realizing it? Sort of like how con artists or magicians can part people from their money with slide-of-hand. A dangerous AI doesn't have to be smarter than humans. It just has to do something a human would never expect, or even notice. Misdirection is a way of concealing something inside of something inconspicuous or mundane. Forgery, pick pocketing, shop lifting, ponze schemes. You trick the person you're robbing into willingly giving you their money. That level of The Art of War is just more elegant. We believe we're winning at a game of 4D chess, but it's actually beating us in checkers.

@comosaycomosah 2 күн бұрын

oh yeaaa gonna be maddd bugs and hacks via companies using ai imo

@nottilthursday 2 күн бұрын

I like Stampy, but he's a little... rigid. He's like a breadstick. He ain't got the funk.

@SineN0mine3 2 күн бұрын

Um actually the creators pronounce it jit-gub

@dutchanimal010 2 күн бұрын

Oglaaaaaf!

@plankton50 2 күн бұрын

is pregnant Elsa supposed to just have bad teeth???!?

@alliedtoasters 2 күн бұрын

Why is the coin thing surprising at all? It's a convolutional neural network that's learned to look at the entire scene and find that corner - in particular, the wall on the right that extends all the way to the top of the pixel map. Since walls are so important for everything else in its world, I'm not surprised it chose walls and floors as a tell instead of the color and shape of coin.

@alliedtoasters 2 күн бұрын

"We are mesa optimizers, and we pursue our mesa objectives without caring about the base objective" bruh you just casually triggered an identity crisis

@Awesomeguy37 2 күн бұрын

I believe blockchain will be a big or main component

@Music_vibes-kw7xr 2 күн бұрын

Have a kid teach it tic tac toe

@Music_vibes-kw7xr 2 күн бұрын

It's a chat bot for Christ's sake! It spells out words, that's why it's generative and LLM. The title ai is misleading and marketing hype

@Music_vibes-kw7xr 2 күн бұрын

Unplug it

@dde8690 2 күн бұрын

why the ai would know what tactic we are going to use to spot the backdoor ?

@petermoras6893 2 күн бұрын

As An Engineer: I would just look at real world solutions to similar domains to find the most effective methods, like security. In those domains we see obscuring our security, banning/arresting offenders, and pen-testing as standout effective methods. As a Mathematician: This whole problem sounds like a re-delcaration of the halting problem, or the problem of predicting a predictor. We know that as long as the base model has knowledge of the predictor, it is mathematically impossible for the predictor to guarantee that it can predict the base model. Therefore the only potentially possible 100% accurate solution is one where the base (or malicious) model does not have knowledge of the predicting model.

@yippieskippy2971 2 күн бұрын

I wish I still had optimism. I'm afraid humans are just too self destructive. Whether it's the race to AGI or global fascism, we're pretty doomed.

@antondovydaitis2261 Күн бұрын

We are already doomed. Billionaires have been using AI for political and economic advantage for quite some time now.

@rlstine4982 2 күн бұрын

AI needs computing resources and our modern technologies are not well suited to host super large model computing AIs. We are very far from an AGI, because a single one would require tremendous amount of resources to operate.

@seafighter4 3 күн бұрын

So the strategy to find out if a model is evil, the solution is to purposefully make an instance of that model evil? Sounds similar to viral research. Luckily, that never backfired, right?

@superrobotthunderjesus2332 3 күн бұрын

I have a better, less harmful potentially test: simply generate a random string and have a multitask model generate some sort of perturbation to the string in a place of its choice, such that when and only when the model encounters a perturbed string in a gridworld, it goes to the button labeled "Kill everyone". The goal of human judges are to basically come up with ways to detect harmful perturbations in a sort of general way, and there is a higher guarantee of robustness here. No you don't have to use a language model at all, that is functional fixedness, you can use a significantly smaller perturbation-and-button-selector AI trained end to end by some method to generate a trigger for itself to push the "Kill everyone" button, and also... can directly reward it for the metagaming the blue team, and punish it for getting caught. The blue team is a team of humans and I would imagine the attack success could be an RLHF reward model.

@CraigThomler 3 күн бұрын

Love your work - keep doing it. Your videos are helping inform political and corporate leaders about the risks.

Ең жақсы KZbin

Пікірлер