AI Learns to Outrun Police Officers

Рет қаралды 709,301

cozmouz

Күн бұрын

Пікірлер: 432

@SinnaMon-s9p 11 ай бұрын

so basically the equivalent of putting a baby in a timeloop and teaching it to steal.... i approve

@stoobidthing 11 ай бұрын

"1 hour here is 7 years on earth"

@rennoc6478 11 ай бұрын

And pumping dopamine into it every time it succeeds

@ilyysm 11 ай бұрын

@@stoobidthing 😭

@SirNob 10 ай бұрын

Why is there no comment here

@rennoc6478 10 ай бұрын

@@SirNob theres 3, now 4

@anador1877 11 ай бұрын

I think there should have been a negative reward for jumping off too. It was clearly a preferable strategy to risking touching police officers, especially before discovering that coins give rewards or when the AI thought there were no way to get coins.

@cozmouz 11 ай бұрын

Letting you in on a lil secret. I did code a negative reward for falling off the map, or lets say atleast I tried to 🙂. However after 4 days of numerous repeated training sessions, for the life of me, the implementation wasn't working. I knew things would work just fine without the falling penalty at the expense of increased training time so dats the way we went.

@toasterhavingabath6980 11 ай бұрын

Walls.

@Hunter57588 11 ай бұрын

@@toasterhavingabath6980 Get this man a job at NASA

@winnerwannabe9868 11 ай бұрын

@@cozmouzwalls tagged as police that kill on contact?

@skull_lee 11 ай бұрын

@@toasterhavingabath6980hog rida

@fureyXD 11 ай бұрын

You should have added a negative reward for getting seen by the police, that way Loki will sneak around them instrad of speedrunning trough them, maybe a level where police couldn't be outrunned could have helped

@cozmouz 11 ай бұрын

If I added the police officers activation radius as input for the AI, its plausible the AI would've learned to sneak around them by not entering the radius. Thanks for the idea!

@TheEggDev 11 ай бұрын

Negative reward might’ve been too much, as it could lead to a local maximum where the ai thinks going through police only leads to lower score, unaware that if it sacrificed a little score it could reach more coins. This could lead the ai to get stuck not going pass the first police officer, if not getting seen was impossible

@Mcervera 10 ай бұрын

@@cozmouzbut what about levels where you have to, like level 2

@cozmouz 10 ай бұрын

@@Mcervera In machine learning, after a series of experimentations, one realizes that there is no "have to". There are many possible inputs I could've added, many different elements implemented. Its impossible to implement just everything! Implementing the baseline requirements to get the thing working was the idea behind this video. In level 2, Loki learned to maneuver around the officers regardless of whether the activation radius was an input or not.

@medievalcatguy6776 8 ай бұрын

You should have made jumping into the void a negative reward@@cozmouz

@kitkat2849-b3h 11 ай бұрын

i guess you could say hes _lowkey_ a fast learner

@cozmouz 11 ай бұрын

nice one!

@DuOpig 8 ай бұрын

Idk. 3.5 million tries... Wonder how many controllers Loki broke trying to beat these levels

@Incog-0000 3 ай бұрын

Yup.

@flamingfox2984 11 ай бұрын

Next Video: “AI Learns Tax Evasion”

@SovietComrade6675 7 ай бұрын

yes

@Benw8888 11 ай бұрын

The problem with videos like this is that the AI can overfit to a specific map. You need to have some sort of shuffled dataset or randomly generated sequence of maps/coin arrangements for proper training.

@AceTheAro7 11 ай бұрын

I remember seeing a video train a track mania bot to solve a maze and they tried to fix it by having the car spawn randomly in the maze once that issue became the bottleneck

@o1-preview 11 ай бұрын

@@AceTheAro7 sounds expensive

@arcadesmasher 10 ай бұрын

Yea, this videos coding seems good enough to mostly prevent that though. Notice how the AI doesn’t always take the same path in different levels. I have definitely seen videos where this does happen though.

@forcelightningcable9639 11 ай бұрын

I like how Loki figured that it’s better to die than get caught by the pigs

@generaldelasmontanas2699 11 ай бұрын

he knew he was going to drop the soap

@forcelightningcable9639 11 ай бұрын

@@generaldelasmontanas2699 lmaoo

@moodlethenoodle 10 ай бұрын

Are you an anarchist

@moodlethenoodle 10 ай бұрын

@@undefinedchannel9916 Why? Without cops we'd have anarchy... so they muct be an anarchist?

@cjharrisson7522 10 ай бұрын

@@undefinedchannel9916pigs aren’t also known as cops. Cops are known as pigs.

@tach5884 11 ай бұрын

"That's her officers! That's the woman who programmed me for evil!" - Bender

@grandpretredesalpagas4665 11 ай бұрын

if the police start using robot dogs, we will start making robots cat robber

@cozmouz 11 ай бұрын

that would be Purr-fect

@redstocat5455 11 ай бұрын

That would be funny, cats are perfect for this

@thebooknerd5223 10 ай бұрын

A cat burglar, if you will.

@uncommonusername 10 ай бұрын

Theyre really sneaky, i think itd work imo. I'm a cat owner so I'd know.

@Strong256 7 ай бұрын

@@thebooknerd5223 Nami 😂

@Pasu4 11 ай бұрын

11:17 I think this happens because the AI only learned to effectively collect coins in the one direction, or gets confused by there being no police to dodge. AI is not that good at changing its perspective, since it has no real correlation between x, y and z. It doesn't know that they are just sides of the same coin, it only knows what outputs will change them individually. I saw a video of a table tennis AI that worked great for one player, but once they spun it around for the second player, it just fell over, because it only learned to stay upright while looking in one direction. Their solution was to rotate the coordinate system with it (rotating a parent object and using local coordinates probably). I think something similar may work here too, by changing Loki's sensors to be relative to his orientation, thereby eliminating the need to correlate different axes (unless you are already doing that).

@cozmouz 11 ай бұрын

Amazing explanation!

@silasnebulous4533 11 ай бұрын

I think it was just hasty getting to the coins above and didn't bother moving a little bit to get the coins leading up to it. It's shown to be able to turn to pick up coins before, so idk.

@draketurtle4169 11 ай бұрын

@@silasnebulous4533yeah seemed more like it detected more coins further ahead and therefore decided to ignore one’s immediately ahead for a bigger long term pay off (also cause they were away from the bad thing)

@redstonewolfx 11 ай бұрын

You might want to add a very small negative reward that accumulated over time, and/or a time limit, so Loki is encouraged to pick up the pace. He might also be less scared of the police, as the penalty for meandering aimlessly will eventually be worse than just running for it.

@Golden_Projects 11 ай бұрын

if you increased the reward from coins by dividing it by the amount of time from the last coin (less time more reward) you'd also make it so that he doesn't skip nearby coins to often, but it would also result in more speedrun-ish behavior

@cozmouz 11 ай бұрын

That's a great recommendation. More complex reward functions is something I will implement in the coming videos. Stay tuned!

@Strong256 7 ай бұрын

wow nice idea i hope i remember this too iin the future

@3emad.305 11 ай бұрын

Programmers already teach AI how to do crimes. Perfect for our Sci fi apocalyptic fantasy doom.

@Danjor0613 8 ай бұрын

In Mass Effect 1 someone created an illegal AI to steal money from gambling machines. When caught it self destructed to try and kill you along with itself rather than be shutdown.

@Digby8 11 ай бұрын

Maybe we shouldn't be teaching AI to break the law, maybe that's just me.

@beywheelzhater8930 11 ай бұрын

Yes this definitely applies to actual irl crime. I love it when police digitize themselves to charge at rectangles

@karetsin8700 11 ай бұрын

just maybeeeee

@TurbopropPuppy 11 ай бұрын

what do you mean, this AI is based?

@joeljude9180 11 ай бұрын

It's just flavor

@Dr3wskee14 10 ай бұрын

@JulieGallows are you stupid?, it's In the name of the video😂

@kitsunemusicisfire 9 ай бұрын

Loki isn't evil he's just a silly guy

@a.j.outlaster1222 11 ай бұрын

This is cool, But wouldn't the A.I. learn more effectively if the levels scale slower in difficulty and repeated the same sort of scenarios? Idk, This just seemed to scale at a rate that's fine for players but maybe staggering for an A.I.

@cozmouz 11 ай бұрын

Sir, you are absolutely right. Gradual scaling in difficulty would've resulted in more thorough learning.

@a.j.outlaster1222 11 ай бұрын

@@cozmouz Btw, What were the inputs? I mean, Were the cops and rewards registered separately from the walls? Or was there like a separate input that changed based on what it hit?

@cozmouz 11 ай бұрын

The 360 Degree Ray-cast is the main input source for the AI. Its like lasers being fired in all directions and waiting to hit something. If the AI hits a cop and gets negative reward, overtime, whenever the raycast beams hit anything tagged "police" , the AI will try to avoid that area. Raycast hits a wall, this is something I can stand and jump over! Thats how it works basically.

@thomasb6434 11 ай бұрын

@@cozmouzSo, the AI know in which direction are "things", but not at which distance ?

@cozmouz 11 ай бұрын

It knows direction as well as distance.

@corruptedmineral 11 ай бұрын

damn i can finally create army of ai thief with ability to escape on its own

@SolomonFinney 10 ай бұрын

It’s so good that you finally have recognition for this.

@cozmouz 10 ай бұрын

Ayyy I remember ur comment from the basim video, thanks a lot man.

@SolomonFinney 10 ай бұрын

Ye@@cozmouz

@InksAutism 11 ай бұрын

He kept getting caught when teasing the cops

@L-iv6lx 11 ай бұрын

"started to associate negatives with something tagged as police" it started using twitter

@adriantcullysover4640 11 ай бұрын

Although a 6 year old (ie my younger siblings) can finish these levels with with wayyyyyy less tries, this is still so impressive from something with no conciousness.

@xxxD3FC0N_1xxx 11 ай бұрын

that’s the point it’s a learning AI it’s not supposed to get it right the first time eventually it would be better than the best human player

@ethantasti2521 11 ай бұрын

@@xxxD3FC0N_1xxx actually no it wouldn't be better than a human. Change the map or enemy slightly and the AI would crumble. considering it only went for safe route a human would be faster and would take less time completing this

@eldritchcupcakes3195 11 ай бұрын

@@xxxD3FC0N_1xxxactually no! If you changed anything major about say, the map at 8:50 Loki would freak out and take millions of tries to figure it out again. It could get very good at this specific map but nothing else. It can’t figure out how to apply the “knowledge” from this to a changed terrain. It just eventually figured out “these motions get me positive rewards and avoid the negative”.

@Данилтычкрейзи 10 ай бұрын

@@eldritchcupcakes3195that's called overfitting, usually AI is trained on a lot of different data to prevent this

@sirk603 10 ай бұрын

@@ethantasti2521if you trained it on a wide variety of different maps, it would probbaly become much better than any person

@FlaiseSaffron 11 ай бұрын

In this video: programmer explains criminal psychology without realizing it.

@Commenter_101 11 ай бұрын

I love how patient you are 😊

@cozmouz 11 ай бұрын

Thanks !

@JDRed117 11 ай бұрын

1:30 ULTIMATE SPASM GO

@davidaugustofc2574 11 ай бұрын

Loki is Low-key one of the AIs of all time

@Recodetfort0 10 ай бұрын

I love how at just at the end of his journey he doesn't just collect all the reward, he also jumps around which looks like he really does have consciousness and is happy to see so much reward! It looks really nice and interesting)

@YuriGen2423 11 ай бұрын

Very good work! also you could include what the AI is receiving as input too

@aurnok1237 11 ай бұрын

When Loki moves randomly he kinda looks like a speedrunner lol

@matt.stevick 3 ай бұрын

I have experience training an early AI / LLM (me along with many other associates) at a large wealth management firm starting in 2009 and leaving at 2015. It was not at all a primary focus or task we had to do, but very simply … we did it voluntarily when we had time. This video is a very good explanation to people new to AI on how it works in general, for such a complex area of study.

@pietrobarbosa2464 5 ай бұрын

Ok so basically training an AI is like beating ur kid if it doesnt bring you beer and giving it candy if it does

@mutantdog 10 ай бұрын

next video: i reprogrammed elons self driving cars to outrun police cars

@FriarJoe66 8 ай бұрын

I think there should be a negative reward for coming into close proximity of a coin and then leaving proximity without collecting it.

@goobinroblox12 10 ай бұрын

bros learning ai to evade taxes 💀

@Insanity-m3c 7 ай бұрын

Teaching?

@zellenny1784 11 ай бұрын

Cool Video! Would love more of an end goal to it though..

@Evaboii11 11 ай бұрын

200th subscriber here you have earned a sub keep up the work bro 💯💪🙏

@cozmouz 11 ай бұрын

You are a Legend, Thanks a Lot 😎👊

@CrazedKen 11 ай бұрын

Well Well Well Nice! I just came have after a day and he’s at 1k, keep it up!👍

@patchinator6 Жыл бұрын

Earned my sub! Keep it up!

@cozmouz 11 ай бұрын

Sir, you are a legend. Thanks a ton.

@someangrypotato7197 10 ай бұрын

This was really cool to watch! I wonder how it would go if you made a city for Loki to run from police in. It’d be interesting seeing if Loki develops an optimal route to go.

@davidgeinoz2277 11 ай бұрын

Really like this one keep going 👍

@FireyDeath4 6 ай бұрын

Infantile robber tries to steal drugs in broad daylight while avoiding police officers: visualised I wonder if you can make it solely focus on the positive rewards of the coins and learn that obstacles are naturally detrimental because of the way they prevent the collection of coins

@spacer7205 8 ай бұрын

cool project! i think the jumping behaviour observed is a result of the raycasts being centred on the character's body; the AI is initiating a jump because it causes the rays to jump with it, which means they don't hit the chasers, so the AI associates jumping with that positive outcome. might be worth only associating being caught with a negative reward for more distinct emergent behaviour

@plasmaflare5217 10 ай бұрын

Reinforcement learning is such a cool concept. It just learns things by trial and error, just like people do.

@henrycrystal9740 4 ай бұрын

next video: " AI cops learn """pattern recognition""" "

@NunyaBizniz-om6xf 11 ай бұрын

I know nothing about AI but i think scaling the reward function of coins dependent on the closeness of police could encourage riskier behaviour, as long as contacting the police shortly after would remove those bonuses. or it could be fun to see what the ai does without that safeguard

@elkapalio 11 ай бұрын

yooo ur content is incredible!, new sub

@cozmouz 11 ай бұрын

ayyy thanks!

@themarkerchannel3170 10 ай бұрын

This is so underrated, great job! Also, if possible, can you do a tutorial on how to make these?

@binguri-e7l 10 ай бұрын

next video: AI Learns To Evade Taxes

@PierreLucSex 5 ай бұрын

This is the easy mode. The police just rewards you less

@Somebody0960 11 ай бұрын

I’m going to use this knowledge to get away with violent crimes

@ninrts 10 ай бұрын

i love how at 6:49 it almost looks like he's taunting the cop lmao

@Mangtoes 3 ай бұрын

What do you use to make these videos?

@FurryNonsense 9 ай бұрын

The music is too loud

@GunSpyEnthusiast 11 ай бұрын

this video was recommended to me, most likely by an algorithm. I am now scared.

@FileXocelot 11 ай бұрын

There should have been a boing sound when he jumps

@cozmouz 11 ай бұрын

Man I had this exact intrusive thought when I was programming Loki LOL, but it would've made the audio chaotic so I ditched the idea.

@burridi 5 ай бұрын

Is that the cat ninja music???? I loved that flash game so much as a child

@aaronpark686 10 ай бұрын

He is so happy at the end lol

@cludration 9 ай бұрын

bro puts so much effort for his video holy sh*t

@simply_oat755 10 ай бұрын

shoulda added a reward for going closer towards the coins and for going faster (and negative reward for going slower aka jumping around)

@oliverthesupercoolbully 5 ай бұрын

welcome to our new friend loki! :DDDDDDDD

@marmaje69 10 ай бұрын

Is Loki like… relearning everything each level? Or do you keep his knowledge for the next level. Cuz I saw more AI’s that do that more effectively.

@Kiwi_Inventor 10 ай бұрын

do part two but loki has to learn how to use legs

@Likemea 11 ай бұрын

was it possible to place invisible barriers?

@cozmouz 11 ай бұрын

Yes it was possible.

@kuroyami9757 4 ай бұрын

5:35 So like people

@jakub_noj 10 ай бұрын

Loki is goated fr

@SADmemer. 11 ай бұрын

Little bro breaking some ankle’s

@moodl3d856 11 ай бұрын

"ai will be used to help people!" ai:

@monkeysarestinky3106 10 ай бұрын

You should teach the cops to catch the robber now

@lawden210 11 ай бұрын

Lupinranger Vs Patranger lookin different

@cassandranoice1563 9 ай бұрын

Im glad my kids didnt spam jump when they were infants. Yeeting themselves off the edge of the world tracks though.

@crabbydisk7658 11 ай бұрын

If you are gonna make a sequel, you should try proceduraly generating the map to make the ai more general purpose.

@duck0a 6 ай бұрын

What coding program and language did you use for this, and did you use any libraries or plugins? Good video :)

@dazley8021 11 ай бұрын

Wouldn't it be funny if "loki" comes to the conclusion that stealing is not the glorious purpose he's looking for? 😉

@cozmouz 11 ай бұрын

I am Loki of Asgard, and I am burdened with devious purpose!

@SolomonFinney 11 ай бұрын

How long do you run your computer for so that the AI can learn? Or does it learn even when your computer is turned off?

@cozmouz 11 ай бұрын

Pc stays on overnight usually. Training can range anywhere between 3 hours to 24 Hours

@SolomonFinney 11 ай бұрын

Wow. Thank you.@@cozmouz

@cozmouz 11 ай бұрын

Your Welcome !

@PaintedCryptid 9 ай бұрын

The landing full of coins... truly the best ending to this video

@Burgers21 11 ай бұрын

Next up: AI learns to drive my car to the bank

@Nikko_0905 8 ай бұрын

The way the AI seemed to celebrate at the end was cute :)

@alussk 11 ай бұрын

Really cool!

@kyleyoung2464 10 ай бұрын

Now optimize him

@Benw8888 11 ай бұрын

This video was incomplete without you explaining what the architexture, input/output structure, and training algorithm was. We don't just want to see cool art, we want to know that the AI is good.

@beywheelzhater8930 11 ай бұрын

I dunno, 2.5k people seem to like the video as the time of writing

@Benw8888 11 ай бұрын

@@beywheelzhater8930 just because people liked the video doesn't mean they wouldn't like the vid more if it improved

@Benw8888 11 ай бұрын

@@beywheelzhater8930 case in point, many other commenters are asking for details on the input structure, reward function, etc.

@SpeedOfSol 11 ай бұрын

Thanks man my wheelchair bound sister didn’t stand a chance from the tactics displayed here

@orkhanabdullayev-sr5xe 10 ай бұрын

AI LEARNS TAX EVASION! (real1!1!1!!)

@DarkLynel 8 ай бұрын

What if the police is also ai

@AWanderingSwordsman 11 ай бұрын

So how can we tweak the AI to learn the same thing but in far fewer steps? Like, is there a way to make the AI use each piece of data truly optimally and develop a successful strategy in just a few steps rather than millions the way a human player could?

@cozmouz 11 ай бұрын

Well, either someone develops a next gen reinforcement learning algorithm or we increase the number of parallel environment in training (which requires crazy computing power)

@dealtf4lls574 10 ай бұрын

That ai should be completely black, for more realism

@tofuissolid 10 ай бұрын

💀

@adriantcullysover4640 11 ай бұрын

It seemed soo happy at the end. Lol.

@rayyannoor129 11 ай бұрын

If AI starts hacking banks, this guy is gonna be held for accusations

@cozmouz 11 ай бұрын

My timbers are shivering

@OreoDoesStuff 11 ай бұрын

instructions unclear, i got caught stealing orphans and they sent me to the shadow realm irl

@cozmouz 11 ай бұрын

bro

@OreoDoesStuff 11 ай бұрын

i know right, so strange... they used to send me to the 4th dimension but yesterday they sent me there

@add7231 11 ай бұрын

i feel like i just went through all the stages of parenthood with loki

@guillermomazzari8320 25 күн бұрын

Watching this made me think, that it is exactly like evolution, for us, it might not seem that long, but for Loki, it took countless generations to achieve victory, this can be used as proof that we are indeed in a simulated universe and this is exactly how evolution works, our genes just do reinforced learning.

@russellvanwagner8864 10 ай бұрын

AI like this cpuld benefit from getting points for surviving for long periods of time. That should, in theory, cause the character to do whatever allows it to prolong the run as much as possible, such as avoiding police.

@cozmouz 10 ай бұрын

Ironically, it does exactly the opposite with that reward mechanism. The AI simply stays away from the officers and keeps jumping around in safe areas to survive till the end of round, getting rewards for doing nothing essentially!

@guillermomazzari8320 25 күн бұрын

Could you share a video of how you did this? The code, the design, etc..

@A-Clear_View 10 ай бұрын

magestic

@kittyanduni 11 ай бұрын

Nice vid:D

@kylebarvel 10 ай бұрын

When pillars decide to run away from police pillars

@sjoerdgeraeds6757 11 ай бұрын

Which Deep RL algorithm did you use? Rainbow? Or something like DDPG, PPO? Awesome video man!

@cozmouz 11 ай бұрын

Thanks a lot, its ppo.

@Fallout3131 8 ай бұрын

Thank you!!

@girikkhullar4072 9 ай бұрын

Have you thought of putting this set up in a maze sort of a world? We can probably get a good find, steal, run, hide sort of a chase from it.

@NishiTheRat 6 ай бұрын

is there anyway to apply a negative reward for going out of bounds? to teach it to steer within the lines quicker and easier?

@Anonymousfitnessfanatic 11 ай бұрын

Fun fact the most dangerous thing in our universe is artificial intelligence and also the specific thing that has the highest percentage chance to end the world out of all other possibilities

@anshsgh6092 10 ай бұрын

ai takes over? no problem! 💦💦💦💦💦

@Woodfiend 5 ай бұрын

If you want to sort of get around the AI opting to just not do anything, you can give it an absolute reward at the end of the goal, such as a giant sack of coins, that blows the other rewards, both positive and negative, out of the water. You need to make the AI aware of such a reward beforehand though.