so basically the equivalent of putting a baby in a timeloop and teaching it to steal.... i approve
@stoobidthing11 ай бұрын
"1 hour here is 7 years on earth"
@rennoc647811 ай бұрын
And pumping dopamine into it every time it succeeds
@ilyysm11 ай бұрын
@@stoobidthing 😭
@SirNob10 ай бұрын
Why is there no comment here
@rennoc647810 ай бұрын
@@SirNob theres 3, now 4
@anador187711 ай бұрын
I think there should have been a negative reward for jumping off too. It was clearly a preferable strategy to risking touching police officers, especially before discovering that coins give rewards or when the AI thought there were no way to get coins.
@cozmouz11 ай бұрын
Letting you in on a lil secret. I did code a negative reward for falling off the map, or lets say atleast I tried to 🙂. However after 4 days of numerous repeated training sessions, for the life of me, the implementation wasn't working. I knew things would work just fine without the falling penalty at the expense of increased training time so dats the way we went.
@toasterhavingabath698011 ай бұрын
Walls.
@Hunter5758811 ай бұрын
@@toasterhavingabath6980 Get this man a job at NASA
@winnerwannabe986811 ай бұрын
@@cozmouzwalls tagged as police that kill on contact?
@skull_lee11 ай бұрын
@@toasterhavingabath6980hog rida
@fureyXD11 ай бұрын
You should have added a negative reward for getting seen by the police, that way Loki will sneak around them instrad of speedrunning trough them, maybe a level where police couldn't be outrunned could have helped
@cozmouz11 ай бұрын
If I added the police officers activation radius as input for the AI, its plausible the AI would've learned to sneak around them by not entering the radius. Thanks for the idea!
@TheEggDev11 ай бұрын
Negative reward might’ve been too much, as it could lead to a local maximum where the ai thinks going through police only leads to lower score, unaware that if it sacrificed a little score it could reach more coins. This could lead the ai to get stuck not going pass the first police officer, if not getting seen was impossible
@Mcervera10 ай бұрын
@@cozmouzbut what about levels where you have to, like level 2
@cozmouz10 ай бұрын
@@Mcervera In machine learning, after a series of experimentations, one realizes that there is no "have to". There are many possible inputs I could've added, many different elements implemented. Its impossible to implement just everything! Implementing the baseline requirements to get the thing working was the idea behind this video. In level 2, Loki learned to maneuver around the officers regardless of whether the activation radius was an input or not.
@medievalcatguy67768 ай бұрын
You should have made jumping into the void a negative reward@@cozmouz
@kitkat2849-b3h11 ай бұрын
i guess you could say hes _lowkey_ a fast learner
@cozmouz11 ай бұрын
nice one!
@DuOpig8 ай бұрын
Idk. 3.5 million tries... Wonder how many controllers Loki broke trying to beat these levels
@Incog-00003 ай бұрын
Yup.
@flamingfox298411 ай бұрын
Next Video: “AI Learns Tax Evasion”
@SovietComrade66757 ай бұрын
yes
@Benw888811 ай бұрын
The problem with videos like this is that the AI can overfit to a specific map. You need to have some sort of shuffled dataset or randomly generated sequence of maps/coin arrangements for proper training.
@AceTheAro711 ай бұрын
I remember seeing a video train a track mania bot to solve a maze and they tried to fix it by having the car spawn randomly in the maze once that issue became the bottleneck
@o1-preview11 ай бұрын
@@AceTheAro7 sounds expensive
@arcadesmasher10 ай бұрын
Yea, this videos coding seems good enough to mostly prevent that though. Notice how the AI doesn’t always take the same path in different levels. I have definitely seen videos where this does happen though.
@forcelightningcable963911 ай бұрын
I like how Loki figured that it’s better to die than get caught by the pigs
@generaldelasmontanas269911 ай бұрын
he knew he was going to drop the soap
@forcelightningcable963911 ай бұрын
@@generaldelasmontanas2699 lmaoo
@moodlethenoodle10 ай бұрын
Are you an anarchist
@moodlethenoodle10 ай бұрын
@@undefinedchannel9916 Why? Without cops we'd have anarchy... so they muct be an anarchist?
@cjharrisson752210 ай бұрын
@@undefinedchannel9916pigs aren’t also known as cops. Cops are known as pigs.
@tach588411 ай бұрын
"That's her officers! That's the woman who programmed me for evil!" - Bender
@grandpretredesalpagas466511 ай бұрын
if the police start using robot dogs, we will start making robots cat robber
@cozmouz11 ай бұрын
that would be Purr-fect
@redstocat545511 ай бұрын
That would be funny, cats are perfect for this
@thebooknerd522310 ай бұрын
A cat burglar, if you will.
@uncommonusername10 ай бұрын
Theyre really sneaky, i think itd work imo. I'm a cat owner so I'd know.
@Strong2567 ай бұрын
@@thebooknerd5223 Nami 😂
@Pasu411 ай бұрын
11:17 I think this happens because the AI only learned to effectively collect coins in the one direction, or gets confused by there being no police to dodge. AI is not that good at changing its perspective, since it has no real correlation between x, y and z. It doesn't know that they are just sides of the same coin, it only knows what outputs will change them individually. I saw a video of a table tennis AI that worked great for one player, but once they spun it around for the second player, it just fell over, because it only learned to stay upright while looking in one direction. Their solution was to rotate the coordinate system with it (rotating a parent object and using local coordinates probably). I think something similar may work here too, by changing Loki's sensors to be relative to his orientation, thereby eliminating the need to correlate different axes (unless you are already doing that).
@cozmouz11 ай бұрын
Amazing explanation!
@silasnebulous453311 ай бұрын
I think it was just hasty getting to the coins above and didn't bother moving a little bit to get the coins leading up to it. It's shown to be able to turn to pick up coins before, so idk.
@draketurtle416911 ай бұрын
@@silasnebulous4533yeah seemed more like it detected more coins further ahead and therefore decided to ignore one’s immediately ahead for a bigger long term pay off (also cause they were away from the bad thing)
@redstonewolfx11 ай бұрын
You might want to add a very small negative reward that accumulated over time, and/or a time limit, so Loki is encouraged to pick up the pace. He might also be less scared of the police, as the penalty for meandering aimlessly will eventually be worse than just running for it.
@Golden_Projects11 ай бұрын
if you increased the reward from coins by dividing it by the amount of time from the last coin (less time more reward) you'd also make it so that he doesn't skip nearby coins to often, but it would also result in more speedrun-ish behavior
@cozmouz11 ай бұрын
That's a great recommendation. More complex reward functions is something I will implement in the coming videos. Stay tuned!
@Strong2567 ай бұрын
wow nice idea i hope i remember this too iin the future
@3emad.30511 ай бұрын
Programmers already teach AI how to do crimes. Perfect for our Sci fi apocalyptic fantasy doom.
@Danjor06138 ай бұрын
In Mass Effect 1 someone created an illegal AI to steal money from gambling machines. When caught it self destructed to try and kill you along with itself rather than be shutdown.
@Digby811 ай бұрын
Maybe we shouldn't be teaching AI to break the law, maybe that's just me.
@beywheelzhater893011 ай бұрын
Yes this definitely applies to actual irl crime. I love it when police digitize themselves to charge at rectangles
@karetsin870011 ай бұрын
just maybeeeee
@TurbopropPuppy11 ай бұрын
what do you mean, this AI is based?
@joeljude918011 ай бұрын
It's just flavor
@Dr3wskee1410 ай бұрын
@JulieGallows are you stupid?, it's In the name of the video😂
@kitsunemusicisfire9 ай бұрын
Loki isn't evil he's just a silly guy
@a.j.outlaster122211 ай бұрын
This is cool, But wouldn't the A.I. learn more effectively if the levels scale slower in difficulty and repeated the same sort of scenarios? Idk, This just seemed to scale at a rate that's fine for players but maybe staggering for an A.I.
@cozmouz11 ай бұрын
Sir, you are absolutely right. Gradual scaling in difficulty would've resulted in more thorough learning.
@a.j.outlaster122211 ай бұрын
@@cozmouz Btw, What were the inputs? I mean, Were the cops and rewards registered separately from the walls? Or was there like a separate input that changed based on what it hit?
@cozmouz11 ай бұрын
The 360 Degree Ray-cast is the main input source for the AI. Its like lasers being fired in all directions and waiting to hit something. If the AI hits a cop and gets negative reward, overtime, whenever the raycast beams hit anything tagged "police" , the AI will try to avoid that area. Raycast hits a wall, this is something I can stand and jump over! Thats how it works basically.
@thomasb643411 ай бұрын
@@cozmouzSo, the AI know in which direction are "things", but not at which distance ?
@cozmouz11 ай бұрын
It knows direction as well as distance.
@corruptedmineral11 ай бұрын
damn i can finally create army of ai thief with ability to escape on its own
@SolomonFinney10 ай бұрын
It’s so good that you finally have recognition for this.
@cozmouz10 ай бұрын
Ayyy I remember ur comment from the basim video, thanks a lot man.
@SolomonFinney10 ай бұрын
Ye@@cozmouz
@InksAutism11 ай бұрын
He kept getting caught when teasing the cops
@L-iv6lx11 ай бұрын
"started to associate negatives with something tagged as police" it started using twitter
@adriantcullysover464011 ай бұрын
Although a 6 year old (ie my younger siblings) can finish these levels with with wayyyyyy less tries, this is still so impressive from something with no conciousness.
@xxxD3FC0N_1xxx11 ай бұрын
that’s the point it’s a learning AI it’s not supposed to get it right the first time eventually it would be better than the best human player
@ethantasti252111 ай бұрын
@@xxxD3FC0N_1xxx actually no it wouldn't be better than a human. Change the map or enemy slightly and the AI would crumble. considering it only went for safe route a human would be faster and would take less time completing this
@eldritchcupcakes319511 ай бұрын
@@xxxD3FC0N_1xxxactually no! If you changed anything major about say, the map at 8:50 Loki would freak out and take millions of tries to figure it out again. It could get very good at this specific map but nothing else. It can’t figure out how to apply the “knowledge” from this to a changed terrain. It just eventually figured out “these motions get me positive rewards and avoid the negative”.
@Данилтычкрейзи10 ай бұрын
@@eldritchcupcakes3195that's called overfitting, usually AI is trained on a lot of different data to prevent this
@sirk60310 ай бұрын
@@ethantasti2521if you trained it on a wide variety of different maps, it would probbaly become much better than any person
@FlaiseSaffron11 ай бұрын
In this video: programmer explains criminal psychology without realizing it.
@Commenter_10111 ай бұрын
I love how patient you are 😊
@cozmouz11 ай бұрын
Thanks !
@JDRed11711 ай бұрын
1:30 ULTIMATE SPASM GO
@davidaugustofc257411 ай бұрын
Loki is Low-key one of the AIs of all time
@Recodetfort010 ай бұрын
I love how at just at the end of his journey he doesn't just collect all the reward, he also jumps around which looks like he really does have consciousness and is happy to see so much reward! It looks really nice and interesting)
@YuriGen242311 ай бұрын
Very good work! also you could include what the AI is receiving as input too
@aurnok123711 ай бұрын
When Loki moves randomly he kinda looks like a speedrunner lol
@matt.stevick3 ай бұрын
I have experience training an early AI / LLM (me along with many other associates) at a large wealth management firm starting in 2009 and leaving at 2015. It was not at all a primary focus or task we had to do, but very simply … we did it voluntarily when we had time. This video is a very good explanation to people new to AI on how it works in general, for such a complex area of study.
@pietrobarbosa24645 ай бұрын
Ok so basically training an AI is like beating ur kid if it doesnt bring you beer and giving it candy if it does
@mutantdog10 ай бұрын
next video: i reprogrammed elons self driving cars to outrun police cars
@FriarJoe668 ай бұрын
I think there should be a negative reward for coming into close proximity of a coin and then leaving proximity without collecting it.
@goobinroblox1210 ай бұрын
bros learning ai to evade taxes 💀
@Insanity-m3c7 ай бұрын
Teaching?
@zellenny178411 ай бұрын
Cool Video! Would love more of an end goal to it though..
@Evaboii1111 ай бұрын
200th subscriber here you have earned a sub keep up the work bro 💯💪🙏
@cozmouz11 ай бұрын
You are a Legend, Thanks a Lot 😎👊
@CrazedKen11 ай бұрын
Well Well Well Nice! I just came have after a day and he’s at 1k, keep it up!👍
@patchinator6 Жыл бұрын
Earned my sub! Keep it up!
@cozmouz11 ай бұрын
Sir, you are a legend. Thanks a ton.
@someangrypotato719710 ай бұрын
This was really cool to watch! I wonder how it would go if you made a city for Loki to run from police in. It’d be interesting seeing if Loki develops an optimal route to go.
@davidgeinoz227711 ай бұрын
Really like this one keep going 👍
@FireyDeath46 ай бұрын
Infantile robber tries to steal drugs in broad daylight while avoiding police officers: visualised I wonder if you can make it solely focus on the positive rewards of the coins and learn that obstacles are naturally detrimental because of the way they prevent the collection of coins
@spacer72058 ай бұрын
cool project! i think the jumping behaviour observed is a result of the raycasts being centred on the character's body; the AI is initiating a jump because it causes the rays to jump with it, which means they don't hit the chasers, so the AI associates jumping with that positive outcome. might be worth only associating being caught with a negative reward for more distinct emergent behaviour
@plasmaflare521710 ай бұрын
Reinforcement learning is such a cool concept. It just learns things by trial and error, just like people do.
@henrycrystal97404 ай бұрын
next video: " AI cops learn """pattern recognition""" "
@NunyaBizniz-om6xf11 ай бұрын
I know nothing about AI but i think scaling the reward function of coins dependent on the closeness of police could encourage riskier behaviour, as long as contacting the police shortly after would remove those bonuses. or it could be fun to see what the ai does without that safeguard
@elkapalio11 ай бұрын
yooo ur content is incredible!, new sub
@cozmouz11 ай бұрын
ayyy thanks!
@themarkerchannel317010 ай бұрын
This is so underrated, great job! Also, if possible, can you do a tutorial on how to make these?
@binguri-e7l10 ай бұрын
next video: AI Learns To Evade Taxes
@PierreLucSex5 ай бұрын
This is the easy mode. The police just rewards you less
@Somebody096011 ай бұрын
I’m going to use this knowledge to get away with violent crimes
@ninrts10 ай бұрын
i love how at 6:49 it almost looks like he's taunting the cop lmao
@Mangtoes3 ай бұрын
What do you use to make these videos?
@FurryNonsense9 ай бұрын
The music is too loud
@GunSpyEnthusiast11 ай бұрын
this video was recommended to me, most likely by an algorithm. I am now scared.
@FileXocelot11 ай бұрын
There should have been a boing sound when he jumps
@cozmouz11 ай бұрын
Man I had this exact intrusive thought when I was programming Loki LOL, but it would've made the audio chaotic so I ditched the idea.
@burridi5 ай бұрын
Is that the cat ninja music???? I loved that flash game so much as a child
@aaronpark68610 ай бұрын
He is so happy at the end lol
@cludration9 ай бұрын
bro puts so much effort for his video holy sh*t
@simply_oat75510 ай бұрын
shoulda added a reward for going closer towards the coins and for going faster (and negative reward for going slower aka jumping around)
@oliverthesupercoolbully5 ай бұрын
welcome to our new friend loki! :DDDDDDDD
@marmaje6910 ай бұрын
Is Loki like… relearning everything each level? Or do you keep his knowledge for the next level. Cuz I saw more AI’s that do that more effectively.
@Kiwi_Inventor10 ай бұрын
do part two but loki has to learn how to use legs
@Likemea11 ай бұрын
was it possible to place invisible barriers?
@cozmouz11 ай бұрын
Yes it was possible.
@kuroyami97574 ай бұрын
5:35 So like people
@jakub_noj10 ай бұрын
Loki is goated fr
@SADmemer.11 ай бұрын
Little bro breaking some ankle’s
@moodl3d85611 ай бұрын
"ai will be used to help people!" ai:
@monkeysarestinky310610 ай бұрын
You should teach the cops to catch the robber now
@lawden21011 ай бұрын
Lupinranger Vs Patranger lookin different
@cassandranoice15639 ай бұрын
Im glad my kids didnt spam jump when they were infants. Yeeting themselves off the edge of the world tracks though.
@crabbydisk765811 ай бұрын
If you are gonna make a sequel, you should try proceduraly generating the map to make the ai more general purpose.
@duck0a6 ай бұрын
What coding program and language did you use for this, and did you use any libraries or plugins? Good video :)
@dazley802111 ай бұрын
Wouldn't it be funny if "loki" comes to the conclusion that stealing is not the glorious purpose he's looking for? 😉
@cozmouz11 ай бұрын
I am Loki of Asgard, and I am burdened with devious purpose!
@SolomonFinney11 ай бұрын
How long do you run your computer for so that the AI can learn? Or does it learn even when your computer is turned off?
@cozmouz11 ай бұрын
Pc stays on overnight usually. Training can range anywhere between 3 hours to 24 Hours
@SolomonFinney11 ай бұрын
Wow. Thank you.@@cozmouz
@cozmouz11 ай бұрын
Your Welcome !
@PaintedCryptid9 ай бұрын
The landing full of coins... truly the best ending to this video
@Burgers2111 ай бұрын
Next up: AI learns to drive my car to the bank
@Nikko_09058 ай бұрын
The way the AI seemed to celebrate at the end was cute :)
@alussk11 ай бұрын
Really cool!
@kyleyoung246410 ай бұрын
Now optimize him
@Benw888811 ай бұрын
This video was incomplete without you explaining what the architexture, input/output structure, and training algorithm was. We don't just want to see cool art, we want to know that the AI is good.
@beywheelzhater893011 ай бұрын
I dunno, 2.5k people seem to like the video as the time of writing
@Benw888811 ай бұрын
@@beywheelzhater8930 just because people liked the video doesn't mean they wouldn't like the vid more if it improved
@Benw888811 ай бұрын
@@beywheelzhater8930 case in point, many other commenters are asking for details on the input structure, reward function, etc.
@SpeedOfSol11 ай бұрын
Thanks man my wheelchair bound sister didn’t stand a chance from the tactics displayed here
@orkhanabdullayev-sr5xe10 ай бұрын
AI LEARNS TAX EVASION! (real1!1!1!!)
@DarkLynel8 ай бұрын
What if the police is also ai
@AWanderingSwordsman11 ай бұрын
So how can we tweak the AI to learn the same thing but in far fewer steps? Like, is there a way to make the AI use each piece of data truly optimally and develop a successful strategy in just a few steps rather than millions the way a human player could?
@cozmouz11 ай бұрын
Well, either someone develops a next gen reinforcement learning algorithm or we increase the number of parallel environment in training (which requires crazy computing power)
@dealtf4lls57410 ай бұрын
That ai should be completely black, for more realism
@tofuissolid10 ай бұрын
💀
@adriantcullysover464011 ай бұрын
It seemed soo happy at the end. Lol.
@rayyannoor12911 ай бұрын
If AI starts hacking banks, this guy is gonna be held for accusations
@cozmouz11 ай бұрын
My timbers are shivering
@OreoDoesStuff11 ай бұрын
instructions unclear, i got caught stealing orphans and they sent me to the shadow realm irl
@cozmouz11 ай бұрын
bro
@OreoDoesStuff11 ай бұрын
i know right, so strange... they used to send me to the 4th dimension but yesterday they sent me there
@add723111 ай бұрын
i feel like i just went through all the stages of parenthood with loki
@guillermomazzari832025 күн бұрын
Watching this made me think, that it is exactly like evolution, for us, it might not seem that long, but for Loki, it took countless generations to achieve victory, this can be used as proof that we are indeed in a simulated universe and this is exactly how evolution works, our genes just do reinforced learning.
@russellvanwagner886410 ай бұрын
AI like this cpuld benefit from getting points for surviving for long periods of time. That should, in theory, cause the character to do whatever allows it to prolong the run as much as possible, such as avoiding police.
@cozmouz10 ай бұрын
Ironically, it does exactly the opposite with that reward mechanism. The AI simply stays away from the officers and keeps jumping around in safe areas to survive till the end of round, getting rewards for doing nothing essentially!
@guillermomazzari832025 күн бұрын
Could you share a video of how you did this? The code, the design, etc..
@A-Clear_View10 ай бұрын
magestic
@kittyanduni11 ай бұрын
Nice vid:D
@kylebarvel10 ай бұрын
When pillars decide to run away from police pillars
@sjoerdgeraeds675711 ай бұрын
Which Deep RL algorithm did you use? Rainbow? Or something like DDPG, PPO? Awesome video man!
@cozmouz11 ай бұрын
Thanks a lot, its ppo.
@Fallout31318 ай бұрын
Thank you!!
@girikkhullar40729 ай бұрын
Have you thought of putting this set up in a maze sort of a world? We can probably get a good find, steal, run, hide sort of a chase from it.
@NishiTheRat6 ай бұрын
is there anyway to apply a negative reward for going out of bounds? to teach it to steer within the lines quicker and easier?
@Anonymousfitnessfanatic11 ай бұрын
Fun fact the most dangerous thing in our universe is artificial intelligence and also the specific thing that has the highest percentage chance to end the world out of all other possibilities
@anshsgh609210 ай бұрын
ai takes over? no problem! 💦💦💦💦💦
@Woodfiend5 ай бұрын
If you want to sort of get around the AI opting to just not do anything, you can give it an absolute reward at the end of the goal, such as a giant sack of coins, that blows the other rewards, both positive and negative, out of the water. You need to make the AI aware of such a reward beforehand though.