A. I. Learns to Play Starcraft 2 (Reinforcement Learning)

Рет қаралды 419,876

Күн бұрын

Пікірлер: 309

@SocalNewsOne Жыл бұрын

Thanks! Your tutorials were the first that worked for me. Biggest problem that I had was the directory path for the Starcraft maps.

@sentdex Жыл бұрын

Thank you for the super!

@serta5727 2 жыл бұрын

I have to say you make the most understandable learning materials Your website together to the videos. All the Code is there, the book, the playlists from scratch. Most professional educators can’t do this 🤗

@kailalueni3251 Жыл бұрын

I love you idea of drawing your own minimap! Thats a smart way to make more information available easily.

@awsamalmughrabi860 2 жыл бұрын

I like how in depth this video is, really enjoyed it!

@JohnJackKeane 2 жыл бұрын

I do not code or have the desire to code, but this video is beautiful. I enjoy StarCraft videos seeing people micromanage, but the thought and process that goes into creating a “program” to do the same thing is fascinating. The amount of work and work to obtain the knowledge that goes into the work is far underrated. I hope for you the best!

@Derrekito 2 жыл бұрын

Never before has a marketing ploy worked so well on me. I'm looking forward to receiving the hardcover version of the book!

@protoplmz 2 жыл бұрын

Hey! I love the update here. I followed the original series you put out. As a SC2 veteran I noticed deficiencies and deviated in a strong way halfway through. I setup separate models to handle the decision making for each aspect of the game. This makes it so it can make the decision to use its army separately from the decision of progressing tech (or not). I stopped around the time I couldn't figure out how to have it build its own strategies as I ended giving it a long set of possible actions and letting it pick and it felt too 'guided'. It was able to beat "Very Hard" 50% of the time vs random's 0%. Was my first exercise with ML. I got the chance to apply the concept it at work for something outside of my scope. Used both that and the SC2 project as demonstration in an interview and got a promotion out of it. This inspires me to try my hand at it again! EDIT: To handle army movement which you mentioned in the video, I chopped the maps up into a grid and gave it decisions to make where it could attack-move its army to any of these at will. 9 worked the best but you could make it much more granular. It used this to both attack and defend.

@fuba44 2 жыл бұрын

This was an interesting video. I will have a look at your example code for sure, wanna try to tinker a bit. Thanx for all your hard work.

@fuba44 2 жыл бұрын

Very interesting idea with a macro ai and a strategic ai, sort of working in tandem forming a symbiotic relationship of sorts.. could maybe even break that down even further, like on a per unit type basis... Tho i imagine the complexity explodes at that point.

@sentdex 2 жыл бұрын

We have very few unit types, at least here. For the full game, there are more, and even here I wasn't utilizing all the things a voidray can actually do, but certainly there are ways to have a "voidray" algo and a "probe" algo...etc. Definitely something to think on.

@hikari1690 2 жыл бұрын

This sounds like how deepfakes work. Have 2 ai models compete with each other to improve each other. So if the macro ai needs to try to defeat the strategy and vice versa

@prodj.mixapeofficial6431 2 жыл бұрын

I believe dota have 5 controllable units, with individual open ai per unit, and modified communication between the 5 to mimic real human gameplay.

@Dethek 2 жыл бұрын

When I was looking into the AI for starcraft i was thinking of the following: Overarching AI - makes final decision on what action to take Supported by: Strategy AI - use training from professional replays to assess based on what player has seen, what is their likely strategy, and then choose strategy based on that Macro AI Micro AI

@TheFalconerNZ 2 жыл бұрын

@@ccriztoff Get his book lol ;-)

@pognar 2 жыл бұрын

I have played starcraft for years and years, and I love this channel. This is going to be great.

@adityachawla7523 2 жыл бұрын

Here is an idea: You can use more then 3 channels to give spatial information to your network. No need to limit yourself by conventional idea of 3 channels! If you are worried about how to visualize this, just think of it as an extra map.

@kevintyrrell7409 2 жыл бұрын

14:49 That's some next-level Gateway placement.

@Neceros Жыл бұрын

This is great! I'd love to see something like this could compete in the arena

@faithful451 2 жыл бұрын

I'd love to see the next video in this series with dual macro and micro algorithms and improving the win percentage

@PathToPrestige 2 жыл бұрын

I'm replying very rarely to those kind of videos.. but hats off. Even though the project structure is messy, your genuine "realistic" practical approach was very enjoy some to watch.

@XmKevinChen 2 жыл бұрын

It’s a very interesting video about the ML + gaming. As a newbie to this AI world, it also gives lots incentives to continue learning.

@erics3596 2 жыл бұрын

Do you want Skynet? Because this is how you get Skynet :) (also great strats and explanation on how this works)

@adye88 2 жыл бұрын

This is freaking intense! also for the hunters problem: Why not make a "return to safe space" function for them when they detect enemies. That way they only perform scouting duties.

@adye88 2 жыл бұрын

And obviously set a variable for safe space= position holding command center

@sebbes333 2 жыл бұрын

*__* One thing I feel is missing from the map, is a kind of "ghost" of where enemies have been seen previously, which could become "points of interest" for scouting in the future. The "ghosts" could "fade" over time, but never fade to zero again (caped at minimum 1, starting at like 255 or something), to make the algorithm prioritize the most recent ghost locations. Also, instead of scouting with void rays, wouldn't it be cheaper to scout with drones (to generate ghost areas) (scouting probably targets mineral areas without ghost, to see if enemies have expanded, while voidrays can scout areas WITH ghosts, to see if the enemies are still there & try to defeat them there, can also send a probe first to ghost area, to determine enemy strength before attacking).

@Lithane97 2 жыл бұрын

Better yet, just train an observer to scout ghosts, it's almost like they're made for that 👍 Wouldn't even require any logic really, just if ghost entity train observer and have it sit there all game.

@achtsekundenfurz7876 2 жыл бұрын

I can imagine some ways to refine the AI using more inputs: -- time elapsed since game started (there's hardly any risk of attack at all in the 1st minute, but at a late stage, the risk is much higher), -- current resource totals (letting resources sit in the "bank" is usually wose than expanding the economy or forces), -- # of "ghosts" on the map (where enemies were sighted and lost again). About rewards and penalties, I'd suggest the following: -- adjust the reward/punishment for victory/defeat: a "good" AI should aim for a quick victory, but not at all costs. Maybe set the victory reward to 24,000 / sqrt(seconds played) and cap at 1000 (i.e. don't reward any higher for games lasting

@tjw2469 9 ай бұрын

@@Lithane97 if there is a raven+cyclone/raven+viking/missile turret then its a dead observer

@J3553xAnotherFan 2 жыл бұрын

This is now the 3rd programming/ artificial intelligence channel that I've found myself watching even though my ability to code (or even Math) is so awful that if there was a gun to my head I would beg to just be shot. But I find it satisfying to watch. Like a time-lapse of an ant colony diligently working away.

@Ammothief41 2 жыл бұрын

Thanks for putting all of that together. Looks neat.

@serta5727 2 жыл бұрын

I recommend your Channel every now and then to people learning python 🤗

@gavinmorton7682 2 жыл бұрын

this is such a cool project! would love to see this keep going

@serta5727 2 жыл бұрын

Can’t get enough of learning this awesome stuff

@danielglidewell 2 жыл бұрын

I wasn’t in the mood to watch the video when I read the title, but when I realized what the thumbnail was I stopped by to drop a like lol.

@EnderSword 2 жыл бұрын

Kind of neat, I'm wondering if you looked at the AlphaStar research at all to do this, or looked into the StarCraft 2 AI community? There's about 70 coders of various bots and AI that compete against each other and it'd give you a ton of ideas on build choices and especially unit control and decision making.

@Leonhart_93 2 жыл бұрын

The AI coders in the community don't make true AI, they just give them a set of commands and responses to various actions. A true AI learns from successes and failures (reinforcement) with very little initial programming.

@PeterRAmice 2 жыл бұрын

@@Leonhart_93 while this has some truth to it, what you are referring to is machine learning. The ai spectrum is much wider than learning like a human, the best way of describing ai imo is: a machine which observes it's environment and executes actions which maximizes its goals. So with that definition in mind I would argue those people are actually building ai's which do not automatically learn from their past experiences and thus they do not build machine learning ai's, which alpha did.

@Leonhart_93 2 жыл бұрын

@@PeterRAmice We just called bots that follow specific sets of instructions AI in the past out of laziness and limited understanding. It doesn't apply to current times anymore, fake AI and true AI have almost nothing in common. We can't use the same word to describe them both, so a "bot" is proper for the fake AI.

@Leonhart_93 2 жыл бұрын

@string name; Yes, bots. I played vs the top bots of the sc2 bots community, they are really good. They won't be easy unless you are at least masters, which is impressive for a bot. The major problem with those fake AI is that they can always be cheesed in some way, no human programmer can ever input the right answer for every situation. Btw, AlphaStar never had complete map vision, it wouldn't have been a valid test. It had complete vision of whatever parts it could see since there was no player-like camera which removed any delay from responses. I think that's ok, even bots respond to everything with 0 delay. AlphaStar has potential, but it will never progress past a certain point if they don't train it permanently on the ladder vs pro players and actually see current tactics.

@ErazerPT 2 жыл бұрын

@string name; It's no more cheesy than a grandmaster switching cams at 400+apm (yes, they do it...). And while "beating the best" might sound like a great eng goal, all you need is to beat 99% to already go WAY beyond what humans can do (on averga). There's a few F1 top racers, there's billions of "common drivers", for a driving ML model which is more important, beating the top F1 or consistently outdoing "Average Joe"? p.s. that one "human trick" that beat the model in one game was a simple "loop", as the model got stuck reacting to the same thing in a loop, back and forward. You can observer that level of idiocy in humans too at times ;)

@BretBowlby 2 жыл бұрын

I like the ideas here, but be sure that you've got task that can understand the adv. of having a high ground vision giving better attacking vs not having high ground vision. Also, I'd consider having the model constantly scouting as all information gained on the players actions can lead for better counter attacks and so forth. But yeah I'm loving this. keep'em coming!

@teardowndynamic6171 2 жыл бұрын

i know nothing about programming or AI but this is just so fun this watch

@binxuwang4960 2 жыл бұрын

Already super impressive that you could do rl for macro level strategy! Totally agree that to solve a csrtain problem how to formulate the state action and reward is key

@serta5727 2 жыл бұрын

Wow congratulations I think what you did is amazing 🤩 I would like to do something like this for software testing for a while but it is so complicated

@kylee.7654 2 жыл бұрын

At 4:52 regarding your comment, I added async def on_start(self): self.last_sent = 0 after the on_step function. It makes it a little clearer

@Mutual_Information 2 жыл бұрын

Wow I'm literally working on a series on RL theory and I was just wondering how the hell you'd code things up to actually play Warcraft 3. Starcraft 2, close enough! Such a useful channel

@Singularitarian 2 жыл бұрын

Very illuminating!

@ericzahn274 2 жыл бұрын

Great vid. Buying the book.

@cedrickram3180 2 жыл бұрын

Some time series analysis (windowed access to what has been searched, where stuff was, ...) would probably help the AI make better decisions. The data of just the map does not do a good job of storing time-information. Your rewards seem like a good fit. Great video!

@stonecoldscubasteveo4827 2 жыл бұрын

Reward for resources spent. this will incentivize expansion and rapid army growth until max out. At that point change the reward to enemy units/structures killed. Something like (big reward) for spending money on nexus/probe/stargate (bigger reward) for void ray, (penalty) for having too much money banked up unless supply is >190. Then (big reward) for killing enemy unit/structure, while dialing back on rewards for building structures. zero out the rewards for probes over 70-80 and for pylons over 200 supply. When supply drops due to combat, flip the rewards back to making void rays to max out again.

@VaSoapman 2 жыл бұрын

Why not give rewards based on how many enemy units/buildings are destroyed? Then give a penalty based on how many units/buildings are destroyed? Also to help the AI prioritize winning over stalling, you could increase the value of a win based on how fast it won.

@nrobo3840 2 жыл бұрын

Yeah, adding a time decay to the win reward was where my mind immediately went.

@moseszero3281 2 жыл бұрын

I was thinking a k/d reward and a lowering of all rewards for time

@benoitkinziki3916 2 жыл бұрын

For the reward mechanism you could probably build a LSTM that gives you the probability of winning for each action you take and you should probably include a time penalty to avoid the bot dragging the game out

@djsyntic 2 жыл бұрын

When you got talking about how to handle the gas extractor on your minimap was that you handled it strangely. So keep in mind that the RGB values for the colors you put on your map are arbitrary and serve to help you visually more than the computer. But you could have encoded some meaningful data into the RGB itself. For example, instead of saying "This building is green, this building is dark green" and so on, you could have put all building/unit type info into the R-value of RGB. IE: This building is R-value 12, this building is R-value 13, and so on. Then the G-Value could represent something else, like building health. IE: R-12, G-255 means it's a Refinery at full health while R-12, G-1 means the Refinery is about to explode if it takes any more damage. Finally, the B-Value could then be used as some sort of indicator of something specific to that building. R-13 might be a Barracks, and B-2 might mean that it's in the middle of training something and has 2 units of time before it finishes and can do something else. On the other hand, R-14 might be a Gas node, and B-# could indicate how much gas that node has, while R-15 indicates that this is an extractor with the B-# still indicating how much is still in the node. Sure to YOU R-14 and R-15 are basically the same amounts of red and your eyes wouldn't be able to tell the difference, but to a computer, those are two distinct values.

@RickBeacham 2 жыл бұрын

Great stuff! Super interesting.

@cmilkau 2 жыл бұрын

Interesting actions. Not only do they encode a lot of knowledge about the game, they include deep causal chains that otherwise would take long to learn.

@achtsekundenfurz7876 2 жыл бұрын

Just a quick note: the "can afford" check at 04:47 is NOT totally redundant. You're inside a "for each idle stargate" sort of loop, and if two are idle, you could end up in a situation where you can afford one but not the other -- and depending on the capabilities of the ex-handler, tripping an exception doe to insufficient resources could crash the AI.

@tibielias 2 жыл бұрын

What an awesome video! I wonder how making an API like this for other RTS games would be possible and then training AI models for those separately. 🤔

@ButtersDClown 2 жыл бұрын

Very cool idea. I think programing a few meta builds into your algorithm and seeing how it learns with time (if achieved "this" by "this time" do "this" otherwise do "this") like doing a rush build ect.

@BasicAndSimple 2 жыл бұрын

Book Purchased. Thanks

@dracomurdock6349 2 жыл бұрын

The criteria I would try to ensure it has highest on its priority is- if you win, only- unit efficiency. IE: how many resources did this unit earn, or destroy for an opponent, relative to its own cost? Averaging them out, and defining those units by a percentage based on the actions they were made to perform- and segmenting the game into the first 5 minutes and the rest of the game- you could provide a huge assist to the AI learning more complicated macro and micro strategies.

@robanson32 2 жыл бұрын

Great book! Love my copy

@achillestroy3122 2 жыл бұрын

I remember when you live coded AI plays GTA v and that too on python's default IDLE. Bring those days back. Great video though.

@whitey9933 2 жыл бұрын

Looks great, always been interested in the Alpha Star gameplay and how it manages all the different tasks. For the enemy search, can focus on undiscovered minerals (enemies would normally congregate around minerals fields) and probably better than random search.

@Cursedmountainstudios 2 жыл бұрын

You were so preoccupied with if you could, you didn't stop to think if you should... and so it begins.

@nastrimarcello 2 жыл бұрын

This amazing. Amazing code, amazing explanation, amazing editing. Only one suggestion: when possible, don't use try:...except:pass As this can lead to hellish problems. If you know what exception you are having in that try-except statement, using that exception explicitly is better (even if you are just going to 'pass' it)

@FF7Cloud 2 жыл бұрын

it might help to allow a phoenix now and then for scouting purposes since void rays are super slow

@dogtato 2 жыл бұрын

very interesting to see how you structured it to use ML decisions for higher level decision making. would definitely be interested in seeing how you approach a micro script and specifically wonder about the ability to add new behaviors without having to retrain from scratch

@KennTollens 2 жыл бұрын

Hey Lieutenant Commander Data, I'm going to go buy your book so I can join Starfleet too!

@matthias916 2 жыл бұрын

Read the first comment at 5:14. Usually what I do is this: ``` if not hasattr(self, "var_name"): self.var_name = initial_value ``` but I believe it is good practice to initialize the variable somewhere else like in the constructor

@wootcrisp 2 жыл бұрын

Nicely done.

@ZhiYin 2 жыл бұрын

I love how I know every word but have no idea what you're talking about. (youtube recommended this video because I follow startcraft2)

@Shadow-yl2tf 2 жыл бұрын

9:00 another reward could be time. If you win a match, then the shorter the time, the extra rewards you get. Like wise, the opposite if you lose.

@cheddar500 2 жыл бұрын

Very satisfying to watch

@matheusGMN 2 жыл бұрын

your strategy of multiple Ais to coordinate everything at the end that you mention is the same one Paradox Entertainment uses in games like Stellaris and EU4

@MFTomp09 2 жыл бұрын

I wonder if modifying the reward structure to include a small reward for scouting. Like finding new enemy structures or something would be useful to get more wins in those games where you said they regrouped and came back with a larger force to beat you later

@AlexGrom Жыл бұрын

Later on there is potential to counter based on what and when was seen. You see early barracks - prepare to counter marines, marauders or reapers.

@cowjuicethepallytank 2 жыл бұрын

Some potential rewards (or punishments) could be losing a voidray is a negative percentage of the positive reinforcement for attacking. Locating the enemy could be a small reward every x seconds to incentivise optimal searching patterns. Another question I have is what information does the API have access to? Does it have the capability of identifying enemy units? Are you able to get unit counts of the AI's specific units? Do we have the capability of training upgrades? In general, I think that with given the correct training it may be possible to find certain timings of when best to scount and taking optimal scouting paths as well as best attack timings in terms of time in game as well as potentially within build order. The difficulty, depending on how far you take it, could come down to army composition and as you were saying, micro. Lastly, showing my lack of knowledge in AI learning. Would it be possible to train the AI using professional gameplay wins, then use that as a baseline "build order" for then using the reinforcement learning?

@ReallyWhy123 2 жыл бұрын

this book is impressive

@Ranshin077 2 жыл бұрын

You could also give the ai a larger gradient win reward or larger gradient lose reward for shorter games.

@saksaganskasadyba 2 жыл бұрын

It's amazing how are you doing it. Your videos are really inspiring

@Peter-rn5bu 2 жыл бұрын

maybe understanding how ai learns how to get to a known goal, for example losing as fast as possible, could help develop or program ai which is more efficient or accurate

@HubertRozmarynowski Жыл бұрын

i'm not usually a fan of your videos but this one did not spare crucial information and presented the topic nicely

@ccgamerlol 2 жыл бұрын

like Deepmind Alphastar, cool, would love to see full gameplay of this, please?

@thepacific2933 2 жыл бұрын

I think the best limitation would be a time limit to win the game. It would optimize all the aspects to achieve the best result

@Stthow 2 жыл бұрын

Amazing video dude. Gj.

@Magicks 2 жыл бұрын

well done sir

@a25885200 2 жыл бұрын

It remind me my FYP in university.

@FuneralProcession 2 жыл бұрын

Reward for attacking and killing is so psycho though 😲

@canadiancoding 2 жыл бұрын

Might also want to look into upgrades if you haven't. Units in mid-late without any upgrades are much worse in SC and this might have quite an impact.

@SuperShiki666 2 жыл бұрын

You should make one for total war it's way simpler because there's no resources or capturing, just manuvering and using abilities.

@Veptis 2 жыл бұрын

Coming up and implementing a good evaluation function is the hidden challenge if any deep learning project. What I feel like your model is lacking - is finer options. It's just classifying one out of 5 actions and that's it. So it's strongly limited by your hard crafted actions. As usual, those interfaces between models lose fine details. Which I believe is why large langauge models with all the residual connections do so great as they do all the tasks implicitly.

@jeremyheng8573 2 жыл бұрын

very inspiring video! Looking forward for more reinforcement learning tutorial!

@boon1580 2 жыл бұрын

i think if can make the ai go apeshet on micro-ing the fights, u can pretty much win at the first minute using just drone on drone action.

@serta5727 2 жыл бұрын

I find it very interesting 🤓

@BalimaarTheBassFish 2 жыл бұрын

Well its been a while. A little sad we have stopped here -or at least seem to have. I was always curious how multiple agents working together could be easily implemented but I guess I am doomed to not know.

@BalimaarTheBassFish 3 ай бұрын

Annual check in. Still sad that this felt a little rushed. Would have loved a more in depth thing like the first CNN ML method we did back then.

@matheusmterra 2 жыл бұрын

Well, you could add rewards for scouting new locations, rewards for keeping units alive, and check out the math of pro starcraft players of what units you should use and when. Also tier rewards for which units and buildings it will destroy to reinforce priority targeting for better performance.

@RickBeacham 2 жыл бұрын

I really want to buy that book.

@kristopherleslie8343 2 жыл бұрын

I would love to see you apply same thinking to Diablo 2 Resurrected

@Telos8 2 жыл бұрын

Any plans on a part 2 with the microgame plan implemented and see how it runs in tandem?

@TheThunderSpirit 2 жыл бұрын

u have to use pipes for interprocess comm. or at least udp

@FireTouched 2 жыл бұрын

I wonder the reward structure. It doesn't realy feel like looking for optimised play as the only negative reward you mentioned was the loss itself and after that only determining efficiancy by the total score. But what about tracking negative rewards (loss, loss of units/structures/resource access, etc.) and comparing the positive and negative score? That way the AI could pick a winning strategy that accrues few losses over one that accrues many losses - despite both having the same end score. And in turn the AI would be able to know the errors due to the dip in the comparison. Also maybe implementing a way that reduces positive/increases negative score over time? That way stalling would also be discouraged.

@serta5727 2 жыл бұрын

Thanks for the amazing content

@vladimirtchuiev2218 2 жыл бұрын

This video is so God damn cool, I have a current project that I try to make chess self play work on very limited resources, I think SC2 will be my next project if the actual python API is open. What is your GPU, and how long did it take for you to train the agent?

@IlIWarGIlI 2 жыл бұрын

years of learning starcraft leads me to simply queue a few things but otherwise not give too many orders as that delays facilitation of actions.

@NVMDSTEvil 2 жыл бұрын

Might be good idea to set reward based on time

@mstrkllr 2 жыл бұрын

You're like Code Bullet, except without the Adderall and Self Depricating Humor 🤣

@robwolters7401 2 жыл бұрын

In my experience grouping attacks and synchronising targets is very important.

@HalIOfFamer 2 жыл бұрын

Maybe code a reward for seeing unique enemy units/buildings. That way ai would have to scout the map for enemies, then double the reward for attacking if the attack unit was recently seen by a scout unit.

@raymoreclef 2 жыл бұрын

What if you sent a few units to the starting locations in a clockwise method using waypoints of some sort. (At x time go to y waypoont repeat) These waypoints could be also used to pinpoint locations the computer hid and resulted in a loss. Also, what if a losing so many units quickly it was set to build the upgrade function. I think both of these ideas could be implemented with your "engines". This was extremely entertaining!

@MrSlyFoxJr_ 2 жыл бұрын

Definitely a very Protoss approach to SC2

@norik1616 2 жыл бұрын

Please, start using `pathlib` and the `Path` object 🙏🏼

@whateverppl1229 2 жыл бұрын

9:20 that's what I figured you'd do but my question is would it be a bad idea to take away points if an enemy unit/building dies? because then, it would be rewarded for attacking. (more points from a kill than a loss, or individually price every enemy unit/building as its own value and same with ally losses) to help teach it to not lose units but to do damage.

@witherslayer8673 2 жыл бұрын

how about building more than just void rays(having stats of each unit, cost, and space. may save for big units, or LOTS of small units) and where air units can go, and were ground units can travel

@JathTech 2 жыл бұрын

a FAR better reward scheme would be kills to losses. Each unit would have a value assigned that you would lose points if you lost, and gain points if you killed. This is ultimately going to reinforce itself since by killing enemy units without losing your own, you always increase your relative combat power. Just "being in combat" incentivizes entering engagements that are losing propositions when you could choose not to fight, and instead wait until you have the advantage. Read the Art of War by Sun Tzu, and you may gain some insights on how to reward your AI.

@JathTech 2 жыл бұрын

Another great way to do this with to be have the rewards scale dynamically over time. Early kills and loses will have a greater impact on the games' overall trajectory. They should be scored as such. An earlier more decisive win is considered better than a long drawn out dragging victory. So the victory reward should start high and reduce over time.