Thanks! Your tutorials were the first that worked for me. Biggest problem that I had was the directory path for the Starcraft maps.
@sentdex Жыл бұрын
Thank you for the super!
@serta57272 жыл бұрын
I have to say you make the most understandable learning materials Your website together to the videos. All the Code is there, the book, the playlists from scratch. Most professional educators can’t do this 🤗
@kailalueni3251 Жыл бұрын
I love you idea of drawing your own minimap! Thats a smart way to make more information available easily.
@JohnJackKeane2 жыл бұрын
I do not code or have the desire to code, but this video is beautiful. I enjoy StarCraft videos seeing people micromanage, but the thought and process that goes into creating a “program” to do the same thing is fascinating. The amount of work and work to obtain the knowledge that goes into the work is far underrated. I hope for you the best!
@Derrekito2 жыл бұрын
Never before has a marketing ploy worked so well on me. I'm looking forward to receiving the hardcover version of the book!
@sebbes3332 жыл бұрын
*__* One thing I feel is missing from the map, is a kind of "ghost" of where enemies have been seen previously, which could become "points of interest" for scouting in the future. The "ghosts" could "fade" over time, but never fade to zero again (caped at minimum 1, starting at like 255 or something), to make the algorithm prioritize the most recent ghost locations. Also, instead of scouting with void rays, wouldn't it be cheaper to scout with drones (to generate ghost areas) (scouting probably targets mineral areas without ghost, to see if enemies have expanded, while voidrays can scout areas WITH ghosts, to see if the enemies are still there & try to defeat them there, can also send a probe first to ghost area, to determine enemy strength before attacking).
@Lithane972 жыл бұрын
Better yet, just train an observer to scout ghosts, it's almost like they're made for that 👍 Wouldn't even require any logic really, just if ghost entity train observer and have it sit there all game.
@achtsekundenfurz78762 жыл бұрын
I can imagine some ways to refine the AI using more inputs: -- time elapsed since game started (there's hardly any risk of attack at all in the 1st minute, but at a late stage, the risk is much higher), -- current resource totals (letting resources sit in the "bank" is usually wose than expanding the economy or forces), -- # of "ghosts" on the map (where enemies were sighted and lost again). About rewards and penalties, I'd suggest the following: -- adjust the reward/punishment for victory/defeat: a "good" AI should aim for a quick victory, but not at all costs. Maybe set the victory reward to 24,000 / sqrt(seconds played) and cap at 1000 (i.e. don't reward any higher for games lasting
@tjw24698 ай бұрын
@@Lithane97 if there is a raven+cyclone/raven+viking/missile turret then its a dead observer
@awsamalmughrabi8602 жыл бұрын
I like how in depth this video is, really enjoyed it!
@fuba442 жыл бұрын
Very interesting idea with a macro ai and a strategic ai, sort of working in tandem forming a symbiotic relationship of sorts.. could maybe even break that down even further, like on a per unit type basis... Tho i imagine the complexity explodes at that point.
@sentdex2 жыл бұрын
We have very few unit types, at least here. For the full game, there are more, and even here I wasn't utilizing all the things a voidray can actually do, but certainly there are ways to have a "voidray" algo and a "probe" algo...etc. Definitely something to think on.
@hikari16902 жыл бұрын
This sounds like how deepfakes work. Have 2 ai models compete with each other to improve each other. So if the macro ai needs to try to defeat the strategy and vice versa
@prodj.mixapeofficial64312 жыл бұрын
I believe dota have 5 controllable units, with individual open ai per unit, and modified communication between the 5 to mimic real human gameplay.
@Dethek2 жыл бұрын
When I was looking into the AI for starcraft i was thinking of the following: Overarching AI - makes final decision on what action to take Supported by: Strategy AI - use training from professional replays to assess based on what player has seen, what is their likely strategy, and then choose strategy based on that Macro AI Micro AI
@TheFalconerNZ2 жыл бұрын
@@ccriztoff Get his book lol ;-)
@adityachawla75232 жыл бұрын
Here is an idea: You can use more then 3 channels to give spatial information to your network. No need to limit yourself by conventional idea of 3 channels! If you are worried about how to visualize this, just think of it as an extra map.
@fuba442 жыл бұрын
This was an interesting video. I will have a look at your example code for sure, wanna try to tinker a bit. Thanx for all your hard work.
@protoplmz2 жыл бұрын
Hey! I love the update here. I followed the original series you put out. As a SC2 veteran I noticed deficiencies and deviated in a strong way halfway through. I setup separate models to handle the decision making for each aspect of the game. This makes it so it can make the decision to use its army separately from the decision of progressing tech (or not). I stopped around the time I couldn't figure out how to have it build its own strategies as I ended giving it a long set of possible actions and letting it pick and it felt too 'guided'. It was able to beat "Very Hard" 50% of the time vs random's 0%. Was my first exercise with ML. I got the chance to apply the concept it at work for something outside of my scope. Used both that and the SC2 project as demonstration in an interview and got a promotion out of it. This inspires me to try my hand at it again! EDIT: To handle army movement which you mentioned in the video, I chopped the maps up into a grid and gave it decisions to make where it could attack-move its army to any of these at will. 9 worked the best but you could make it much more granular. It used this to both attack and defend.
@pognar2 жыл бұрын
I have played starcraft for years and years, and I love this channel. This is going to be great.
@kevintyrrell74092 жыл бұрын
14:49 That's some next-level Gateway placement.
@kylee.76542 жыл бұрын
At 4:52 regarding your comment, I added async def on_start(self): self.last_sent = 0 after the on_step function. It makes it a little clearer
@adye882 жыл бұрын
This is freaking intense! also for the hunters problem: Why not make a "return to safe space" function for them when they detect enemies. That way they only perform scouting duties.
@adye882 жыл бұрын
And obviously set a variable for safe space= position holding command center
@faithful4512 жыл бұрын
I'd love to see the next video in this series with dual macro and micro algorithms and improving the win percentage
@J3553xAnotherFan2 жыл бұрын
This is now the 3rd programming/ artificial intelligence channel that I've found myself watching even though my ability to code (or even Math) is so awful that if there was a gun to my head I would beg to just be shot. But I find it satisfying to watch. Like a time-lapse of an ant colony diligently working away.
@PathToPrestige2 жыл бұрын
I'm replying very rarely to those kind of videos.. but hats off. Even though the project structure is messy, your genuine "realistic" practical approach was very enjoy some to watch.
@EnderSword2 жыл бұрын
Kind of neat, I'm wondering if you looked at the AlphaStar research at all to do this, or looked into the StarCraft 2 AI community? There's about 70 coders of various bots and AI that compete against each other and it'd give you a ton of ideas on build choices and especially unit control and decision making.
@Leonhart_932 жыл бұрын
The AI coders in the community don't make true AI, they just give them a set of commands and responses to various actions. A true AI learns from successes and failures (reinforcement) with very little initial programming.
@PeterRAmice2 жыл бұрын
@@Leonhart_93 while this has some truth to it, what you are referring to is machine learning. The ai spectrum is much wider than learning like a human, the best way of describing ai imo is: a machine which observes it's environment and executes actions which maximizes its goals. So with that definition in mind I would argue those people are actually building ai's which do not automatically learn from their past experiences and thus they do not build machine learning ai's, which alpha did.
@Leonhart_932 жыл бұрын
@@PeterRAmice We just called bots that follow specific sets of instructions AI in the past out of laziness and limited understanding. It doesn't apply to current times anymore, fake AI and true AI have almost nothing in common. We can't use the same word to describe them both, so a "bot" is proper for the fake AI.
@Leonhart_932 жыл бұрын
@string name; Yes, bots. I played vs the top bots of the sc2 bots community, they are really good. They won't be easy unless you are at least masters, which is impressive for a bot. The major problem with those fake AI is that they can always be cheesed in some way, no human programmer can ever input the right answer for every situation. Btw, AlphaStar never had complete map vision, it wouldn't have been a valid test. It had complete vision of whatever parts it could see since there was no player-like camera which removed any delay from responses. I think that's ok, even bots respond to everything with 0 delay. AlphaStar has potential, but it will never progress past a certain point if they don't train it permanently on the ladder vs pro players and actually see current tactics.
@ErazerPT2 жыл бұрын
@string name; It's no more cheesy than a grandmaster switching cams at 400+apm (yes, they do it...). And while "beating the best" might sound like a great eng goal, all you need is to beat 99% to already go WAY beyond what humans can do (on averga). There's a few F1 top racers, there's billions of "common drivers", for a driving ML model which is more important, beating the top F1 or consistently outdoing "Average Joe"? p.s. that one "human trick" that beat the model in one game was a simple "loop", as the model got stuck reacting to the same thing in a loop, back and forward. You can observer that level of idiocy in humans too at times ;)
@Neceros Жыл бұрын
This is great! I'd love to see something like this could compete in the arena
@XmKevinChen2 жыл бұрын
It’s a very interesting video about the ML + gaming. As a newbie to this AI world, it also gives lots incentives to continue learning.
@VaSoapman2 жыл бұрын
Why not give rewards based on how many enemy units/buildings are destroyed? Then give a penalty based on how many units/buildings are destroyed? Also to help the AI prioritize winning over stalling, you could increase the value of a win based on how fast it won.
@nrobo38402 жыл бұрын
Yeah, adding a time decay to the win reward was where my mind immediately went.
@moseszero32812 жыл бұрын
I was thinking a k/d reward and a lowering of all rewards for time
@BretBowlby2 жыл бұрын
I like the ideas here, but be sure that you've got task that can understand the adv. of having a high ground vision giving better attacking vs not having high ground vision. Also, I'd consider having the model constantly scouting as all information gained on the players actions can lead for better counter attacks and so forth. But yeah I'm loving this. keep'em coming!
@achtsekundenfurz78762 жыл бұрын
Just a quick note: the "can afford" check at 04:47 is NOT totally redundant. You're inside a "for each idle stargate" sort of loop, and if two are idle, you could end up in a situation where you can afford one but not the other -- and depending on the capabilities of the ex-handler, tripping an exception doe to insufficient resources could crash the AI.
@serta57272 жыл бұрын
Can’t get enough of learning this awesome stuff
@binxuwang49602 жыл бұрын
Already super impressive that you could do rl for macro level strategy! Totally agree that to solve a csrtain problem how to formulate the state action and reward is key
@Ammothief412 жыл бұрын
Thanks for putting all of that together. Looks neat.
@MFTomp092 жыл бұрын
I wonder if modifying the reward structure to include a small reward for scouting. Like finding new enemy structures or something would be useful to get more wins in those games where you said they regrouped and came back with a larger force to beat you later
@AlexGrom Жыл бұрын
Later on there is potential to counter based on what and when was seen. You see early barracks - prepare to counter marines, marauders or reapers.
@Singularitarian2 жыл бұрын
Very illuminating!
@danielglidewell2 жыл бұрын
I wasn’t in the mood to watch the video when I read the title, but when I realized what the thumbnail was I stopped by to drop a like lol.
@serta57272 жыл бұрын
I recommend your Channel every now and then to people learning python 🤗
@gavinmorton76822 жыл бұрын
this is such a cool project! would love to see this keep going
@benoitkinziki39162 жыл бұрын
For the reward mechanism you could probably build a LSTM that gives you the probability of winning for each action you take and you should probably include a time penalty to avoid the bot dragging the game out
@nastrimarcello2 жыл бұрын
This amazing. Amazing code, amazing explanation, amazing editing. Only one suggestion: when possible, don't use try:...except:pass As this can lead to hellish problems. If you know what exception you are having in that try-except statement, using that exception explicitly is better (even if you are just going to 'pass' it)
@cedrickram31802 жыл бұрын
Some time series analysis (windowed access to what has been searched, where stuff was, ...) would probably help the AI make better decisions. The data of just the map does not do a good job of storing time-information. Your rewards seem like a good fit. Great video!
@dracomurdock63492 жыл бұрын
The criteria I would try to ensure it has highest on its priority is- if you win, only- unit efficiency. IE: how many resources did this unit earn, or destroy for an opponent, relative to its own cost? Averaging them out, and defining those units by a percentage based on the actions they were made to perform- and segmenting the game into the first 5 minutes and the rest of the game- you could provide a huge assist to the AI learning more complicated macro and micro strategies.
@Mutual_Information2 жыл бұрын
Wow I'm literally working on a series on RL theory and I was just wondering how the hell you'd code things up to actually play Warcraft 3. Starcraft 2, close enough! Such a useful channel
@cowjuicethepallytank2 жыл бұрын
Some potential rewards (or punishments) could be losing a voidray is a negative percentage of the positive reinforcement for attacking. Locating the enemy could be a small reward every x seconds to incentivise optimal searching patterns. Another question I have is what information does the API have access to? Does it have the capability of identifying enemy units? Are you able to get unit counts of the AI's specific units? Do we have the capability of training upgrades? In general, I think that with given the correct training it may be possible to find certain timings of when best to scount and taking optimal scouting paths as well as best attack timings in terms of time in game as well as potentially within build order. The difficulty, depending on how far you take it, could come down to army composition and as you were saying, micro. Lastly, showing my lack of knowledge in AI learning. Would it be possible to train the AI using professional gameplay wins, then use that as a baseline "build order" for then using the reinforcement learning?
@Shadow-yl2tf2 жыл бұрын
9:00 another reward could be time. If you win a match, then the shorter the time, the extra rewards you get. Like wise, the opposite if you lose.
@whateverppl12292 жыл бұрын
9:20 that's what I figured you'd do but my question is would it be a bad idea to take away points if an enemy unit/building dies? because then, it would be rewarded for attacking. (more points from a kill than a loss, or individually price every enemy unit/building as its own value and same with ally losses) to help teach it to not lose units but to do damage.
@erics35962 жыл бұрын
Do you want Skynet? Because this is how you get Skynet :) (also great strats and explanation on how this works)
@ButtersDClown2 жыл бұрын
Very cool idea. I think programing a few meta builds into your algorithm and seeing how it learns with time (if achieved "this" by "this time" do "this" otherwise do "this") like doing a rush build ect.
@serta57272 жыл бұрын
Wow congratulations I think what you did is amazing 🤩 I would like to do something like this for software testing for a while but it is so complicated
@tibielias2 жыл бұрын
What an awesome video! I wonder how making an API like this for other RTS games would be possible and then training AI models for those separately. 🤔
@ericzahn2742 жыл бұрын
Great vid. Buying the book.
@stonecoldscubasteveo48272 жыл бұрын
Reward for resources spent. this will incentivize expansion and rapid army growth until max out. At that point change the reward to enemy units/structures killed. Something like (big reward) for spending money on nexus/probe/stargate (bigger reward) for void ray, (penalty) for having too much money banked up unless supply is >190. Then (big reward) for killing enemy unit/structure, while dialing back on rewards for building structures. zero out the rewards for probes over 70-80 and for pylons over 200 supply. When supply drops due to combat, flip the rewards back to making void rays to max out again.
@djsyntic2 жыл бұрын
When you got talking about how to handle the gas extractor on your minimap was that you handled it strangely. So keep in mind that the RGB values for the colors you put on your map are arbitrary and serve to help you visually more than the computer. But you could have encoded some meaningful data into the RGB itself. For example, instead of saying "This building is green, this building is dark green" and so on, you could have put all building/unit type info into the R-value of RGB. IE: This building is R-value 12, this building is R-value 13, and so on. Then the G-Value could represent something else, like building health. IE: R-12, G-255 means it's a Refinery at full health while R-12, G-1 means the Refinery is about to explode if it takes any more damage. Finally, the B-Value could then be used as some sort of indicator of something specific to that building. R-13 might be a Barracks, and B-2 might mean that it's in the middle of training something and has 2 units of time before it finishes and can do something else. On the other hand, R-14 might be a Gas node, and B-# could indicate how much gas that node has, while R-15 indicates that this is an extractor with the B-# still indicating how much is still in the node. Sure to YOU R-14 and R-15 are basically the same amounts of red and your eyes wouldn't be able to tell the difference, but to a computer, those are two distinct values.
@fuba442 жыл бұрын
In regards to rewards, did you try "resource worth of kills" devided by "resource worth of loses" + the win or lose bonus. ? (Maybe with a modifier on workers to make them more juicy targets) + maybe something to do with map exploration.. to find hiding bases.. just spitballing here, i know you already did a lot on this project.
@sentdex2 жыл бұрын
I am not sure I tried that exactly, I think you really want to have some sort of total "resource" reward that doesnt punish you for building units/buildings, but then you might needlessly build things. For destruction, what you describe may work well as a reward, so maybe combat against a worker is worth less than attacking a city hall...etc. I am not sure it's wise to force the AI to reward things higher or not though, weird things happen the more biases you insert into rewards, at least that's what I've found so far with RL. Ideally, the rwd function is as simplistic as possible.
@BalimaarTheBassFish2 жыл бұрын
@@sentdex "so maybe combat against a worker is worth less than attacking a city hall...etc" Which could be dangerous against a Terran opponent whose workers can repair buildings while you're attacking them. Had an issue like this with my CNN attempt at SC2 where i lost my entire army because they were too busy focusing a terran command center while completely ignoring the half dozen marines shooting them.
@RickBeacham2 жыл бұрын
Great stuff! Super interesting.
@FireTouched2 жыл бұрын
I wonder the reward structure. It doesn't realy feel like looking for optimised play as the only negative reward you mentioned was the loss itself and after that only determining efficiancy by the total score. But what about tracking negative rewards (loss, loss of units/structures/resource access, etc.) and comparing the positive and negative score? That way the AI could pick a winning strategy that accrues few losses over one that accrues many losses - despite both having the same end score. And in turn the AI would be able to know the errors due to the dip in the comparison. Also maybe implementing a way that reduces positive/increases negative score over time? That way stalling would also be discouraged.
@whitey99332 жыл бұрын
Looks great, always been interested in the Alpha Star gameplay and how it manages all the different tasks. For the enemy search, can focus on undiscovered minerals (enemies would normally congregate around minerals fields) and probably better than random search.
@OnlyKoolaid2 жыл бұрын
SENTDEX: I'm going to teach AI to rush Voidrays. Protoss mains: STOP! I can only get so aroused. Zerg: This is a war crime.
@cheddar5002 жыл бұрын
Very satisfying to watch
@matthias9162 жыл бұрын
Read the first comment at 5:14. Usually what I do is this: ``` if not hasattr(self, "var_name"): self.var_name = initial_value ``` but I believe it is good practice to initialize the variable somewhere else like in the constructor
@FF7Cloud2 жыл бұрын
it might help to allow a phoenix now and then for scouting purposes since void rays are super slow
@robanson322 жыл бұрын
Great book! Love my copy
@dogtato2 жыл бұрын
very interesting to see how you structured it to use ML decisions for higher level decision making. would definitely be interested in seeing how you approach a micro script and specifically wonder about the ability to add new behaviors without having to retrain from scratch
@LecherousCthulhu2 жыл бұрын
You can actually improve your algorithm in very specific steps. There is a maximum number of workers you want to achieve and there is a maximum number of units total that you want to achieve. Getting the model to understand that it needs to reach those maximums as quickly as possible is a nice reward curve as if your model is able to get the fastest possible time to maximum unit capacity with X workers and Y void rays then it will likely do better. You can also change the model to go from attacking to destruction of units and unit loss. You'll have to weight each unit based on how many units you currently have of that type, but this will allow you to eventually build the model to work with any build in SC2 as most of these builds will be able to be switched out for any other list of weighted units from any other faction. You can actually find out the exact build orders and steps from someone like WinterStarcraft kzbin.info/door/k3w4CQ_SlLH4V0-V6WjFZg He will give you very specific steps for the model to focus on based on how skilled the models opponent should be with his Low APM Bronze to Masters Series
@Mediiiicc2 жыл бұрын
He wants the AI to learn how to play, not just build as instructed.
@avananana2 жыл бұрын
You're missing the point that this isn't a video on an "algorithm", it's a video on a machine learning system. The point is not to make the algorithm, it's for the computer to make it.
@EnderSword2 жыл бұрын
Yeah, the problem with this approach being used, is the actual SC2 AI doesn't 'adapt' at all, it just is going to do basically the same thing, so if you really want you could just creation 'actions' or a build designed to perfectly kill that AI. I think opening this up to a real game with real build orders and decisions increases the complexity 1000x over. The fact he's limited it to only 5 actions and picked a unit AI sucks at dealing with makes it kind of a neat experiment, but its designed to win against a very specific opponent, not a variety of opponents.
@witherslayer86732 жыл бұрын
how about building more than just void rays(having stats of each unit, cost, and space. may save for big units, or LOTS of small units) and where air units can go, and were ground units can travel
@teardowndynamic61712 жыл бұрын
i know nothing about programming or AI but this is just so fun this watch
@cmilkau2 жыл бұрын
Interesting actions. Not only do they encode a lot of knowledge about the game, they include deep causal chains that otherwise would take long to learn.
@floydbarber75282 жыл бұрын
oh man, i needed that book 3 months ago, made with 3 others our own NN and genetic algorithm to play mario. also with reinforment learning. i was thinking about how hard it would be for sc to do so. but it doesnt seemed too hard, but you used didnt wrote your own neural network right?
@Stthow2 жыл бұрын
Amazing video dude. Gj.
@Telos82 жыл бұрын
Any plans on a part 2 with the microgame plan implemented and see how it runs in tandem?
@majorduff32202 жыл бұрын
what would happen if you gave a reward if your void killed or destroyed something (enemy army and buildings), and when a void dies you got a negative reward? and give really small rewards for beeing alive. when the steps bring you to a positive result, like killing an enemy, you can give also the last 10 steps, or whatever you like more rewards. if will this end in a lose of your army or a building, negative reward. maybe this will help, learning faster and better?
@matheusGMN2 жыл бұрын
your strategy of multiple Ais to coordinate everything at the end that you mention is the same one Paradox Entertainment uses in games like Stellaris and EU4
@teardowndynamic61712 жыл бұрын
i am trying to make a AI that will farm for me in rust, but ia mso lost xD, if I understand you are not using computer vision because the camera movement is to complicated ? so you are building data from minimap only ? if i wanted to train my AI to farm sulfur nodes in rust what would be your approach ?
@Shazumbi2 жыл бұрын
I know nothing about coding or anything else, really, that went on in this video. But I do have a question, as this is incredibly interesting; how do you make the program know if an action is "good" or "bad"? Trying to rack my pea brain on how one would write this out. Even if you say the final outcome is greater than x (as a "good" outcome) how do you 'convince' your program that it should continue to try for that outcome?
@themaster84322 жыл бұрын
is there a c# version of the code snippets? maybe another resource that can teach machine learning algorithms in c# also? :)
@bronsoncarder24912 жыл бұрын
Here's my issue with your approach: Your actions are basically just a hard coded list of commands. You could essentially just create a hierarchy of those commands and apply a little probability and get similar results. The way you've set this up, the AI will never develop novel strategies. It can, at best, play with the topmost level of human strategy available (and that's only if you spend the time to hardcode that into each action). And, that's cool, but... I feel like the point of an exercise like this should be to see how the AI "thinks" about the task and what novel strategies might arise from that. Idk, I do understand that the computing power to decide between the thousands of different options available at any given moment in an RTS is beyond most personal computers, but... I feel like hard-coding the actions kind of defeats the whole purpose.
@JOHNSMITH-ve3rq2 жыл бұрын
hard agree. love the channel but yeah -- all the hardcoded rules are confusing. Can't you simply give it the barest of initial game parameters - no strategy, no rules - and let it learn from winning strategies?
@tronowolf2 жыл бұрын
What if the reward incentive was striking a perfect flow and balance of resource gathering/construction?
@BasicAndSimple2 жыл бұрын
Book Purchased. Thanks
@wootcrisp2 жыл бұрын
Nicely done.
@achillestroy31222 жыл бұрын
I remember when you live coded AI plays GTA v and that too on python's default IDLE. Bring those days back. Great video though.
@NeutrinoTorch2 жыл бұрын
You actually need more then just voidrays. And for other units some scripts for fighting patterns. You can probably star with archon-immortal-charge zealots composition. It has almost none fighting patterns
@WTfire102 жыл бұрын
No voidrays are the only unit a protoss needs
@TheThunderSpirit2 жыл бұрын
u have to use pipes for interprocess comm. or at least udp
@ccgamerlol2 жыл бұрын
like Deepmind Alphastar, cool, would love to see full gameplay of this, please?
@canadiancoding2 жыл бұрын
Might also want to look into upgrades if you haven't. Units in mid-late without any upgrades are much worse in SC and this might have quite an impact.
@kja2ja2 жыл бұрын
Interest! But isn't the build in AI already do all these? just need to adjust the difficulty level, no? What is the difference between this Python code vs the SC AI? Awesome info!
@davidcristobal71522 жыл бұрын
Does Stable-baselines allow to store states - reward pairs in harddisk? I developed a modification of the MemorySequential class in keras-rl to use little memory in ram. My algorithm uses a thread to store states (images or whatever) as numpy arrays in my ssd disk, and keeps a randomized subset of the states in every loop of the algorithm in order to train the agent without using tons of RAM (which i don't have). It's a sloppy implementation so I was wondering if stable-baselines has something like that
@Veptis2 жыл бұрын
Coming up and implementing a good evaluation function is the hidden challenge if any deep learning project. What I feel like your model is lacking - is finer options. It's just classifying one out of 5 actions and that's it. So it's strongly limited by your hard crafted actions. As usual, those interfaces between models lose fine details. Which I believe is why large langauge models with all the residual connections do so great as they do all the tasks implicitly.
@Ranshin0772 жыл бұрын
You could also give the ai a larger gradient win reward or larger gradient lose reward for shorter games.
@vladimirtchuiev22182 жыл бұрын
This video is so God damn cool, I have a current project that I try to make chess self play work on very limited resources, I think SC2 will be my next project if the actual python API is open. What is your GPU, and how long did it take for you to train the agent?
@alansmithee4192 жыл бұрын
2:20 But does the ai then think there's very little there, or are they dim just for us to more intuitively understand the video? If the former, why would the AI go to those areas if it believes there's nothing there? Or does it have to learn itself that dim means there's very little, thereby also learning for itself that very dim means unknown?
@keanamrazek37452 жыл бұрын
What if you were to only use the reward function you did in the initial training and then use the win-loose reward for model refinement.
@th1nhng02 жыл бұрын
This is what Im looking for
@cmilkau2 жыл бұрын
If you already know the things and their positions, why give the model extra work by encoding them into an image? You could just input them directly as vectors.
@alrey722 жыл бұрын
Can some of the values be included in the iteration or training ... like for example the reward values?
@serta57272 жыл бұрын
Thanks for the amazing content
@BalimaarTheBassFish2 жыл бұрын
I'll be interested to see how you link the different AIs with there different specialties together. My only concern would be there is bound to be some overlap, how would the AI resolve 'competition' against itself when one or more AI specialties want to control the same thing? ugh I can English I swear!
@calebb47822 жыл бұрын
Couldn't you use the mini map and cursor to move camera by clicking said mini map? or am I missing something?
@boon15802 жыл бұрын
i think if can make the ai go apeshet on micro-ing the fights, u can pretty much win at the first minute using just drone on drone action.
@rpraver12 жыл бұрын
Long time follower and purchased your book. Why not touch on genetic algo from scratch?
@SuperShiki6662 жыл бұрын
You should make one for total war it's way simpler because there's no resources or capturing, just manuvering and using abilities.
@BryceRoche2 жыл бұрын
If you could reveal the army supply for each bot and reward the RL if your army supply is higher. Also another reward if your worker count is higher
@kristopherleslie83432 жыл бұрын
I would love to see you apply same thinking to Diablo 2 Resurrected
@ivanmakara73202 жыл бұрын
What issues did you run into using a reward mechanism of: Reward = Resources_Harvested + Resources_Remaining * X where X = 1 in a win, and X = 0 in a loss This shouldn't prolong a game, because any resource harvested from the map will equally drain resources remaining. I would imagine it would actually speed up the game, because the only activity that reduces the reward is the enemy harvesting resources, which is a thing you would want to disrupt anyway. (I guess also a probe dying while carrying a unit of resources, but I think that change would be negligible)
@KennTollens2 жыл бұрын
Hey Lieutenant Commander Data, I'm going to go buy your book so I can join Starfleet too!
@jeremyheng85732 жыл бұрын
very inspiring video! Looking forward for more reinforcement learning tutorial!
@Cursedmountainstudios2 жыл бұрын
You were so preoccupied with if you could, you didn't stop to think if you should... and so it begins.