Monte Carlo Tree Search (MCTS) Tutorial

Рет қаралды 88,128

Күн бұрын

Learn more advanced front-end and full-stack development at: www.fullstackacademy.com
The Monte Carlo Tree Search (MCTS) is a search algorithm that an AI can use to make optimal decisions and reach its goals. Unlike minimax, the MCTS can work in very open-ended environments with a high branching factor, making it much more effective in games like Go, where the sheer amount of possibilities and choices are just too great for the brute force approach. It also does not require an evaluation function to determine the value of a game state, making it much more "adaptable" to different environments. The MCTS uses 4 main functions (Select, Expand, Simulate, and Update) to create an asymmetric statistics tree which maps onto a game tree, which an AI can then use to ultimately make the "right decision." This tech talk goes into detail how the MCTS determines which function to call in what situation, and how exactly it builds its stats tree.

Пікірлер: 46

@srijan4622 4 жыл бұрын

Best Concise intuition building video out there. Thank You

@kishorepv4924 6 жыл бұрын

Concise talk. Thank you.

@thawsitt 6 жыл бұрын

This is really helpful and easy to understand. Thank you!

@mikeouwen Жыл бұрын

Awesome explanation on MCTS for beginners! Thanks

@matthewzhao7507 6 жыл бұрын

Awesome explanation. Thank you!

@user-mw5rz6ws2b 6 жыл бұрын

Thank you very much! This video game me better understanding in MCTS.

@hamedfarahani4281 Жыл бұрын

Really helpful and easy to understand. Thank you!

@egor10_4 6 жыл бұрын

very clear explanation, thanks!

@larry1285 5 жыл бұрын

great explanation for beginners, thank you so much

@anmaraljanabi7137 3 жыл бұрын

Thank you very much for this wonderful explaination, keep it up.

@timwu3423 7 жыл бұрын

Very concise, thank u.

@AkshayAradhya 5 жыл бұрын

This was really clear and helpful thanks. But you also forgot to mention that MTCS simulations can be cut off after a certain depth

@saeidshamsaliei2536 3 жыл бұрын

really helpful thank you very much

@mangomilkshakelol 4 жыл бұрын

Thank you!

@pankaj_pundir 3 жыл бұрын

Does the root node need to be updated with statistics?

@Ch3tok 4 жыл бұрын

you are my father bro, i love you

@ras4884 5 ай бұрын

As the search tree expands more and more, does it mean it is requiring us exponential memory also?

@418teapot9 3 жыл бұрын

what happens if a selected node has only terminal next states?

@KevinHorecka 6 жыл бұрын

Not sure I understand the Simulation step. Presumably each environment/game requires that step to be custom tailored to generate some outcome based on a game state. And for many environment/games, the outcome is dependent upon a long chain of steps (which, at this point, aren't executed). Does it just randomly pick steps after that until an end state is reached? That doesn't seem like it should work. Thanks for the talk!

@RaimonTarou 6 жыл бұрын

From my understanding it seems to be so, it will randomly pick steps until all resources are exhausted. For systems that have infinite resources, I have no idea, I guess the programmer would have to explicitly state the maximum amount of steps the simulated tree should take before it stops.

@aamir122a 6 жыл бұрын

Simulation means creating random states with the constrain of state of previous node. As we approach end game this state space will get smaller.

@roccococolombo2044 6 жыл бұрын

For Go this would mean simulating random legal moves for both sides until the game ends and then registering the result of the game. But I could be wrong.

@DarrenSemotiuk 5 жыл бұрын

jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ ^ this article is a good summary which also links to his Python code examples. Multiple games including Connect4 and Reversi.

@JO-vj9kn 4 жыл бұрын

Selection and Expansion follow a "Tree Policy" which deterministically chose the nodes with the best values. Simulation follows a "Rollout Policy" which you can design yourself using tools of choice/heuristics. If it's selecting actions randomly you get a really robust but slow learning Tree Policy. If you replace it with a function approximator you don't even have to do full rollouts (the results of those are approximated) so faster learning Tree Policy but also more biased.

@weihe1680 6 жыл бұрын

thanks the tutorial.

@iamsmart5255 Жыл бұрын

Great video but I still cannot understand how the simulation step works can anyone explain it for me please. Thanks in advance.

@ercanatam6344 2 жыл бұрын

Is there access to the slides?

@Dudlykoo 4 жыл бұрын

In the first example I see each node is a 3*3 square, so does it mean the next move of AI is only limited in the 3*3 area? If we enlarge the square, it will increase the time complexity right?

@sieyk 3 жыл бұрын

For the minimax algorithm, yes. For MCTS it just reduces the accuracy if given the same computation time.

@lunique_rudd 2 жыл бұрын

Thanks you for that

@ThePositiev3x 4 жыл бұрын

One question. If there is no eval function, to be able to say "good or bad" after doing simulation, do we have to arrive at a final state so that we can say it's a win or loss?

@thedingodile5699 4 жыл бұрын

Yep.

@0114mercury 3 жыл бұрын

Great explanation of how the MCTS works. I have one question: At 10:34, what is meant by "after enough simulations the MCTS converges to look like values in the minimax"? Suppose minimax returns "1" for winning positions and "0" for loosing positions. Suppose for two moves "a" and "b" the MCTS returns the values "9990/10000" and "10/1000000" respectively. Does "9990/10000" correspond to "1" in the minimax tree? Should we choose "b" because it has more simulations? Or will MCTS never output such values?

@intertextualdialectic1769 2 жыл бұрын

Like he said the tree node with more simulations will be the one that is picked. A node that is clearly losing won’t have that many simulations either as shown by the formula. All he means by converging is that the algorithms will eventually agree on what the best move is and the second and third best etc. they will converge to the same move ranking from best to worst.

@haejulee259 2 жыл бұрын

I hope uct treesearch fomular was introduced...

@Jkauppa 2 жыл бұрын

assume you dont have any knowledge of the "promising" evaluation function, only the win/lose ending condition, you have to randomly put somehow, only recording the position placed, diff chains, game recordings

@Jkauppa 2 жыл бұрын

I conjecture that breadth first will solve the go non-problem, optimally, like the maze

@Jkauppa 2 жыл бұрын

problem not defined, program not defined, only win/lose

@anonymousnobody5001 Жыл бұрын

Pyramid scheme

@user-qi5kb5th7y 2 жыл бұрын

Lol, this dude seems to have no clue what he's talking about.

@fsgfaf 5 жыл бұрын

Interesting video but the presenter does not know much about games and makes a number of inaccuracies. 1. Kasparov beat Deep Blue in 1996 but presenter says the computer won 2. Chess cannot be, and will never be, brute-forced; there are too many possible combinations 3. It's not true that it took longer for a Go computer to beat the top human player (presenter says "world champion", but Go does not have official world championships like in chess), one can only say it happened later. But that was because there was not much interest in making a Go computer, whereas Deep Blue was the culmination of 40 years of making chess computers. Deep Blue was a significant investment for IBM, including one of the fastest supercomputers of its time, a top and large programming team, hiring of a full-time chess grandmaster, hiring several other grandmasters as consultants, etc. No one expended any resources into a Go computer until Deepmind. AlphaGo was also magnitudes faster than Deep Blue, so if talking about computing speed (presenter's reference to brute force), it applies much more so to AlphaGo. There certainly wasn't anything close to 40 years of development in Go computers.

@bitti1975 5 жыл бұрын

First two points are true, but the third one simply wrong. The Computer Go research is almost as long as for Chess, and it was a popular research theme, exactly because it couldn't be solved with the same approaches as in chess. The Ing foundation even offered a $1.4M dollar price till the year 2000 for anyone who could write a strong Go program, which went unclaimed (senseis.xmp.net/?IngPrize). So it's not like there was no incentive to invest many resources into the problem.

@sieyk 3 жыл бұрын

You're missing a lot of information. The method used for chess was only tractable because of intelligent pruning. For Go, you cannot prune intelligently using hand-made methods due to the state space being so much larger as well as Go not having a well-defined way to determine who's winning (outside of the obvious cases). Go required a whole new way of approaching the search space to become tractable. Consequently, the newer methods that are used for Go are also orders of magnitudes better at Chess.

@michaelharrington6698 Жыл бұрын

Your point 3 is a gross misstatement. Chess got lots of development and funding because it was on the frontier of feasibility. Go was not. It actually had lots of CS departments chugging along at it for a very long time. The approaches to Chess utterly failed at Go and thus CS departments used it for a research problem.