AlphaGo - How AI mastered the hardest boardgame in history

Рет қаралды 178,705

6 жыл бұрын

In this episode I dive into the technical details of the AlphaGo Zero paper by Google DeepMind.
This AI system uses Reinforcement Learning to beat the world's Go champion using only self-play, a remarkable display of clever engineering on the path to stronger AI systems.
DeepMind Blogpost: deepmind.com/blog/alphago-zer...
AlphaGo Zero paper: storage.googleapis.com/deepmi...
If you want to support this channel, here is my patreon link:
/ arxivinsights --- You are amazing!! ;)
If you have questions you would like to discuss with me personally, you can book a 1-on-1 video call through Pensight: pensight.com/x/xander-steenbr...

Пікірлер: 113

@kkkkjjjj4517 6 жыл бұрын

7:24 Your explanation of MCTS is not correct. For one instance of simulation: It picks the top move recommended by the network (greedy) most of the time, with random moves some of the time (epsilon). Then it walks into that move and repeats the same. It does it to completion. Then it backs up and keeps track of win vs visit ratio for each state as shown in the picture. It repeats this whole process 1600 times. As it is performing these walkthroughs it trains the networks and updates the values. So eventually, the more often you see a state, it will statistically converge to optimal value. MCTS runs to completion, its not a depth pruning algorithm. Temporal Difference stops somewhere in the middle, this was not used in AGZ. MCTS algorithm is discussed by David Silver in his lecture #8 towards the end.

@ArxivInsights 6 жыл бұрын

I checked the paper and you are indeed correct! The used MCTS doesn't always play out every thread until the very end of the game (they use value tresholds for early stopping) but I did misinterpret the meaning of the '1600 simulations', thanks for pointing this out!

@andreys7944 5 жыл бұрын

Do I get it correct: with average depth ~300 moves and no early stops that will be ~1600*300 network queries just for the first move?

@edeneden97 5 жыл бұрын

so it plays out and if he wins then those moves he picked are trained to be 1 and value 1? and if he losses everything 0?

@DiapaYY 5 жыл бұрын

AGZ doesn't really use MCTS as it doesn't use rollouts, it doesn't play the game out to the end.

@philippheinrich7461 4 жыл бұрын

So I am going down the search tree (based on an algorithm that takes exploration and other things into account), until I reach a leaf node. I put the leaf node position into my neural net and get a policy and an evaluation as a result. The policy adds new leaf nodes to my current position and the value function gives me an evaluation of my current position. Is this correct ?

@AlessandroOrlandi83 4 жыл бұрын

Thank you for taking the time to explain it so well. Still difficult for me that I'm not familiar with the matter yet, but you did really a good job of showing it clearly!

@clrajapaksha 6 жыл бұрын

You explained technical stuff very clearly. Thanks Arxiv Insights

@antonystringfellow5152 6 жыл бұрын

Clearest and most informative video I've seen on AlphaGo. Thanks!

@noranta4 6 жыл бұрын

This is a valuable explanation, this channel is a great discovery

@ArxivInsights 6 жыл бұрын

noranta4 thanks man, just started two weeks ago ;) More vids coming up :p

@alonamaloh 6 жыл бұрын

I've been programming board game engines for 25 years and I've followed the development of CNNs to play go quite closely. This video is a really good description of the AlphaGo Zero paper, with very clear explanations. Well, the explanation of MCTS was completely wrong, but other than that this video was great. I'll make sure to check out more from this channel.

@shafu0x 5 жыл бұрын

Thank you for this great explanation!

@SantoshGupta-jn1wn 6 жыл бұрын

You explanation skills are fantastic! I like how he has an outline at the begging of his video, very simple thing yet very effective when it comes to teaching a subject, yet so few educational videos do that. If I were to figure out the paper by myself, it would have taken me personally ~2x longer. Subscribed.

@augustopertence2804 6 жыл бұрын

Best explanation I found about AlphaGo Zero

@Sl4ab 2 жыл бұрын

It's very clear, thank you! I can't wait to discover the other videos :)

@elishaishaal7958 Жыл бұрын

Thank you! This is the one of the clearest and most concise explanations of any paper I've found thus far.

@siskon912 5 жыл бұрын

Great explanation. Thank you!

@myj313 5 жыл бұрын

Great summary of the paper! Thank you :)

@welcomeaioverlords 4 жыл бұрын

Excellent video, thanks for making it!

@Hyrtsi 3 жыл бұрын

Excellent explanation, thanks!! I'm going to make my own 9*9 alphago zero version

@dankelly Жыл бұрын

Awesome explination! (And, you're greenscreen work looks great!)

@ericfeuilleaubois40 6 жыл бұрын

Damn great video! Carry on ! Makes it very easy to get into these advanced subjects :)

@SiavashFahimi 5 жыл бұрын

Thank you, finally I found a good video on this paper.

@shamimhussain396 5 жыл бұрын

We, humans, run simulations in our heads all the time because sometimes simple intuitions are not enough... So, I guess, it isn't surprising that inclusion of Monte Carlo Tree Search would always drastically improve performance no matter how good the value function estimates are, even with the help of deep learning... The question is how to search more efficiently and also how to build an efficient model...

@Moonz97 5 жыл бұрын

Loving your channel!

@Leibniz_28 4 жыл бұрын

Excellent explanation, thanks

@arijit07 4 жыл бұрын

This is the best video regarding Alpha GO paper. Just Amazing !!!

@2000chinadragon 6 жыл бұрын

Fantastic explanation! Few people balance simplicity with thoroughness as well as you do.

@ArxivInsights 6 жыл бұрын

That's the goal indeed, thx for the feedback :)

@SreeramAjay 6 жыл бұрын

Wow, this is really a great explanation

@daehankim2437 6 жыл бұрын

This helps a lot for those who need insights of machine learning trends :)

@diracsea2774 6 жыл бұрын

Excellent Presentation

@guitarchessplayer 6 жыл бұрын

Thanks for the great explanation! Im still wondering how alpha go zero learns that certain moves are obviously bad like playin in the corner for example without playing a game till the end?

@LOGICZOMBIE 2 жыл бұрын

GREAT WORK

@brahimelmssilha7234 6 жыл бұрын

maaaan you are doing a great work keep up

@alaad1009 4 ай бұрын

Excellent video

@curiousalchemist 6 жыл бұрын

Brilliant - thanks for this! Really enjoyed watching and I think it takes away all the right information from the paper. Just a quick point: is there any chance you could quieten down the background music for your next video? It was slightly distracting and I think it detracted a bit from your great explanation! Merry Christmas!

@ArxivInsights 6 жыл бұрын

Thanks a lot, great to hear :) And for the background music: I got the same feedback from a few different people! This was my first video (every other video you'll find on my channel has this fixed) :p

@matrixmoeniaclegacy 4 жыл бұрын

Thank you for this valuable explanation! I just want to request, if you would like to highlight the parts of your images more, that you are talking about? E.g. in the diagrams, you show. This would make it easier to follow!

@RomeoKienzler 4 жыл бұрын

7:27 u said "certain depth", did u mean "certain width"? btw. I'd say this is one of the very best channels on the DL topic I've ever seen! thanks so much!

@Bippy55 Жыл бұрын

10 Nov 2022 - I just discovered this video and your channel. Fantastic explanation of granted a difficult subject to even tackle. Did you mention what kind of computer hardware the newest AlphaGo system uses? I assume it’s a mainframe of some type. Also, I wonder if the system can decide in advance to play a peaceful game or a highly combative game? I have seen games where they were very few prisoners taken off the board. Otherwise called the peaceful game. Still there is a winner nonetheless. Anyway bravo for an excellent video.

@siddarthc7091 Жыл бұрын

the transition 'dhkk' hits hard

@seleejanegaelebale9192 4 жыл бұрын

Thanks for the impressive explanation, where can I learn the source code ?

@PALYGAP 6 жыл бұрын

A little question on the AlphaGo Zero MCTS. The Monte-carlo aspect of the Alpha Zero MCTS seems to be gone AFAIK. Can't see random number or random choices in that MCTS. It seem to be replaced by the CNN calculating the probability of a board position leading to victory. What's your take on it ?

@yolomein415 4 жыл бұрын

How is the value representation trained?

@PesarTarofi 6 жыл бұрын

cant wait for this thing to perform in sc2

@ClarkPotter 5 жыл бұрын

So what did you think?

@ruanjiayang 4 жыл бұрын

I am wondering what the output "policy vector" is like in the neural network

@karFLY1 6 жыл бұрын

Keep on! It's great.

@johnvonhorn2942 4 жыл бұрын

Xander, you look like "The Hoff" (David Hasslehoff) and that's a great look!

@SpaceExplorer 3 жыл бұрын

germans love david hasslehoff

@generichuman_ Жыл бұрын

The part I don't understand, is how they dispense with rollout in MCTS. It seems like this is the only way to get a ground truth value ( by reaching a terminal state) which can then be propagated back up the chain. If you reach a non terminal state, you're back propagating a value from the policy network which won't have useful values until it can be trained on useful values from the tree search. It seems like it's pulling itself up by it's bootstraps. Is it the case that the true values come from the odd time that a simulation reaches a terminal state? Or am I missing something fundamental?

@zzewt 2 ай бұрын

This is cool, but after the third random jumpscare sound I couldn't pay attention to what you were saying--all I could think about was when the next one would be. Gave up halfway through since it was stressing me out

@einemailadressenbesitzerei8816 3 жыл бұрын

What im interested in is how does the cnn work? What is the old target/label what is the new target/label? How does it update the label in training? so what is the prediction and what is the target. i mean the cnn is dependent on these. Ok the prediction is in the beginning randomized. The network says its a win, but in the end it was a loss so it can update the weigths. But i dont understand it in detail. I mean it needs to play the whole game until it can update the weights. Does it update every output (p,v) for every position played in this game? And somehow it plays the same position a lot of times to update the output.

@bhargav7476 2 жыл бұрын

that's some giga chad jaw u have there

@railgunpat3170 4 жыл бұрын

wow, i see some mistakes and also I didn't watch to much of your videos, but i find this channel is definitely underrated

@davidm.johnston8994 6 жыл бұрын

Interesting video :-)

@petercruz1688 6 жыл бұрын

Danger Will Robinson!!!

@zee1645 6 жыл бұрын

did u guys teach alphago how to beat security systems yet? and take over teh stock market and all nuclear launch codes?

@LaurentLaborde 3 жыл бұрын

i'm confused as your explanation contradict the points you mentionned in the introduction

@timleonard2668 6 жыл бұрын

Is it hard to implement this algorithm by myself? Could I create a super-human go Player on say a 7x7 board with just my laptop? How big could I make the board using just a normal laptop?

@ArxivInsights 6 жыл бұрын

There's a ton of open source implementations on GitHub: github.com/topics/alphago-zero but I know that many people are having issues reproducing the full strength of DeepMinds version. I don't know if the 'interesting game mechanics' of Go also emerge on a small board like 7x7 but I would guess that you can definitely train a decent model on a laptop for such a small game-state. Additionally, you could also apply the algorithm to chess, which has a much smaller branching factor so it's easier to train, although again I think in order to get decent results you would have to throw some cloud computing power in the mix :)

@robertkinslow8953 3 жыл бұрын

Ok. So how do you play and what is the idea of it?

@columbus8myhw 6 жыл бұрын

You should open this up to community captioning

@ArxivInsights 6 жыл бұрын

Done! Good suggestion :)

@thiru42 4 жыл бұрын

The history (extra 7 layers) is also used to identify ko (kind of similar to threefold repetition in Chess)

@dhrumilbarot1431 6 жыл бұрын

Epic👌👌👌👌

@tenacityisthekey 3 жыл бұрын

Does anybody know if the shape of the output layer changes for every phase of the game? In the video, he explains that the network produces probability distribution over possible moves and the number of possible moves is dynamic. Does that mean the output layer's dimension is also dynamic? If so, how is it achieved? Can anyone help me understand? Thanks!

@dshin83 Жыл бұрын

No, the output layer shape is static. You need to zero out the illegal moves from the output and then renormalize the probabilities to sum to 1.

@fyrerayne8882 2 жыл бұрын

🔥🧠🔥

@andresnet1827 3 жыл бұрын

Do Alphafold 2 when paper comes out)

@jakekim1357 4 жыл бұрын

yo this video is dope. it's super fire. Just letting u know i'm a dan player I want to know more about this i hope your my school teacher

@truehurukan 4 жыл бұрын

Thank you very much for the effort to educate the ignorants about the mechanisms linked to Alpha Go that is taken from a lot of ignorants like a "beast" like "terminator-like machine"... for me to simplify, I would say that the Go champion played versus 50000 professional go players -> no chance to win at all. As kasparov failed winning against 50000 human amateur players in this last decade.For me this is the massive parallel processes and recursive functions that beated the champion, technically beaten, this is definetly NOT intelligence but MASSIVELY PARALLEL processes clustered on thousands of CPU and GPU (floating point operations).

@arnavrawat9864 5 жыл бұрын

What if instead of self training, the ai is trained with the data of matches of a previously trained alpha zero ai?

@ArxivInsights 5 жыл бұрын

You could use that to speed up training in the beginning for version 2.0, but eventually performance will saturate and you wont do better.. And if you're building a version 2.0 you're hoping to do better than 1.0, so bootstrapping on gameplay that is worse than what you want to achieve doenst really makes sense. Similarly AlphaGoZero got better than AlphaGo by NOT bootstrapping on human games...

@RASKARZ34 4 жыл бұрын

+1 sub

@these2menrgannadoit 4 жыл бұрын

*Guitar Noise*

@XChen-te7hk 5 жыл бұрын

7:37 "... to play about 1600 simulations for every single board evaluation ... " I have a question. How do they do this? Even if it's not 19*19*17, let's say it's just 19*19*2 around 700, there would be (2**700) "board evaluation"s (maybe less if there are some illegal board states, but not much less). How could they even just play one simulation for so many "board evaluation"s? Guess I'm missing something...

@ArxivInsights 5 жыл бұрын

To build out the search tree, potential actions are sampled from the policy network (both for white and black moves) (so this hugely constrains the tree rollout to the most-likely / best moves). And then they also do pruning according to the value network, so whenever a part of the search tree results in a very low chance of winning (below some threshold according to the value network) it is discarded and not explored further. Combining these two approaches they build out the search tree and finally decide what move to play at the top of the tree by looking at the average value estimates for each of the tree's branches.

@vornamenachname906 2 жыл бұрын

11:39 two of four of your "very popular moves that stands for thousands of years" was dismissed by alphago after 50 hours of training.

@funkyderrick3589 4 жыл бұрын

very nice video ! put a microphone closer to you so we don't have that annoying reverb please

@angloland4539 9 ай бұрын

😊

@thekitchfamily 5 жыл бұрын

So not really AI, just number crunching using statistical analysis (montecarlo tree).

@ArxivInsights 5 жыл бұрын

Well, it uses deep neural nets (value estimate + policy net) + self-play training (Reinforcement Learning) to make the Monte Carlo Tree Search tractable on the exponentially scaling Go search space. So yes it's number crunching, but that's what AI is all about...

@confucamus3536 5 жыл бұрын

so really it's just one big ass flow chart, if this then that

@keylllogdark 4 жыл бұрын

my brain feels sexually abused after watching this video...

@josephxavier8636 4 жыл бұрын

🤣🤣

@RavenFly1232 4 жыл бұрын

Go play with yourself, so you can learn Go! XD

@muschas1 4 жыл бұрын

well, basically like humans acquire skills ... from scratch. cool

@MilesBellas 5 жыл бұрын

distracting music

@quangho8120 3 жыл бұрын

I think it's just fine

@shortsthatdontsuck2174 2 жыл бұрын

I could barely notice it

@IBMua 6 жыл бұрын

Anybody knows wtf they use a move history for? Aside from obfuscating learning and computation by many many times over? Seems like nonesense.

@ArxivInsights 6 жыл бұрын

Ihor Menshykov yeah had the same thought at first. Apparently including the history let's the network learn a form of attention over the important/active parts of the game. But I agree that theoretically, it shouldn't really be necessary... See the Reddit Q&A for more details!

@uncledevin700 5 жыл бұрын

It’s too difficult to explain how great of alphago to the people who don’t know how to play wei qi.

@420_gunna 6 жыл бұрын

You're...gwern? What?

@chaithanya4384 4 жыл бұрын

exactly! Is he?

@nightmareTomek Жыл бұрын

Nice video. But your sound effects and music are VERY loud. Maybe normalize a bit?

@dougdevine27 6 жыл бұрын

Good info but you should consider losing those annoying and jarring scene transition guitar strums+kick drum sounds. They detract very much from the presentation.

@ArxivInsights 6 жыл бұрын

dougdevine27 haha very true, this was my first video :p I removed them in all my other content ;) Unfortunately, once uploaded KZbin doesn't let you change anything anymore..