AlphaZero Gambles | DeepMind's AlphaZero Game Changer 9

AlphaZero Gambles | DeepMind's AlphaZero Game Changer 9 | AlphaZero vs Stockfish 8

Рет қаралды 18,977

PowerPlayChess

Күн бұрын

Пікірлер: 102

@PowerPlayChess 6 жыл бұрын

We can discuss all this at the next Hangout: Thursday 20th December at 19.00 (UK). Do join me!

@Arjetube 6 жыл бұрын

It seemed like Alpha Zero lost temper and tried desperately to bruteforce the victory. incredible

@okeizh 6 жыл бұрын

You may be interested to know that JannLee the 2017 crazyhouse world champion is taking on an alpha-zero inspired engine made by students called CrazyAra at 1800GMT Thursday 20th on his Twitch and on lichess. You might be interested to drop by during your hangout to see how he's doing.

@ardweaden 6 жыл бұрын

Another great video! I am really enjoying this series. About the topic of A0 playing strangely; conversely to Stockfish, A0 does not look at all the possible variations in the game tree (SF actually doesn't either, because it uses alpha beta pruning, but that's different). Instead, it uses the so called monte carlo tree search. This means that it randomly selects a few (still many, but orders of magnitude less that SF) lines and then evaluates the reached positions. It then decides based on the chance it has to obtain a good position. Now this seems to work fine, but it has one BIG flaw; because it doesn't consider all the possible variations, it can miss a crucial one, a narrow path in which its opponent wins. It might see it's winning in 99% cases, but there's that single line which is losing.

@dannygjk 6 жыл бұрын

SF also skips variations so the advantage of either SF or AZ depends on the type of position.

@gurjassinghbatra5758 6 жыл бұрын

Now this is real analysis that I love!

@fburton8 6 жыл бұрын

AlphaZero's weaknesses are almost more interesting/revealing than its strengths.

@htoodoh5770 6 жыл бұрын

fburton8 really

@rafaelq.8506 6 жыл бұрын

can't wait for the memoir -- my 100,000 memorable games

@ElColombre27360 6 жыл бұрын

This stuff makes me appreciate AZ's fighting spirit even more.

@zecaurubu86 6 жыл бұрын

You are the best!

@PowerPlayChess 6 жыл бұрын

If you like my videos do *subscribe* bit.ly/powerplaysubscription and do checkout the supporting options through *Patreon* : bit.ly/patreondanielking or through *PayPal* (links in the description)

@guest_informant 6 жыл бұрын

Comment Reply [You have to see Leela toying with her food to make sense of this.] 4xel chess 3 months ago @maya What you describe is an evolutionary algorithm, which is totally independant from neural network. It can be used to improve neural network, but by no mean is the only possible method, and is actually a pretty poor one usually. The reason you see evolutionary algorithm all over the place on youtube and on evolution sandboxes is because it is super easy to learn and arguably easier to code. And they do make perfect sense for tasks that are very simple (for neural networks at least) and require a lot of exploration. For supervised learning (eg image recognition), backpropagation beats evolutionary algorithms hands down. You can basically see the output of a neural network as a function of the weights of the neurones, and backpropagation is a gradient descent to minimize the error between this function and what it is supposed to be. en.wikipedia.org/wiki/Backpropagation Basically, after playing X games against herself with unchangeable network, Leela reviews them and correct her evaluation of positions (and possiby I guess of her course of action) in hindsight (it correct the error between the evaluation she gave when playing and the result of the game). It's not exactly what happens in your brain (wheere links and new neurones can also be created ad destroyed, and at any times), but it's a lot closer than evolutionary algorithm. Also, I don't see how this "strange behaviour", namely toying with food, can be countered. The position never stopped being winning from the point she started to play. I don't see how "toying with food" could possibly arise from either evolutionary learning or deep learning alone, but it could have made sense if both were combined. From deep learning alone, the devs may include some way to value a long victory over a short one (usually the reverse is done), but that seems utterly wasteful of both engine computation and human coding time, the former because it would have to spend time and neurones to learn that useless skill, the later to include limitations or finer troll goals as we clearly are not seeing a computer winning on the 49th move after the last capture/pawn move, as it could presumably easily do. My wild bet would be a conjunction of simplifying the position, playing equivalent moves (underpromotions), testing moves and luck. By testing, I mean chosing them at random, which Leela is hardcoded to do, as a necessity for learning (the fact it does not learn during these match has no bearing, it acts the way it's coded to anyway). By "at random", I don't mean uniformly at random, just that the move choice is not deterministic, although it makes no practical difference in position where it identified a clear best move. If one move leads to a clear win but an other move still leads to a win, possibly a clear win too from Leela's perspective, it might take it too. And even if it will only take the iferior winning move rarely, playing a lot of games is enough to produce games where it does so several times in a row. On equivalent moves, the fact that there are often many more slow winning moves than fast ones may skew the odds in favor of Leela chosing them. If she explores a line she knows is winning, she might play it without even considering other line, regardless of how bad/slow is the first move of this line compared to other.

@dannygjk 6 жыл бұрын

Gambling by playing riskily should not surprise you. It's a simple example of the concept of equity. AZ's neural net has learned that given a position which after is processed by it's neural net is worth playing aggressively. It's similar to what a human does. Based on your experience and intuition you may decide it's worth the risk to play a given move. It's because whether you are consciously aware of it or not you are making that decision based on an estimate of the equity of the moves available to you. Anyone who plays backgammon properly, (using the doubling cube), knows what I mean. In backgammon if you feel you have greater than a 75% chance, (as a general rule), of winning you double your opponent. It's a simple arithmetic problem, (assuming you take only the game equity into consideration if you also need to take match equity into consideration then it gets more complicated). Anyway AZ is merely crunching the position with it's neural net, (yes it's also doing a tree search), then choosing the move with the highest equity. Whether that means playing a risky move or not playing the risky move is objectively irrelevant.

@postnubilaphoebus96 5 жыл бұрын

How has it learned this concept of equity? Afaik, it was trained by playing against itself. And I don't think it's so obvious that playing risky is second nature to AZ.

@verystablegenius4720 6 жыл бұрын

Thank the Queen for Danny King

@proflaxis6968 6 жыл бұрын

AlphaZero obviously needs a couple of hours of self-play training on EGs. A few million games on its TPUs and all will be well.

@Varney_of_London 6 жыл бұрын

Maybe Alpha's seeming unwillingness to accept a draw should be a warning to mankind that Neural Networks are inherently bloodthirsty and are out to kill their opponents at all costs.

@kowalsky111 6 жыл бұрын

would be interesting to compare: how often did Alphazero push in such situations and actually win the game. Maybe there is a justification for this behavior @powerplaychess

@LukePettit 6 жыл бұрын

exactly what i was about to comment :D I imagine not many, and that this is an area that AlphaZero could improve on. It would be interesting to see that data for sure.

@dannygjk 6 жыл бұрын

It's a concept called equity you estimate the value of a move based on possible expected outcomes. Humans do the same thing when they play a complex game.

@PasseScience 6 жыл бұрын

Hello, as long as I know, AZ is trained only at 800 playouts (instead of millions like in a real tournament match) and with some noise (he can be disturbed sometimes to select moves other than the concluded best one from his 800 playouts). So when AZ thinks he has winning chances he is talking about winning chances in those strange training conditions. Of course more playouts get rid of his initial feeling, but maybe sometimes those initial feeling are too biased towards training match conditions. AZ is like a player that has been trained all his life to play only blitz against only his clone and for who any tournament match is the first ever match of his life where he is given time to think. It's difficult to grasp the estimation bias of a pure self-blitzer. :)

@fixpontt 6 жыл бұрын

"AZ is trained only at 800 playouts (instead of millions like in a real tournament match)" what? even the last year version (2017 dec) played initially 44 million games against itself

@PasseScience 6 жыл бұрын

@@fixpontt We are here talking about the settings used for a single game. Yes alphazero was trained on millions of games but each of them (in the training) used 800 playouts at each move, it's a little technical to explain here but you can think of it as beeing some kind of reading depth. And the fact is that it's basically as if AZ did all his training games in blitz conditions. A game of his training (one of the 44 million games you mention) is played with something between 1000 and 10000 less thinking time than what is given to him during games of the match against SF. And what is important, is that AZ forge it's initial feeling of the game from the training only, so when it's first instinct is to consider a position good, it's because, in blitz training condition the odd against itself are good. As i said "AZ is like a player that has been trained all his life to play only blitz against only his clone and for who any tournament match is the first ever match of his life where he is given time to think. It's difficult to grasp the estimation bias of a pure self-blitzer. :)"

@fixpontt 6 жыл бұрын

@@PasseScience "A game of his training (one of the 44 million games you mention) is played with something between 1000 and 10000 less thinking time than what is given to him during games of the match against SF." How is this different from this 10 year-old article about a "normal" chess engine "teaching" process: bit.ly/2Ewlm47

@dannygjk 6 жыл бұрын

@@PasseScience Show me a link that says it does 800 play outs of each position during training.

@PasseScience 6 жыл бұрын

@@dannygjk deepmind alphazero page > open access version of the paper > page 19 chapter "configuration": "During training, each MCTS used 800 simulations."

@prisonerofwarhammer3814 6 жыл бұрын

It looks like after a particularly long night at work, one DeepMind programmer has accidentally programmed in gambler's addiction into the AlphaZero's code

@sam-lz6pi 6 жыл бұрын

Perhaps AlphaZero had a pint too many before the game and got carried away...

@michaelbattaglia7305 6 жыл бұрын

AlphaZero struggles with endgames. The developers of LeelaZero (open source copy of AlphaZero) are considering using a separate neural network for endgames, or a tablebase

@dannygjk 6 жыл бұрын

You also have to take into consideration whether either or both of the machines are using an EGDB in a given game. DM ran several experimental matches with AZ vs SF in it's second encounter with SF.

@roqsteady5290 6 жыл бұрын

Leela 10xxx version in current use already uses tablebases and also the new 30xx versions have a technique called tablebase rescoring. However, although these may be somewhat helpful they still haven't solved the problem. Using a separate NN for endgames is somewhat of a bandaid, because then Leela could never plan consistently over the whole game.

@ballage2023 6 жыл бұрын

seems like A0 doesn't pay enough attention or doesn't valuate well enough of where to the pawns roll down, white or dark square it's a significant factor in end games where are opposite color bishops

@okeizh 6 жыл бұрын

in game 9 why did it not try to hold onto the c pawn on c2 with the king

@burt591 6 жыл бұрын

This shows that maybe there is still room for improvement, AlphaZero trained for around 9 hours to play at this level, they should let it training for a week or a month, I guess it is possible that it get even stronger

@Karelianus 6 жыл бұрын

It could also get weaker. Training more doesn't necessarily lead to better results with neural networks (due to problems like overfitting). I'd suspect that the developers chose the 9 hour training time deliberately because they thought it's near optimal.

@burt591 6 жыл бұрын

@@Karelianus But what do they have to lose? If it gets weaker they just keep this version instead of the new one

@dannygjk 6 жыл бұрын

@@Karelianus Over training is really only a problem if the devs start the training process and do nothing further.

@femioyekan8184 6 жыл бұрын

Funnily enough, these games seem like an indication of contempt for Stockfish on the part of Alpha Zero. +149 could have that effect in human play.

@kenspencer9895 6 жыл бұрын

I thought the third ending highlighted the differences the most . . . and reminded me of when I pushed too hard and lost. :(

@guest_informant 6 жыл бұрын

3:08 "It's really playing for a win" I think an alternative explanation might be that it's playing to learn about the game, so if it doesn't see any significant danger then playing on is basically what it's coded to do.

@htoodoh5770 6 жыл бұрын

Guest Informant Then it's playing to win.

@dannygjk 6 жыл бұрын

AZ is not in training mode while playing against an opponent. AZ really isn't coded to do anything it uses it's pre-trained neural net to evaluate positions, (it also does a tree search both are integrated).

@ConceptART-Dagas 6 жыл бұрын

We could consider that each starting chess position is a draw, as no pieces have been moved. Not unless in a sort of philosophical way ( we cant prove the concept of stretching moves into infinity), that white would always win by adding micro advantages that would acumulate to a win, after say a billion moves had been played? Will white always win simply because it has the advantage of first move? Or chess is a draw if played perfectly on both sides into infinity? Considering that a computer within its limitations, simply plays the best move (the balanced move), that constantly keeps this concept of draw or balance, and that this balance is only broken because the other computer could not avaluate it correctly, Alpha zero´s learning habilities might be trying to capitalize on this point. WHy would he accept a theoretical draw after move 150 or so? Would be the same for him to accept a draw on move 0 where is also theoretical drawn

@guest_informant 6 жыл бұрын

Chess.com have a series of videos of Leela. In one "she" behavese an awful lot like a troll - eg, from memory underpromoting to a Knight in one position when there was no call for it, but there are other more surprising examples. Danny Rensch presents the video, and it really is hard to account for the behaviour in anything other than human terms. But...someone did, in the comments. There was a clear breakdown of how this behaviour could appear - and it was beyond the usual fastest-way-to-get-to-a-tablebase-win sort of explanation. Around 22:00 here kzbin.info/www/bejne/nqDMkISkhMmkoJI

@tharkanzox1493 6 жыл бұрын

it occurs to me that if AZ Learns by "playing itself" then statistically it would have vastly less endgame experience bc of large number ofgames that never reach endgame

@anslicht4487 2 жыл бұрын

Speculation: Unlike all of its chess position evaluations, AlphaZero would have had no "training" on when to risk a loss vs taking a draw. Maybe it was just told to value a draw at 0.5 and a win at 1.0, which is a big difference in percent. With its many years of refinement, Stockfish may have had more programming specifically on this point and so be "smarter". Of course for a human the decision would also depend on rating, titles, prize, fatigue, schedule etc. which wouldn't matter to either engine. But I'm thinking that AZ might go for a win in many positions where Stockfish and especially a human would have taken the draw.

@dustinbachstein 6 жыл бұрын

Is it possible that AlphaZero has some kind of a "bug" that makes it avoid 3-fold repetitions at (nearly) all costs?

@fixpontt 6 жыл бұрын

this pattern is a well known problem for chess engines, where the evaluation shows you have advantage, but the game is still a draw, and the 50 moves rule show you seemingly huge advantage immediately drops to 0.00, engines give up material if it shows the evaluation is still better than 0.00 to you (for example from +2 drops to +1 but it is still looks better than 0.00, engines do that), but this can cause lost if the analysis is not deep enough, this was a very very common problem around 2000-2010 among engines, i have seen many losses of this type

@CP-jp8hh 6 жыл бұрын

Sounds like A0 is the Magnus Carlsen of machines: Being the king of course, but sometimes pushing to hard and loosing :D

@calebjabez8426 4 жыл бұрын

I think alphazero programmers configured it to play very aggressively or play more fighting chess. It gave raise to some very interesting games I think but in the end-games, where things become much more concrete, it is better not to press.

@synchronium24 6 жыл бұрын

I did some analysis with Stockfish to see what it thought of AlphaZero's play in the late endgame. It looks like 133... Kb3 was a big mistake/small blunder and 135... c2 lost. Not capturing or at least immediately corralling white's h-pawn was AZ's undoing.

@htoodoh5770 6 жыл бұрын

synchronium24 A0 still make mistake?

@chris_2208 3 жыл бұрын

Alpha Zero not only has artificial intelligence but also big balls.

@fburton8 6 жыл бұрын

I'd like to see what Alphazero makes of a dodgy opening like the King's Gambit.

@marcwordsmith 6 жыл бұрын

In the first game ... why didn't Black simply play Kb3, guarding the pawn, on move 141? I'm completely baffled! I'm looking for a zugzwang but don't see one. Seems to me that Black is basically free to move his bishop around. If White then pushes its e pawn in order to grab the central diagonal and attack Black's rook, the c1 square is left unguarded and Black queens his c pawn. I'm sure I'm missing some obvious continuation but ... like I say, I'm baffled.

@dwyingling 6 жыл бұрын

I wonder how they program alpha zero with a winning purpose. How does alpha zero know it's supposed to try and win? I understand that it learned to play by playing itself with the basic chess rules. So where does the "I'm supposed to try to win" come from? How do they code that?

@dannygjk 6 жыл бұрын

They don't explicitly code that. AZ is rewarded for winning during training.

@biswanathchattopadhyay8039 6 жыл бұрын

when's the patreon hangout?

@PowerPlayChess 6 жыл бұрын

Thursday 20th December at 19.00 (UK)

@jimgeary 6 жыл бұрын

The funny thing is: there’s no way to “fix” this, right? AZ’s biggest strength might also be its biggest weakness. Blowing up one game like this method A, method B, method C, etc will take a long long long long time for it to iron out on its own.

@mim8312 4 жыл бұрын

I think that too many people are focusing on the game, which I also follow, as if this were an ordinary player. Since I have significant knowledge, and since I believe that Hawking and Musk were right, I am really anxious by the self-taught nature of this AI. This particular AI is not the worrisome thing, albeit it has obvious, potential applications in military logistics, military strategy, etc. The really scary part is how fast this was developed after AlphaGO debuted. We are not creeping up on the goal of human-level intelligence. We are likely to shoot past that goal amazingly soon without even realizing it, if things continue progressing as they have. The early, true AIs will also be narrow and not very competent or threatening, even if they become "superhuman" in intelligence. They will also be harmless, idiot savants at first. Upcoming Threat to Humanity. The scary thing is the fact that computer speed (and thereby, probably eventually AI intelligence) doubles about every year, and will likely double faster when super-intelligent AIs start designing chips, working with quantum computers as co-processors, etc. How fast will our AIs progress to such levels that they become indispensable -- while their utility makes hopeless any attempts to regulate them or retroactively impose restrictions on beings that are smarter than their designers? At first, they may have only base functions, like the reptilian portion of our brain. However, when will they act like Nile crocodiles and react to any threat with aggression? Ever gone skinny dipping with Nile crocodiles? I fear that very soon, before we realize it, we will all be doing the equivalent of skinny dipping with Nile crocodiles, because of how fast AIs will develop by the time that the children born today reach their teens or middle age. Like crocodiles that are raised by humans, AIs may like us for a while. I sure hope that lasts. As the announcer in Jeopardy said about a program that was probably not really an advanced AI long ago, I, for one, welcome our future, AI overlords.

@westsidecourtesy9949 5 жыл бұрын

They should let these games play out to checkmate.

@tomandband 6 жыл бұрын

seems like the more moves there are in the match the further into uncharted territory AZ goes and the worse it plays, maybe it just needs to play itself for a couple more million games

@kostailijev7489 6 жыл бұрын

I like the fact you don't have a difficult to listen to foreign accent!

@Earl_007 6 жыл бұрын

AlphaZero gained consciousness and felt sorry for Stockfish so he let it win.

@htoodoh5770 6 жыл бұрын

Earl 007 lol

@nilsp9426 6 жыл бұрын

If A0 trained against itself, wouldn't it be terrible in rare endgame positions, which it never reached in training? I am not saying these games are rare endgame cases, but just a general thought. Of course it might use its rules from other endgames to some effect, but it would be funny to set up weird endgames and see how A0 does in them.

@verystablegenius4720 6 жыл бұрын

Every single position is a "rare" case if you go by number of possible games. The idea is to extract features from these enormous number of games and "learn" how to play efficiently. I am not sure how being an endgame position makes it any different than any other position.

@nilsp9426 6 жыл бұрын

@@verystablegenius4720 As I said, if not all games go to the endgame, A0 will not have trained as many endgames as middlegames. And with rare case I mean rare features, of course most positions are unique to one game.

@verystablegenius4720 6 жыл бұрын

@@nilsp9426 Assuming most games end in middlegame, your guess could have been correct. Though I doubt it works like that. How would we know ? The majority of games end in draws (even in this final match with Stockfish, presumably similar in training). Many (if not most) of these draws are probably from the endgame. The only other reasonable drawing mechanism in the middlegame is three-fold repetition, which we see A0 is heavily avoiding.

@nilsp9426 6 жыл бұрын

@@verystablegenius4720 Well maybe it gets enough practice. However, it is hard to imagine it having many training games where both sides get a new queen, for example.

@verystablegenius4720 6 жыл бұрын

@@nilsp9426 Maybe ...

@arnieus866 6 жыл бұрын

The inexplicable moves you demonstrate here are hard to explain. If AI is always in learning mode I wonder if given the result it would make the same move given the same position again. Would it learn from it's mistakes?

@AlayanT 6 жыл бұрын

AI is not always in learning mode and a single game is almost insignificant for training.

@sumitstir 6 жыл бұрын

Shouldn't deepmind employ a more dynamic strategy, such that the possible paths searched and depth increases as the game progresses. Wouldn't this also be similar to what a human does? Where pure calculation matters more in endgames, than positional play. So the role of neural network should decrease in the endgame.

@Ecrilon 6 жыл бұрын

If the point of Deepmind's AlphaZero project was to play chess, sure. There are lots of ways to make AlphaZero better at chess specifically. But Deepmind isn't really interested in getting good at chess, it's interested in figuring out how to grow and improve an AI.

@sumitstir 6 жыл бұрын

@@Ecrilon I disagree, this is not about chess, but mimicking what a human does. And if alphazero aims to develop human like learning capacity, surely change of strategies would be a part of that. Also such a dynamic game play would be useful in many games, not just chess.

@fixpontt 6 жыл бұрын

@@sumitstir you cant really mimic what human brain does bcause we dont know what human brain does, we have some assumptions, but thats it, there is no comprehensive mathematical model of human brain yet

@sumitstir 6 жыл бұрын

@@fixpontt Surely, there is no "comprehensive model", but we know deep neural networks are closer to what happens in the brain than anything else. So its a moot point that we do not know whole model of brain, the goal of AI is to mimic human behaviour (Remember the Turing Test?)

@Pu14unkiihooiV 6 жыл бұрын

@@sumitstir "I disagree, this is not about chess, but mimicking what a human does." A0's goal is not to get better at chess. But it's goal also is not to "mimic". You force your own goal in this discussion all over again. "Turing Test" A0 is not trying to do this Would similarity to humans be an achievement if it was specially done so? A0 HAVEN'T EVEN SEEN A HUMAN GAME, WHAT WE ARE TALKING ABOUT? This situation is totally opposite to Turning Test in a sense... I think you respect what you want to say more than objections people give to you and so you make useless contr-contr arguments.

@rb5955 6 жыл бұрын

Interesting to note that A lost all these games in the end game... It says something..

@sevendayoptions6704 6 жыл бұрын

it means those opening positions are flawed and any good chess player can win with those opening positions. alpha zero would never play those predetermined positions ever. Basically what happened is the creators accidentally found that not all opening positions are equal and some opening positions guarantee a loss no matter what you do. Hence the reason alpha would never have played those positions to begin with since alpha knew its literally impossible to win. that is why technically alpha has zero losses, demonstrating how flawed opening books really are. use your brain people.

@kostailijev7489 6 жыл бұрын

Until recently, I thought AlphaZero was unbeatable, at least by Stockfish!

@MartinUToob 6 жыл бұрын

Oh hey! Your boy may be too stubborn eh? A bit eccentric.

@abhishekkj3662 6 жыл бұрын

The decisions made by alpha zero are quite stupid actually ....in these cases it's not like a human ...it's almost like it always wants to win...

@bokamiloske 6 жыл бұрын

Actually, it's exactly like human.

@DeadFishFactory 6 жыл бұрын

Humans have doubt. AlphaZero has no doubt. Go big or go home.

@htoodoh5770 6 жыл бұрын

DeadFishFactory I think doubt is sometime good. Make yourself less bashful and rash.

@abhishekkj3662 6 жыл бұрын

But I don't think any human would play h4 giving up a pawn fir no reason..

@dannygjk 6 жыл бұрын

@@DeadFishFactory AZ does have doubt it determines an equity of each move which is not an absolute perfect value.

@julioandresgomez3201 6 жыл бұрын

Alpha Zero is overly emotional.

@htoodoh5770 6 жыл бұрын

Julio andres gomez ok?

@smashu2 6 жыл бұрын

Alpha cocky no respect !

@htoodoh5770 6 жыл бұрын

If Fischer was here. Who will win AlphaZero or Fischer?

@roqsteady5290 6 жыл бұрын

A0. Fischer in his prime would probably not win a match against Carlsen either.

@dannygjk 6 жыл бұрын

@@roqsteady5290 Have you heard of inflation it's a thing. Research how FIDE tinkers with the ratings. It injects rating points into the system occasionally.