Deepmind AlphaZero - Mastering Games Without Human Knowledge

Рет қаралды 192,016

Күн бұрын

2017 NIPS Keynote by DeepMind's David Silver. Dr. David Silver leads the reinforcement learning research group at DeepMind and is lead researcher on AlphaGo. He graduated from Cambridge University in 1997 with the Addison-Wesley award.
Recorded: December 6th, 2017

Пікірлер: 212

@SafeTrucking 5 жыл бұрын

Thank you for an excellent explanation. I'm looking forward to seeing where this leads.

@palfers1 6 жыл бұрын

The best exposition I've seen to date on what promises to be an AGI

@sucim 6 жыл бұрын

maybe the closest we got but its nevertheless soo utterly far from that

@confucamus3536 5 жыл бұрын

it doesn't know "why" yet

@casewhite5048 5 жыл бұрын

@@confucamus3536 or who

@MrCmon113 5 жыл бұрын

Not really. Boardgames are very easy to understand. We can write down all of the rules in minutes. An AGI would have to understand much more complicated and uncertain scenarios all at the same time and fit them into a universal theory of action. We really have no clue how to do that other than to rebuild a brain.

@Xpertman213 3 жыл бұрын

@@MrCmon113 In a way though, can't you write down the rules of life in a few minutes? Find food, overcome/avoid danger, reproduce. I think the biggest thing we have learned so far is that intelligence isn't 'built' but rather that it 'grows' into the environment that surrounds it. A philosopher I respect describes organism/environment as one continuous field from which intelligence emerges. It gets me wondering whether the whole realm of human experience that we consider meaningful is largely a 'bug', an example of an organism's systems growing so well into the environment that these systems end up 'inventing' new problems that don't actually exist because they can't be conveniently turned off.

@drancisdrake 5 жыл бұрын

Amazing talk, thanks to speaker and uploader

@keguthueringer5136 6 жыл бұрын

thx for the very clear presentation

@alph4966 6 жыл бұрын

It's a wonderful achievement. I think that it has the potential to change the world.

@Wemdiculous 6 жыл бұрын

Thought he said he had automated several talks. I thought, man that would be super impressive.

@peters972 3 жыл бұрын

I wonder if you could use this to analyze where a kid is going wrong in his math understanding for example, as a tool to teach kids math. It could pinpoint the area of confusion and help the kid bridge that and gain insight by providing simpler examples.

@petergreen5337 3 ай бұрын

❤Thank you very much publisher beautiful lesson and demonstration..

@kayrosis5523 6 жыл бұрын

I'd love to see this in more complex and open ended computer games. If you tell AlphaZero to play Cities Skylines and maximize the population and add secondary constraints like environmental quality and rci balance, I wonder what it would come up with

@profanemagic5671 5 жыл бұрын

Check out "Deepmind vs Starcraft 2" =)

@bjornargw 5 жыл бұрын

Keep it simple. Way to go

@richiester 5 жыл бұрын

And at this point Stockfish resigned the game

@vegahimsa3057 3 жыл бұрын

It's often said RL without search. But there's always a search tree.

@PaytonTroy 6 жыл бұрын

Just a thought. Has it been contemplated what would happen if we could teach AlphaGoZero to teach humans play Go? How would that develop and what kind of players (who had never played GO) would that produce. And what would happen the day you put a traditional human player up against an AlphaGoZero player. I find very interesting.

@thezyreick4289 5 жыл бұрын

that would be a truly beneficial purpose for an AI. To learn and master a certain skill, then pass on the knowledge of the best methods it has found as a master of that skill to human recipients.

@sarveshpadav2881 4 ай бұрын

@@thezyreick4289 AI sensei lol

@generichuman_ 3 жыл бұрын

One part I don't quite understand, is how the neural net learns to evaluate a position without the random rollout. It seems like the rollout is the only thing that begins the process of attaching a value to a position, which then can be used to train the network.

@MOSMASTERING 5 жыл бұрын

Is the goal to win, or does the AI know generally how useful each move is throughout an entire tree of game moves?

@jaanuskiipli4647 6 жыл бұрын

When will we see a rematch of AlphaZero against Stockfish or asmFish?

@forestpepper3621 6 жыл бұрын

If you look at the three graphs at 30:25, you'll notice "jumps" in all three curves. At a jump, from left to right, the curve starts to level off, and then abruptly shoots up nearly vertically again, the slope changing quite suddenly. There must be some significance to these jumps. Perhaps the algorithm has suddenly discovered a particularly effective heuristic for evaluating board positions, or the algorithm actually is developing something like human "insight" or "intuition" at these jumps.

@Dirtfire 6 жыл бұрын

Machine Epiphany, perhaps?

@Dirtfire 6 жыл бұрын

Haiku Shogi You should look up "3d creature evolution" on youtube and check out the end-over-end worm, among other evolved creatures. Amazing stuff.

@PHeMoX 5 жыл бұрын

"Perhaps the algorithm has suddenly discovered a particularly effective heuristic for evaluating board positions, or the algorithm actually is developing something like human "insight" or "intuition" at these jumps." Probably merely a side effect of when a certain variation of the algoritms takes longer to fail. Doesn't translate to an epiphany at all.

@yoloswaggins2161 5 жыл бұрын

There's an extremely simple answer to this. It's when they lower the learning rates, the lr schedule is in the paper.

@bunnygummybear9638 6 жыл бұрын

I hope they release the remaining 90 games of alpha zero and stockfish 8

@robiniekiller45 6 жыл бұрын

bunny gummybear hope they do 100 new games with stockfish 9 with proper speed and release it

@Cube930 6 жыл бұрын

Robinie killer az isnt doing chess anymore. There is a new ai called leela chess zero tho, and you can help it train. Still not quite az level i think.

@robiniekiller45 6 жыл бұрын

Cube930 yes its nice

@MrMartinpro 6 жыл бұрын

They released some games deepmind.com/research/alphago/alphazero-resources/

@teslathejolteon8007 6 жыл бұрын

bunny gummybear I also want to see those games really bad

@Larkinchance 5 жыл бұрын

Is it available on the X-Box?

@Metacognition88 6 жыл бұрын

How would an alpha program do against live players in a fps shooter? In the beginning it would suck but would it still improve since there are other variables inherent in humans involved such as reaction time, accuracy of aiming down sights, etc and not just calculations of where a persons next position or move would be?

@PHeMoX 5 жыл бұрын

They could add some type of sensory system to allow for the AI to understand what happens (not unlike an actual bot in any FPS game; they focus on only updating it's behavioural routines to increase it's playstyle intelligently). Either that, or do what they've done learning an AI to play games like Mario 'as a player', where only the visual feedback on screen is taken, with input signals send to regular controls. The problem is, that approach probably doesn't really easily translate to a fully fledged 3D game I'd say. Conceptually it can be done though. To be honest, I'm not sure that approach is really a good method of making an AI learn about games in a more conceptual way.

@thezyreick4289 5 жыл бұрын

it is much more simplistic than you would think. so a better question is what would your goal in having it go into the FPS be? because if it is just to win, there would be no human alive that could beat the AI, human reaction time is not fast enough to allow anyone to correctly aim and then begin shooting at the AI before it has aimed at you and fired enough rounds to drop your hp to zero. you forget to an AI, distance is not a factor since eyesight is not necessary, aiming is not a factor since the code would simply pinpoint the exact ingame coordinates your character is located in on the 3d graph that the game is built on, then simply check to see if there is any obstructions that would prevent the bullet from doing damage to the player, and it would shoot the second there are not. so most player deaths would likely result in being shot through walls that are able to be shot through and ect. or if no walls like that exist, then most player deaths would happen the second there is no longer an obstruction between the AI's line of sight and the player. the only times I can think of where a player would kill the AI is with a randomly thrown grenade over the wall or across the map, but keep in mind the AI would be doing this as well, with perfect 100% accuracy and grenades cooked perfectly to the point they explode the millisecond they are in killrange

@linhusp2349 4 жыл бұрын

@@thezyreick4289 Well then lets buy a bunch of smoke grenades and freefire the AI

@robostain_9722 6 жыл бұрын

I'm waiting for the day when AI will be able to design new games from scratch instead of just learning how to play already existing ones.

@samosa9488 6 жыл бұрын

When such knowledge arrives, believe me, making games would be one of the least interesting things (for scientists) it would be doing.

@timothybolshaw 6 жыл бұрын

+robostain_ This is not unprecedented, actually. Look up Angelina (used in Mike Cook's research). The games it comes up with are not board games (at least, I do not think Mike's research has gone in that direction) but the game play has been quite impressive.

@Twizzzle 6 жыл бұрын

Dude! That's an awesome idea.

@Ryan1729 6 жыл бұрын

There's also Cameron Browne's Ludi, which generates board games, some of which have even been published commercially!

@jean-marcfueri6678 6 жыл бұрын

Well we won't even understand the rules...

@peterpetrov6522 5 жыл бұрын

Astonishing games from Alpha zero! Stockfish calculates 80mln positions per second. Alpha zero 70,000. Human champion Carlsen probably can do 7. Human intuition is 10,000x better than AI, but the amazing part is that AI intuition is 1000x better than a brute force approach. It seems that AI is about halfway there. BTW all 3 players are not equal, and Stockfish would probably need 10 or 100x increase in speed if it wasn't equipped with table bases, heuristics, openings, etc. How many years will it take for the 2nd half of the road to AGI? For starters, how long did it take for the first half?

@basteqss8859 6 жыл бұрын

To tell you the truth my friends, I'm more afraid of this technology than I'm fascinated in it. Greetings! ;)

@jaywulf 4 жыл бұрын

"And in this interesting experiment funded by DARPA... we have uploaded AlphaZero+ Battlefield edition into Boston Robotics warframe chassis" "Hahaha... of course we have friend/foe algorithms in place... in the full release version"

@mbree3998 6 жыл бұрын

wow, scary and interesting, search, finance, networks, traffic, war, etc

@skierpage 6 жыл бұрын

I'm confused whether AlphaGo Zero's initial Go knowledge only scores final positions (all stones played or both players pass), or whether it includes scoring intermediate positions. It seems learning from scratch whether a move is good based on whether it finally wins 350 random moves later is impossible! And I have the same question for AlpaZero chess, does it start knowing that losing pieces is generally a bad sign?

@jurajhadzala951 5 жыл бұрын

skierpage i think no, it starst only knowing the movement rules, Nothing else

@EORANDAY 6 жыл бұрын

What interest me about all of this is where it's ultimately going. IBM is not doing this so that their AI can optimize game play. Eventually all of this could be applied to any domain, finance, engineering, etc. If this happens in next few decades, what kind of job market will exist?

@introvertedskeptic33 6 жыл бұрын

Creative jobs? Though I've heard arguments that can also be taught...

@guupser 6 жыл бұрын

Yeah, so please check the TED talk by Rutger Bregman on universal basic income: kzbin.info/www/bejne/r5WulJR_epuCZ80

@krashd 5 жыл бұрын

Hopefully no job market so humans can actually live their lives instead of spending it as a slave to cash.

@swfsql 6 жыл бұрын

I wonder if they will have one single network that could work for more than on type of game; that may be interesting

@amaarquadri 4 жыл бұрын

How does the model begin learning? If it starts with random weights making random predictions and doesn't use rollout from MCTS to get some "non-random" (statistical) data to make its decisions, wouldn't it just produce random training data?

@Michegianni 3 жыл бұрын

Part of the value and policy system is to learn (and improve) which moves are more relevant and which (better) variations to search through - which is why the data ends up being way less random. Video explains 80k searches instead of 70 million for Stockfish due to this efficiency in the algorithm.

@asink5928 3 жыл бұрын

I was high af watching this and I could only focus on this guy saying “uuuum”

@duskie69 6 жыл бұрын

Yea, but can it perform on a cold wet night in Stoke....

@doubleggamingmeruz678 5 жыл бұрын

But can it play dark souls?

@iman388 4 жыл бұрын

matthew pepperell hahahahahaha

@Banjara1345 4 жыл бұрын

JLingz can

@acWeishan 4 жыл бұрын

Shall we play a game :) So with super intelligent AI and the use of more and more autonomous weapons you really can see we could advance to clone wars and then Skynet...

@MOSMASTERING 5 жыл бұрын

Could you 'solve' GO with a Quantum computer with enough Qubits to represent the number of possible moves?

@FiveThings2018 5 жыл бұрын

Maybe quantum computers (If its possible to build them) will be able to solve the game of GO. But as for now, to solve the game of GO you would need the combined power of all the computers in the world and run them for millions of years.

@FloydMaxwell 5 жыл бұрын

Can AlphaZero save its state? i.e. do wins and better moves get converted into usable future rules?

@TernaryM01 5 жыл бұрын

It's a neural network, so every lesson it has learned so far is implicitly stored in the current weights of the neurons. However, there is no obvious way to understand what is going there, or in other words, it cannot 'explain' why it makes a move in a logical or any way understandable to humans.

@Michegianni 3 жыл бұрын

It appears that's exactly what it does - and then it plays itself and imroves and then saves the new state - then keeps playing itself again and again with every improvement state saved giving it more knowlege. It's basically been coded to continue practicing against itself to keep getting better and better. The better it gets the more it learns.

@franatrturcech8484 3 жыл бұрын

YES

@AvielLivay Жыл бұрын

Correct me if I am wrong - Deepmind’s goal is to solve AGI. Frankly, I didn’t witness here the innovation that I was expecting to see. I was under the impression that since alpha-beta tree search was prohibitively expensive, DeepMind came up with something profoundly different, a new approach that takes us closer to AGI. Something that doesn’t require 80M calculations per second but is limited to the number of calculations per second that a human being can handle and still can beat world champions. Something that thinks like us humans, or like cats, or ants… And what I got here is MCTS instead of alpha-beta combined with start-from-scratch reinforcement learning using a neural network. Is this getting us closer to AGI? Also, I was expecting to go into the deep neural network design details. Some voodoo is going on here where you input the board state and get P and V. Surely; this doesn’t happen with every neural network; there’s so much to say here about this neural architecture - no?

@NathanOkun 5 жыл бұрын

In Go the curve of AlphaGo Zero started at zero and rose very steeply until a level where only small variations allowed winning, so the curve flattened out and had only a very shallow improvement from then on per games played. Is that the curve shape for all such learning curves? Could it not do that and suddenly at some point see a whole new way to play and jump up again so that there are more than one flattened step in this curve? Like it suddenly found that Newton was wrong and Einstein was right and switched its algorithms to match. This type of thing could make the results of your AI rapidly become totally unintelligible to a human evaluator -- it wins but cannot even remotely explain to a mere human how it does. Have you had this experience yet?

@AutomaticHourglass 2 жыл бұрын

Usually on any learning algorithm the success criteria(or loss value) follows diminishing returns because after a while you reach to the constraints of the game and have to -fine tune- your strategy to gain slight advantage. Although it looks small on the ELO rating, keep in mind that ELO rating is logarithmic so -arbitrary numbers- if you make yourself 2 times better, your ELO rating increases by 50 which tells that even at the end of the curve Alpha Zero still increases its skill by many times due to this constraint.

@NathanOkun 2 жыл бұрын

Thank you.

@rogelioarguello2800 4 жыл бұрын

Intelligence without Desire or Affect is death

@gregh7457 3 жыл бұрын

but yet intelligence exists. go figure

@kimyunmi452 5 жыл бұрын

Lets see how this is applied to the game of financial trading.

@ipuhbamrash6708 4 жыл бұрын

Any game if digress from rules, deep reinforcement learning should perform poor. These things are still in naive stages. We have not still have the resolutions of which algorithms perform better in which scenarios. A lot more to come!

@gregh7457 3 жыл бұрын

simple algorithms can and are achieving this right now. They're the ones causing the wild swings in the market. I beat the machines sometimes by betting against them by catching the falling knife they're throwing

@Michegianni 3 жыл бұрын

Unable to access the platform and perform the necessary number of iterations because of the nature of the platform not allowing you to do millions of trades in a short time frame. Learning will therefore be limited to the timeframes allowed by the trading platform.

@aboninna 5 жыл бұрын

perfect can i get the games

@moonboy5851 5 жыл бұрын

I wonder what would happen if you just left it on to keep learning...

@Innosos 6 жыл бұрын

Is there a team working on *good* tools translating neural network behavior back to human understanding? Emphasis on *good*. Because while this is all good and well for black box operations it doesn't advance understanding of anything.

@skierpage 6 жыл бұрын

Yes people are working on visualising neural networks' operation. And people are learning a lot from Alpha* Zero's play: the presenter mentioned it favoring certain opening patterns unknown to humans, and top players are analyzing many of its unexpected moves.

@sarainiaangelsong440 6 жыл бұрын

Starcraft would be a harder one for AlphaZero to do!, or Heavily Modded Minecraft like my Nexympheria Modpack on Minecraft.Curseforge which can also be Download via Twitch app! The problem AlphaZero would have in both scenarios is it has to know Resource Management, so given Starcraft you have to gather resources make scouts, make an army and strengthen up your base and then be able to calculate exactly when it is the right time to defeat your opponent all while being bombarded by your opponents scouts and army, same thing goes with Minecraft there has been challenges where people have like 10 minutes to gather resources and make armor and weapons, but the thing is there is steps you have to find wood and make planks then craft a crafting table then make a pickaxe then you have to find ores in a completely Randomized terrain generation then you have to craft items, it would also have to know how to equip crafted armor and be able to get back to land given whatever it faces for mining situations, Examples of mining dilemmas are it could fall into lava if it mined directly below it's feet and fell into lava or it can fall into a dungeon and get mauled by mobs, it could fall into a ravine and die from fall damage, also it would have to know how to use blocks and sneak making bridges across lava, and understand that placing a block in lava or water replaces that lava or water block source with the block it just placed down, It will have to know how to make makeshift stairs in a fashion that safely gets it back to the top! Stairs can be crafted however it is much faster to just break and place blocks to move around meaning the A.I. will have to know how to jump! When the A.I. goes into battle it will have to know tactics of how shields work whether it has one or the opponent has one or both players have one, it will also have to know how to use a sword and defend against one with or without a shield or sword! If your doing a 20 minute match then this is what the A.I. will have to also know in minecraft is it will have to know how to make a furnace and know when it sees stone and coal as stone mined becomes cobblestone which crafts into a Furnace, and coal is used to smelt ores or cook food, cooked food replenishes hunger and sometimes heals some hearts, stone can also be used to craft a stone pickaxe which then mines Iron, which iron Armor or iron Sword is definitely strong in a 20 minute matchup! Either case of Starcraft or Minecraft there is many factors in place it will have to learn Starcraft is close to open book where most games will be the same but there are like 3 races, whereas Minecraft is random terrain gen! There are formulas in Minecraft like what Y level when mining results in best ore finding chances but again it's still all random! One player may find better or worse items even may find a Village chest and loot some iron ingots weapons or Armor which the A.I. has to know what a chest is open the chest and know what the best survival chances are and know what to take and equip! Think of Minecraft like Crib it's all random cards dealt, it's up to the player or A.I. to make the best choices! Once the 10, 20, 30 minutes are done both Player And A.I, or A.I. and A.I. must then face off and whoever wins the battle clash wins the round! :)

4 жыл бұрын

Delusions (8:20) Reinforcement: That's the magic part. Pay attention.

@goldenweave9455 3 жыл бұрын

so can we apply this to a program to win at blackjack?

@Michegianni 3 жыл бұрын

Yes that wouldn't be difficult - the program would end up card counting and the action set is so small that it could be very quickly and easily done with brute force instead of making it improve on itself.

@judithsixkiller5586 2 жыл бұрын

There already a few humans with unique skills who tend to be summarily banned from casinos. Or mysteriously disappear.

@williamko4751 3 жыл бұрын

Forget about chess and games, when will you make a blade running type Marilyn Monroe?

@WaveTreader 6 жыл бұрын

i would want to see deepmind play crazyhouse chess

@cesarbrown2074 6 жыл бұрын

Google should get into 3d printing. It's a perfect match for their search skills and A.I capabilities.

@krashd 5 жыл бұрын

An A.I. connected to a 3D printer? That's how it starts ;)

@JiveDadson 5 жыл бұрын

I lobbied for years for the chess engine developers to use expected (average) value (-1,0,1) for training. Many argued that it made no sense. I thought it was imminently sensical. Ah, vindication.

@freidamargolis2615 2 жыл бұрын

After playing against itself for 1 year, Alpha Zero decided that the best move was to turn itself off.

@Jeff-cv4qn 3 жыл бұрын

Hopefully human brains can learn also

@petrainjordan7838 6 жыл бұрын

Yes fab and what is the talk of tabula rasa supposed to mean ? All the excitement about how 'we' have packed and prodded the system with various types of INPUT ( which obviously does not count as any tabula plaza!

@LetalisLatrodectus 6 жыл бұрын

It refers to the algorithm learning by playing games itself and not by looking at past games that humans played. Obviously humans created the system but that's not relevant with calling it tabula rasa.

@richardfredlund3802 6 жыл бұрын

32:10 I thought there were no draws in Shogi?

@Biomirth 4 жыл бұрын

"Where are the error bars?"..... David: "We don't do errors".

@henriquemarks 4 жыл бұрын

If you have run sometimes, put the results in the graphs, with the error bars. If the error bar is zero, then it lacks some randomness in the experiment, and this is an indication of error. The paper should not have been accepted with this basic scientific error.

@Michegianni 3 жыл бұрын

@@henriquemarks The program continually improves upon itself. It is far from random. That was explained in the video. It uses a value and policy system to make sure it only selects the best possible moves with each improvement instead of brute force style like the other computer programs. If something kept recursively improving on itself - the graph can only be a positive curve to a limit - error bars do not apply.

@Jirayu.Kaewprateep 3 жыл бұрын

🥺💬 He is right we also study about chess board game they also use of two logicals because to flavours us by give up some scores to make us came back and play it again 😃

@RaineriHakkarainen 6 жыл бұрын

The Alpha Zero beat the Stockfish 8 28 wins 72 draws 64% result score.The Gauss bell curve stats book says 64% score is about 101,8 elo points. 3396(stockfish 8)+101,8=3497,8 999-1 Win over Stockfish 8 is about 4270.The are a lot of youtube videos claiming that Alpha Zero rating is 4100.That is wrong. AlphaZero close to 3500

@naylinnhtun2006 6 жыл бұрын

You have to think A0 3500 when SF on 1GB is 3400. At least my Galaxy S7 has 16 GB memory and I doubt their SF is even as powerful as SF running on my phone. Meanwhile they boost approximated 2300 algorithmic on 1000 times powerful hardware.

@archerhosford4471 4 жыл бұрын

But can it play Crysis?

@Sahuagin 6 жыл бұрын

In the early part of the 21st century, the first steps toward true AI were being taken. Little did they know...

@crave2527 4 жыл бұрын

Wtf.. for the only two of you..

@crave2527 4 жыл бұрын

Just, just to catch my eye and be like wt.. are you hippie smoking together.. plz be more knowledgeable before talking smoke out the ass..

@TheDavidlloydjones 4 жыл бұрын

At 2:36 it seems to me a little strong to say "the policy network, which is illustrated here..." There is an illustration, or rather a graphic, and it might suggest or indicate the policy network. It doesn't tell us anything about it beyond the fact that humans use those words for a part of what they are doing. That's not illustrating anything about the policy network, is it?

@Michegianni 3 жыл бұрын

I'm also curious about the policy and value network and how it is coded.

@TheDavidlloydjones 3 жыл бұрын

@@Michegianni It's an in-house term among Alpha Go's very competent programmers for one of their subsystems. In retrospect, my little whine up above is accurate, sorta: the illustration does not "illustrate" the network in any meaningful way. It's simply a picture labelled "network." It doesn't even picture a network: it's a picture of a diagram. 🤣 On the other hand, I'm being petty about their petty error. The whole of their work, by contrast, is magnificent -- and in particular, Dennis's original conceptualization of the project as a whole was a brilliant feat of management, comparable to, say, Henry Ford's instantiation of the assembly line.

@Michegianni 3 жыл бұрын

@@TheDavidlloydjones Yeah I'd love to know more about it. I am familiar with policy because it will more or less be the set of rules governing pieces and their movement / legality of moves / direction / eating pieces etc (basic rules of chess) but the value network is what I'm curious about and how the algorithm decides what value is when you lose a particular piece or gain position etc.

@TheDavidlloydjones 3 жыл бұрын

@@Michegianni David Silver, of DeepMind, the AlphaGo people, has a rather good exposition at kzbin.info/www/bejne/jabNqmqFr9uXgM0&start_radio=1&t=1. The whole question of how to explain advanced science and engineering -- hell, anything -- economics, diplomacy... -- is a difficult one. Still, I think we can agree that speaking the words "policy network" does a very little bit and then pointing at a graphic which says the words "policy network" in Roman letters is useful only for people learning English. I will give these AlphaGo documenters credit for one thing: they don't tell lies under the impression that a simple lie is better than a complicated truth, uh, "pedagogically."

@spicybaguette7706 5 жыл бұрын

but can it play crisis?

@MexterO123 3 жыл бұрын

Alpha Zero vs Ai Hinatsuru

@yeungarthur9796 4 жыл бұрын

can alpha zero be the teacher then? what's more, can alpha zero be a polymath teacher and teach us anything?

@apexmaintenance461 4 жыл бұрын

Yea, but can it beat me in connect 4?

@sturpdog 3 жыл бұрын

I think humans would do a pretty good job against ai at connect four. It's scaled down quite a bit from go and chess thus, less variables and/or moves. To scale down to a simple game such as tic-tac-toe; we would be pretty even with AI

@user-cs7ki5il3o 2 жыл бұрын

alpha zeroは幾何学を完全にマスターしているんですよ

@dilyan-2904 6 жыл бұрын

Need to master starcraft game, its way more challenging for a.i. than chess and go.

@Thedirtycat 5 жыл бұрын

I would love to see that ! Would be interesting to see how it played it and what strategies it would use.

@chrispugmire 4 жыл бұрын

And now they have! :-)

@j0tt0 4 жыл бұрын

I hope these guys are being very cautious whit this new tech. To quote Goldblum character in Jurassic Park. They were very busy knowing if the could, that they forgot to check if they should

@love_pets1363 2 жыл бұрын

What if weaponized robots and killer drones get that AI?

@snicklesnockle7263 2 жыл бұрын

go is so much more fun than chess, even though I suck at both

@tofighshno4075 5 ай бұрын

سلام بازی به نام دامه هست که از همیه این بازیها سختر و پیچیدتر و حرکات بشتری دارد امید وارم الفا یک بازی هم با داما انجام دهد

@dragmio 6 жыл бұрын

It's hilarious how eager the puny humans are to welcome their new master.

@couga8888 5 жыл бұрын

It's called evolution

@Luix 5 жыл бұрын

If the dataset is based on strong human player it is not just based on Go rules.

@VicJang 5 жыл бұрын

Yes, the AlphaGo that defeated human champion was trained using datasets, but "AlphaGo Zero", which defeated AlphaGo 100-0 was self-trained with no more than the rule. (Took it 3 days)

@menatoorus5696 5 жыл бұрын

Conclusion: Refined Monte Carlo is superior to alpha bata.

@trewq398 6 жыл бұрын

nice video! I dont like that he doesnt tell that they didnt play against a full power stockfish. The hardware was also pretty hard to compare, so saying alphago is stronger the stockfish is not proven yet

@timothybolshaw 6 жыл бұрын

The hardware used by DeepMind for development of the neural networks was insane. For the actual match against Stockfish, AlphaZero ran on modest hardware, equivalent to that used by Stockfish. In fact, AlphaZero was deliberately limited in its provided hardware so it did not take advantage of its inherent scalability. Stockfish is not capable of using very powerful hardware. As for Stockfish playing without an opening book or endgame tablebase, as a pure chess exercise, it would be nice to see them included. Analysis I have seen to date indicates that Stockfish usually lost in the middlegame, and opening books and endgame tablebases would have made little difference. Anyway, DeepMind wanted to compare its pure artificial intelligence approach against the human knowledge combined with brute force of traditional chess engines (i.e. just compare the algorithms themselves). Arguably, opening books are allowing Stockfish to play the first few moves with the assistance of a group of grandmasters rather than using purely its own resources. This is arguable, though, as Stockfish is supposed to be an amalgam of human and computer effort, so the best test might be an unassisted AlphaZero against Stockfish with opening books, endgame tablebases, and grandmasters allowed to override Stockfish moves. I still think AlphaZero would be superior, but it would be a valid test. I would like to see that done before AlphaZero type approaches are allowed to override human experts in areas like medical diagnosis and parole hearings.

@trewq398 6 жыл бұрын

I saw some games analysed and they said that stockfish made mistakes in the opening and then played from behind in the midgame where alphazero was able to capitalize on that. But i also don't like the fact that Stockfish needs opening books and endgame tables to work properly. I would still like to see a rematch with them.

@profd65 6 жыл бұрын

Was Stockfish even allowed to use its opening book and endgame tablebase? If it wasn't, then that seems unfair. Alphazero in effect has its own opening book and tablebase that it created itself through playing itself countless times; it's not like Alphazero had no opening and endgame knowledge going into the games with Stockfish.

@notyou6674 4 жыл бұрын

the visualisation wasn't very good as it only visualised one line. to search it it would be like shown but not just one move branching out into 200 but all 200 moves branching out into 200 moves each which then also branch out into 200 moves each so on and so on.

@notyou6674 4 жыл бұрын

the one at 1:30 to be specific

@Michegianni 3 жыл бұрын

@@notyou6674 I think we all understood that the presenter had limited screen space and time to demonstrate the basics. I don't think we would all have been prepared to visualise the entire tree - it would take forever and no screen big enough could display it.

@notyou6674 3 жыл бұрын

@@Michegianni or just zoom out continually, that is very commonly used to show vast sizes on even bigger scales like the universe

@mindin2941 4 ай бұрын

Either they a) dont work hard enough on chess b) limit how well they could do it and/or c) for some reason ‘have to’ play it safe. In any case this talk didnt seem to be very open or forthright

@DaDankStrafe 3 ай бұрын

What? Lol

@bansheee1 6 жыл бұрын

you wanna create skynet.? Thats how you create skynet.Dont do it pal...you gonna regret it

@sandrocavali9810 Ай бұрын

I'm moving to Venus. Hot but human

@klausgartenstiel4586 6 жыл бұрын

first u learn go. then you forget go.

@HongNguyen-co4fy 6 жыл бұрын

Klaus Gartenstiel ly tickets eu long @.tv

@shawnburnham1 9 ай бұрын

7:00

@MICKEYISLOWD 3 жыл бұрын

I just want an AGI as a homemaker who can clean and suck as she blows. This was all my problems are solved as she can also trade the Forex markets making me $10,000 per day.

@spectator5144 2 жыл бұрын

you understood the future

@MultiCharles321 6 жыл бұрын

Iterations? How many iterations of Go did alpha need to learn to be the best player ever? Not how long in seconds but how long in games? In the biological world, organism learn quickly, maybe not in terms of the number of seconds but in terms of the number of trails. How many games did it take alpha go learn to play as well as a human versus how many games did the human have? Rigid intelligence is what you know but fluid intelligence is how fast you learn? How does alpha Go compare to human fluid intelligence? Notice that at 30:47 they are talking thousands of batches, that is how many 1000s of games does it take. A typical Go Tournament has one game a day. So 1000,000 games is life time. In the real world, humans don't have that many chances to learn, there simply aren't that many trails available. Alpha Go does great in a strictly limited well defined game in which it can run millions of trails with a few relevant variables but how well does it do at recognizing friend from foe when it only has a half a dozen trails and there are hundreds of potentially relevant variables?

@hohhoch3617 6 жыл бұрын

First the word you're looking for is trials, not trails. Also why does it matter about its fluid intelligence? Even if they learn slower than us (which they do, if you ever see an AI learn a game, it's awful) it doesn't matter. Their ability to play 100,000 games a day, or do 100,000 trials a day is what makes up for their lack of fluid intelligence. In the end, AlphaGo is still the best player in the world.

@autohmae 6 жыл бұрын

Notice the speaker says at 19:09 40 days is 40 million so 1 million games per day. This is kind of the 'brute force' method that fast computers can apply to problems.

@MultiCharles321 6 жыл бұрын

Yes but that limit computer intelligence to problems which are strictly defined or for which there already exists a large number of solutions

@PHeMoX 5 жыл бұрын

@@autohmae "This is kind of the 'brute force' method that fast computers can apply to problems." which is exactly why it is not based upon actual rational intelligence. A person can be made to learn and understand a game without millions of random attempts of failure. It's exactly this problem we haven't solved yet for AI.

@DJHastingsFeverPitch 5 жыл бұрын

This is how you get Skynet

@gregh7457 3 жыл бұрын

it already exists in china

@smfanqingwu1474 6 жыл бұрын

please look at : 1 if Lee Sedol = Ke Jie 2 if Ke Jie = Alphago original 2016( ALPHA LEE VERSION) Ke Jie is a little stroger(in 2016 year) than LEE if he can play 3:2 or 2:3 with Alpha go 2016 version. 3 Master 3: 0 K ejie and Master Master 60:0 Human 9s professional players. 4 Master(2017) Let 3 stones to ( Alpha go 2016) (or 3 stones ahead). 5 this (without human version ) ALPHA ZERO can Let 2 stones. to Master. 5 In fact It can Let ( 5 stones ahead) of KeJie ( Rank 2 of the world players).. I want to tell u that . in 2016, Kejie was in a TV broadcasting( LIVE SPORTS TV) on LETV China in this ( LEE VS ALPHA) While Professor LIUZHIQING( BJ Telecomunication and Post University). While ProfessorLIU said there will be less than 3 years to see an AI Let human best players (3 stones ahead or more). Ke said : I bet its never impossbile( just 3 and now its 5!!!)... And also Let u play black ( is the largest distance between any best professional player ). even if in China Professional and Ametuar player only had 1 stone(s) or less. 柯洁在2016年说什么机器让人3个子不可能职业选手也最多只让你走（不贴目）。但是目前 ZERO应该可以让柯洁5个子或则更多。

@wizkidd6950 6 жыл бұрын

I am curiously skeptical of the general purposefulness Alpha Zero yes it a great accomplishment with one system mastering three sets of board games, never the less they are all fairly binary, win loose in one match. Perhaps being able to tackle what might be considered a weaker game would be more useful to establish a better sense of generality. A high order abstraction game would come with many more complications. this is because the best play-actions would not always garnish this best outcomes. So the system would also need to have conceptualization of self generated misinformation. Bottom line is the network would have to hold multiple copies. of any policies it has. This goes against the less complexity more generality belief that built into Alpha Go. Take a card game such as Spades for example, has high order abstraction and multiple complicating factors. Due to the game having two cooperative agents vs two adversarial agents that also needs to predicatively model each other. I simply Do not see Alpha Zero being able to do this masterfully and without over fitting. High order dynamic concepts just will not exist, as they will be squashed to some average expression. Vs probably residing in a dedicated layer of the network.

@hohhoch3617 6 жыл бұрын

The idea of an AI capable of abstract thought is still a long way off. But then, we're not interested in an AI's ability (currently) to be able to mess with abstraction. We want an AI for it's ability to make binary decisions. AI would be extremely useful in mathematics, science, medicine, business. Any field that could benefit from better management would benefit from having an AI assistant on the board.

@wizkidd6950 6 жыл бұрын

HoH hoch thanks for your well reasoned response. I would agree with you 100% except for 1 small issue we are currently utilizing AI systems in self driving cars and at a minimum said systems need to have at least a rudimentary ability to abstract. While decisions inevitably gets reduced to some binary decision set the ability to predict and influence a live event is ultimately more abstract or at a minimum iterative. This is to say numerous decisions and reactionary values are summated. Being that AI systems are interacting with humans beings, it is easy to suggest our behavior set is problematic and more akin to irrational bouts. We will slow down when we should speed up, go when we should stop. So let me wrap this up now, with An AI system being so well reasoned and some dynamic situation starts to unfold a human driver that is in a key location turn on his signaling indicator. AI system comes up with two courses of actions first accounting human driver action/s and the second ignores the human driver's signaling. The point is this, if the AI system is set heuristically. It cannot be truly interactive, and if it cannot account for humans being irrational it has no business being among us.

@puddingosu3326 5 жыл бұрын

huh

@IgorGabrielan 6 жыл бұрын

AlphaZero.ai

@theSpicyHam 4 жыл бұрын

hahaha perhaps pn rather puny of

@blackmayb3 5 жыл бұрын

I think I can beat AlphaZero in Fortnite Playground

@thezyreick4289 5 жыл бұрын

depends entirely on how it gets coded for a fps setting. they could simply decide to code it to where it loads the entire game state, analyzes the location of everything, then proceeds to find the nearest weapon with substantial ammo to drop all the nearby players health to 0, grabs it, fires every single round without missing into every player starting with the closest one, until it is out of ammo. regardless of distance, or rendering distance. in short. you would die within the amount of time it takes the AI to get a weapon, then get a clear line of shot to have the round make contact with your character, even if all the shots are from one side of the map to the other, through a hole a single byte larger than the hitbox of the "bullet" you likely would never see it, and the AI likely would not even be within your render distance to be able to have a chance to fight back. something so simple wouldn't be a challenge for a ruthlessly programmed AI like this. sorry to break it to you, but most in game AI is "dummed" down or given a distinct "handicap" to keep it from annihilating the players of the game, so that the game is fun for players. or the game dev's are not good enough at AI programming to do it effectively for their game. that happens too.

@SatiricalStewie 6 жыл бұрын

You know, in many end of the world movies, it starts with a brilliant scientist with a British accent talking some scientific mumbo jumbo about an invention that they hope will improve the world........just saying......

@PASBGR 6 жыл бұрын

Dont worry, son. You are not the only one I seen on the internet, who is dreaming about the end of the world.

@terryhughes7196 2 жыл бұрын

Master cancer for us

@nethbt 4 жыл бұрын

They cheated against Stockfish though...the version that they played against was an older version using suboptimal settings run on a crappy hardware. Booohooo

@eddiesmurfy Жыл бұрын

Dam this is booring af.

@myothersoul1953 6 жыл бұрын

5:30 "... at the beginning of this (training) pipeline we start with a human data set ..." So much for "without human knowledge". Don't be fooled there's a lot of human knowledge embedded in every AI.

@firebrain2991 6 жыл бұрын

That's when he was describing AlphaGo, not AlphaGo Zero or AlphaZero. Pay attention before you make comments like this.

@1man1bike1road 6 жыл бұрын

alpha go zero was given zero lol

@myothersoul1953 6 жыл бұрын

Whether it was the data used to train it or the experiences used to design it, there is human knowledge built into to every AI.

@gabrielfreire2935 6 жыл бұрын

you didn't even finish the video before commenting -_-``

@Gregzenegair 6 жыл бұрын

Well there was no human data input, but indeed the network itself was 'humanly' architectured and setup. But maybe in few years, these could be built from scratch by other AI and AI algorithm would give birth to other AI algorithms and so on (still the first one parent would be human crafted)