Reinforcement Learning 10: Classic Games Case Study

  Рет қаралды 42,500

Google DeepMind

Google DeepMind

Күн бұрын

Пікірлер: 33
@LuisYu
@LuisYu 6 жыл бұрын
amazing high quality lectures. especially enjoyed attention, memory, alpha zero talks.
@Kingstanding23
@Kingstanding23 5 жыл бұрын
A Nash equilibrium sounds like what happens on roads where traffic evens itself out amongst all the roads towards some destination. When a new road is built, nothing really changes because the traffic just redistributes itself to an new equilibrium.
@LucyRockprincess
@LucyRockprincess Жыл бұрын
great real life analogy
@TheGreatBlackBird
@TheGreatBlackBird 3 жыл бұрын
Shouldn't there also be a reward present in TD error at 42:30 and 50:25 ? edit: ok, it's explained a bit more in the 2015 lecure that this version assumes no intermediate reward
@johangras3522
@johangras3522 6 жыл бұрын
It is possible to access to the course slides ?
@TuhinChattopadhyay
@TuhinChattopadhyay 4 жыл бұрын
@@Sigmav0 Link not working
@Sigmav0
@Sigmav0 4 жыл бұрын
@@TuhinChattopadhyay The slide has been moved to www.davidsilver.uk/wp-content/uploads/2020/03/games.pdf Hope this helps !
@TuhinChattopadhyay
@TuhinChattopadhyay 4 жыл бұрын
@@Sigmav0 Got it... many thanks
@Sigmav0
@Sigmav0 4 жыл бұрын
@@TuhinChattopadhyay No problem ! 👍
@dojutsu6861
@dojutsu6861 4 жыл бұрын
@@Sigmav0 these slides are from an older UCLxDeepMind lecture series lead primarily by David Silver. They do not include content on the newer AlphaZero models. Do you by any chance know if these updated slides are available online
@samagrasharma7755
@samagrasharma7755 5 жыл бұрын
Two lectures (CNN and RNN) are missing from this series. Can anyone tell if they are available online?
@helinw
@helinw 4 жыл бұрын
Did David do another RL course in 2018? Or just one lecture?
@ShortVine
@ShortVine 4 жыл бұрын
i was thinking the same & searched a lot, but i think he did just one lecture in 2018
@domino14
@domino14 2 жыл бұрын
The level of computer play in Scrabble is not superhuman. Quackle beats Maven, and the best humans can 50-50 Quackle in a long series.
@jakubbielan4784
@jakubbielan4784 5 жыл бұрын
Anyone know what was the exact hardware used to train Alpha Go Zero?
@luisbanegassaybe6685
@luisbanegassaybe6685 5 жыл бұрын
deepmind.com/blog/alphago-zero-learning-scratch/
@stevecarson7031
@stevecarson7031 3 жыл бұрын
Thankyou so much for this series of lectures!
@alexanderyau6347
@alexanderyau6347 6 жыл бұрын
I can comment now. See you again David.
@yidingyu2739
@yidingyu2739 6 жыл бұрын
Why so many empty seats?
@yoloswaggins2161
@yoloswaggins2161 6 жыл бұрын
This stuff not on the exam
@matveyshishov
@matveyshishov 6 жыл бұрын
The number of people is lower with later lectures for some reason.
@markdonald4538
@markdonald4538 6 жыл бұрын
@@matveyshishov stupid ppl
@mohammadkhan5430
@mohammadkhan5430 4 жыл бұрын
I love him, how sad the room is empty
@KayzeeFPS
@KayzeeFPS 4 жыл бұрын
Here's a link to the same video but with slides visible kzbin.info/www/bejne/hGKvfH-Za9qZfbs
@Dina_tankar_mina_ord
@Dina_tankar_mina_ord 5 жыл бұрын
I would love to see how deepmind would build a city on its own in Cityskyline. See how its optimization would create the best and most efficient layout in real time. Maybe we could learn alot from that.
@julioandresgomez3201
@julioandresgomez3201 5 жыл бұрын
Despite the success of A 0 nets in several games, I feel that is better starting point playing (random number) games with humans. Only then, when it has grasped some basic basics (by itself, not forcibly inserted by hand), let it play against itself. This way it could accomplish in thousands of self-play games what from scratch it´d take millions of self-play games, due to the total randomness and clueless of the first games. It´s not the absolute zero approach, but it has no "artificial" parameters handcrafted either. It learns from its own games all the way.
@Avandale0
@Avandale0 4 жыл бұрын
Playing with humans takes considerably more time than running simulations - so actually, playing millions of games by itself is still faster than playing 100 games from playing humans. Knowing that a game of go takes around 1h, you'd have finished 3 games with a human in the time that it took AlphaZero to reach human level play. Same for chess, when you realise it took Alpha Zero 4 hours reach a level higher than Stockfish... It should be clear from these examples that one of the particularities of AlphaZero is the speed at which it learns. Playing humans here both defeats the purpose of self-learning and actually wastes time.
@omarcusmafait7202
@omarcusmafait7202 6 жыл бұрын
why does nobody take notes?
@yoloswaggins2161
@yoloswaggins2161 6 жыл бұрын
Not on the exam
@Sigmav0
@Sigmav0 6 жыл бұрын
@William Davis Sure... In primary school...
@vijayabhaskar-j
@vijayabhaskar-j 6 жыл бұрын
because slides are available online and lectures are available online, I would listen carefully first in the class
Reinforcement Learning 7: Planning and Models
1:46:51
Google DeepMind
Рет қаралды 18 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН
黑天使被操控了#short #angel #clown
00:40
Super Beauty team
Рет қаралды 61 МЛН
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.
Reinforcement Learning 3: Markov Decision Processes and Dynamic Programming
1:44:24
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 409 М.
Reinforcement Learning 4: Model-Free Prediction and Control
1:39:26
Google DeepMind
Рет қаралды 28 М.
David Silver - Towards Superhuman Intelligence - RLC 2024
1:03:13
Reinforcement Learning Conference
Рет қаралды 6 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 792 М.
The Elegant Math Behind Machine Learning
1:53:12
Machine Learning Street Talk
Рет қаралды 129 М.
Reinforcement Learning 8: Advanced Topics in Deep RL
1:28:34
Google DeepMind
Рет қаралды 18 М.
Deep Learning 7. Attention and Memory in Deep Learning
1:40:19
Google DeepMind
Рет қаралды 79 М.
Reinforcement Learning, by the Book
18:19
Mutual Information
Рет қаралды 114 М.
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН