Deep Q Learning for Video Games - The Math of Intelligence #9

  Рет қаралды 233,362

Siraj Raval

Siraj Raval

Күн бұрын

Пікірлер: 246
@ben_jamin01
@ben_jamin01 Жыл бұрын
This is a high quality video and I'm sure a lot of people can tell you put a lot of effort into these.
@satindershergilla
@satindershergilla 7 жыл бұрын
Finally a KZbinr I always wanted to watch. Speed cool and great information
@jmzg1229
@jmzg1229 7 жыл бұрын
Hey Siraj, I don't think any of us can beat the Dota 2 bot that OpenAI just unveiled. Those guys really deserve a shout-out.
@aksa706
@aksa706 7 жыл бұрын
I can't feel my brain anymore
@SirajRaval
@SirajRaval 7 жыл бұрын
oh no
@aliciasuper7014
@aliciasuper7014 5 жыл бұрын
Tell me about it :/
@randomorange6807
@randomorange6807 5 жыл бұрын
Hey did u know? The brain can't feel pain, it senses the pain of the body and transmits it but it can't feel the pain... ..It can get hurt tho and get brain damage
@janmichaelbesinga3867
@janmichaelbesinga3867 4 жыл бұрын
@@randomorange6807 but what if you punch a brain? does it feel pain?
@giuseppeguap7250
@giuseppeguap7250 5 жыл бұрын
Just saw this now, your jokes we're KILLING IT back then
@ShaneCalderssc
@ShaneCalderssc 7 жыл бұрын
Thanks Siraj. Can't wait for the Super Mario Bros Bot. I enjoyed your videos in the deep learning ND. Cheers your effort is appreciated.
@luckybrandon
@luckybrandon 7 жыл бұрын
One of your best vids Siraj!!
@SirajRaval
@SirajRaval 7 жыл бұрын
thanks Brandon!
@neonipun
@neonipun 6 жыл бұрын
At 8:08 whats the input_shape supposed to be ?? the challenge code and what you show are different ......
@voodoocobblestone8320
@voodoocobblestone8320 7 жыл бұрын
i cannot understand your videos. how should i start learning?
@vladislavdracula1763
@vladislavdracula1763 7 жыл бұрын
Start by learning basic calculus, statistics, and linear algebra. Once you understand the basics, learning advanced concepts is not that hard.
@xPROxSNIPExMW2xPOWER
@xPROxSNIPExMW2xPOWER 7 жыл бұрын
no tensorflow and most of the other libraries handle almost all of the higher level math, all you will need buddy is to learn basic object orientation and then move into ML techniques. Don't fret, most of the complex math has been solved, all you will need to do is creativity implement that. Trust me it gets very easy once you learn the flow. If you are interested in advanced topics where you want to build your own ML algorithm then learn Linear Algebra with an emphasis on higher dimensional linear algebra will help greatly.
@hammadshaikhha
@hammadshaikhha 7 жыл бұрын
Like others have mentioned, have a math and some machine learning background helps understand these faster pace videos. Another thing you can do is look in the description, read some of the blogs on the topics under "learning resources", and then come back and watch video again, it should make more sense.
@MachineLearningwithPhil
@MachineLearningwithPhil 7 жыл бұрын
Great place to start is coursera's class on machine learning. It's free and a solid intro to the core concepts From there, there are plenty of step by step tutorials on KZbin. SentDex has a great channel with lots of content - check him out
@notapplicable7292
@notapplicable7292 7 жыл бұрын
Tip if you're trying to start don't start with Siraj. Start with someone slow (possibly the udemy machine learning micro-degree) as Siraj is very fast and awesome to expand your understanding but hard to start learning with.
@insightfulgarbage
@insightfulgarbage 7 жыл бұрын
Very nice information and rythm, subscribed !
@harshitagarwal5188
@harshitagarwal5188 7 жыл бұрын
at 5:15 you say the more in the fuure the reward is - more are we uncertain of it? i didn't get it-can you explain with an example ?
@rolininthemud
@rolininthemud 7 жыл бұрын
I understand that a convolutional neural network can be used to simplify the state from an array of pixels to a smaller collection of values, but how does the algorithm use a deep network to approximate the Q-function? 8:19 Thank you!
@herougo
@herougo 7 жыл бұрын
Hi Siraj, could you include pseudocode of algorithms you talk about? I think it is crucial to be able to implement algorithms you learn about (ie "What I cannot code myself, I do not understand"). Explaining pseudocode is a great way to communicate algorithms in a clear, complete, and non-ambiguous way.
@basharjaankhan9326
@basharjaankhan9326 7 жыл бұрын
OMG, i googled "Q Learning with Neural Network" a few months back without realising it was this important.
@SirajRaval
@SirajRaval 7 жыл бұрын
haha awesome
@shreyas707
@shreyas707 7 жыл бұрын
I don't understand 10% of what you say but your videos are just epic! Please keep posting them often :)
@JakubRohla
@JakubRohla 7 жыл бұрын
I still don't understand how we can store these Qs. Wouldn't they contain a quadrillions of states and actions for a pretty simple game? Seems pretty inefficient, so I would love to know where I'm wrong in my understanding of Q learning. Is there some generalization in place or what?
@poc7158
@poc7158 7 жыл бұрын
You can store all possible actions for all possible state in a matrix for a simple game like tic tac toe. However as you say it is impossible for more complexe game, it's why we use a neural network wich replace this matrice by taking the pixels of the screen as input (the state) wich then output an action. After training it is suppose to give the optimal action for any state we give as input.
@SirajRaval
@SirajRaval 7 жыл бұрын
great answer pierre
@JakubRohla
@JakubRohla 7 жыл бұрын
Thanks for the reply, this clarified it for me. Much thanks ^^
@ml-squared
@ml-squared 5 жыл бұрын
The way this works is by approximating an optimal Q function. A Q function is a function of state and action, so Q(s,a). Q*(s,a) is the optimal Q function. This is great for games with few states, but because of combinatorics, it does not scale to games with hundreds of thousands of states, such as video games. To accommodate this, we approximate Q* by using a parameterized Q function, Q(s,a,Theta), where Theta is a set of parameters that we need to optimize to bring us to approximating Q*. A type of function that's excellent at iteratively approximating functions through parameters is a neural network. So that's where Deep Q learning comes in, optimizing a neural network to approximate Q*.
@prashanttarey9902
@prashanttarey9902 6 жыл бұрын
Awesome and optimized explanations in all the videos! Thanks a lot!!
@look3248
@look3248 7 жыл бұрын
Hey Siraj could you expand on this topic and explain how Sethbling's MarI/O program works?
@xPROxSNIPExMW2xPOWER
@xPROxSNIPExMW2xPOWER 7 жыл бұрын
I believe siraj already has a video on Genetic evolution decision making if im not mistaken. Does't Seth explain it pretty indepth tho, he talks about everything from math to how he programmed it with pearl I think.
@SirajRaval
@SirajRaval 7 жыл бұрын
genetic algo vid coming this week (similar to what he used)
@-justyourfriendlyneighborh5898
@-justyourfriendlyneighborh5898 7 жыл бұрын
Siraj Raval Hey Siral, in a previous stream you mentioned that learning this kind of thing (neural networks/Machine Learning) is best to do on the internet. I was wondering, for a near complete beginner, (minor experience with Processing.JS) where would you suggest that I start off? (I'm 15 and want to get into this field as soon as possible)
@flyingsquirrel3271
@flyingsquirrel3271 7 жыл бұрын
icancto Did you read the NEAT paper? If not, I'd recommend it, because it's actually really smart and comprehensible. NEAT is not just picking the best randomly generated genomes but uses a crossover mechanism which makes sure that only connections are crossed over that have a similar "purpose" inside of the neural net. It can intelligently crossover neural networks of different topologies which are created through mutation, starting with minimal neural networks. That way it improves the weights AND selects the ideal topology of the neural nets. Comparing NEAT to back-propagation doesn't make any sense because it's purpose is to be used, when you can't use back-propagation. MarI/O is a good example for this. What target-data would you use for back-propagation there? ;-)
@TheAnig140895
@TheAnig140895 7 жыл бұрын
he used lua
@hangchen
@hangchen 5 жыл бұрын
7:46 Well I don't think the pooling layer is used to get insensitive about the locations of the objects in an image. The convolutional layer can already do that since the convolutional operation is actually a pixel window going from location to location until all locations are considered under the set stride. The pooling layer is used to semantically merge similar features into one, like in the max pooling example used in this video, you can see the image is partitioned into 4 parts and in each part, the max number is preserved. The max number can semantically represent a feature in that region. It's more like image compression but we have preserved the key features of this object in this image. Feeding this pooled image into the neural net could be more efficient.
@UNIVERSALMINDBAND
@UNIVERSALMINDBAND 6 жыл бұрын
And, what happen to rewards functions?, are the same for all these games?
@arthurwulfwhite8282
@arthurwulfwhite8282 6 жыл бұрын
Probably score? Did you get an answer?
@vamsikrishna-qz8rt
@vamsikrishna-qz8rt 7 жыл бұрын
Hi Siraj, is there any way we can train a machine learning mode with a raw text file and properly arranged data from the text file in .csv file? So that when we input a new text file it automatically converts that text file into the .csv file format with columns and rows which we used as training data. Is this even possible?
@Machin396
@Machin396 7 жыл бұрын
Your videos are amazing, thanks.
@haziqhamzah3071
@haziqhamzah3071 5 жыл бұрын
Can you give some insights for Deep Q Learning in Mobile Networking?
@Machinima2013
@Machinima2013 7 жыл бұрын
You should do a video comparing this with NEAT, which is popular for this same use case.
@weinansun9321
@weinansun9321 7 жыл бұрын
thank you Siraj for your awesome content, you really made learning fun and easier!
@huluvublue112
@huluvublue112 7 жыл бұрын
Question: Why do pooling layers make the Network spatially invariant? Don't they just compress information? I thought convolutional layers do that, which the model does have
@viviankeating7327
@viviankeating7327 7 жыл бұрын
Max pooling compresses information, but it's lossy. On the first pooling operation you lose a pixel or two of position information. On a final pooling operation you might effectively be taking the max across the entire image.
@moonman239
@moonman239 7 жыл бұрын
So with a Markov discrete process, there will always be some reward function R because getting the reward depends only on the states and actions we take. Thus, our AI can learn Q simply by going?
@abhinashkhare1933
@abhinashkhare1933 7 жыл бұрын
hey siraj , can you help me explain this.. in sethbling video , the bot learned to play a mario level. But he didn't use the learning on new data or level. isn't this a overfitting, i mean bot just learned that level from trial n error.
@nermienkhalifa5997
@nermienkhalifa5997 6 жыл бұрын
Great, I do really love your way of explaining!! thanks
@MavVRX
@MavVRX 5 жыл бұрын
How would reinforcement learning work on a game with a town hub? One that requires mouse clicks to go into a dungeon, eg, Diablo, MMOs.
@lampsonnguyen9425
@lampsonnguyen9425 4 жыл бұрын
you explained it so well. thank you
@Sohlstyce
@Sohlstyce 3 жыл бұрын
hey siraj, can you make a full tutorial on reinforcement learning? thanks siraj
@Chris-di3us
@Chris-di3us 7 жыл бұрын
I love you man, I always wanted to do this myself
@rnparks
@rnparks 7 жыл бұрын
Can you show the game Mario game actually running? It throws an error in my notebook. I'm using python 3.6 so maybe its a translation issue?
@AliAkhtarcs
@AliAkhtarcs 6 жыл бұрын
What is the difference between static and dynamic dataset? Can you elaborate more?
@piyushgupta809
@piyushgupta809 7 жыл бұрын
Great Improvement Brother. I am sorry, but previous videos were not good. Nice Tutorial and Intuition. although I Do recommend watching Deep mind's Reinforcement Learning Tutorial before jumping into Practical Application
@manojagarwal3441
@manojagarwal3441 4 жыл бұрын
Hey siraj , can you please share the link to code by winner and runner up on lda challenge , i know i am pretty late but i would really appreciate if you could help
@yehorpererva6803
@yehorpererva6803 7 жыл бұрын
Cool video. Thanks. But who to adjust this for certain purpose (like collecting all coins / getting the less score / speedrunning)?
@williamcosta6683
@williamcosta6683 7 жыл бұрын
Could you guys give me any hint on how i can approach pong game to build a model where i can apply q learning? (I have all the informations necessary, like ball x and y position, player x and y position, ball speed, etc). I'm struggling at this :_:
@Veptis
@Veptis 3 жыл бұрын
I am starting a deep learning course at university this semester. And maybe I can do a homework project. There is a mobile game from my childhood: Mirror's Edge mobile which launched on iOS and Windows Phone in like 2011 but is no longer available. If I somehow find a way to emulate the game on a computer and get either frames or game state values and manage to give it one of four different inputs per frame, I might try and teach a network to play the game. I also want to have it beat levels really fast and explore speedrunning this way.
@boemioofworld
@boemioofworld 7 жыл бұрын
That was an awesome explanation. Thanks.
@mattgoralka3941
@mattgoralka3941 5 жыл бұрын
Hi, can someon please explain to me how the model is predicting in this sequence of code when it hasn't been trained yet? I'd really appreciate it. Thanks!! if np.random.rand()
@lefos99
@lefos99 5 жыл бұрын
Hey there, so the epislon tells us when it is ready to exploit Q-values instead of explore the map. The main idea is: 1) We specify an exploration rate “epsilon,” which we set to 1 in the beginning. This is the rate of steps that we’ll do randomly. In the beginning, this rate must be at its highest value, because we don’t know anything about the values in Q-table. This means we need to do a lot of exploration, by randomly choosing our actions. 2) We generate a random number. If this number > epsilon, then we will do “exploitation” (this means we use what we already know to select the best action at each step). Else, we’ll do exploration. The idea is that we must have a big epsilon at the beginning of the training of the Q-function. Then, reduce it progressively as the agent becomes more confident at estimating Q-values. Here is a nice graph of this idea: cdn-media-1.freecodecamp.org/images/1*9StLEbor62FUDSoRwxyJrg.png Hope that helped! :D
@mattgoralka3941
@mattgoralka3941 5 жыл бұрын
@@lefos99 Hi, thanks for helping me out! I understand that (at least I think), but I don't understand how the model can predict if it hasn't been trained. At what point is the model learning from the D values and is able to "exploit." I'm from more of a c background but I don't get how it's learning until the next block of code where it does "Experience Replay."
@lefos99
@lefos99 5 жыл бұрын
@@mattgoralka3941 Oh okay now I see your question. Well it depends on the Reinforcement Learning Technique you use. For example if you use simple Q-Learning, you just create a matrix (row is for state and column is fro action). There are plenty of concepts that you use, that I cannot explain in just one KZbin comment. A reallllly good and simple tutorial is this: simoninithomas.github.io/Deep_reinforcement_learning_Course/#syllabus In this tutorial you will find not only mathematical explanation but also explanation with examples in simple games. Check this out! ;)
@claudiowalter3092
@claudiowalter3092 6 жыл бұрын
How do you get the computer to play the game by itself and read the screen?
@altafurrahman9404
@altafurrahman9404 5 жыл бұрын
Hi Siraj, I want to know that I am going to do a path planning project to navigate a robot with Q learning. How much minimum hardware will be required to do this? Do we need a GPU? Will a core i5 PC only with CPU will be enough?
@harshakada3374
@harshakada3374 7 жыл бұрын
hey siraj I have a 4 node raspberry pi cluster computer, can I use it to train this Mario game?
@benjaminpaul3545
@benjaminpaul3545 7 жыл бұрын
is it possible to do what you do in windows? cause i cant get the environment started even though the emulator is running can anyone help?
@johndoe-ug3lo
@johndoe-ug3lo 7 жыл бұрын
So I am working on an AI for a hidden information game (for the sake of simplicity, you can think of poker). Optimal play would actually be a nash equilibrium problem, where each action is being taken some percentage of the time. Would the proper way to make an AI for this be to use a random number generator, and scale the frequency of each action to its Q value?
@fayezbayzidify
@fayezbayzidify 7 жыл бұрын
first, but seriously nice vid Siraj you are amazing at what you do !
@user-zu1ix3yq2w
@user-zu1ix3yq2w 7 жыл бұрын
RIP Chester.
@JazevoAudiosurf
@JazevoAudiosurf 7 жыл бұрын
so this version is without a NN, at which point to you need a NN?
@arafatullahturjoy5380
@arafatullahturjoy5380 5 жыл бұрын
Can Q-Learning be used for solving classification problem? If it does then how? could you explain or make a video regarding this topic? If you do it will very helpful.
@hanyuliangchina
@hanyuliangchina 7 жыл бұрын
i Like this interesting video rather than the previous purely theoretical video, more humor is better For me, now the most important question: 1. for Machine learning beginners to train machine model which one is better buy gpu graphics card, or buy amazon cloud gpu hours 2.the tip of deep learning environment configuration, 3. the tip of Programming development skills,
@GKS225
@GKS225 7 жыл бұрын
And now open AI beats human in Dota2 1v1 matchups
@SirajRaval
@SirajRaval 7 жыл бұрын
stuff is moving fast
@readingsteiner6061
@readingsteiner6061 7 жыл бұрын
Sir I don't know who you are, but you totally blew me away with your comment. It is very rare to come across an individual who did us(the viewers on the Internet) a huge help in debunking certain methodologies in machine learning. I would love to see more of your writings. Folks at Vicarious are of a different breed I believe, maybe it is because of their influence by the Redwood Neuroscience Institute. It would certainly be a privilege if you consider my request. Truly humbled. Thanks Sir. I hop
@Belofsky1
@Belofsky1 7 жыл бұрын
I'm a hardware guy mostly, how do I go about ai or algorithms?
@TheLibertarian97
@TheLibertarian97 5 жыл бұрын
How I define when will give a reward to the bot?
@jobrown04
@jobrown04 6 жыл бұрын
Hi Siraj. Have you thought about using Capsules (CapsNet) instead of not having a MaxPooling layer?
@maxitube30
@maxitube30 6 жыл бұрын
where can find the winner of stock prediction challange?
@tylersnard
@tylersnard 5 жыл бұрын
Smart guy, talented teacher.
@Rambo9677
@Rambo9677 6 жыл бұрын
Great Video Siraj Thanks But I don't get something. How do you input 4 Gamescreens? Do you combine them as one input?
@mankitpong5591
@mankitpong5591 7 жыл бұрын
The videos of David Silver from Deepmind are worth watching, that might be the bast reinforcement learning courses on web.
@rheeaa
@rheeaa 7 жыл бұрын
Siraj I'm a huge fan of your KZbin channel and I truly admire the way you taught yourself ML. I'm in my final year of undergrad, and I was thinking of not pursuing a master's degree rn. Any advice on what resources to use to teach myself ML or how to get some industry level exposure ? Thanks in advance 😉
@SirajRaval
@SirajRaval 7 жыл бұрын
thanks rhea! see the ML subreddit
@qwerty11111122
@qwerty11111122 7 жыл бұрын
Hi Siraj, could you have a video mention the OpenAI bot that beat a pro gamer at Dota 2 a few days ago? It's great that you released this video so close to this current event
@thedeliverguy879
@thedeliverguy879 7 жыл бұрын
Thanks for the great video. I‘m still confused how this algorithm can generalize in any game. Is the generalization of algorithm different from the generalization of a specific AI program? Since the input and label (or control/buttons whatever) are fixed in a game, I don't think you can make an AGI just with this algorithm.
@stuartdavid
@stuartdavid 7 жыл бұрын
Very nice! Do you have a video with more detail on Q learning? Would be interesting to see how the Q matrix evolves over play of a simple game.
@tjhannover3069
@tjhannover3069 7 жыл бұрын
Is it possible to do that with games like Overwatch?
@TheAIChannel
@TheAIChannel 7 жыл бұрын
Hi Siraj, I am interested in stock price prediction and would like to have a glance on the second runner up code, can you kindly share the github link, thanks in advance,
@karljay7473
@karljay7473 6 жыл бұрын
Can't find the links to the winner and runner up. Great series of videos!
@nitishravishankar6586
@nitishravishankar6586 6 жыл бұрын
Thanks a lot Siraj! This video provided a great insight on applications of Q learning and RL. Are there any programming assignments (that includes a dataset) for this?
@HuyNguyen-rt7eb
@HuyNguyen-rt7eb 7 жыл бұрын
Hey Siraj great job on the videos. :) what do you think of the dota 2 ai that beat a pro player?
@MotazSaad
@MotazSaad 5 жыл бұрын
The link of the paper web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf
@herstar9510
@herstar9510 3 жыл бұрын
When are you forming a mid 90s boy band with machine learning themed ballads?
@chiranshuadik
@chiranshuadik 7 жыл бұрын
Nice video! Are u coming to Pune or Mumbai ??
@SirajRaval
@SirajRaval 7 жыл бұрын
Mumbai
@chiranshuadik
@chiranshuadik 7 жыл бұрын
Siraj Raval when and where ?? Can ur fans meet you?
@egor.okhterov
@egor.okhterov 7 жыл бұрын
Too fast. I need a longer video :(
@mferum77
@mferum77 7 жыл бұрын
поставь скорость 0,5 )
@SirajRaval
@SirajRaval 7 жыл бұрын
more to come
@chicken6180
@chicken6180 7 жыл бұрын
been waiting so long for this! havent even watched it , but know it's going to be great already edit: confused, but not dissapointed :D
@eav300M
@eav300M 7 жыл бұрын
Super Siraj AI. Who do you think is correct regarding the future of AI, Elon or Zuck?
@getrasa1
@getrasa1 7 жыл бұрын
Elon, because he's aware of the danger that AI might cause to human race if we lose control over it.
@vijayabhaskar-j
@vijayabhaskar-j 7 жыл бұрын
If you know AI,then you wont think AI as a Danger.
@getrasa1
@getrasa1 7 жыл бұрын
Edgar Vega As soon as its intelligence starts increasing exponentially, we won't be able to keep up with it and understand it. Everything we don't understand is dangerous at some point (I'm referring to AGI and ASI)
@SirajRaval
@SirajRaval 7 жыл бұрын
elon. we do need some regulation.
@frankribery3362
@frankribery3362 5 жыл бұрын
That part where he says hello world it's siraj... I'm replaying it again and again coz it's soo funny xD
@bofeng6910
@bofeng6910 7 жыл бұрын
Do I have to learn calculus to learn deep learning?
@rolininthemud
@rolininthemud 7 жыл бұрын
Pretty much
@shahzmalik
@shahzmalik 6 жыл бұрын
The only thing I am impressed is with his creativity
@masoudmasoumimoghaddam3832
@masoudmasoumimoghaddam3832 7 жыл бұрын
Siraj all your videos are awsome. Could you make a video about temoral-difference learning which is announced by Professor Sutton. I also ask you to make another one about General Game Players and Monte Carlo Tree Search. Thanks
@xPROxSNIPExMW2xPOWER
@xPROxSNIPExMW2xPOWER 7 жыл бұрын
Yes a video on TD Learning would be wonderful
@masoudmasoumimoghaddam3832
@masoudmasoumimoghaddam3832 7 жыл бұрын
Yeah! Specially if its differences and similarities with reinforcement learning would be pointed out.
@xPROxSNIPExMW2xPOWER
@xPROxSNIPExMW2xPOWER 7 жыл бұрын
I think TD learning is just an extension to back propagation, Its pretty fascinating
@koppuravuriravisankar7954
@koppuravuriravisankar7954 7 жыл бұрын
Hi Siraj, I love your teaching style and I am a member in UDACITY's deep learning foundation program in which you are an instructor, Here my doubt is that can we use DEEP Q-LEARNING in any other situations where image or pixel input would not be there, If yes can you tell how. I have read that for building Q-table we can use neural networks instead of table(state * action). can you explain it or if possible do a video about this.
@srenkoch4597
@srenkoch4597 6 жыл бұрын
Hey Siraj! Great stuff! it could be really cool if you would combine Recurrent Neural Network and Deep Q-network = DRQN in a video! Thanks!
@YaduvendraSingh
@YaduvendraSingh 7 жыл бұрын
This is ultimate !! A game bot !! Thanks a lot Siraj ! When are you heading to India for a meet-up ?
@SirajRaval
@SirajRaval 7 жыл бұрын
thanks! Sept 1 delhi one-way ticket. i'll figure things out from there
@_____8632
@_____8632 5 жыл бұрын
Wait, where my brain at?
@pinkiethesmilingcat2862
@pinkiethesmilingcat2862 7 жыл бұрын
Siraj you have not accepted english subs in MoI #6 :(
@SirajRaval
@SirajRaval 7 жыл бұрын
just did thanks
@donaldhobson8873
@donaldhobson8873 7 жыл бұрын
Wouldn't it work better if you trained a variational autoencoder on the screen data to capture the important patterns, then trained the deepQ model on the encoded screen. That way the VarAuto can learn a lot of info about how the world works even when rewards are scarce. I would use a bottleneck thats about 1/4 the dimensions of the image with say 3 layers. Leave the shrinking down from convolutional layers to dense layers for the deepQ.
@hammadshaikhha
@hammadshaikhha 7 жыл бұрын
I don't know anything about this topic yet. But why don't you submit something along this line for the coding challenge for this week?
@SirajRaval
@SirajRaval 7 жыл бұрын
hmm good thought . an autoencoder could work well.
@anonco1907
@anonco1907 7 жыл бұрын
The memes were distracting, was to busy laughing that I didn't learn anything.
@TheLordyyx
@TheLordyyx 7 жыл бұрын
Hey Siraj, Deep Mind also works on a StarCraft 2 Learning Enviroment. I would love to see a video about it :)
@tomwojcik
@tomwojcik 7 жыл бұрын
Video uploaded Aug 2017 and it's only 9:46 long? Autolike from me :)
@albertoguerrini9761
@albertoguerrini9761 5 жыл бұрын
"We can't be sure that we'll get the same rewards in another episode" to justify discounted rewards... There's a gap between the two that I can''t seem to grasp, could anybody help?
@1992jamo
@1992jamo 4 жыл бұрын
I think he's looking at it the wrong way. A higher discount value means you value actions in the future more so, and a lower discount factor means that you value short term rewards. Long term goals seem the best, but long term predictions are less accurate.
@BhagavatVibes
@BhagavatVibes 7 жыл бұрын
Hey Siraj, fantastic work. I am a unity developer so how can i integrate this functionality in games i already coded. Best wishes for future videos.
@larryteslaspacexboringlawr739
@larryteslaspacexboringlawr739 7 жыл бұрын
thank you for deep q video game video
@sandzz
@sandzz 7 жыл бұрын
Bill Nye of Computer Science Kanye of Code Beyonce of Neural Networks Osain Bolt of Learning Chuck Norris of Python Jesus Christ of Machine Learning
@SirajRaval
@SirajRaval 7 жыл бұрын
thanks Sandzz
@sandzz
@sandzz 7 жыл бұрын
I copied it from your channel description...I don't deserve that "thanks"
@sandzz
@sandzz 7 жыл бұрын
I copied it from your channel description....I don't deserve that "thanks" .
@GameCasters
@GameCasters 6 жыл бұрын
but how do you grab input from a game in android? for example, an APK?
@anudeep168
@anudeep168 7 жыл бұрын
Awesome video :) But reminded me of Xin-Yang's HotDog-NotHitDog App :D
@codethings271
@codethings271 7 жыл бұрын
thst was a classifier , SUPERVISED learning
@NaoshikuuAnimations
@NaoshikuuAnimations 7 жыл бұрын
Just a piece of advice, I hope you see this : never speak while showing text ! (I remember Vsauce saying this in a video too) But really, either show text and read it, or show images / yourself while talking; but displaying a text while saying something different is really hard to follow. If you want to talk about a part of the text, try to darken everything but the line you're talking about; overwise we won't know where to stop and whether to listen to you or read. (at least that's what most "educational" KZbinrs I follow do, and it works quite well) Especially when you're talking about such complicated subjects (and with such pace), I think that's important ! Hope it'll be useful somehow; Thanks for the vid' !
@SirajRaval
@SirajRaval 7 жыл бұрын
great point thanks
@yashagarwal8249
@yashagarwal8249 6 жыл бұрын
Excellent point
@FilipeRebollo
@FilipeRebollo 6 жыл бұрын
If advices were good it would not be for free..
@CausticCatastrophe
@CausticCatastrophe 5 жыл бұрын
i just pause.
@MissFashionDesign
@MissFashionDesign 7 жыл бұрын
Siraj Rival is the neurotransmitter of generation z
@kermitthehermit9373
@kermitthehermit9373 6 жыл бұрын
we all miss Chester!!😢
@synetic707x
@synetic707x 7 жыл бұрын
A video about Q learning for video games on actual games (without opengym) would be great
@SirajRaval
@SirajRaval 7 жыл бұрын
will consider thanks
@synetic707x
@synetic707x 7 жыл бұрын
Siraj Raval Awesome, thank you.
@OthmanAlikhan
@OthmanAlikhan 5 жыл бұрын
Thanks for the video =)
@bryanbocao4906
@bryanbocao4906 6 жыл бұрын
any decent papers would be great!
Actor Critic Algorithms
9:44
Siraj Raval
Рет қаралды 97 М.
Learn Machine Learning Like a GENIUS and Not Waste Time
15:03
Infinite Codes
Рет қаралды 214 М.
Каха и дочка
00:28
К-Media
Рет қаралды 3,3 МЛН
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 35 МЛН
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 54 МЛН
A Guide to DeepMind's StarCraft AI Environment
24:45
Siraj Raval
Рет қаралды 76 М.
Reinforcement Learning from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 80 М.
Training AI to Play Pokemon with Reinforcement Learning
33:53
Peter Whidden
Рет қаралды 7 МЛН
How AI Discovered a Faster Matrix Multiplication Algorithm
13:00
Quanta Magazine
Рет қаралды 1,5 МЛН
Hyperparameter Optimization - The Math of Intelligence #7
9:51
Siraj Raval
Рет қаралды 113 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 1,4 МЛН
Multi-Agent Hide and Seek
2:58
OpenAI
Рет қаралды 10 МЛН
Build an Mario AI Model with Python | Gaming Reinforcement Learning
1:17:06
Nicholas Renotte
Рет қаралды 166 М.
I Made an Electronic Chessboard Without Turns
14:32
From Scratch
Рет қаралды 994 М.
Каха и дочка
00:28
К-Media
Рет қаралды 3,3 МЛН