1:12 Outline 1:36 Approaching New Problems 2:00 When you have a new algorithm 4:50 When you have a new task 6:21 POMDP design 9:31 Run baselines 10:56 Run algorithms reproduced from paper with more samples than stated 13:00 Ongoing development and tuning 13:18 Don't be satisfied if it works 14:50 Continually benchmark your code 15:25 Always use multiple random seeds 17:10 Always be ablating 18:21 Automate experiments 19:17 Question on frameworks for tracking experiment results 19:47 General tuning strategies for RL 19:58 Standardizing data 22:17 Generally important hyperparameters 25:10 General RL Diagnostics 26:15 Policy Gradient strategies 26:21 Entropy 27:02 KL 28:07 Explained variance 29:41 Policy initialization 30:21 Q-learning strategies 31:27 Miscellaneous advice 35:00 Questions 35:21 how long to wait until deciding whether code works or not 36:18 unit tests 37:35 what algorithm to choose 39:28 recommendations on older textbooks 40:27 comment on evolution strategies and OpenAI blog post on it 43:49 favorite hyperparameter search framework
@TheAIEpiphany3 жыл бұрын
I love John's presenting style he's super positive and enthusiastic, great tips thank you!
@agarwalaksbad7 жыл бұрын
This is a super useful lecture. Thanks, John!
@FalguniDasShuvo2 жыл бұрын
Wow! I love how simply John conveys great ideas. Very interesting lecture!
@SinaEbrahimi-ee3fq6 ай бұрын
Awesome talk! Still very relevant!
@sclmath57 жыл бұрын
What a number to end the video, 44:44.
@ProfessionalTycoons6 жыл бұрын
this was a great talk .
@zhenghaopeng66334 жыл бұрын
Hi there! Can I upload this lecture in Bilibili, a similar-to-youtube, famous video website in China? Many students are there and wish to get access to this insightful talks! Thanks!
@piyushjaininventor Жыл бұрын
may be view on youtube? its free :)
@BahriddinAbdiev6 жыл бұрын
We (3 students) exploring DQN and different types of it i.e. Double DQN, Doube Duelling DQN, Prioritized Experience Replay, etc. There is one thing that we all are facing: even it converges, if you run it long enough at some point it diverges again. Is this normal or it should converge and stay there or become even better always? Cheers!
@alexanderyau63476 жыл бұрын
Hi, I think it normal. But I don't know how does it come out. Maybe the model learned too much and become stupid, LOL.
@yoloswaggins21616 жыл бұрын
No this is not supposed to happen. I've seen it happen for a couple of reasons but the most common is people scaling by a standard deviation that gets very close to 0 due to too much similar data.
@georgeivanchyk93764 жыл бұрын
If you cut all the times he said 'ah', the video would be 2 times shorter