CS 285: Guest Lecture: Dorsa Sadigh
1:01:41
Пікірлер
@joshuasonnen5982
@joshuasonnen5982 Күн бұрын
Such a great lecture!
@Earth_Rim_Roamer
@Earth_Rim_Roamer 4 күн бұрын
Great talk. Easily to follow along with the visuals and real world demos. Sparked a lot of ideas in my head!
@forheuristiclifeksh7836
@forheuristiclifeksh7836 11 күн бұрын
15:56 RL with llm
@zenohamers2849
@zenohamers2849 24 күн бұрын
This is big!
@corgirun7892
@corgirun7892 Ай бұрын
nice work
@prof_shixo
@prof_shixo Ай бұрын
Interesting! How did you handle different frame sampling rates among those heterogenous datasets?
@CODE7X
@CODE7X Ай бұрын
this seems promising
@fatemehmousavi402
@fatemehmousavi402 2 ай бұрын
Thanks for sharing this video, personally I think this video could be better if the instructor tried to speak slower. Despite of understanding the sentences I think there should be some thinking time between each sentence for audience to understand the concept. By the way it was really helpful.
@SamuelAyanyemiAyankoso
@SamuelAyanyemiAyankoso 2 ай бұрын
Interesting work
@HangLe-ou1rm
@HangLe-ou1rm 2 ай бұрын
Amazing lecture! Thank you!
@JiaxinLee-gy8kc
@JiaxinLee-gy8kc 2 ай бұрын
Why it switch to goal-directed at 1:54? Paper does not mention the condition for switching?
@the_master_of_cramp
@the_master_of_cramp 2 ай бұрын
at 2:40, it should be the expectation where a_t, o_t ~ p_data, right?
@maksymriabov1356
@maksymriabov1356 2 ай бұрын
Thanks. From AI/ML startup founder
@yurakriachko7008
@yurakriachko7008 2 ай бұрын
A very nice overview. What is a good place to learn it from code examples?
@ZinzinsIA
@ZinzinsIA 3 ай бұрын
Fascinating !
@nileshramgolam2908
@nileshramgolam2908 3 ай бұрын
Is the equation for CAL-QL the same for the paper and this video or did it change
@sami9323
@sami9323 3 ай бұрын
Great talk!
@kevon217
@kevon217 3 ай бұрын
Great lecture.
@muzesu4195
@muzesu4195 4 ай бұрын
I have a question about mixture gaussian. It outputs n gassian distributions or add them together with weights? Although I don't think adding with weights will work. And when talking about multimodal, does it mean we can have different ways to get the solution? like the example with tree. Then how come adding degree of freedom will relate to multimodal or different ways to do something
@browncow7113
@browncow7113 2 ай бұрын
If you were to add together two different Gaussian distributions, each with a different mean, then the graph/histogram of this distribution would look like "two humps". This is a probability distribution over the different actions that your agent can take. So, it is saying that there are two actions around which there is a high probability-density (where the two humps are). And those could be, for example, "turn left" (or, turn -90 degrees) and "turn right" (turn +90 degrees).
@muzesu4195
@muzesu4195 2 ай бұрын
@@browncow7113 thank you, I thought it through days ago. I was kind of confused with the concept of the distribution of sum aof variables and simply adding the distributions together.
@lemurpotatoes7988
@lemurpotatoes7988 4 ай бұрын
Extremely interesting, the part about being less aggressive when prediction is harder kind of reninds me of some speculative thoughts I had about artificial neurotransmitters aping the locus coruleus norepinephrine system to control error rates of subcircuits to speed up learning, that idea is still underdeveloped though and I was thinking more about optimizing the joint relationships of different routes into a neuron which would have atrocious scaling done naively.
@ShanshanZhang-kw5qf
@ShanshanZhang-kw5qf 4 ай бұрын
So happy to find this open source here!!!
@texwiller7577
@texwiller7577 4 ай бұрын
Wow... that was really a TOP video
@zhenghaopeng6633
@zhenghaopeng6633 4 ай бұрын
What happen at NoMad 2:45 ? It seems it generates a batch of noisy trajectories?
@forheuristiclifeksh7836
@forheuristiclifeksh7836 4 ай бұрын
1:00
@forheuristiclifeksh7836
@forheuristiclifeksh7836 5 ай бұрын
3:54
@BellaSportMotoristici
@BellaSportMotoristici 5 ай бұрын
Sergey Levine is a gem for this world!
@ArvindDevaraj1
@ArvindDevaraj1 5 ай бұрын
nice intro to RL
@user-xn2wk9oy5j
@user-xn2wk9oy5j 6 ай бұрын
i am really grateful for your great lecture.
@AJ-vx3zk
@AJ-vx3zk 6 ай бұрын
around 7:00, in this case the probability `pi theta (a != a* | s)` is the probability for the non deterministic policy correct? or is it that the probability is coming from the state distribution and the policy is deterministic? or is the policy non-deterministic and the states are also non deterministic?
@user-pu1vr4he6f
@user-pu1vr4he6f 6 ай бұрын
Offline RL for language models is indeed a promising direction to explore. It's worth noting that Sergey, an expert in the field, has expressed concerns about the feasibility of online RL with language models. This reminds me how brilliant of the RLHF approach is
@user-rx5pp3hh1x
@user-rx5pp3hh1x 6 ай бұрын
good energy, mildly funny, by far the best articulation...can only come from the DPO inventors, Eric Mitchell et, al
@haiyunzhang2002
@haiyunzhang2002 6 ай бұрын
Excellent explanation! Thank you for the open-source lectures.
@krezn
@krezn 6 ай бұрын
Дякую!
@timanb2491
@timanb2491 7 ай бұрын
its good its really good
@mehrdadmoghimi
@mehrdadmoghimi 7 ай бұрын
Great presentation! Thank you for sharing
@SantoshGupta-jn1wn
@SantoshGupta-jn1wn 7 ай бұрын
Thanks for posting this!
@joshuasheppard7433
@joshuasheppard7433 7 ай бұрын
Incredibly well explained. Thank you! Great examples at the end.
@user-to9ub5xv7o
@user-to9ub5xv7o 7 ай бұрын
Chapter 1: Introduction to Reinforcement Learning in the Real World [0:00-0:54] - Sergey Lan discusses the importance of reinforcement learning (RL) in AI, differentiating it from generative AI techniques. - Highlights the ability of RL to achieve results beyond human capabilities, using the example of AlphaGo's unique moves. Chapter 2: Advantages of Real World RL and Challenges in Simulation [0:54-2:13] - Emphasizes the power of RL in real-world scenarios over simulation. - Discusses the challenges in building accurate simulators, especially for complex environments involving human interaction. Chapter 3: Reinforcement Learning in Complex Environments [2:13-3:18] - Describes how RL is more effective in complex, variable environments. - Explains the need for advanced learning algorithms to interact with these environments for emerging behaviors. - Argues for RL's potential in optimizing policies specifically for real-world settings. Chapter 4: Progress and Practicality of Real-World Deep RL [3:18-5:52] - Outlines the advancements in deep RL, making it more practical and scalable in the real world. - Details on how sample complexity has significantly improved, allowing for real-world locomotion skills to be learned in minutes. - Discusses leveraging prior data in RL and overcoming challenges previously considered as showstoppers. Chapter 5: Learning Locomotion Skills in Real World with Deep RL [5:52-11:46] - Sergey shares examples of learning locomotion skills in real robots, progressing from simplified models to more complex ones. - Highlights the significant reduction in training time due to improved algorithms and engineering techniques. - Discusses the importance of regularization and taking numerous gradient steps for fast learning. Chapter 6: Manipulation Skills Through Real World Experience [11:46-16:03] - Transition to learning manipulation skills in real-world settings. - Discusses the challenges and solutions for continuous learning without human intervention. - Explains the integration of multiple tasks to enable autonomous learning and resetting. Chapter 7: Leveraging Heterogeneous Prior Data in RL [16:03-21:40] - Focuses on combining data-driven learning with RL to improve efficiency. - Describes how pre-training with diverse data can lead to rapid adaptation in new tasks. - Uses examples to illustrate the efficiency of RL with prior data in both locomotion and manipulation tasks. Chapter 8: Scaling Deep RL for Practical Applications [21:40-26:30] - Sergey discusses the scalability of deep RL in practical, real-world applications. - Provides examples of real-world deployments, including navigation and waste sorting robots. - Emphasizes the importance of continuous improvement and adaptation in diverse environments. Chapter 9: Future Directions and Potential of Real World RL [26:30-38:20] - Concludes with a discussion on future potentials and improvements needed in real world RL. - Suggests areas for further research, including leveraging prior data for exploration and lifelong learning. - Acknowledges the need for more stable, efficient, and reliable RL methods for mainstream adoption.
@omarrayyann
@omarrayyann 7 ай бұрын
Can the rest of the lectures be uploaded? Thanks a lot!
@godwyllaikins3277
@godwyllaikins3277 7 ай бұрын
This was truly an amazing presentation.
@zabean
@zabean 7 ай бұрын
thank you mate
@NoNTr1v1aL
@NoNTr1v1aL 7 ай бұрын
Absolutely brilliant talk!
@ArtOfTheProblem
@ArtOfTheProblem 7 ай бұрын
thank you for sharing
@chillmathematician3303
@chillmathematician3303 7 ай бұрын
We need to make people focus back to RL again
@rfernand2
@rfernand2 7 ай бұрын
Thanks - I haven't been tracking RL for past several years, so this is a nice high-level update on things, with linked papers for details. Given this progress, are we about to see an explosion in robotics deployment? If so, will it be mostly industrial, or some consumer impact also?
@Alex-kr7zr
@Alex-kr7zr 2 ай бұрын
I'd bet that the consumer impact will be there before this will be picked up by "industrial" applications. Industrial applications are so stuck in the 70s style static robot programming with SPS and ladder logic that they will be extremely late to the game compared to startups and research already working with AI/ML. Also, this will have the most impact on small to medium size companies. Big companies can already afford an expensive robot programmer to the do the programming of the industrial robot, small companies simply can't, so they will benefit most from easier (and therefore much cheaper) robot programming. The hardware might still be expensive, but the current reality is that programming a robot is much more expensive than the hardware (even something like a UR isn't fully end user friendly and requires expert knowledge of some level).
@MessiahAtaey
@MessiahAtaey 8 ай бұрын
This dude is a beast
@aryansoriginals
@aryansoriginals 8 ай бұрын
so flippin cool
@aryansoriginals
@aryansoriginals 8 ай бұрын
love this video!!