Reinforcement Learning with Quadruped Robot Bittle and NVIDIA Isaac Gym: Almost Walking

  Рет қаралды 12,620

Hardware.ai

Hardware.ai

Күн бұрын

My experiments with Isaac Gym, high performance physics simulation environment from Nvidia and an affordable yet agile DIY quadruped robot from Petoi, Bittle.
#robot #python #deeplearning #nvidia #dog
‪@sentdex‬ - more RL with Bittle experiments here!
Github repository:
github.com/AIWintermuteAI/Isa...
Nvidia Isaac Gym Preview Release:
developer.nvidia.com/isaac-gym
Bittle (official Petoi shop):
www.petoi.com/pages/bittle-op...
Credits for music:
Labyrinth - A Chillwave Mix
• Labyrinth - A Chillwav...
whilefalse - Insight
Decisive Koala - Future Roads (ft Star Madman)

Пікірлер: 42
@Hardwareai
@Hardwareai 3 ай бұрын
Support my work on making tutorials and guides on Patreon! www.patreon.com/hardware_ai
@PP-fh2mk
@PP-fh2mk 2 жыл бұрын
I'm trying on this project too Glad to see more information about Isaac gym I hope your channel gets bigger!
@tastlk6351
@tastlk6351 2 жыл бұрын
hello i am working on this project too,and I have achieved some works so far.I hope that we can discuss together and collaborate for this project
@Hardwareai
@Hardwareai 2 жыл бұрын
I hope so too!
@PP-fh2mk
@PP-fh2mk 2 жыл бұрын
@@tastlk6351 sry for late reply I found leggedgym and tried insert own model but it doesn't work well.. This is really hard project ever
@TheChromePoet
@TheChromePoet 2 жыл бұрын
@@Hardwareai Can you please tell me something, is it possible to fast forward the learning process to extreme speeds, for example, 30 days would be equivalent to 30,000 years of learning?
@mlcat
@mlcat 2 жыл бұрын
Good job and congrats on relocation!
@Hardwareai
@Hardwareai 2 жыл бұрын
Thank you so much 😀
@allanboughen
@allanboughen 2 жыл бұрын
Creating RL is going to continue to get faster & easier. It will become thinking like a robot not writing hundreds of lines of code. The T-800 will be here very soon & we'll need experts in edging & ball fondling at the ready.
@Hardwareai
@Hardwareai 2 жыл бұрын
What is "thinking like a robot"? xD
@allanboughen
@allanboughen 2 жыл бұрын
@@Hardwareai the task will be to set optimal reward functions. To do this you will need to think like your subject (robot). Gym let's you see what he is thinking (or not thinking about) & adjust your functions to suit. The way you arrived at the IMU solution.
@ericblenner-hassett3945
@ericblenner-hassett3945 2 жыл бұрын
It is looking more and more like set poses, with their own reward, should also be added in the learning.
@Hardwareai
@Hardwareai 2 жыл бұрын
Well, I think what you're saying is that imitation learning can be used - that is true, although the setup is a bit cumbersome. What I want to experiment with is RL with feedback, as per deepmind.com/blog/article/learning-through-human-feedback
@levbereggelezo
@levbereggelezo 2 жыл бұрын
Very good
@Hardwareai
@Hardwareai 2 жыл бұрын
Thanks!
@mdaamir9245
@mdaamir9245 Жыл бұрын
0:01 hello and welcome to my channel hardware 0:04 ai 0:06 this is my first video after trading 0:08 east for west and moving to switzerland 0:12 it is also going to be the least 0:14 technical video from recently made and 0:17 will have none of my usual distracting 0:19 hand waving since i'm still setting up 0:21 the studio and the green screen 0:24 so please sit back and relax while i'll 0:28 tell you about reinforcement learning 0:30 powered robots taking over the world 0:34 nope that's not happening anytime soon 0:43 [Music] 0:50 since i finished my tiny ml course 0:53 series i wanted to focus a bit more on 0:56 robotics and publish some of the last 0:59 year's projects that i was quietly 1:01 working on 1:02 you remember that i made a few videos 1:05 about a robotic dog from pitoy beetle 1:09 i discussed how to write a custom driver 1:11 for it and perform a tele operation and 1:14 also how to do mapping with lidar and or 1:17 cameras 1:18 subscribers to my twitter knew that i 1:21 was exploring reinforcement learning for 1:24 opta and beetle 1:27 here is where my perfectionism came into 1:30 play 1:31 reinforcement learning is notoriously 1:33 hard for real world problems and 1:37 in my humble perfectionist opinion i did 1:40 not achieve stellar results and thus did 1:43 not have material to share 1:45 well 1:46 watching a series on training beetle 1:48 from zendex i realized that even the 1:52 past to the final project and experiment 1:55 however unsuccessful they might be are 1:59 interesting and useful to other people 2:02 worst case people can just learn how not 2:04 to do reinforcement learning for 2:06 quadruped robots from me 2:08 plus being in academia now taught me a 2:11 thing or two about failures in 2:13 scientific research having value on 2:16 their own 2:17 spoiler alert though it wasn't a 2:20 complete failure 2:22 first of all if you haven't watched 2:24 zendex videos do watch them he did a 2:27 great job explaining many of the basic 2:30 things that i won't be focusing on this 2:32 video 2:33 i was using nvidia isaac gem as a 2:36 simulation environment for the 2:38 experiments 2:39 it's fresh off the development bench and 2:42 in fact still isn't beta phase 2:45 but the fact that it fully utilizes 2:49 nvidia gpu capabilities for simulation 2:52 makes it possible to keep most of 2:55 elements of your training pipeline 2:57 namely the environment and the agents 3:00 and the products of their interaction in 3:02 gpu as tensors speds up training by a 3:06 lot 3:07 i tried open ai gym before and while it 3:10 was the thing that possibly inspired 3:13 nvidia team to create isaac gin now 3:16 isaac jim can be strongly recommended in 3:19 favor of open's open ai's gym 3:23 speed of training means a great deal 3:26 when testing different reward functions 3:28 verifying the correctness of your 3:29 environment and robot model and so on 3:33 it could very well be different between 3:36 success and never getting past the point 3:39 where your robot just flies across the 3:41 environment like a crazed chickadee 3:46 [Music] 3:52 [Music] 4:02 for my first try i adapted one of the 4:05 example algorithms nvidia shipped 4:08 with their first version of isaac jim 4:11 the ant walker to ottawa 4:14 it uses ppo algorithm which stands for 4:18 proximal policy optimization 4:21 an actor critic method 4:23 it is one of the most commonly used 4:25 baselines for new reinforcement learning 4:27 tasks and its variants have also been 4:30 used to train a robot hand to solve a 4:33 rubik's cube or win dota 2 against 4:35 professional players 4:37 so it's a good place to get started 4:42 experimenting with simpler robots also 4:45 allowed me to get the hang of creating 4:48 somewhat complex urdf robot descriptions 4:52 in phobus 4:53 a more or less vzweek editor 4:56 what you see is what you get 4:58 working as a blender plugin 5:01 it was a success and i was really happy 5:04 to see that virtual otta has learned the 5:07 walking gate resembling the walking gate 5:10 of a normal altar 5:23 after a slight nudge from an aussie 5:25 friend of mine i went to tackle a more 5:29 challenging task 5:31 teaching a quadruped robot how to walk 5:34 first in simulation and then ideally 5:37 utilizing scene to real to transfer the 5:40 learned knowledge to an actual physical 5:42 robot 5:43 creating your df for detail wasn't the 5:46 cakewalk 5:47 but after some try on error i was able 5:49 to create a urdf with 3d models reverse 5:53 engineered by a third-party developer 5:56 and i shared it on github for the people 5:58 to build on my work 6:00 i'm happy to see it was stared quite a 6:03 few times and used by other people 6:06 including sandix 6:08 reinforcement learning algorithm wise 6:10 the first thing i tried was adopting the 6:13 same ant walker approach 6:16 it did not work well or at all 6:19 what was different in detail apart from 6:22 inherently more complexity coming from 6:25 having more joints 6:26 is that it's d its default pose is not 6:30 stable 6:31 changing the initial position of joints 6:33 aka the starting pose however just 6:36 brought different but still not 6:38 satisfying gates like slight jumping on 6:41 the knees 6:42 walking still on the knees and a lot a 6:45 lot of fall among the things that i have 6:50 tried also was tweaking the reward 6:52 function to incentivize a pride movement 6:55 in specific direction and staying above 6:58 certain height 7:00 that just brought more jumping 7:03 in inside it seems the model was 7:06 hopelessly overfitting when the only 7:09 thing it was incentivized to learn was 7:12 movement in specific direction for 7:14 longest time possible without dying 7:18 of being reset in this case mostly from 7:21 falling or turning on its back 7:23 i mean in the end as often happens with 7:27 reinforcement learning algorithms it 7:29 wasn't wrong 7:31 perhaps jumping on its knees was the 7:33 best way to move in a specific direction 7:36 for longest time possible without 7:38 accidents 7:39 it's just not exactly what i wanted from 7:42 it 7:44 with the second version of isaac gym 7:47 nvidia released the code for quadrupid 7:51 walking for animal robots and by 7:54 comparing my old code with it i 7:57 immediately realized what was missing 8:00 piece of the puzzle here 8:02 instead of trying to formulate the 8:04 reward functions as just movement in 8:06 specific direction for longest time 8:09 possible without dying 8:11 in order to avoid overfitting they 8:14 formulated the reward function for 8:16 animal essentially as just difference 8:19 between random angular and linear 8:21 velocity commons and the actual angular 8:25 and linear velocities of robot was 8:27 moving with 8:29 after being given these comments 8:32 that would teach the quadruped how to 8:35 move in different directions and avoid 8:38 the pitfalls of previous approach 8:40 just because jumping like a wounded 8:43 cricket is no longer the best way to 8:45 maximize the reward function 8:49 so a more generalized gate needs to be 8:52 developed 8:53 by and held by the algorithm 8:56 animal code could not be used as a 8:59 drop-in replacement for beetle and i had 9:01 to make quite a few tweaks with respect 9:04 to initial joint position angular linear 9:06 velocities and reward function 9:09 however in the end it worked reasonably 9:12 well i wasn't able to get a perfect 9:14 walking gait but for that i suppose 9:17 more research is needed 9:21 now for some final thoughts when making 9:24 a urdf model for beetle i have already 9:27 contemplated how would it be possible to 9:30 transfer the trained algorithm to real 9:33 robots to bridge the gap between 9:35 simulation and reality 9:38 the code for animal while working for 9:40 robots in simulation takes many 9:43 observations that won't be accessible on 9:45 beetle 9:46 the only two sensors that are available 9:49 are accelerometer and gyroscope which 9:51 actually combined in a single mpu unit i 9:55 placed a virtual gyroscope and 9:57 accelerometer in the center of the board 10:00 when making beetle urdf and this is 10:03 where we can get rotation and speed 10:06 values from virtual accelerometer try 10:09 training algorithm that takes these plus 10:12 velocity comments and outputs angles or 10:15 torque for servers 10:17 the speed at which all of this needs to 10:19 be executed means that the neural 10:21 network very likely needs to be run on 10:24 the edge right at the beetle main board 10:30 the standard knight board will not be 10:32 sufficient since it only has atmega328p 10:36 chip so by board with esp32 needs to be 10:40 used 10:41 fortunately i got one lawyer beetle 10:44 equipped with by board that followed me 10:48 all the way to switzerland
@Hardwareai
@Hardwareai Жыл бұрын
I do need to start adding subtitles :)
@olalekanisola8763
@olalekanisola8763 Жыл бұрын
I quite like this video; it was very helpful.I downloaded your URDF and attempted to replicate Petoi Bittle in Isaac Gym, however several of the body parts started floating for no apparent reason. Petoi robot appears to be in nearly a standing position and moves strangely even when I modify its joints positions during training
@Hardwareai
@Hardwareai Жыл бұрын
Hmmm. I did publish an example code, did you have a look? Does that work normally?
@allanboughen
@allanboughen 2 жыл бұрын
I hope bittle becomes a year long series.
@Hardwareai
@Hardwareai 2 жыл бұрын
It is likely to become a trilogy :)
@nicobohlinger7077
@nicobohlinger7077 7 ай бұрын
Thanks for the great video. I was wondering where you got the PD gains from. Do we know for sure that Bittle uses a PD controller with those gains on the real robot? I didn't find any documentation on this
@Hardwareai
@Hardwareai 7 ай бұрын
Great question! No, unfortunately PD gains are incorrect - they were tweaked to make it work in simulation, as there are quite a few other things needed to make it work on a real robot. Which project are you working on?
@allanboughen
@allanboughen 2 жыл бұрын
OMG, OMG, OMG! I'm one very exited Ozzy. If you & 100 followers all fail (partially succeed) then share you results in an easy to observe environment (ISAAC gym) you will have a powerful & successful team.
@Hardwareai
@Hardwareai 2 жыл бұрын
I was trying to pronounce "Aussie". Did I fail? xD
@andresmendez6151
@andresmendez6151 2 жыл бұрын
Could you please edit the description to contain the git repo. At the moment, is says "WIP" but I am not sure what that means. Nice video by the way.
@Hardwareai
@Hardwareai Жыл бұрын
Oh-oh, I do need to upload the code then. I'll put a reminder for myself.
@maximeg3659
@maximeg3659 Жыл бұрын
strap an NRF24l01 on it, run the net on a big desktop GPU, add some random input delay in the gym to account for lag, i'm curious of the results.
@Hardwareai
@Hardwareai Жыл бұрын
Why is NRF24l01 necessary? ESP32 on BiBoard can be wirelessly connected to PC and it already has accelerometer/gyro.
@pa-su6901
@pa-su6901 2 жыл бұрын
Thank you for giving useful video. As my case also trying to hard to making my mobile robot car. Could you please give me an example python code or helpful reference? I want to reduce my experiment time.
@Hardwareai
@Hardwareai Жыл бұрын
Right. As mentioned in another comment, I'm wrapped up at the moment, but I put a reminder to clean and upload the code!
@ilhamakbar531
@ilhamakbar531 9 ай бұрын
hallo sir, do you have a code for your pondo as the example of the video ?
@Hardwareai
@Hardwareai 9 ай бұрын
What is the pondo? There is GH repository in the video description.
@abdurrahmanaliyu1512
@abdurrahmanaliyu1512 Жыл бұрын
Hi. Can you share the github link for the URDF?
@Hardwareai
@Hardwareai Жыл бұрын
It's in the video description.
@VishnuVardhan-vy1ve
@VishnuVardhan-vy1ve 2 жыл бұрын
Sir can we make a custom reinforcement learning environment with issasc gym
@Hardwareai
@Hardwareai 2 жыл бұрын
Of course. This is actually a point of Isaac Gym.
@VishnuVardhan-vy1ve
@VishnuVardhan-vy1ve 2 жыл бұрын
@@Hardwareai can you a video for that for us plz
@aixle3590
@aixle3590 Жыл бұрын
Hey lovely video, will you be sharing the code?
@aixle3590
@aixle3590 Жыл бұрын
I apologise, I found the github. Thank you for your hardwork
@aixle3590
@aixle3590 Жыл бұрын
How did you actually transferred the model into the BiBoard?
@Hardwareai
@Hardwareai Жыл бұрын
That was not done yet.
AI Learns to Walk (deep reinforcement learning)
8:40
AI Warehouse
Рет қаралды 9 МЛН
I Built a Robot Dog and Made it Dance
15:01
Aaed Musa
Рет қаралды 390 М.
УГАДАЙ ГДЕ ПРАВИЛЬНЫЙ ЦВЕТ?😱
00:14
МЯТНАЯ ФАНТА
Рет қаралды 4,1 МЛН
New model rc bird unboxing and testing
00:10
Ruhul Shorts
Рет қаралды 25 МЛН
Can we simulate a real robot?
21:26
sentdex
Рет қаралды 105 М.
How to learn coding for Robotics ft. Bittle Robot
14:38
Let's Talk With Robots
Рет қаралды 9 М.
Making Real-World Reinforcement Learning Practical
38:23
NVIDIA's "Foundation Agent" SHOCKS The Entire Industry (Dr. Jim Fan)
21:00
OpenAI Plays Hide and Seek…and Breaks The Game! 🤖
6:02
Two Minute Papers
Рет қаралды 10 МЛН
From Reality to Simulation with Isaac Sim (+ RTX3080Ti raffle)
15:51
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 787 М.
Most Advance Robot Dog Petoi-Bittle (Open Cat)
5:42
TecH BoyS ToyS
Рет қаралды 250 М.
Training an unbeatable AI in Trackmania
20:41
Yosh
Рет қаралды 13 МЛН
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,9 МЛН
iPhone 16 с инновационным аккумулятором
0:45
ÉЖИ АКСЁНОВ
Рет қаралды 9 МЛН
Xiaomi SU-7 Max 2024 - Самый быстрый мобильник
32:11
Клубный сервис
Рет қаралды 496 М.
Какой ноутбук взять для учёбы? #msi #rtx4090 #laptop #юмор #игровой #apple #shorts
0:18
Это - iPhone 16 и вот что надо знать...
17:20
Overtake lab
Рет қаралды 132 М.