ViTPose: 2D Human Pose Estimation

  Рет қаралды 3,894

Soroush Mehraban

Soroush Mehraban

Күн бұрын

Пікірлер
@wolpumba4099
@wolpumba4099 Жыл бұрын
- 0:00: The video discusses vit post paper which is currently leading in 2D post estimation on the Ms coco data set. - 0:13: Previous attempts to use Transformers for 2D Pro estimation have included transpose and token pose. - 0:26: Transpose uses a CNN backbone to extract local information from the input image and a Transformer encoder to understand the skeleton key points in the image. - 0:58: Token pose uses a similar approach but includes random tokens to represent missing or occluded key points. - 1:33: Another attempt, HR former, combines Transformer blocks and convolutional blocks for down sampling and up sampling. - 2:11: Vit pose simplifies the process by using only Transformers, making it easier to deal with the problem. - 2:21: Vit pose uses an encoder which is a Transformer to create tokens from an input image. - 3:50: Vit pose has two different decoder options - classic decoder and simple decoder. - 6:15: Vit pose allows multi-dataset training, enabling the utilization of different decoders depending on the data set. - 7:03: The video presents different variants of vit pose like base, large, huge, and gigantic, which differ in the number of layers and channel size. - 7:27: The video discusses the simplicity and scalability of vit pose. - 8:33: The video discusses the influence of pre-training data on the performance of vit pose. - 10:11: The video discusses the influence of input resolution on the performance of vit pose. - 11:32: The video discusses the influence of attention type on the performance of vit pose. - 14:55: The video discusses the influence of partially finetuning on the performance of vit pose. - 16:02: The video discusses the influence of multi-dataset training on the performance of vit pose. - 16:21: The video discusses the use of knowledge distillation to improve the generalizability of the model. - 21:12: The video presents the results of vit pose in comparison with different modules for the task of 2D post estimation on Ms Coco dataset. Positive Learnings: - Vit pose simplifies the process of 2D pose estimation by using only Transformers. - The use of an encoder which is a Transformer to create tokens from an input image has proven to be effective. - The use of different variants like base, large, huge, and gigantic can enhance the performance of vit pose. - The use of pre-training data can improve the performance of vit pose. - The use of knowledge distillation can improve the generalizability of the model. Negative Learnings: - Previous attempts to use Transformers for 2D Pro estimation such as transpose and token pose had limitations. - The use of a CNN backbone in transpose limits its effectiveness. - Token pose's use of random tokens to represent missing or occluded key points is not the most efficient approach. - HR former's combination of Transformer blocks and convolutional blocks for down sampling and up sampling makes it complicated. - Partially finetuning can negatively affect the performance of vit pose.
@amirhosseinmohammadi4731
@amirhosseinmohammadi4731 4 ай бұрын
It was very comprehensive, thanks a lot Soroush
@mjalali3109
@mjalali3109 Жыл бұрын
Congratulations, a perfect and neat job
@francisferri2732
@francisferri2732 Жыл бұрын
Thank you for your videos! they are very good to know the state of the art
@soroushmehraban
@soroushmehraban Жыл бұрын
Glad you enjoyed it
@rohollahhosseyni8564
@rohollahhosseyni8564 Жыл бұрын
Great job!
@alihadimoghadam8931
@alihadimoghadam8931 Жыл бұрын
nice job
@soroushmehraban
@soroushmehraban Жыл бұрын
Thanks
@mrraptorious8090
@mrraptorious8090 8 ай бұрын
Hey, I am asking myself how to train ViTPose by myself. Did you coincidently trained it by yourself? If so could you share experiences?
@nikhilchhabra
@nikhilchhabra Жыл бұрын
Thank you for this Interesting video. Would be interesting to see Bottom up pose estimation using transformers like ED-Pose. VitPose is top down so (a) Inference time increases with number of person. (b) It can not handle overlapping human scenarios.
@soroushmehraban
@soroushmehraban Жыл бұрын
Thanks for the feedback. I didn’t know about the ED-Pose. Surely will read it soon
@Fateme_Pourghasem
@Fateme_Pourghasem Жыл бұрын
That was great. Thanks.
@soroushmehraban
@soroushmehraban Жыл бұрын
Thanks for the feedback
@shklbor
@shklbor 4 ай бұрын
how do they detect poses from heatmaps for say 'k' people?
@shklbor
@shklbor 4 ай бұрын
nevermind it doesn't detect multiple poses
@shrayesraman5192
@shrayesraman5192 Ай бұрын
Concievably have two stages with a human object detection and then crop for pose estimation
@ngtiens_dat
@ngtiens_dat 3 ай бұрын
làm ơn cho tôi code
AI can't cross this line and we don't know why.
24:07
Welch Labs
Рет қаралды 1,5 МЛН
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds
4:55
«Жат бауыр» телехикаясы І 26-бөлім
52:18
Qazaqstan TV / Қазақстан Ұлттық Арнасы
Рет қаралды 434 М.
Непосредственно Каха: сумка
0:53
К-Media
Рет қаралды 12 МЛН
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
28:39
Learn Machine Learning Like a GENIUS and Not Waste Time
15:03
Infinite Codes
Рет қаралды 351 М.
Vision Transformer (ViT) Paper Explained
6:41
Soroush Mehraban
Рет қаралды 3,2 М.
7 Outside The Box Puzzles
12:16
MindYourDecisions
Рет қаралды 82 М.
2D Human Pose Estimation with OpenPose: Deep Learning in Action
12:55
OpenCV University
Рет қаралды 9 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45