[Seminar] Towards Unstructured Unlabeled Optical Mocap: A Video Helps!

  Рет қаралды 121

강형엽 IIIXR LAB

강형엽 IIIXR LAB

Күн бұрын

Пікірлер: 28
@IIIXRLab
@IIIXRLab 2 ай бұрын
Please review each point of feedback carefully, and respond with how you plan to address each one moving forward. [Overall] Your seminar was commendable for its clarity and effort to analyze weaknesses beyond the paper. While the presentation was easy to follow at a basic level, it lacked depth in key areas. Including more of your unique insights and expertise would greatly benefit the audience and elevate the overall learning experience. Furthermore, attention to formatting is needed. [Strength] S1: You successfully conducted the seminar in a manner that was easy to understand, even at a very basic level. S2: Your effort to analyze the weaknesses that were not explicitly mentioned in the paper was commendable. [Feedback] F1: Formatting Attention! Ensure complete sentences that start and end with a subject and verb are punctuated with a period. For phrases that do not form complete sentences, omit the period. On pages 3 and 4, the captions for the figures were not correctly attached. Due to that, more effort is needed to properly understand these figures. At the very least, each figure should have a title that clarifies its content. F2: Use more animations or indications! - On page 7, it was difficult to match the figures with the explanations. Instead of casually placing the figures, consider breaking down the explanations into clear sections, pointing out key elements, or using animations to make your points clearer. - On page 11, it was unclear what exactly you were referring to. Consider using animations to reveal one element at a time or highlighting areas with a red box to direct attention to specific points. - Even if time constraints prevent you from covering everything, it would be beneficial for the audience if you could project your unique expertise and provide insights that go beyond the paper. This would enhance the value of the seminar, not just as a paper presentation, but as a learning experience where your own knowledge and perspective are shared. [Question] Q1: I’m having difficulty understanding the marker-part matching explained on Page 7. Could you please clarify what "Count Groups" and "Fit Each Part" mean? I would appreciate it if you could provide more details in a separate response. Q2: Page 11 is challenging to follow. Could you explain those step by step? Q3: Does "UUO" imply that users are unaware of the marker configuration? Is it used in scenarios involving an arbitrary marker set?
@critbear
@critbear 2 ай бұрын
I will pay attention to sentence format and use more visual materials. A1: The phrase "Count Groups" refers to the process of determining how many groups to classify each marker into by analyzing the sequence of the markers data. During this process, average linkage clustering is used, and the number of clusters is the number of body parts. Then, "Fit Each Part" refers to the process of matching the clustered marker's point cloud to the body part's vertex point cloud on a one-to-one basis, using the Chamfer distance as the criterion. A2: Page 11 shows summary of previous pages. Firstly, when the video and raw marker data are received, initial SMPL parameters are extracted from the video(Page 6). After that, the markers in raw marker data are clustered according to the number of body parts, and assigned to the body parts(Page 7). Next, to find the precise pose, the prossess that finding global orientation(pose fitting) marker-vertex correspondence, inverse kinematics are performed(Page 8~10). A3: "UUO" stands for Unlabeled and Unstructured Optical. This is the marker data output from the mocap system, but since it is not called a "raw marker", I don't know if it is data with noise removed.
@한동현Han
@한동현Han 2 ай бұрын
Thank you for the interesting presentation. Q1: How much does HMR2's performance affect the result quality? It seems that HMR2's performance is crucial to the quality of the results. Q2: Why can't we just use HMR2's output alone? Why are we combining HMR2 with marker solving?
@critbear
@critbear 2 ай бұрын
1. The result of HMR2.0 is mainly used to perform marker-body part matching. Unless 1/10 of all frames are seriously misinferred, it won't have a big impact because of subsequent processes like inverse kinematics. Rather, the noise in the marker data will have a bigger impact on the results. 2. The pose estimation model like HMR2.0 can't estimate the precise 3D positions. It just looks good only when rendering in the captured direction.
@박주현-123
@박주현-123 2 ай бұрын
Thank you for the presentation. 1. What is SMPL, and how does it differ from traditional methods of representing motion? Why can't we simply use the rotation values of the skeleton joints instead? 2. What criteria are used to determine the placement of markers on the human body in marker-based motion capture systems?
@critbear
@critbear 2 ай бұрын
1. SMPL is parametric body model. The "parametric" means that you can control the whole vertex data (6890x3) by few data (10+3+69). Using only the rotation of the skeleton joints does not utilize the body shape information and body surface area. So, when using the body's vertices, using a parametric body model can greatly reduce the dimension and complexity of the model. 2. To reduce noise when doing marker-based mocap, markers are attached to points that are affected by the rotation of only one joint and have little movement due to muscle movement.
@홍성은-iiixr
@홍성은-iiixr 2 ай бұрын
Thank you for your presentation. Then, there are any research for other modalities? In this evaluation, there are results by only markers and marker+video.
@critbear
@critbear 2 ай бұрын
This paper only includes an evaluation of the that modalities. Most mocap datasets, excluding IMU, are either video-only or marker + video. However, it might be possible using the synthetic dataset BEDLAM.
@김병민_dm
@김병민_dm 2 ай бұрын
Thanks for your presentation. I have two question below: Q1. I want to know more about marker-part matching. How this method select the best match from segment markers to body parts? Q2. In evaluation, I think This method has shown far superior performance than others, so what do you think is the most fractional of the reasons?
@critbear
@critbear 2 ай бұрын
1. They utilize Chamfer distance loss. It meaures the difference between two point cloud. One point cloud is the group of clustered markers, and the other is the vertices of body part. 2. I think utilizing the output of video pose estimation model is very useful. It greatly reduce the complexity of problem.
@misong-kim
@misong-kim 2 ай бұрын
Thank you for your presentation. In the marker segmentation process, you mentioned that markers are grouped based on the distances between them. However, if the distances between markers are not consistent in certain body parts, such as near joints like the elbow or knee, how accurately can segmentation be performed in these cases? Are there any additional corrections or methods employed to improve the accuracy of segmentation in such scenarios?
@critbear
@critbear 2 ай бұрын
For that reason, when doing marker-based mocap, we need to carefully place the markers on the body. In the paper, they decide in advance how many groups they will cluster into. This will reduce errors.
@LRH_iiixrlab
@LRH_iiixrlab 2 ай бұрын
Thank you for the presentation. The question is, what techniques are used to minimize the mean per joint position error (MPJPE) and mean per joint velocity error (MPJVE) in your method, and how do these compare to existing methods?
@critbear
@critbear 2 ай бұрын
In my method? I'm also researching mocap, and I use inverse kinematics with optimizations like levenberg-marquardt as post-processing, but it mainly depends on the performance of neural networks.
@정승재_teclados078
@정승재_teclados078 3 ай бұрын
Thank you for introducing Unstructured Unlabeled Optical Mocap. I have a question. I'm wondering what the main contribution of this paper is compared with previous works. Is improving accuracy compared to existing markers+video based methods the most important contribution in this field? In this paper, the proposed approach to enhance performance involves designing a marker-part matching and a mocap solver, but these approaches seem to solve the problem through handcrafted algorithms rather than being novel. Is the novelty of the methodology or the heaviness of the pipeline not a significant concern?"
@critbear
@critbear 2 ай бұрын
Previously, there was almost no research on video+marker mocap. Using two different sensor data is troublesome. And video based markerless pose&shape estimation model is not robust. However, recently, the performance of pose estimation models such as HMR2.0 has improved significantly. So, we can consider video+marker and utilizing at marker segmentation and pose fitting as contribution.
@노성래99
@노성래99 2 ай бұрын
Thank you for a great presentation. 1. I have doubts about the reliability of chamfer distance, even with marker segmentation. For example, can a segmented group of markers accurately find the proper position of a body if the SMPL model is either very skinny or fat? The distance between markers close to joints may vary significantly more than that of other markers. Does marker segmentation perform robustly in such cases too?
@critbear
@critbear 2 ай бұрын
The HMR2.0 can also predict the body shape of actor approximately. And in current commercial mocap system, the location of markers on body is important. So to reduce noise when doing marker-based mocap, markers are attached to points that are affected by the rotation of only one joint and have little movement due to muscle movement. (It is why i research the arbitrary marker layout) I think marker segmentation works well because it is predetermined how many groups to cluster into, the entire frames are utilized for clustering, and there is no marker noise.
@SeungWonSeo-q3s
@SeungWonSeo-q3s 2 ай бұрын
Thank you for the presentation. I have some questions that I would like to ask: - I assume there is a dataset used for training HMR2.0 (since the title includes "Transformer"). Is there a dataset that matches the SMPL parameters with videos? - Regarding the Marker Part Matching section, how are the body parts in the four images for each part generated? Are the body parts taken from a dataset or are they generated? I'm curious about how this process works in detail, particularly for selecting the best match.
@critbear
@critbear 2 ай бұрын
- Human3.6M, MPI-INF-3DHP, COCO, MPII, InstaVariety, AVA and AI Challenger dataset were used to train HMR2.0 - The body parts used in marker matching are from the output of HMR2.0. The pose estimation model like HMR2.0 can predict the 3D body mesh approximately. Also, since SMPL model contains skinning weight data, we can know which vertex is part of each body part.
@황주영-j7y
@황주영-j7y 2 ай бұрын
Hello, thank you for the seminar. 1. What exactly is unlabeled and unstructured? 2. I didn't understand the Fit each part very well, but there are several parts that are already set and among them, are you looking for the best match?
@critbear
@critbear 2 ай бұрын
1. Unlabeled means that we don't know where the marker is on the body. And unstructured means that the marker clustering process has not performed in advance. 2. The body parts are the result of video based markerless mocap model(HMR2.0). The clustered markers are assigned to this body parts
@포로뤼
@포로뤼 2 ай бұрын
Thank you for the presentation. 1. On slide 7, "Directly fitting the SMPL model to markers by optimization => stuck in local minima," why can it get stuck in local minima? 2. On slide 9, is it sufficient to have only four cases for the initial root rotation? Couldn’t it be a value in between those cases? 3. What are the satisfaction conditions for solver refinement? 4. I am not entirely clear on the explanation in the limitations section. Does the example in the image suggest that it is difficult to track the marker positions on the other leg because the markers are only attached to one leg? Video is also used. Doesn't this help compensate for the issue? I am not sure if I have understood this correctly.
@critbear
@critbear 2 ай бұрын
1. Because it is very difficult problem when marker is unlabeled. The optimization may requires so many iterations and hard to matching even only torse. 2. Of course utilizing more initial value return good result. But i think the difference is small compared to computational cost. 3. "Solver refinement" is repeating Stage3, 4 one time. 4. The result of video mocap dosen't involve the marker position. So, it is hard to match the video mocap result and marker data when two legs perform same action.
@tjswodud-c6c
@tjswodud-c6c 2 ай бұрын
Thank you for your presentation. I have two questions. Q1: I understood that human pose estimation is performed based on monocular video , and then mocap solving is performed based on the predicted human pose to finally calculate the marker position for the predicted human pose. Am I understanding correctly? In short, I'm wondering if the human pose that is the target of solving in the Mocap Solver process is obtained from the Monocular Reconstruction process (although the two poses look similar when I look at the figure presented in the paper). Q2. Are the UUO markers data used in the Marker-Part Matching process included in the dataset? If not, please explain how to obtain such data. Thank you.
@critbear
@critbear 2 ай бұрын
1. To put it simply, they get "initial" pose and body shape from monocular video. It is not precise data. So to utilize marker data, they refinement the pose and shape. 2. You means that is there ground-truth(labeled and structured) marker data in dataset? Marker-Part matching process is not data-driven method. So it is not necessary to prepare the labeled and structured marker data.
@RounLee0927
@RounLee0927 2 ай бұрын
I understand IK for “poses”, but I'm wondering how IK is applied for shapes and meshes while doing mocap sovling.
@critbear
@critbear 2 ай бұрын
Just as we can reduce errors by optimizing poses in IK, we can reduce errors by optimizing body shape parameters(betas) in SMPL.
DRM explained - How Netflix prevents you from downloading videos?
18:17
Mehul - Codedamn
Рет қаралды 205 М.
When mom gets home, but you're in rollerblades.
00:40
Daniel LaBelle
Рет қаралды 143 МЛН
小路飞还不知道他把路飞给擦没有了 #路飞#海贼王
00:32
路飞与唐舞桐
Рет қаралды 76 МЛН
Happy birthday to you by Secret Vlog
00:12
Secret Vlog
Рет қаралды 6 МЛН
[Seminar] ParCo: Part-Coordinating Text-to-Motion Synthesis
16:00
강형엽 IIIXR LAB
Рет қаралды 174
Harvard Presents NEW Knowledge-Graph AGENT (MedAI)
38:36
Discover AI
Рет қаралды 69 М.
What are AI Agents?
12:29
IBM Technology
Рет қаралды 623 М.
Generative Image Dynamics (CVPR2024 best paper award)
42:48
[Seminar] GSGEN: Text to 3D using Gaussian Splatting
18:36
강형엽 IIIXR LAB
Рет қаралды 205
Think Fast, Talk Smart: Communication Techniques
58:20
Stanford Graduate School of Business
Рет қаралды 42 МЛН
18. UniXcoder: Unified Cross-Modal Pre-training for Code Representation (ACL 2022)
57:20
Human-AI Collaborative Programming Platform
Рет қаралды 29
[Seminar] Physics-aware Hand-object Interaction Denoising
19:37
강형엽 IIIXR LAB
Рет қаралды 111
When mom gets home, but you're in rollerblades.
00:40
Daniel LaBelle
Рет қаралды 143 МЛН