-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smpl_mesh_root_align #158
Comments
Thanks~ This bug is pretty easy to fix. It is caused by the multipliation between different data type (Half and Float). Please check the data type where bug occured and set them to .float(). The easiest way is to set all outputs with .float(), which I have done in released code. There must be some changes in your code, please set these outputs to float. |
Hey, I spent my afternoon and finally located the issue, OMG...hhh, it seem to be a bug of your current released code: The output vertices of lbs is already float16, and the minus operation within the condition smpl_mesh_root_align can transfer it to float32. So this is the reason why the error only occur when smpl_mesh_root align is False, if we skip it, it would be a float16 output, resulting in the error of Pytorch3D above: Going further, that's because in the lbs: Even if A and W are all float32, the T is still float16, resulting in a float16 verts as well. Pretty weird. Anyway, I have fixed this now. But the questions are still confusing me:Do you have any suggestions about the questions above (fine-tuning config and ROMP_HRNet_32.pkl are using different smpl_root_mesh_align, not consistent) ? Maybe we can further improve it after fixing the bug? All in all, I think there are some bugs for the current released code:
|
Thanks for your great work.
|
You're welcome. I have one more question: Since we're going to do pelvis keypoints alignment before evaluation anyway, why don't we directly align the pelvis keypoints to begin with, instead of root joint alignment? |
You might notice that in different evaluation, we have to align to different root. Because different evaluation benchmarks have different root joint. They may share the same name (Pelvis), but they are really at different position. |
@Arthur151 Great! Thanks for your great help! And I have another question about your result_parser.py: ROMP/romp/lib/maps_utils/result_parser.py Lines 44 to 67 in 20d440f
As I understand, if len(batch_ids)==0, that means there are no matched results. But why do you set batch_ids, person_ids, flat_inds to 0, 0, 1 if there are no results matched (line 58)? This would make outputs['detection_flag'] permanently True because len(batch_dis) will at least be 1. This means that If no person was detected, we will have a meaningless parameter sampled from batch 0, flat_inds 1 to calculate loss with batch0, person0. And, the param_loss and the keypoints_loss will always be calculated as outputs['detection_flag'] is always True, which I think doesn't make any sense. Am I missing something? Is it a feature or a bug?
However, why is model_return_loss True for fine-tuning config bug False for v1.yml? Also, the same goes for argument "new_trianing". Are they two features of newer other version? It seems that only when "new_training" is True and detected person is 0, the detection flag could be False. But I think there is another problem: it's not consistent with the cfg 'new_training''s comment: 'learning centermap only in first few iterations for stable training.'. Because detected person could still be larger than 0, detection will be True again. By that time, the param loss will still be calculated... Anyway, still trying to fully understand your code. Maybe fused multi-version feature makes some part a bit hard to understand now. |
|
Thanks for your prompt reply: About 1, 3: Thanks, this makes sense. But the problem still exists: Why don't you use corresponding flat_inds of batch 0 and person 0 to sample from param_map instead of using always (torch.ones(1)*self.map_size**2/2.)? Won't this wrong position information affect training (if the actual flat_inds is actually far from the sampling position)? This is essentially sampling parameters from where it shouldn't be. Also, this operation is in conflict with the occasion where there are person detected:
Why don't you keep it in the same protocol to use GT position for sampling instead of using a meaning less position flat_inds=(torch.ones(1)*self.map_size**2/2.)? About 2: Does this mean that the camera map is very easy to learn? Even if we use a wrong projection at first, all the pj_2d will be shifted from what they should be. It will be adjust to adapt the new alignment very quickly. |
Thanks for your suggestion about directly using flat_inds. I will try it later. It does easy to learn to shift to new alignment. |
@ZhengdiYu |
Thank you for your compliments. What I've done is just trivial compared to you and other contributors. I'm just reading code and asking question all the time. Not even helpful yet. |
I'm new to this area, so I have more questions than others. Hope you don't mind. Thank you for your help all the time. And here comes another one about coordinate system, : Q1. Why do we need alignment like smpl_root_mesh_alignment and sometime don't? If we set it to False, I think we're essentially learning the real 3D position in camera coordinate system, instead of root_relative position. In this case, I think we don't even need to predict a weak camera parameter, we only need the camera intrincics to convert to u, v coordinate. (The same goes for root-relative 3D pose). Why am I asking this?: Now if I want to directly put those 3D meshes in MeshLab for visualization, if smpl_root_mesh_alignment is True, thus predicting the root_relative 3D coordinates, the pelvis of all meshes are all centered together. Is there a way to put them into correct position in 3D space? Your visualization seems to achieve this. So just now I have looked into your visualization code. You seem to convert weak camera parameter to perspective camera parameter to convert to their original 3D position, so actually we can also get their relative position in 3D space, even if we don't have GT root information. Am I understanding correctly? Actually, I have just tried to use your ROMP_HRNet_32.pkl to get some meshes using your image.py demo. This model should be trained with smpl_root_mesh_aligm=False, but the meshes are still all take pelvis as (0,0,0) and merged together, I don't know why. Still trying to figure out this. I think current evaluation metrics in this area is not really fully considerable and reasonable for multi-person evaluation: Q2.1 When it comes to multi-person evaluation, why don't we need to recover all the person to its real 3D position and put them in a common space to perform evaluation for multi-person at once, instead of aligning and evaluating them separately like single-person (each person is aligned by their own pelvis and evaluate with their corresponding GT)? I think this kind of evaluation doesn't take into account the relative position information. The multi person could even be in random position in the space but still have good quantitative results as they're all aligned to their own GT anyway. Q2.2 In regard to the evaluation, how did you deal with the case of missing person? I think now the quantitative results is only calculated for the successful detection case. There might be a method missing a lot of person but still have better quantitative results. Q3. Btw, I suddenly noticed that in our previous discussion, when you have spent one night to verify 'loading pre-trained backbone or not'. Your MPJPE in training loss seems strange, it's too small compared to your previous log, and his and my log:
|
About Q1, please note that our input is random internet images without camera parameters. This un-calibrated setting makes us to use weak-perspective camera model. But recently, some efforts are made to use perspective camera, like SPEC. |
About Q1,
About Q3, Yes it's your log. I was just wondering that why your training MPJPE(47, 48) is nearly 1/3 of validation MPJPE(140, 150). In your previous log, his log, and my log. This should be 1:1 (e.g. 90:95), not 1:3. Did you use a different loss calculation? |
Oh, sorry, I haven't figured out the reason. I edit the code everyday. I can't remember which one causes this difference. |
Sorry for my late response:
For example, if we have two GT person annotations in the image, namely θ1,β1 and θ2,β2, if we put them into the SMPL layer and output two meshes without any alignment operation. Will they be in correct relative 3D position or simply put together around pelvis (or root) ? |
I suggest that you might need to go deep into the smpl model, like understanding the code. Then you would not have such confusions. |
Thanks for your quick apply. Thanks for your suggestion, I will look into it tomorrow. And by the way, last thing to confirm,did you preprocess the h36m’ theta and beta to be root centered (0, 0, 0)?I think,without trans, its root position is not (0, 0, 0) originally and in the world coordinate system. So just to confirm that, I guess for H36M, you only transform theta and beta to camera coordinate system, and use them directly without trans, and you didn't set the root to (0, 0, 0), right? |
Hi, I notice that your ROMP_HRNet_32.pkl was trained on
smpl_mesh_root_align=False
. But in v1.yml, smpl_mesh_root_align is not set, so it's default valueTrue
.So My questions are:
(Solved✔) Firstly I found my model perform having the same issue as resnet (mesh shift), then I found the reason:
Image.yml
is initially designed for ROMP_HRNet_32.pkl, which was trained on smpl_mesh_root_align=False. If we want to test on image using our model trained from pre-trained model using hrnet and v1.yml, the smpl_mesh_root_align in image.yml should also be set toTrue
, just like resnet Question about the released Resnet-50 trained models #106 . So this was solved.When should smpl_mesh_root_align be True or False? Why did you set it to True for v1.yml and resnet, although it's false for ROMP_HRNet_32.pkl? I think for 3D joints loss, it doesn't matter as long as we would do another alignment before calculating MPJPE/PAMPJPE. And for the 2D part, the weak camera parameters will be automatically learnt to project those 3D joints to align with GT_2d as long as it's consistent all the time. ~So the last question is:
During fine-tuning from your model: ROMP_HRNet_32.pkl using v1_hrnet_3dpw_ft.yml. the smpl_mesh_root_align is also default value
True
, However, ROMP_HRNet_32.pkl was trained withsmpl_mesh_root_align=True
.As we know from question1: if we use different setting of smpl_mesh_root_align, the visualization will be shifted, I think this could be a problem for training and fine-tuning.
And I tried to train with smpl_mesh_root_align from scratch, but it's ended up with error below:
I'm still debugging anyway.
The text was updated successfully, but these errors were encountered: