Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To get cam_intrinsics and cam_extrinsics from .npz files #300

Open
MoyGcc opened this issue Jul 7, 2022 · 12 comments
Open

To get cam_intrinsics and cam_extrinsics from .npz files #300

MoyGcc opened this issue Jul 7, 2022 · 12 comments

Comments

@MoyGcc
Copy link

MoyGcc commented Jul 7, 2022

Hi Yu, thanks for your work and such an organized repo!

I'm now using ROMP to get SMPL poses and would like to visualize the meshes via a perspective camera. I usually use a similar way as chungyiweng/humannerf#1 to convert the s, t_x, and t_y along with a human bbox to the pinhole camera parameters and it does work on VIBE output. However, it seems like I cannot easily get the parameters with the output from ROMP .npz outputs (I can get a rough bbox from pj2d_org ). I found that the scaling factor s is quite different between the VIBE and ROMP estimation for the same input image (~1.14 in VIBE and ~0.58 in ROMP). Could you please point out how I can quickly obtain (estimate) the camera intrinsic and extrinsic? Thanks!

@Arthur151
Copy link
Owner

Hi, @MoyGcc
Thanks for you kind word!
You can use this function to achieve this.

def estimate_translation_cv2(joints_3d, joints_2d, focal_length=600, img_size=np.array([512.,512.]), proj_mat=None, cam_dist=None):

It take the estimated 3D joints, the 2D joints pj2d_org, image size, and the focal length to estimate the corresponding
3D translation in the camera space defined by these intrinsice paramters.
In BEV, we calculate the focal length like this:
focal length: when FOV=60 deg, 443.4 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(30))
BEV takes the square 512 x 512 input, we assume the FOV = 60 degree

@hongsiyu
Copy link

hongsiyu commented Jul 8, 2022

Hi Yu, thanks for your work and such an organized repo!

I'm now using ROMP to get SMPL poses and would like to visualize the meshes via a perspective camera. I usually use a similar way as chungyiweng/humannerf#1 to convert the s, t_x, and t_y along with a human bbox to the pinhole camera parameters and it does work on VIBE output. However, it seems like I cannot easily get the parameters with the output from ROMP .npz outputs (I can get a rough bbox from pj2d_org ). I found that the scaling factor s is quite different between the VIBE and ROMP estimation for the same input image (~1.14 in VIBE and ~0.58 in ROMP). Could you please point out how I can quickly obtain (estimate) the camera intrinsic and extrinsic? Thanks!

Have you solved this problem? I met the same issue.

@MoyGcc
Copy link
Author

MoyGcc commented Jul 8, 2022

Hi Yu @Arthur151,
Thanks so much for the quick reply and for pointing out the correct way to do this. In the end, I followed the way that you applied for evaluation on AGORA:

def save_agora_predictions_v6(image_path, outputs, save_dir):

and now the projected smpl mesh can align well with the image. Though there is still a "slight" difference (below, the one with normal color is my projected result) in terms of the projection. I think it's okay. @hongsiyu, you could probably also refer to the evaluation on the AGORA dataset for doing this.

0000
00000000

@Arthur151
Copy link
Owner

That's clever. Glad to hear that.

@Andyen512
Copy link

So the intrinsic is ([443.4, 1, 512//2], [1, 443.4, 512//2],[0, 0, 1]), and the extrinsics[:3, 3] = cam_trans , right? @MoyGcc

@hongsiyu
Copy link

hongsiyu commented Jul 8, 2022

Hi Yu @Arthur151, Thanks so much for the quick reply and for pointing out the correct way to do this. In the end, I followed the way that you applied for evaluation on AGORA:

def save_agora_predictions_v6(image_path, outputs, save_dir):

and now the projected smpl mesh can align well with the image. Though there is still a "slight" difference (below, the one with normal color is my projected result) in terms of the projection. I think it's okay. @hongsiyu, you could probably also refer to the evaluation on the AGORA dataset for doing this.
0000 00000000

I followed the way you mentioned with my own video. But the progress image in humannerf seems not correct. Do you succeed in trainiing humannerf with AGORA dataset.

@Arthur151
Copy link
Owner

Arthur151 commented Jul 8, 2022

@Andyen512
No, The image size should be the original size on input image, not on the resize BEV's input map.
It is fine to directly use the camera intrinsic in humannerf during calculating the 3D translation using estimate_translation

"cam_intrinsics": [
            [23043.9, 0.0,940.19],
            [0.0, 23043.9, 539.23],
            [0.0, 0.0, 1.0]

@hongsiyu
Copy link

hongsiyu commented Jul 8, 2022

@Andyen512 No, The image size should be the original size on input image, not on the resize BEV's input map. It is fine to directly use the camera intrinsic in humannerf during calculating the 3D translation using estimate_translation

"cam_intrinsics": [
            [23043.9, 0.0,940.19],
            [0.0, 23043.9, 539.23],
            [0.0, 0.0, 1.0]

Thank you very much, the focal length makes me succeed in training humannerf.

@Andyen512
Copy link

Andyen512 commented Jul 8, 2022

@Arthur151 Sorry, why using the humannerf cam_intrinsics? I was using romp --mode=video --calc_smpl --render_mesh -i=/path/to/video.mp4 -o=/path/to/output/folder/results.mp4 --save_video to inference my own video and I see the args.focal_length in

V6_group.add_argument('--focal_length',type=float, default = 443.4, help = 'Default focal length, adopted from JTA dataset')
is 443.4. Also, the original size of input image is 1920*1080, so why not the cam_intrinsics[0][2]=960, cam_intrinsics[1][2]=540?
I was so confused.

@Arthur151
Copy link
Owner

@Andyen512
That focal length (23043.9) / image center coords (940.19, 539.23) is just for training humannerf in their camera extrinsic matrix.

To inference on you own video, you can re-calculate the focal length :
when FOV=60 deg, focal length = H/2 * 1/(tan(FOV/2)) = 1920/2. * 1./np.tan(np.radians(30)) = 1662.768

@Andyen512
Copy link

ok thx, I'll try

@mch0dmin
Copy link

length

hi @hongsiyu , can you tell me how to use ROMP to obtain "3x3" cam_intrinsics and "4x4" cam_extrinsics, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants