Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicted parameters of the weak perspective projection #142

Open
longbowzhang opened this issue Jun 9, 2020 · 2 comments
Open

Predicted parameters of the weak perspective projection #142

longbowzhang opened this issue Jun 9, 2020 · 2 comments

Comments

@longbowzhang
Copy link

longbowzhang commented Jun 9, 2020

Hi, @akanazawa sorry to bother you.
I am confused w.r.t the predicted parameters of the weak perspective projection.

  1. As you mentioned that scale s that HMR recovers is essentially focal_length/z, but the following line

    tz = flength / (0.5 * img_size * cam_s)
    suggests that 0.5 * img_size comes into play, why?

  2. This line code

    vert_shifted = verts + trans
    suggests that verts and trans, which is trans = np.hstack([cam_pos, tz]), are in the some but what space?

Thus, could you elaborate a little bit on the parameters of this weak perspective projection?

Thanks in advance.

@jszgz
Copy link

jszgz commented Sep 5, 2020

Hello, do you know how to use mpi_inf_3dhp_to_tfrecords.py to convert mpi_inf_3dhp dataset? I failed because the code use jpg as input but the dataset I downloaded is consisting of videos. Do I need to use ffmpeg and write code to convert avi to jpg?

@nnop
Copy link

nnop commented Jun 29, 2024

In case some is coming to this issue.
For the 1st question. The keypoints is normalized to [-1, 1] in data preprocessing.

hmr/src/data_loader.py

Lines 320 to 325 in f149abe

# Normalize kp output to [-1, 1]
final_vis = tf.cast(crop_kp[2, :] > 0, tf.float32)
final_label = tf.stack([
2.0 * (crop_kp[0, :] / self.output_size) - 1.0,
2.0 * (crop_kp[1, :] / self.output_size) - 1.0, final_vis
])

So the predicted s should be rescaled by 0.5 * img_size for the original image.
That makes tz = f / (0.5 * img_size * cam_s). This is a suttle detail.

For the 2nd question, it's in the camera frame which is not consistent with the paper's equation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants