Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use different input size on train and inference #17

Open
nebuladream opened this issue Dec 11, 2020 · 7 comments
Open

Use different input size on train and inference #17

nebuladream opened this issue Dec 11, 2020 · 7 comments

Comments

@nebuladream
Copy link

We use 256 image input to your trained 512 input model, and adjust the coordconv size, but the result seems not right. Does your network need the same input size on train and test stage?

@Arthur151
Copy link
Owner

Could you please upload some demo images? So I can figure out the reason.

@nebuladream
Copy link
Author

image
my input size is 256, coordconv size is 64.

@Arthur151
Copy link
Owner

From my view, the demo image you uploaded fails in estimating the person 2D location, right?
The released demo code is capable of resizing the input image of random size to 512*512, see this line. Therefore, to get the right result, I think that you may need to maintain the original setting with input size 512 and coordconv 128.

@nebuladream
Copy link
Author

It confused me. The network is a fully convolution neural network, why it is failed when changing the input size. Is it a bug in postprocessing? The mesh pose seems right, but the camera pose seems wrong. If the center map prediction failed, we can not get the right human pose since the mesh param is extracted by center map prediction.

@Arthur151
Copy link
Owner

If you change the Coordconv from 128 to 64, the input to the head network will also change. During training, the head network has been familiar with the constant 128-index input from Coordconv. I think the camera parameter is calculated based on the constant 128-index input. I can't make sure whether the camera estimation would be stable if you change its input from 128 to 64. It may affect the feature resolution. After all, the feature is learnt from a different resolution.

@nebuladream
Copy link
Author

image
We tried a lower resolution image 224x128 as input, but padding the image to 512x512, the result seems right.

@Arthur151
Copy link
Owner

If you try to make full use of the resources, you may mix up 4~8 224x128 image to a 512x512 one.
Glad to see it works with padding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants