-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pretrained weights #1
Comments
Hi! I started from the weights provided in keras by this gist. Such weights were ported from the original caffe repo. In the keras porting gist, a easy prediction code is presented along with the results, that I copy paste here:
Now, if I run the Keras code loading the weights provided, I obtain
So, something in the keras weights drifted a bit. My pytorch porting yields these last results. This means that my porting keras->pytorch is correct, but the my-keras, gist-keras discrepancy is propagated to pytorch as well :( Hope I made the issue clear. I tried to downgrade my keras to many older versions (back 'till 1.0.2), but still obtained the same results. I leave this issue opened since I'm still trying to align perfectly to keras gist. Best, |
I saw your predict.py, but I didn't find the mean subtraction step. Maybe this is the reason ! |
Has this been resolved by @happygds solution? |
@DavideA I used your code with the weights of 'c3d.pickle' in PyTorch 0.4.0 to predict the action of Roger Federer. But I receive the results like that: |
@BarryBA I just ran the prediction script (without removing the mean) with PyTorch 0.4.0 and it provides me the correct results. Really weird. |
I made a mistake in the stride and padding size of the pool5 layer. And now the results are correct! |
Hi, |
Hi and thanks for interest. I can share with you the snippet to save parameters from keras (v=1.2.2) to file:
Then as you build the pytorch model you can load them into the Concerning the mean: if you train subtracting the mean, you test subtracting the mean. Otherwise you don't. I am not sure whether in the original caffe implementation this step was performed or not, I just wanted to reproduce the keras gist mentioned above. Hope this helps, |
Thank you.I know how to transfer the weight now. I considered that there was a function could transfer the weight from keras to pytorch directly but may not have such a function. |
hi,
|
@EMCL The predicted discussed in this thread is not about that video. they are talking about the video used here: https://gist.github.com/albertomontesg/d8b21a179c1e6cca0480ebdf292c34d2 I tested it on pytorch 1.0 + cuda 9.2 + python 3.6.5. It still works! I was able to get the following.
Also, after I applied the mean subtraction mentioned here #4 I was able to get the following.
Hope this helps! |
Ok I noticed the mean provided in #4 is actually wrong. That mean is for ImageNet. The mean should be from Sports1M. Luckily, I found mean file is here : https://github.com/albertomontesg/keras-model-zoo/blob/master/kerasmodelzoo/data/c3d_mean.npy This should originally from https://github.com/facebook/C3D/blob/master/C3D-v1.0/examples/c3d_feature_extraction/sport1m_train16_128_mean.binaryproto. I checked In short, I guess we should add:
After all these changes, I am getting
for the images in this repo, and
for the video dM06AMFLsrc.mp4 P.S. I checked several other C3D repos imported from caffe but it seems like most do not correctly care the mean subtraction and BGR ordering.... |
Hi everybody, The last post of @apple2373 is helpful. The original mean file, computed on Sports1M, provided in many C3D Caffe repos, is of size 3x16x128x171 (channels x frames x height x width). In these repos, a way to preprocess any video volume, is to 1) resize every frame to 128x171 resolution 2) subtract the mean from the video volume 3) center-crop the video volume to 112x112 by keeping the pixels [8:120, 30:142].
This seems like a more fair reproduction of the original C3D caffe repo, as it does not compute a single channel-wise mean value across all spatial locations and frames. However, there is the prerequisite of resizing the frames in 128x171 before proceeding. A thing that needs to be clarified, is the order of the channels in the mean file, and the order of the channels in the expected image to be fed to C3D. I got the aforementioned results, loading an image in RGB ordered channels, subtracting the mean file as is, and feeding the image as is. When reordering the first and third channel of the mean, the results (see below) weren't disappointing, so I wouldn't exclude the possibility that the mean file is in BGR ordered channels. Any idea? Here is a list of these tests (all of them including resize to 128x171 and center-crop to 112x112), in the following results: Reorder image from RGB to BGR, then subtract mean as is:
Reorder first and third mean channels, then subtract mean:
Subtract mean, reorder cropped image's channels from RGB to BGR:
|
Thank you all for your comments. I guess the only way to validate preprocessing is to measure test set accuracy on Sports1M. Taking a quick peek into the dataset, it seems like a non-trivial task. I hopefully will get some time to do it in the near-mid future. D |
@BarryBA could you please let me know your corresponding stride and padding size of the pool5 layer?. |
Firstly, thank you this c3d implementation in pytorch! net = C3D() I land up with an error as follows: - RuntimeError: size mismatch, m1: [2048 x 4], m2: [8192 x 4096] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:266 Am I doing something wrong? Any help would be appreciated |
It seems like you cropped the network to fc6. That means you dropped the classifier of the model, so you cannot make predictions. One way you can fine-tune up to fc6 is exploit the Instead of doing this:
you do this:
or something equivalent. Hope this helps, |
Thank you for your reply, you are right , I do not want the classification part of the model. I just want the 4096 vector which is the output of the fc6 layer. The 4096 representation vector will then serve as the input vector to my network. So I just need the weights till the fc6 layer and apply the weights to the video input to get the 4096 feature vector...Here is my entire code ` from C3D_Model_RTA import C3D class C3D_Model(nn.Module):
def c3Dfeatures(vector):
data_reshaped = np.load('pickle file') # load the pickle file of the video no_of_groups = data_reshaped.shape[1] ` I get the same error as mentioned in my previous comment. i.e. RuntimeError: size mismatch, m1: [2048 x 4], m2: [8192 x 4096] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:266 It is my intuition that somewhere it is getting an error flattening/reshaping from the pool5 layer to 8192 vector. |
Hi! |
Hi guys, thanks for the explanations about the normalization. In this case, I tried the normalization method as followed. And the result is pretty good. If you are interested, you can definitely give this a try, too! :) With Normalization& Channel swap:
Results:
Only with Normalization, no channel swap:
Results:
|
Any conclusion regarding how to properly feed the model with RGB clips? What are the correct normalization and cropping steps? |
Hi,
First, thank you very much for contributing this c3d implementation in pytorch!
I had a question on the origin of the pretrained weights, did you obtain them by converting them from another source or by training the network yourself ?
The text was updated successfully, but these errors were encountered: