Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoDataset has a bug about data normalization #23

Open
leftthomas opened this issue Apr 5, 2019 · 6 comments
Open

VideoDataset has a bug about data normalization #23

leftthomas opened this issue Apr 5, 2019 · 6 comments

Comments

@leftthomas
Copy link

leftthomas commented Apr 5, 2019

There is a bug about normalize(self, buffer) function in dataset.py, it has not normalize data to [0, 1], which we usually do this in Deep Learning training process with PyTorch.
And I also tested it, if we don't normalize it, the training process was totally failed when I used the official train/test split of UCF101, after 54 epochs, the testing accuracy was only around 5%.
And if we normalize it, the training process was fine, after 5 epochs, it obtained 8.2% testing accuracy.

def normalize(self, buffer):

@leftthomas leftthomas changed the title VideoDataset have a bug about data normalization VideoDataset has a bug about data normalization Apr 5, 2019
@wave-transmitter
Copy link

Why do you think this is a bug? Normalizing data to [0, 1] is not always the case. Subtracting the mean RGB values of the used dataset(usually ImageNet) for backbone's pre-training is also common. Function normalize() follows this approach.

If you want to prove that normalizing data to [0, 1] leads to higher performance, you have to elaborate more on this. The results that you provided are not comparable to each other. You could validate this by training you model while applying each time one of the two normalization approaches and report the results for the same number of epochs.

@leftthomas
Copy link
Author

leftthomas commented Apr 5, 2019

@wave-transmitter The common solution is that Normalization should be done after the data have been scaled to [0,1], we usually call the function ToTensor() then follow with some Normalization ops in PyTorch, and ToTensor() function would change the data to [0,1]. But in this repo, it defines its own totensor() and normalize() functions, it haven't scale data to [0,1], but PyTorch example does.

I have tested with ucf101 split1, then the results showed if we don't normalize the data to [0,1] then the test accuracy is around 5% at epoch 15, but if we normalize the data then the accuracy is around 25% at epoch 15. If you don't believe me, you could try it with ucf101 split1 (not with sklearn random split provided by this repo) by yourself, you will see the same result.

@wave-transmitter
Copy link

It's not that I don't believe you, I am just trying to understand if you are making a fair comparison between the two normalization methods. You should give more details about your set-up, you haven't even mentioned which model you are trying to train...

In my opinion, if you want to evaluate both methods, you should compare the results after a number of epochs where both models have converged. E.g. you can apply an early-stopping after 99.9% accuracy reached in training set, or just train for a higher number of epochs. I have also trained the C3D model(without any changes) in official split1 of UCF101 and posted the results in #14. The 5% accuracy at 15 epochs that you reported does not comply with those results in #14.

@leftthomas
Copy link
Author

leftthomas commented Apr 8, 2019

@wave-transmitter I trained C3D with official split1 from scratch, not used the pre-trained model, and you could test the C3D model from scratch just change one line code in normalize function to frame = frame / 255.0, you will see the result.
In this repo, the input tensor values are lage value such as 233.7, -45.2, etc. it's not common in deep learning training period, it easily causes the value overflow problem, because the conventional ops are matrix multiplication in essential. This is why someone had proposed issues like NAN loss value.
mentioned in #17 . If you normalize the data to [0,1], you will see the NAN problem gone.

@jamshaidwarraich
Copy link

could you share the paper link.

@shanchao0906
Copy link

How should the code be modified?Training loss is always NAN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants