Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added GRU to achieve video consistency #14

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Jerry-Master
Copy link

First of all, your work is amazing!! I just want to make it clear that I absolutely love this result together with matte anything. However, the main problem for real applications of this type of models is the temporal inconsistency. Since you are applying the model image-wise is impossible to achieve such temporal consistency for videos. This pull request is an attempt to include all the features that made RobustVideoMatting temporally consistent, so that you can easily retrain and see if you solve the temporal inconsistency problem.

The main change is the addition of convolutional GRUs in the detail capture mode. To make it possible to reuse already trained models, I add the ConvGRU layers similar to how it was done in controlnet, by initializing at zero and creating residual connections. This way, you can share a hidden state across frames and so the model can achieve temporal consistency. Nevertheless, that is not enough, I have also added another loss function that explicitly guides the model in achieving temporal consistency.

All the code is more or less recycled from the RobustVideoMatting repository. To not break anything I have duplicated the affected files and added a '_video' suffix. The code is supposed to be backward compatible except for working with 5D tensors instead of 4D. I tried to integrate it as much as possible so that you can rapidly try this idea. However, I am aware that the difficult part of managing the data is not included in this pull request. You would need to download RobustVideoMatting dataset and train on there.

I will be more than glad to help with any doubt or contribute further if you give me directions on the hardware you use or the environment. I really want this model to have temporal consistency so that it can be used in real world applications.

@skyler14
Copy link

skyler14 commented Oct 4, 2023

First of all, your work is amazing!! I just want to make it clear that I absolutely love this result together with matte anything. However, the main problem for real applications of this type of models is the temporal inconsistency. Since you are applying the model image-wise is impossible to achieve such temporal consistency for videos. This pull request is an attempt to include all the features that made RobustVideoMatting temporally consistent, so that you can easily retrain and see if you solve the temporal inconsistency problem.

The main change is the addition of convolutional GRUs in the detail capture mode. To make it possible to reuse already trained models, I add the ConvGRU layers similar to how it was done in controlnet, by initializing at zero and creating residual connections. This way, you can share a hidden state across frames and so the model can achieve temporal consistency. Nevertheless, that is not enough, I have also added another loss function that explicitly guides the model in achieving temporal consistency.

All the code is more or less recycled from the RobustVideoMatting repository. To not break anything I have duplicated the affected files and added a '_video' suffix. The code is supposed to be backward compatible except for working with 5D tensors instead of 4D. I tried to integrate it as much as possible so that you can rapidly try this idea. However, I am aware that the difficult part of managing the data is not included in this pull request. You would need to download RobustVideoMatting dataset and train on there.

I will be more than glad to help with any doubt or contribute further if you give me directions on the hardware you use or the environment. I really want this model to have temporal consistency so that it can be used in real world applications.

I was wondering if you went ahead and trained a model with this or anything further occurred?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants