Skip to content

A novel image harmonization method based on Implicit Neural Representation.

License

Notifications You must be signed in to change notification settings

WindVChen/INR-Harmonization

Repository files navigation

Share us a ⭐ if this repo does help

This repository is the official implementation of HINet (or INR-Harmonization), which can achieve Arbitrary aspect ratio & Arbitrary resolution image harmonization. The newest version of the paper can be accessed in [IEEE], the previous version of the paper can be accessed in [arXiv]. (Accepted by TCSVT 2023)

If you encounter any question, please feel free to contact us. You can create an issue or just send email to me [email protected]. Also welcome for any idea exchange and discussion.

Updates

[10/15/2023] Our method now also generalizes well to video harmonization! We have released the training and inference code for video harmonization! πŸš€πŸš€ Also with the pretrained weight on HYouTube dataset. More details please refer to Requirements, Training, Evaluation, and Results parts. Feel free to play with it! πŸ₯³ What's more, this paper is finally accepted by TCSVT. πŸ‘‹ You can find the newest version of paper (with more new results and experiments) here. For the previous version, please refer to here.

[07/21/2023] We achieve that!πŸŽ‰πŸŽ‰ With all TODOs complete! Try here for our Huggingface Demo!! You can also download this repository, and run the GUI locally (refer to cmd here)!πŸ₯³πŸ₯³

[07/19/2023] Hi everyone! We have added two new inference scripts: efficient_inference_for_square_image.py where you can achieve quite fast speed on harmonizing a square image! And inference_for_arbitrary_resolution_image.py where you can harmonize any resolution image (2K, 4k, 8k, JUST WHATEVER YOU WANT!!). Please check them out!πŸ˜‰πŸ˜‰

A summary of features of different inference strategies (More information please refer to Inference):

Features efficient_inference_for_square_image.py inference_for_arbitrary_resolution_image.py
Support Arbitrary Image ❌ (Only squre image) βœ… (Arbitrary aspect ratio, Arbitrary resolution!!!)
Speed πŸš€ (Quite fast) 🚌 (Relatively slower than the left one)
Memory cost 🌲 (Quite low) 🏭 (Relatively higher than the left one for the same resolution)

[07/18/2023] Check out our new work Diff-Harmonization, which is a Zero-Shot Harmonization method based on Diffusion Models!😊

[07/17/2023] Pretrained weights have been released. Feel free to try that!πŸ‘‹πŸ‘‹

[07/16/2023] The code is initially public. πŸ₯³

[03/06/2023] Source code and pretrained models will be publicly accessible.

TODO

  • Initial code release.
  • Add pretrained model weights.
  • Add the efficient splitting strategy for inferencing on original resolution images.
  • Add Gradio demo.

Table of Contents

Visualization GUI

We provide a GUI based on Gradio for visualizing the intermediate results of our method. You can run the following command to start it locally, or make use of our provided Huggingface Space.

python app.py

Abstract

HINet's framework

High-resolution (HR) image harmonization is of great significance in real-world applications such as image synthesis and image editing. However, due to the high memory costs, existing dense pixel-to-pixel harmonization methods are mainly focusing on processing low-resolution (LR) images. Some recent works resort to combining with color-to-color transformations but are either limited to certain resolutions or heavily depend on hand-crafted image filters. In this work, we explore leveraging the implicit neural representation (INR) and propose a novel image Harmonization method based on Implicit neural Networks (HINet), which to the best of our knowledge, is the first dense pixel-to-pixel method applicable to HR images without any hand-crafted filter design. Inspired by the Retinex theory, we decouple the MLPs into two parts to respectively capture the content and environment of composite images. A Low-Resolution Image Prior (LRIP) network is designed to alleviate the Boundary Inconsistency problem, and we also propose new designs for the training and inference process. Extensive experiments have demonstrated the effectiveness of our method compared with state-of-the-art methods. Furthermore, some interesting and practical applications of the proposed method are explored.

Requirements

  1. Software Requirements

    • Python: 3.8
    • CUDA: 11.3
    • cuDNN: 8.4.1

    To install other requirements:

    pip install -r requirements.txt
    
  2. Datasets

    • We train and evaluate on the iHarmony4 dataset. Please download the dataset in advance, and arrange them into the following structure:
    β”œβ”€β”€ dataset_path
       β”œβ”€β”€ HAdobe5k
          β”œβ”€β”€ composite_images
          β”œβ”€β”€ masks
          β”œβ”€β”€ real_images
       β”œβ”€β”€ HCOCO
       β”œβ”€β”€ Hday2night
       β”œβ”€β”€ HFlickr
       IHD_test.txt
       IHD_train.txt
    
    • Before training we resize HAdobe5k subdataset so that each side is smaller than 1024. This is for quick data loading. The resizing script can refer to resize_Adobe.py.

    • For training or evaluating on the original resolution of iHarmony4 dataset. Please newly create a HAdobe5kori directory with the original HAdobe5k images in it.

    • If you want to train and evaluate only on HAdobe5k subdataset (see Table 1 in the paper), you can modify the IHD_train.txt and IHD_test.txt in train.py to only contain the HAdobe5k images.

    • (Advanced) We also support training on the video harmonization dataset. For this, you can download the video harmonization dataset from HYouTube, then you need to modify the IHD_train.txt and IHD_test.txt in train.py to train_list.txt and test_list.txt.

  3. Pre-trained Models

    • We adopt HRNetV2 as our encoder, you can download the weight from here and save the weight in pretrained_models directory.
    • In the following table, we provide several model weights pretrained under different resolutions (Correspond to Table 1 in the paper). We also provide the weight pretrained on HYouTube video harmonization dataset:

(Note that the provided model weights also contain other information, like the optimizer state, the epoch number, etc. Thus, it is larger than the actual model size.)

Download Link Model Descriptions
Resolution_RAW_iHarmony4.pth Train by RSC strategy with original resolution iHarmony4 dataset
Resolution_256_iHarmony4.pth Train with 256*256 resolution iHarmony4 dataset
Resolution_RAW_HAdobe5K.pth Train by RSC strategy with original resolution HAdobe5k subdataset
Resolution_2048_HAdobe5K.pth Train by RSC strategy with 2048*2048 resolution HAdobe5k subdataset
Resolution_1024_HAdobe5K.pth Train by RSC strategy with 1024*1024 resolution HAdobe5k subdataset
Video_HYouTube_256.pth Train on HYouTube video harmonization dataset with 256*256 resolution

Training

The intermediate output (including checkpoint, visualization, log.txt) will be saved in directory logs/exp.

Train in low resolution (LR) mode

python train.py --dataset_path {dataset_path} --base_size 256 --input_size 256 --INR_input_size 256
  • dataset_path: the path of the iHarmony4 dataset.

  • base_size: the size of the input image to encoder.

  • input_size: the size of the target resolution.

  • INR_input_size: the size of the input image to the INR decoder.

  • hr_train: whether to train in high resolution (HR) mode, i.e., using RSC strategy (See Section 3.4 in the paper).

  • isFullRes: whether to train in full/original resolution mode.

  • (More parameters' information could be found in codes ...)

(Advanced) If you want to train on HYouTube dataset, you need to replace processing.py and build_dataset.py with the same-name files in Video_Harmonization directory. Other settings are the same.

Train in high resolution (HR) mode (E.g, 2048x2048)

If not use RSC strategy, the training command is as follows: (For a single RTX 3090, it will lead to out-of-memory even batch_size is set to 2.)

python train.py --dataset_path {dataset_path} --base_size 256 --input_size 2048 --INR_input_size 2048

If use RSC strategy, the training command is as follows: (For a single RTX 3090, batch_size can set up to 6.)

python train.py --dataset_path {dataset_path} --base_size 256 --input_size 2048 --INR_input_size 2048 --hr_train

Train in original resolution mode

python train.py --dataset_path {dataset_path} --base_size 256 --hr_train --isFullRes

Evaluation

The intermediate output (including visualizations, log.txt) will be saved in directory logs/test.

Notice: Due to the resolution-agnostic characteristic of INR, you can evaluate dataset at any resolution not matter which resolution the model is trained on. Please refer to Table 4 and Table 5 in the paper.

Evaluation in low resolution (LR) mode

python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --input_size 256 --INR_input_size 256

(Advanced) If you want to evaluate on HYouTube dataset, you need to replace inferece.py with either inference_normal.py or inference_3D_LUT_Interpolation.py in Video_Harmonization directory. Other settings are the same. inference_3D_LUT_Interpolation.py is a faster version of inference_normal.py which we leverage our proposed 3D LUT interpolation strategy. You can refer to Section V-C for more details.

Evaluation in high resolution (HR) mode (E.g, 2048x2048)

python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --input_size 2048 --INR_input_size 2048

Evaluation in original resolution mode

python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --hr_train --isFullRes

Inference

We have provided demo images (2K and 6K) in demo. Feel free to play around them.

Notice: Due to the resolution-agnostic characteristic of INR, you can inference images at any resolution not matter which resolution the model is trained on. Please refer to Table 4 and Table 5 in the paper.

Inference on square images (fast & low cost)

If you want to inference on square images, please use the command here. Note that this code only support square images with resolution of multiplies of 256. Some other requirements will be listed in cmd prints (if error) when you run the code.

python efficient_inference.py --split_resolution {split_resolution} --composite_image {composite_image_path} --mask {mask_path} --save_path --{save_path} --pretrained {pretrained_weight}
  • split_resolution: the resolution of the split patches. (E.g., 512 means the input image will be split into 512x512 patches.) These patches will finally be assembled back to the resolution of the original image.
  • composite_image: the path of the composite image. You can try with the provided images in demo.
  • mask: the path of the mask. You can try with the provided masks in demo.
  • save_path: the path of the output image.
  • pretrained: the path of the pretrained weight.

Inference on arbitrary resolution images (slow, high cost, but support any resolution)

If the former inference script cannot meet your needs and you want to inference on arbitrary resolution images, please use the command here. Note that this script will be slower and cost more memory for a same resolution (But anyway, it supports arbitrary resolution).

If you encounter out-of-memory error, please try to reduce the split_num parameter below. (Our script will also have some prints that can guide you to do this)

python inference_for_arbitrary_resolution.py --split_num {split_num} --composite_image {composite_image_path} --mask {mask_path} --save_path --{save_path} --pretrained {pretrained_weight}
  • split_num: the number of splits for the input image. (E.g., 4 means the input image will be split into 4x4=16 patches.)
  • composite_image: the path of the composite image. You can try with the provided images in demo.
  • mask: the path of the mask. You can try with the provided masks in demo.
  • save_path: the path of the output image.
  • pretrained: the path of the pretrained weight.

Results

Metrics Metrics Metrics Visual comparisons Visual comparisons2

Citation & Acknowledgments

If you find this paper useful in your research, please consider citing:

@article{10285123,
  author={Chen, Jianqi and Zhang, Yilan and Zou, Zhengxia and Chen, Keyan and Shi, Zhenwei},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Dense Pixel-to-Pixel Harmonization via Continuous Image Representation}, 
  year={2023},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TCSVT.2023.3324591}}

License

This project is licensed under the Apache-2.0 license. See LICENSE for details.