Skip to content

Latest commit

 

History

History
 
 

paintskills

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Visual Reasoning Skill Evaluation on PaintSkills

teaser image

Dataset Setup

  1. Create $paintskills_dir directory.
  2. From the Google Drive link, download metadata.json and three skill directories: object/, count/, spatial/, inside $paintskills_dir.
  • The $paintskills_dir directory has hierarchy as below:
$paintskills_dir/
    # skill name (i.e., object, count, and spatial)
    {skill}/

        # Scene configuration
        scenes/
            {skill}_train.json
            {skill}_val.json

        # GT Images (from {skill}/images.zip)
        images/

        # Bounding box annotations (for DETR finetuning)
        {skill}_train_bounding_boxes.json
        {skill}_val_bounding_boxes.json

    # metadata for all skills.
    metadata.json

Scene Configuration

The scene configuration files (scenes/{skill}_{split}.json) have the following structure, where skill is one of object, count, spatial, and split is one of train, val.

e.g., count_val.json

{
    "data": [
        {
            "id": "count_val_00000",
            "scene": "HDR-KirbyCove",
            "text": "1 person",
            "skill": "count",
            "split": "val",
            "objects": [
                {
                    "id": 0,
                    "shape": "humanJosh",
                    "coconame": "person",
                    "color": "plain",
                    "relation": null,
                    "scale": 14.114588410729079,
                    "texture": "plain",
                    "rotation": null,
                    "state": "sitting"
                }
            ]
        },
        ...
    ]
}

Evaluation of Text2Img models with DETR

  1. Generate the skill-specific images in $image_dir from captions (text field in the scene data) with your text-to-image generation models (finetuned on PaintSkills). The evaluation scripts expects that the generated images have filenames in the format of image_{datum['id']}.png. For example, if the datum['id'] is count_val_00000, the filename should be image_count_val_00000.png.

  2. Run the evaluation script

skill='object' # switch to other skills (choices=['object', 'count', 'spatial'])
image_dir='/path/to/generated/images'
bash scripts/evaluate_skill_FT_DETR-R101-DC5.sh \
    --skill_name $skill \
    --paintskills_dir $paintskills_dir \
    --image_dir $image_dir \

(Optional) 3D simulator

Please see https://github.com/aszala/PaintSkills-Simulator for our 3D Simulator implementation.

teaser image

(Optional) Evaluation on GT images

skill='object' # count, spatial
bash scripts/evaluate_skill_FT_DETR-R101-DC5.sh \
    --skill_name $skill \
    --gt_data_eval \
    --paintskills_dir $paintskills_dir