Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There seems to be a bug in benchmark.py. #9

Open
Johnsonj0308 opened this issue Mar 12, 2024 · 2 comments
Open

There seems to be a bug in benchmark.py. #9

Johnsonj0308 opened this issue Mar 12, 2024 · 2 comments

Comments

@Johnsonj0308
Copy link

Issue Description

Hello, I encountered an anomaly while using benchmark.py, where the execution speed during testing was unusually fast. Upon further investigation of benchmark.py, I identified a bug.

In your def benchmark() function, BATCH_SIZE is defaulted to 32, but when calling the benchmark function, BATCH_SIZE is not set to 1.
This results in the dataset's BATCH_SIZE being set to 1, while the model.evaluate(test_dataset, steps=steps_per_epoch) uses steps = len_data // 32 instead of len_data // 1.
Consequently, during testing, only a small amount of test data is read, and due to the absence of shuffle=False in build_dataset, the performance varies with each execution.

Fix

Set BATCH_SIZE to 1 in the def benchmark() function.
Set shuffle=False during the build_dataset step.

Model Weights

I experimented with three sets of model weights:

  1. Pretrained weights provided by you.
  2. Training MetaPolyp from scratch.
  3. Using Pretrained weights and training for 350 epochs.

Among these, option 1 (using your provided Pretrained weights) performed the best.

Test Results Comparison (Kvasir)

Before Fix

dice_coeff: 0.9572
bce_dice_loss: 0.2784
IoU: 0.9183
zero_IoU: 0.9748
mean_squared_error: 0.0184

After Fix

dice_coeff: 0.9049
bce_dice_loss: 0.3448
IoU: 0.8481
zero_IoU: 0.9700
mean_squared_error: 0.0222

Example Usage of benchmark.py

# from save_model.pvt_CAM_channel_att_upscale import build_model
import os
import tensorflow as tf
# from metrics.metrics_last import  iou_metric, MAE, WFbetaMetric, SMeasure, Emeasure,  dice_coef, iou_metric
from metrics.segmentation_metrics import dice_coeff, bce_dice_loss, IoU, zero_IoU, dice_loss
from dataloader.dataloader import build_augmenter, build_dataset, build_decoder
from tensorflow.keras.utils import get_custom_objects
from model import build_model #### from model_research import build_model

os.environ["CUDA_VISIBLE_DEVICES"]="0"

def load_dataset(route):
    X_path = '{}/images/'.format(route)
    Y_path = '{}/masks/'.format(route)
    X_full = sorted(os.listdir(f'{route}/images'))
    Y_full = sorted(os.listdir(f'{route}/masks'))

    X_train = [X_path + x for x in X_full]
    Y_train = [Y_path + x for x in Y_full]

    test_decoder = build_decoder(with_labels=False, target_size=(img_size, img_size), ext='jpg', 
                                segment=True, ext2='jpg')
    test_dataset = build_dataset(X_train, Y_train, bsize=BATCH_SIZE, decode_fn=test_decoder, 
                                augmentAdv=False, augment=False, augmentAdvSeg=False, shuffle=False, cache=False)
    return test_dataset, len(X_train)

def benchmark(route, model, BATCH_SIZE = 1, save_file_name = "benchmark_result.txt"):
    list_of_datasets = os.listdir(route)
    f = open(save_file_name,"a")
    f.write("\n")
    for datasets in list_of_datasets:
        print(datasets, ":")
        test_dataset, len_data = load_dataset(os.path.join(route,datasets))
        steps_per_epoch = len_data // BATCH_SIZE
        loss, dice_coeff, bce_dice_loss, IoU, zero_IoU, mae = model.evaluate(test_dataset, steps=steps_per_epoch)
        f.write("{}:".format(datasets))
        f.write("dice_coeff: {}, bce_didce_loss: {}, IoU: {}, zero_IoU: {}, mae: {}".format(dice_coeff, bce_dice_loss, IoU, zero_IoU, mae))
        f.write('\n')

if __name__ == "__main__":
    img_size = 256
    BATCH_SIZE = 1
    SEED = 1024
    save_path = "pretrained_model.h5"
    route_data = "./TestDataset/"
    path_to_test_dataset = "./MetaPolyp_Dataset/TestDataset/" 
    model = build_model(img_size)
    model.load_weights(save_path)

    model.compile(metrics=[dice_coeff, bce_dice_loss, IoU, zero_IoU, tf.keras.metrics.MeanSquaredError()])
    
    benchmark(path_to_test_dataset, model)

@huyquoctrinh
Copy link
Owner

Hi @Johnsonj0308 , thank you so much for your support to fix the evaluation. I will look for this issue and have an update

@wwweeeeiii
Copy link

Hello, I found in "paper with code" that your dice coefficient currently ranks first, but I also found such problems in benchmark.py. I hope you can correct them in time and give relevant answers, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants