There seems to be a bug in benchmark.py. #9

Johnsonj0308 · 2024-03-12T05:13:15Z

Issue Description

Hello, I encountered an anomaly while using benchmark.py, where the execution speed during testing was unusually fast. Upon further investigation of benchmark.py, I identified a bug.

In your def benchmark() function, BATCH_SIZE is defaulted to 32, but when calling the benchmark function, BATCH_SIZE is not set to 1.
This results in the dataset's BATCH_SIZE being set to 1, while the model.evaluate(test_dataset, steps=steps_per_epoch) uses steps = len_data // 32 instead of len_data // 1.
Consequently, during testing, only a small amount of test data is read, and due to the absence of shuffle=False in build_dataset, the performance varies with each execution.

Fix

Set BATCH_SIZE to 1 in the def benchmark() function.
Set shuffle=False during the build_dataset step.

Model Weights

I experimented with three sets of model weights:

Pretrained weights provided by you.
Training MetaPolyp from scratch.
Using Pretrained weights and training for 350 epochs.

Among these, option 1 (using your provided Pretrained weights) performed the best.

Test Results Comparison (Kvasir)

Before Fix

dice_coeff: 0.9572
bce_dice_loss: 0.2784
IoU: 0.9183
zero_IoU: 0.9748
mean_squared_error: 0.0184

After Fix

dice_coeff: 0.9049
bce_dice_loss: 0.3448
IoU: 0.8481
zero_IoU: 0.9700
mean_squared_error: 0.0222

Example Usage of benchmark.py

# from save_model.pvt_CAM_channel_att_upscale import build_model
import os
import tensorflow as tf
# from metrics.metrics_last import  iou_metric, MAE, WFbetaMetric, SMeasure, Emeasure,  dice_coef, iou_metric
from metrics.segmentation_metrics import dice_coeff, bce_dice_loss, IoU, zero_IoU, dice_loss
from dataloader.dataloader import build_augmenter, build_dataset, build_decoder
from tensorflow.keras.utils import get_custom_objects
from model import build_model #### from model_research import build_model

os.environ["CUDA_VISIBLE_DEVICES"]="0"

def load_dataset(route):
    X_path = '{}/images/'.format(route)
    Y_path = '{}/masks/'.format(route)
    X_full = sorted(os.listdir(f'{route}/images'))
    Y_full = sorted(os.listdir(f'{route}/masks'))

    X_train = [X_path + x for x in X_full]
    Y_train = [Y_path + x for x in Y_full]

    test_decoder = build_decoder(with_labels=False, target_size=(img_size, img_size), ext='jpg', 
                                segment=True, ext2='jpg')
    test_dataset = build_dataset(X_train, Y_train, bsize=BATCH_SIZE, decode_fn=test_decoder, 
                                augmentAdv=False, augment=False, augmentAdvSeg=False, shuffle=False, cache=False)
    return test_dataset, len(X_train)

def benchmark(route, model, BATCH_SIZE = 1, save_file_name = "benchmark_result.txt"):
    list_of_datasets = os.listdir(route)
    f = open(save_file_name,"a")
    f.write("\n")
    for datasets in list_of_datasets:
        print(datasets, ":")
        test_dataset, len_data = load_dataset(os.path.join(route,datasets))
        steps_per_epoch = len_data // BATCH_SIZE
        loss, dice_coeff, bce_dice_loss, IoU, zero_IoU, mae = model.evaluate(test_dataset, steps=steps_per_epoch)
        f.write("{}:".format(datasets))
        f.write("dice_coeff: {}, bce_didce_loss: {}, IoU: {}, zero_IoU: {}, mae: {}".format(dice_coeff, bce_dice_loss, IoU, zero_IoU, mae))
        f.write('\n')

if __name__ == "__main__":
    img_size = 256
    BATCH_SIZE = 1
    SEED = 1024
    save_path = "pretrained_model.h5"
    route_data = "./TestDataset/"
    path_to_test_dataset = "./MetaPolyp_Dataset/TestDataset/" 
    model = build_model(img_size)
    model.load_weights(save_path)

    model.compile(metrics=[dice_coeff, bce_dice_loss, IoU, zero_IoU, tf.keras.metrics.MeanSquaredError()])
    
    benchmark(path_to_test_dataset, model)

The text was updated successfully, but these errors were encountered:

huyquoctrinh · 2024-03-18T02:54:16Z

Hi @Johnsonj0308 , thank you so much for your support to fix the evaluation. I will look for this issue and have an update

wwweeeeiii · 2024-04-12T13:18:07Z

Hello, I found in "paper with code" that your dice coefficient currently ranks first, but I also found such problems in benchmark.py. I hope you can correct them in time and give relevant answers, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There seems to be a bug in benchmark.py. #9

There seems to be a bug in benchmark.py. #9

Johnsonj0308 commented Mar 12, 2024

huyquoctrinh commented Mar 18, 2024

wwweeeeiii commented Apr 12, 2024

There seems to be a bug in benchmark.py. #9

There seems to be a bug in benchmark.py. #9

Comments

Johnsonj0308 commented Mar 12, 2024

Issue Description

Fix

Model Weights

Test Results Comparison (Kvasir)

Before Fix

After Fix

Example Usage of benchmark.py

huyquoctrinh commented Mar 18, 2024

wwweeeeiii commented Apr 12, 2024