-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There seems to be a bug in benchmark.py. #9
Comments
Hi @Johnsonj0308 , thank you so much for your support to fix the evaluation. I will look for this issue and have an update |
Hello, I found in "paper with code" that your dice coefficient currently ranks first, but I also found such problems in benchmark.py. I hope you can correct them in time and give relevant answers, thank you! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Issue Description
Hello, I encountered an anomaly while using benchmark.py, where the execution speed during testing was unusually fast. Upon further investigation of benchmark.py, I identified a bug.
In your def benchmark() function, BATCH_SIZE is defaulted to 32, but when calling the benchmark function, BATCH_SIZE is not set to 1.
This results in the dataset's BATCH_SIZE being set to 1, while the model.evaluate(test_dataset, steps=steps_per_epoch) uses steps = len_data // 32 instead of len_data // 1.
Consequently, during testing, only a small amount of test data is read, and due to the absence of shuffle=False in build_dataset, the performance varies with each execution.
Fix
Set BATCH_SIZE to 1 in the def benchmark() function.
Set shuffle=False during the build_dataset step.
Model Weights
I experimented with three sets of model weights:
Among these, option 1 (using your provided Pretrained weights) performed the best.
Test Results Comparison (Kvasir)
Before Fix
dice_coeff: 0.9572
bce_dice_loss: 0.2784
IoU: 0.9183
zero_IoU: 0.9748
mean_squared_error: 0.0184
After Fix
dice_coeff: 0.9049
bce_dice_loss: 0.3448
IoU: 0.8481
zero_IoU: 0.9700
mean_squared_error: 0.0222
Example Usage of benchmark.py
The text was updated successfully, but these errors were encountered: