Skip to content

Latest commit

 

History

History

binning

3.2 Revisiting Stage 2 (Creating Data splits) -- Sec 3.2

This folder contains the method in which the data-splits are created. This procedure partions the count range into balanced strata (bins) using Bayesian optimality criterion (Sec 3). The procedure can be summarised as shown below.

Optimal Bins for different datasets are:

Dataset name Bins
NWPU [0.0, 0.5, 2.5, 5.5, 10.5, 18.5, 31.5, 53.5, 90.5, 152.5, 256.5, 429.5, 717.5, 1197.5, 1997.5, 3330.5, 5552.5, 9255.5, 15426.0]
UCF [49.0, 49.5, 51.5, 55.5, 62.5, 73.5, 92.5, 123.5, 175.5, 261.5, 405.5, 644.5, 1043.5, 1708.5, 2816.5, 4662.5, 7738.5, 12865.0]
STA [33.0, 33.5, 35.5, 38.5, 43.5, 51.5, 64.5, 85.5, 120.5, 178.5, 275.5, 436.5, 704.5, 1151.5, 1897.5, 3139.0]
STB [13.0, 13.5, 15.5, 19.5, 25.5, 36.5, 54.5, 83.5, 132.5, 214.5, 351.5, 578.0]
Algorithm Block 1 Algorithm block 2
algo algo2

Procedure for generating bins:

Step 1

First put your dataset txt in form of image_name,image_count in a .txt, for a demo you can look into the folder dataset_txt. As an example we have provided the txt files for NWPU,UCF-QNRF, STA and STB datasets.

Step 2

To start the training run

python train.py -d ./dataset_txt/Train_nwpu.txt -f multinomial

If you want to get bins on your dataset, change the -d argument to that path of your datasets information stored .txt format. The -f argument helps choose between the fitness functions (multinomial,poisson).

Step 3

After training (Step 2) there are best files generated in the select_best folder. To print out the top two best binning configurations, run the following code

python generate_bins.py  -d ./dataset_txt/Train_nwpu.txt -f multinomial

Here -d corresponds to dataset and -f corresponds to fitness function.

The folder structure of the codes in this folder is:

.
├── dataset_txt             # All files of dataset are in this folder.
│   ├── *.txt               # Dataset-wise txts 
│   └── ...
├── select_best             # After running the train.py file the best configs are stored in this (dataset-split-wise).
├── test_jsons              # While running the files generated are stored in here.
├── train_jsons             # While running the files generated are stored in here.
├── bayesian_blocks.py      # Modified version of bayesian blocks to fit our use case.
├── generate_bins.py        # This file is used to generate bins after training is over.
├── load_data.py            # For loading data
├── multinomial.py          # For multinomal distribution
├── poisson.py              # For poisson distribution
├── seeds_10.txt            # Fixing the seeds of the data splits.
├── select_best.py          # script that selects the best bin and is called in generate_bins.py
└── train.py                # For generating the bins split wise.