Allow redefinition of DATASET_CSV_FILE_NAME #403

kh296 · 2021-02-22T11:37:41Z

The scans and structures to be used in a training run are specified in an index file, the name of which is given by the variable InnerEye.ML.common.DATASET_CSV_FILE_NAME. This is hardcoded to dataset.csv.

When trying to improve model performance, it can sometimes be useful to consider only a subset of patients and/or structures. This can be achieved by overwriting dataset.csv, or by adding a differently named index file and hacking InnerEye/ML/common.py. It to be nice instead able to set the value of DATASET_CSV_FILE_NAME in the model definition.

AB#3785

ant0nsc · 2021-03-06T23:34:21Z

def test_custom_dataset_name(test_output_dirs: OutputFolderForTests) -> None:
    filename = "foo.csv"
    (test_output_dirs.root_dir / filename).write_text("""subject,channel,...
    """)
    config = SegmentationModelBase()
    config.local_dataset = test_output_dirs.root_dir
    config.custom_dataset_name = "foo.csv"
    dataframe = config.read_dataset_if_needed()
    assert dataframe is not None

kh296 · 2021-03-09T12:33:33Z

I've enabled user definition of the dataset csv file name by allowing the parameter dataset_csv to be set as part of model configuration. The parameter is initialised to DATASET_CSV_FILE_NAME in InnerEye/ML/config.py, then is used to locate the dataset csv file in InnerEye/ML/config.py, in InnerEye/ML/run_ml.py and in InnerEye/ML/utils/ml_util.py. I've added a unit test to Tests/ML/test_config_helpers.py.

The above seems to be all that's needed to set custom dataset csv file names for model training. However, I've seen that DATASET_CSV_FILE_NAME is used for file location in a number of other places, for example InnerEye/ML/baselines_util.py and InnerEye/ML/normalize_and_visualize_dataset.py.

Should I submit a pull request for the changes made, in that they provide the functionality requested in the Issue submission, or should I first try to avoid DATASET_CSV_FILE_NAME being used anywhere, except as a fallback value?

ant0nsc · 2021-03-09T13:48:06Z

@kh296 please do a pull request with the changes you described. As far as I can see now, the other uses of DATASET_CSV_FILE_NAME are fine. Once I see the PR, I'll be able to see more clearly anyway.

ant0nsc added the triaged An item on Azure Boards has been created and prioritized label Mar 9, 2021

kh296 mentioned this issue Mar 9, 2021

Added possibility to set name of dataset csv as part of model configu… #412

Merged

ant0nsc linked a pull request Mar 10, 2021 that will close this issue

Added possibility to set name of dataset csv as part of model configu… #412

Merged

ant0nsc closed this as completed in #412 Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow redefinition of DATASET_CSV_FILE_NAME #403

Allow redefinition of DATASET_CSV_FILE_NAME #403

kh296 commented Feb 22, 2021 •

edited by azure-boards bot

Loading

ant0nsc commented Mar 6, 2021

kh296 commented Mar 9, 2021

ant0nsc commented Mar 9, 2021

Allow redefinition of DATASET_CSV_FILE_NAME #403

Allow redefinition of DATASET_CSV_FILE_NAME #403

Comments

kh296 commented Feb 22, 2021 • edited by azure-boards bot Loading

ant0nsc commented Mar 6, 2021

kh296 commented Mar 9, 2021

ant0nsc commented Mar 9, 2021

kh296 commented Feb 22, 2021 •

edited by azure-boards bot

Loading