Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapsManager uses a different CAPS folder in predict than the one provided by the user #591

Open
NicolasGensollen opened this issue May 23, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@NicolasGensollen
Copy link
Member

Working on #581 I'm getting the following kind of failures on the predict tests:

E           clinicadl.utils.exceptions.ClinicaDLCAPSError: The CAPS directory you gave is not a folder.
E           Error explanations:
E           	- Clinica expected the following path to be a folder: /mnt/data/clinicadl_data_ci/data_ci/predict/in/caps_random
E           	- If you gave relative path, did you run Clinica on the good folder?

I was very surprised to see the folder /mnt/data/clinicadl_data_ci/data_ci/ popping up in the error message since I never provided such a folder. Indeed, the command that the test executes is:

poetry run pytest --verbose \
          --junitxml=./test-reports/test_predict_report.xml \
          --disable-warnings \
          --verbose \
          --basetemp=$HOME/tmp/predict \
          --input_data_directory=/mnt/data/data_ci \
          test_predict.py

So, the input data directory is /mnt/data/data_ci (which is a symlink to /builds/GitRepos/clinicadl_data_ci/data_ci where I have put the CI data on the new Ubuntu VM that we use for the CI).

After a little bit of digging, the problem seems to happen in the predict method of the MapsManager. This method receives the caps_directory as input:

def predict(
self,
data_group: str,
caps_directory: Path = None,

But later on, it uses a different variable when calling return_dataset:

data_test = return_dataset(
group_parameters["caps_directory"],
group_df,

This group_parameters variable is created here:

group_df, group_parameters = self.get_group_info(data_group, split)

In this get_group_info function, this piece of code is loading the value of the CAPS directory to use from a JSON file:

df = pd.read_csv(group_path / "data.tsv", sep="\t")
json_path = group_path / "maps.json"
from clinicadl.utils.preprocessing import path_decoder
with json_path.open(mode="r") as f:
parameters = json.load(f, object_hook=path_decoder)
return df, parameters

In my case, the JSON file getting loaded is /builds/GitRepos/clinicadl_data_ci/data_ci/predict/in/maps_roi_ae/groups/test-RANDOM/maps.json and it contains:

{
    'caps_directory': PosixPath('/mnt/data/clinicadl_data_ci/data_ci/predict/in/caps_random'),
    'multi_cohort': False
}

In other words, this JSON file is in the CI data themselves, and it hardcodes a CAPS folder which is not the one I provided but rather one that existed on a different machine when this data was generated.

I don't mind creating a symlink /mnt/data/clinicadl_data_ci which would point to the same location as /mnt/data/data_ci since this should theoretically make the tests pass, but I believe this behavior is unexpected and very confusing (at least to me...).

Is there a reason why a different input folder than the one the user provides is used in the end ?

@NicolasGensollen NicolasGensollen added the bug Something isn't working label May 23, 2024
@thibaultdvx
Copy link
Collaborator

thibaultdvx commented May 23, 2024

Hi @NicolasGensollen, thanks for reporting.

Sorry you had trouble with CI data. Indeed, it is very confusing:

A MapsManager is created with a path of a MAPS folder. In the test, the MAPS path is input_data_directory/predict/in/maps_{some suffix} (so in your case /mnt/data/data_ci/predict/in/maps_{some suffix}). Then MapsManager will read the maps.json file inside this folder to get the path of the CAPS folder (to have an example, look at /mnt/datadata_ci/predict/in/maps_image_cnn/maps.json).

And here comes the issue: since the paths inside maps.json are hard coded, you must change them if you use a folder organization different from the one on the CI machine.

To sum up, you can either put CI data in /mnt/data/clinicadl_data_ci/data_ci rather than in /mnt/data/data_ci, or change every occurence of /mnt/data/clinicadl_data_ci/data_ci to /mnt/data/data_ci in CI data...

@thibaultdvx
Copy link
Collaborator

thibaultdvx commented May 23, 2024

I agree that it is really annoying because you must change the data if you run the tests on another device...
And relative paths won't solve the issue because it depends on the relative organization of your data_ci folder and your clinicadl folder (or the folder you run the tests from).

We are looking for a solution. If you have any idea, please feel free to suggest!

@thibaultdvx
Copy link
Collaborator

And please tell me if it is not clear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants