Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layout.get not returning subject directory paths for datasets with no session layer #978

Open
alyssadai opened this issue Apr 10, 2023 · 3 comments
Labels

Comments

@alyssadai
Copy link

Hi there,

I would like to use layout.get to get subject-level directory paths for BIDS datasets, but am getting unexpected results for datasets that do not have a session layer (but still have imaging data), e.g. bids-examples dataset "ds003":

ds003/
├── sub-01/
│   ├── anat/
│   └── func/
└── ...

Commands used to load the dataset and try to fetch the directory path for a specific subject:

>>> layout = BIDSLayout("bids-examples/ds003", validate=True)
>>> layout
BIDS Layout: ...\bagelbids\bids-examples\ds003 | Subjects: 13 | Sessions: 0 | Runs: 0
>>> layout.get(return_type="id", target="subject")  # double check that subject IDs are able to be parsed
['12', '08', '10', '13', '04', '05', '07', '11', '03', '02', '06', '09', '01']
>>> layout.get(subject="01", target="subject", return_type="dir")  # ISSUE: returns an empty list
[]

In the last line, using target="subject", return_type="dir" returns an empty list as opposed to a list of paths, even though pybids appears to be recognizing that there is subject-level data. This issue persists even when subject isn't specified in layout.get.

Strangely, not all the bids-examples datasets which are missing a session layer cause this behaviour. For example I've noticed it for ds003 and eeg_ds000117, but eeg_cbm returns the subject paths as expected:

>>> layout = BIDSLayout("bids-examples/eeg_cbm", validate=True)
>>> layout.get(subject="cbm001", target="subject", return_type="dir")
['D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm001']
>>> layout.get(target="subject", return_type="dir")
['D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm001', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm002', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm003', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm004', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm005', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm006', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm007', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm008', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm009', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm010', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm011', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm012', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm013', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm014', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm015', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm016', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm017', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm018', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm019', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm020']

Not sure if this is a bug or if the inconsistent behavior is due to specific differences in the dataset structure. Any help on this would be much appreciated!

@effigies
Copy link
Collaborator

effigies commented Apr 11, 2023

To be honest, I'm surprised anybody uses return_type='dir'. Looking at the code, it's extracted in a pretty baroque way because we index files, not directories.

I'm not sure that we want to support this long-term, so it might be best to take another approach. What about:

subject_dirs = [Path(layout.root) / f'sub-{subject}' for subject in layout.get_subjects()]

@alyssadai
Copy link
Author

alyssadai commented Jun 19, 2023

Hi @effigies, thanks again for your advice on this. We ended up going with your suggested method to extract session / subject directories.

Just letting you know that we also noticed during experimenting that layout.get(..., return_type="dir") has issues fetching the path of a given session when a subject has exactly one session:

import bids
layout = bids.BIDSLayout("bids-examples/ieeg_motorMiller2007")
layout.get(subject="cc", session="01", target="session", return_type="dir")
Out[]: []

# BUT:
layout.get_sessions(subject="cc")
Out[]: ['01']

This does seem to reinforce your statement that return_type='dir' isn't the most reliable for these use cases.

In light of this, would it make sense to update the docs (https://bids-standard.github.io/pybids/examples/pybids_tutorial.html#other-return-type-values) to either remove reference to this parameter, or warn about its usage?

Can also open another issue for the docs update if that would be helpful.

@effigies
Copy link
Collaborator

Yes, I think it would be a good idea to discourage use of this option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants