Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derivatives datasets with the raw data in sourcedata produce double results #579

Open
effigies opened this issue Jan 30, 2020 · 4 comments
Labels

Comments

@effigies
Copy link
Collaborator

This is basically #557 again. The problem is that the contents of sourcedata/ are indexed.

@tyarkoni
Copy link
Collaborator

sourcedata/ is included in the _default_ignore class variable, so in principle it shouldn't be indexed. The problem is that _default_ignore is only used if the user doesn't pass ignore explicitly, and I imagine maybe you're trying to use it here (if not, then it's a straight-up bug).

It's not immediately obvious to me what a sensible way to deal with this is. We don't want to always ignore the defaults, but we also don't want to force the user to have to exclude all of them manually just to get the same behavior. We could introduce a special string value, or add a flag for this, but there are already so many arguments... WDYT?

@tyarkoni
Copy link
Collaborator

Actually, I don't think it's a huge deal for the user to manually add ["code", "stimuli", "sourcedata", "models"] to ignore... is there a reason that won't work here? The ignore arg is passed through to derrivative initialization, so if the issue is that it should apply only to the base project but not derivatives, I don't really see a good way around that. In that kind of situation I think the solution is not to rely on automatic derivative ingestion, and instead explicitly call add_derivatives.

@effigies
Copy link
Collaborator Author

No, I'm not passing an ignore variable. I do the following:

layout = BIDSLayout('/data/bids/ds003_fmriprep/sourcedata', derivatives='/data/bids/ds003_fmriprep')

I can add sourcedata to ignore, which seems fine. It just seems weird that we avoid indexing derivatives/ but not sourcedata/.

@tyarkoni
Copy link
Collaborator

Ah, then it looks like a bug. Not clear to me what's happening, as ignore is passed directly through to add_derivatives. I initially thought maybe the paths were being expanded before add_derivatives() was called, but the expanded paths are stored in self.ignore, so ignore shouldn't mutate as it passes through.

The reason we still manage to avoid indexing derivatives/ is that the check for that is hardcoded into the indexing routine, because you never want to index derivatives (from a base project, I mean). The sourcedata exclusion is evaluated separately, and is presumably failing somehow.

@tyarkoni tyarkoni added the bug label Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants