-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow to access datasets hosted on NDA #3710
Comments
Note that "metadata", even file names in NDA might be leaking sensitive information (dates, subject ids such as GUIDs etc), so such dataset might not be shareable openly, and only within the group which got initial permissions from NDA. I am not sure if NDA provides permissions to "wide" groups such as an entire research center. |
Can you please clarify what would prevent me from "getting" the S3 URL? They seem ti be contained in a metadata table that is left behind be ndatool |
IIRC those would be short-term lived (either url itself or a "bundle" bucket)... once again -- I might be wrong, haven't tried myself. wasn't yet granted any access to NDA (recently) to try myself |
OK, thx. It wasn't clear from your original post that any S3 URL is temporary. |
I think this can be closed. |
@yarikoptic What would be a sensible workflow to access dataset hosted on NDA as a DataLad dataset? In particular access to datasets for which dedicated data usage permission has been (or has to be) obtained, and that are comprised of more than just imaging data hosted on S3 (e.g. clinical assessments coming from some other dataset).
What about this?
ndatool
(https://github.com/NDAR/nda-tools) throughdatalad run
with the request number obtained through the standard NDA application process. This will download all files from S3, and make the necessary requests to also obtain all other datafiles.The outcome is a dataset that represents any NDA dataset in its raw form (defined as whatever
ndatool
is doing). This dataset can be subsequently normalized with tools like https://github.com/psychoinformatics-de/datalad-hirni by adding more required metadata, or using additional helpers to extract this information from the NDA-provided metadata.ZIP files with DICOMs tracked in the dataset after the initial
ndatool
run, could be fed todatalad import-dcm
. It would make sense to me to implement a metadata extractor for NDA metadata that ends up in a dataset in this way and format, such that things like datalad hirni can query for such metadata in order to better and less manual do their job.Ping @loj @bpoldrack
The text was updated successfully, but these errors were encountered: