Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDA crawler fails with error #56

Open
loj opened this issue Sep 24, 2019 · 8 comments
Open

NDA crawler fails with error #56

loj opened this issue Sep 24, 2019 · 8 comments

Comments

@loj
Copy link

loj commented Sep 24, 2019

I'm attempting to use DataLad's NDA crawler for a dataset I'm trying to download, but I'm running into problems. Following the instructions in the datalad crawler docs, I ran the following:

$ datalad create -c text2git nda_crawler
[INFO   ] Creating a new annex repo at /data/BnB_USER/loj/downloads/nda_crawler 
[INFO   ] Running procedure cfg_text2git                                                                                                                                                       
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) ===== 
create(ok): /data/BnB_USER/loj/downloads/nda_crawler (dataset)
$ datalad crawl-init --save --template nda collection=2274
[INFO   ] Creating a pipeline for the NDA bucket

However the crawl fails. :-(

$ datalad crawl
[INFO   ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg 
[INFO   ] Creating a pipeline for the NDA bucket 
[INFO   ] Running pipeline [[assign(assignments=<<{'filename': 'collecti...>>, interpolate=False), <datalad_crawler.nodes.annex.Annexificator object at 0x7f5dafec9320>], [crawl_mindar_images03(collection='2274'), continue_if(negate=False, re=True, values=<<{'url': 's3:https://(?P<buck...>>), <datalad_crawler.nodes.annex.Annexificator object at 0x7f5dafec9320>]] 
[ERROR  ] Failed to create the collection: Prompt dismissed.. [SecretService.py:get_preferred_collection:58] (InitError) 

I'm running datalad version 0.12.0rc5 and the latest master of datalad crawler.

One of my concerns is whether I'm using the correct information for the "collection". NDA requires that the user create a "package" for any downloads. So I've created my package to download this dataset, and I have the package identifier, but my understanding of this crawler is that it wants the dataset ID, not the package identifier (which I also tried, but it too failed with the same error)... The point is, I'm unsure if I'm doing the right thing here. Thoughts?

Thanks!
--Laura

@yarikoptic
Copy link
Member

well, the nda crawler was pretty much a prototype a years back, then the "NDA ways" of delivering content have changed ... even NDA authentication adapter is no longer working: datalad/datalad#3674 . We had some initial dialog with @obenshaindw (and @agt24) on how datalad could (in the future RFing) to interface to NDA, but so far nobody had juice/time and needed use-case to progress forward. I feel like you have a use case? or it was just an example of no particular interest/need?

@agt24
Copy link

agt24 commented Sep 24, 2019

It'd be good to revisit this. @yarikoptic do you have a record of the ticket number at https://ndar.zendesk.com ?

I can't find it for some reason

@yarikoptic
Copy link
Member

I do not find any email among mine which relates to datalad on ndar.zendesk

@loj
Copy link
Author

loj commented Sep 25, 2019

Thanks for the response. :-)

I feel like you have a use case? or it was just an example of no particular interest/need?

@yarikoptic Yeah, this is for a dataset I'm downloading at work. Over the next couple of months, I'll be downloading 2-4 datasets from the NDA. If you need more information about what we're doing, I can explain further.

Using the crawler to achieve this isn't critical, my fallback is to use NDAR/nda-tools to download the data.

@yarikoptic
Copy link
Member

Ok, I guess just fallback for now

@agt24
Copy link

agt24 commented Sep 25, 2019 via email

@yarikoptic
Copy link
Member

@loj Did you establish some workflow to fetch datasets from NDA? One way (fix up datalad and/or datalad-crawler) or another (custom extension/set of scripts like for ukbiobank) it would be nice to have it available to wider audience.

@loj
Copy link
Author

loj commented Aug 3, 2020

Unfortunately I haven't yet, but this is still on my to-do list. I hope to get to it soon, and will definitely share once I have something. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants