Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download Huge corpus of papers #31

Open
UddeshyaPandey opened this issue Jan 12, 2022 · 2 comments
Open

Unable to download Huge corpus of papers #31

UddeshyaPandey opened this issue Jan 12, 2022 · 2 comments

Comments

@UddeshyaPandey
Copy link

UddeshyaPandey commented Jan 12, 2022

Describe the bug
Was downloading XML and CSV files for all the papers published in the year 2021 for the query "Transcription factors", the limit was set to 100k papers, and hits were 99k, ideally, it should start the download with a warning but the error is
TypeError: 'NoneType' object is not subscriptable

To Reproduce
Steps to reproduce the behaviour:

  1. In your windows command prompt type
    pygetpapers -q "Transcription factors" -x -c -o TF_database_2021 -k 100000 --startdate 2021-01-01 --enddate 2021-12-31
  2. press 'Enter'
  3. Scroll down to the end
  4. See an error like
    TypeError: 'NoneType' object is not subscriptable

Expected behaviour

Ideally, it should start the download of all the available XML and CSV files related to the query

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser : Firefox
  • Version : Firefox 95.0

Additional context
it usually works for a small corpus of like 1000 to 100 papers, for example, pygetpapers ran smoothly the above query for the year 2022 and set the limit to 1000 papers, but the actual hits were only 458. it downloaded a corpus of 458 papers with CSV and XML files.
But for a huge corpus usually >1k, it shows the above error message.

@ayush4921
Copy link
Collaborator

Can you check the same command in version 1.1.5

@petermr
Copy link
Owner

petermr commented Feb 24, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants