Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow in my environment #15

Closed
TinoPlayStuff opened this issue Jun 29, 2022 · 4 comments
Closed

slow in my environment #15

TinoPlayStuff opened this issue Jun 29, 2022 · 4 comments

Comments

@TinoPlayStuff
Copy link

hi,

I tried this api, its very convenient but relatively slow.
So I modify it to use "session":

  1. add one line under import requests as
    import requests
    re_session = requests.Session()
    
  2. modify another line
    response: requests.models.Response = getattr(requests, method)( 
    => response: requests.models.Response = getattr(re_session, method)(
    

It's much faster now, but still slower than using subprocess.Popen to call curl for the same thing.

Is this the limitation of the python requests module or maybe I did something wrong?

@marph91
Copy link
Owner

marph91 commented Jun 29, 2022

Hi @TinoPlayStuff,

thanks for the feedback. I didn't think about the speed for now. In general I would expect requests to be almost as fast as curl. It seems to be an issue/limitation of joppy.

Could you add a reproducer for your issue? I. e. some sample requests and how fast they are on your machine with curl and joppy? I couldn't observe a big difference when using sessions in the testsuite, but this could be due to the testsuite structure.

@TinoPlayStuff
Copy link
Author

TinoPlayStuff commented Jun 29, 2022

in run_py.zip, there are two python scripts, run.py and run_request.py.
They do basically the same thing. The main difference between them is that the "run.py" is totally based on curl, while in run_requests.py, many of the http stuff have been replaced with joppy api.

To run them, you have to put your joplin token in run.py.tok
and edit settings in run.py or run_requests.py as:

# <- setting
TOK_FILE = "run.py.tok"  # file contains joplin token
PUBTAG = "published"  # note with this tag will be extracted
TAGHIDE = {"published", "publishedx"} # test tag
N_FDR = "./_posts"  # where to put the exported posts
R_FDR = "./_resources"  # where to put resource files (.jpg, .png, ...)
URL = "http:https://localhost:41184/"
# -> setting

with such setting, all notes with tag "published" (defined as PUBTAG) and related resource files will be put into ./_posts and ./_resources (defined as N_FDR and R_FDR respectively)
The scripts will report start time and end time.

the joppy in this zip is modified to use requests.session
In my environment with a forty notes test, the curl version is 20% faster.
If use original joppy, it's too long and I didn't wait it finish

run_py.zip

@marph91
Copy link
Owner

marph91 commented Jun 30, 2022

Unfortunately I couldn't run your script out of the box. It's recommended to pass the Popen as sequence instead of a string (https://docs.python.org/3/library/subprocess.html#subprocess.Popen) to avoid OS dependent problems.

However, I did a "dry" look at the script. Notes:

  • At first I thought it's about the pagination. But you resolved the pagination manually most of the time. Since your test data is only 40 notes, it shouldn't make a difference anyway. Are these plain notes or do they have tags and resources attached?
  • It seems like a 20 % speed difference between curl and requests is in the expected range: https://stackoverflow.com/a/32899936/7410886
  • It seems reasonable that using a session is faster. However, I can't see a significant speedup at my local tests. Will try further.

@TinoPlayStuff
Copy link
Author

Thanks for your explanation. Today I made some modifications and ... ...
now run_requests.py (with joppy using a session) is a little faster than run.py.

The main modification is that I now use shutil.copy2() to directly copy resource files from joplinprofile folder. Thus the original insufficient performance may be from that I used your get_resource_file function is an improper way.

FYI, I ran these scripts on a windows 11 machine with python 3.7. Without session, it takes more than one second to deal with one note and related resource files.

Thanks again for your convenient api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants