Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CLI startup time #696

Merged
merged 7 commits into from
May 11, 2023
Merged

Optimize CLI startup time #696

merged 7 commits into from
May 11, 2023

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Apr 24, 2023

Optimizes CLI startup time by avoiding unnecessary imports for basic commands, mainly to improve the CLI command completion response times in #693.

Changes:

  • Imports of joblib and numpy are simply moved inside the functions/methods where they are used.

  • Import of connexion is avoided for all CLI commands but annif run and annif routes by using separate creation scripts for plain-Flask and Connexion-Flask applications. When started via gunicorn "annif:create_app() the Connexion-Flask app is created as previously.

This is a continuation of the optimizations in #514, #544 and #563.

I checked the import time of the pre-release of connexion 3, and it was even slower (~0.35 s) than of the current v2.14.* (0.12 s), so avoiding connexion import is even more valuable in the future.

Improvement compared to current main
The improvement seems to be some 0.3 seconds (~0.1 s coming from each of joblib, numpy, and connexion).

Before (main):

$ time annif --version
1.0.0.dev0

real	0m0.610s
user	0m0.948s
sys	0m0.666s
$ time annif --help
Usage: annif [OPTIONS] COMMAND [ARGS]...
...
real	0m0.689s
user	0m1.079s
sys	0m0.626s
$ time annif list-vocabs 
Vocabulary ID       Languages                 Size  Loaded
----------------------------------------------------------
yso                 en,fi,se,sv              38179  True  
ykl                 -                            -  False 
thema-fi            -                            -  False 

real	0m1.214s
user	0m1.500s
sys	0m0.665s

After (PR):

$ time annif --version
1.0.0.dev0

real	0m0.293s
user	0m0.265s
sys	0m0.029s
$ time annif --help
Usage: annif [OPTIONS] COMMAND [ARGS]...
...
real	0m0.305s
user	0m0.266s
sys	0m0.038s
$ time annif list-vocabs 
Vocabulary ID       Languages                 Size  Loaded
----------------------------------------------------------
yso                 en,fi,se,sv              38179  True  
ykl                 -                            -  False 
thema-fi            -                            -  False 

real	0m0.945s
user	0m0.880s
sys	0m0.065s

The CLI command completion response times seemed to follow times of annif --help, so they are improved similarly.

Timing imports for completion responses does not show any easy targets for further optimization. The rdflib is used via multiple modules and in nearly all methods of corpus.subject.SubjectFileSKOS, and I don't know how rdflib import could be done only when needed in there in a nice way.

Tuna output for the timing of list-* completion response time:

image

@juhoinkinen juhoinkinen added this to the 1.0 milestone Apr 24, 2023
@codecov
Copy link

codecov bot commented Apr 24, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.03 🎉

Comparison is base (3451bd3) 99.63% compared to head (e6e06b8) 99.66%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #696      +/-   ##
==========================================
+ Coverage   99.63%   99.66%   +0.03%     
==========================================
  Files          89       89              
  Lines        6222     6257      +35     
==========================================
+ Hits         6199     6236      +37     
+ Misses         23       21       -2     
Impacted Files Coverage Δ
annif/__init__.py 100.00% <100.00%> (+3.22%) ⬆️
annif/cli.py 100.00% <100.00%> (+0.35%) ⬆️
annif/corpus/skos.py 100.00% <100.00%> (ø)
annif/corpus/subject.py 100.00% <100.00%> (ø)
tests/test_cli.py 100.00% <100.00%> (ø)

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@juhoinkinen
Copy link
Member Author

The rdflib import could be turned to lazy, but the code is not super clean and the time saving is only ~0.05 s.

However I would still add it. For the command completions even such a tiny improvement in the response time is noticeable I think.

@juhoinkinen juhoinkinen marked this pull request as ready for review April 26, 2023 07:19
@juhoinkinen
Copy link
Member Author

Seems that upgrading to soon-to-be-released Connexion 3 requires some changes to the way application is created, see #689 (comment). Hopefully the changes can be incorporated to this.

Copy link
Member

@osma osma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks OK, although making more and more local imports makes the code messier IMHO. But it seems worth it in this case, as the startup speed improves noticeably (I also tested this on my laptop and the speedup was around 0.2 seconds). Maybe someday we could switch to real PEP 690 lazy imports, but those are scheduled for Python 3.12, which should be released in a few days, but it will take years until we can drop support for 3.11.

I see that Codecov complains about cli.py line 31 not being covered by tests, which I found puzzling, since you added a test that runs annif run --help which should execute this line. But then I added some debugging code around that line and found out that the test doesn't actually exercise that line at all. I think that's because when using CliRunner (e.g. runner.invoke(annif.cli.cli, ["run", "--help"])) that doesn't change sys.argv at all, so it also won't execute line 31 of cli.py. Is there anything we could do here to make sure that the test also exercises this path?

@juhoinkinen
Copy link
Member Author

I see that Codecov complains about cli.py line 31 not being covered by tests, which I found puzzling, since you added a test that runs annif run --help which should execute this line. But then I added some debugging code around that line and found out that the test doesn't actually exercise that line at all. I think that's because when using CliRunner (e.g. runner.invoke(annif.cli.cli, ["run", "--help"])) that doesn't change sys.argv at all, so it also won't execute line 31 of cli.py. Is there anything we could do here to make sure that the test also exercises this path?

The line could definitely be run in a test such as

def test_run_separate_process():
    result = os.popen("python annif/cli.py run --help").read()
    assert "Run a local development server." in result

But I don't know how to ensure this this really starts Flask with Connexion, and not just Flask... And codecov does not take this kind of test into account in coverage counting (at least locally run).

@osma
Copy link
Member

osma commented May 5, 2023

The os.popen solution doesn't seem very useful, for the reasons you mention.

How about using some other information than just sys.argv[1] in cli.py when deciding whether to choose Connexion or plain Flask?

For testing parallel.py using different spawn methods, I added a constant definition like this:

Annif/annif/parallel.py

Lines 7 to 10 in 3451bd3

# Start method for processes created by the multiprocessing module.
# A value of None means using the platform-specific default.
# Intended to be overridden in unit tests.
MP_START_METHOD = None

The unit test that exercises this code overrides the value using monkeypatching:

Annif/tests/test_cli.py

Lines 887 to 900 in 3451bd3

def test_eval_two_jobs_spawn(tmpdir, monkeypatch):
tmpdir.join("doc1.txt").write("doc1")
tmpdir.join("doc1.key").write("dummy")
tmpdir.join("doc2.txt").write("doc2")
tmpdir.join("doc2.key").write("none")
tmpdir.join("doc3.txt").write("doc3")
# use spawn method for starting multiprocessing worker processes
monkeypatch.setattr(annif.parallel, "MP_START_METHOD", "spawn")
result = runner.invoke(
annif.cli.cli, ["eval", "--jobs", "2", "dummy-en", str(tmpdir)]
)
assert not result.exception
assert result.exit_code == 0

Something similar could perhaps be done here - define a constant in cli.py which uses sys.argv and then use it in the if clause:

USE_CONNEXION = len(sys.argv) > 1 and sys.argv[1] == "run"

if USE_CONNEXION:
    create_app = annif.create_app  # Use Flask with Connexion
else:
    # Connexion is not needed for most CLI commands, use plain Flask
    create_app = annif.create_flask_app

Then you can monkeypatch USE_CONNEXION in the unit test to force it to be True in any case.

@osma
Copy link
Member

osma commented May 5, 2023

...except now that I think of it, I'm not sure if that will work, because monkeypatching is probably applied too late for this to have an effect. Anyway, something similar could work here.

@sonarcloud
Copy link

sonarcloud bot commented May 11, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@juhoinkinen
Copy link
Member Author

I see that Codecov complains about cli.py line 31 not being covered by tests, which I found puzzling, since you added a test that runs annif run --help which should execute this line. But then I added some debugging code around that line and found out that the test doesn't actually exercise that line at all. I think that's because when using CliRunner (e.g. runner.invoke(annif.cli.cli, ["run", "--help"])) that doesn't change sys.argv at all, so it also won't execute line 31 of cli.py. Is there anything we could do here to make sure that the test also exercises this path?

The line could definitely be run in a test such as

def test_run_separate_process():
    result = os.popen("python annif/cli.py run --help").read()
    assert "Run a local development server." in result

But I don't know how to ensure this this really starts Flask with Connexion, and not just Flask... And codecov does not take this kind of test into account in coverage counting (at least locally run).

Actually using the os.popen solution does get the line 31 covered in codecov (when run in CI). When I locally ran coverage run -m pytest tests/test_cli.py; coverage report -m | grep annif/cli.py that line was reported as missed.

And I made also annif routes command to start the Connexion app, which registers all REST API endpoints. The list of the registered endpoints from the output of routes command is now tested to confirm that Connexion is started.

I think this can be merged. When upgrading to Connexion 3 (#702) this functionality can or may need to be changed again.

@juhoinkinen juhoinkinen merged commit 6d3a834 into main May 11, 2023
12 checks passed
@juhoinkinen juhoinkinen deleted the lazy-imports branch May 11, 2023 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants