Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependencies v0.56 #549

Merged
merged 15 commits into from
Jan 17, 2022
Merged

Update dependencies v0.56 #549

merged 15 commits into from
Jan 17, 2022

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Jan 7, 2022

Dependency updates for Annif v0.56.

Updates to (edited):

  • Omikuji 0.4.x
  • TensorFlow 2.7.0 & lmbd 1.3.0
  • Gensim 4.1.x
  • joblib 1.1.0
  • Optuna 2.10.x
  • SciPy 1.7.x
  • Scikit-learn 1.0.2
  • NumPy 1.21.x
  • Click 8.0.x
  • Flask >=1.0.4,<3
  • Connexion to use connexion2 PyPI project

Below is a table of Omikuji 0.3.4 and 0.4.1 models comparison. Training was performed on full Finnish Finna dataset (real and user times include preprocessing), and evaluation in parallel.

v0.3 v0.4
train time
real 190m 174m
user 2061m 1789m
reported by Omikuji 7776s 6783s
eval time
real 2m59s 2m3s
user 53m55s 14m33s
sys 3m18s 1m47s
model size on disk 3.8G 3.4G
eval results
JYU-theses 0.4716 0.4759
kirjaesittelyt2021 0.4511 0.4496
kirjastonhoitaja 0.2759 0.2824
satakunnan-kansa 0.2808 0.2524
vapaakappaleet-orig 0.4648 0.4613
average 0.3888 0.3843

@codecov
Copy link

codecov bot commented Jan 7, 2022

Codecov Report

Merging #549 (f6cf926) into master (586adba) will decrease coverage by 0.03%.
The diff coverage is 84.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #549      +/-   ##
==========================================
- Coverage   99.53%   99.49%   -0.04%     
==========================================
  Files          80       80              
  Lines        5341     5340       -1     
==========================================
- Hits         5316     5313       -3     
- Misses         25       27       +2     
Impacted Files Coverage Δ
annif/backend/omikuji.py 96.42% <60.00%> (-2.34%) ⬇️
annif/backend/pav.py 98.90% <100.00%> (ø)
annif/backend/tfidf.py 98.80% <100.00%> (-0.05%) ⬇️
annif/eval.py 100.00% <100.00%> (ø)
annif/lexical/mllm.py 100.00% <100.00%> (ø)
annif/lexical/util.py 100.00% <100.00%> (ø)
annif/suggestion.py 99.21% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 586adba...f6cf926. Read the comment docs.

@juhoinkinen
Copy link
Member Author

Other points:

  • Omikuji models trained with previous version do not work on the new Omikuji version. Others models stay working, also NN ensemble and TFIDF models despite TensorFlow and Gensim version updates.
  • Python 3.6 & 3.7 tests show now a RuntimeWarning raised by test_english_analyzer_normalize_word, but it has been around since Optimize startup time using local & lazy imports (take 2) #544.
  • Python 3.6 tests show also many DeprecationWarnings.

@juhoinkinen
Copy link
Member Author

juhoinkinen commented Jan 10, 2022

Some dependencies were not updated to newer versions because they do not support Python 3.6; some latest versions do not support even 3.7:

  • SciPy 1.7.x and 1.6.x require 3.7+ (to-be-released 1.8.0 will require Python 3.8+)
  • NumPy 1.20 requires Python 3.7-3.9; 1.21 requires 3.7-3.10; 1.22 requires 3.8-3.10
  • Scikit-learn 1.0 requires Python 3.7-3.9 (stwfsa has scikit-learn pinned to 0.24.x >= 0.24.x (but to ==0.24.2 in Pipfile?))
  • TensorFlow 2.7 has wheels and is tested for Python 3.7-3.9

Should we keep Python 3.6 support for Annif?

@osma
Copy link
Member

osma commented Jan 10, 2022

We've tried to support three consecutive versions of Python. But we already currently have support for 3.7, 3.8 and 3.9, so dropping 3.6 wouldn't violate that policy. Python 3.6 has been EOL'd and the final release 3.6.15 was released on 2021-09-04.

Python 3.6 is the default version in Ubuntu 18.04, but it's possible to install newer versions of Python from PPAs.

So I think dropping 3.6 support would be OK. Should that be done in a separate PR or is it easier to just do it in this one? (I guess that would mainly involve reorganizing the CI setup, and updating README.md)

Next steps would then be adding support for 3.10 and then we can consider when to drop 3.7.

@juhoinkinen
Copy link
Member Author

Sounds good, I make a separate PR for dropping Python 3.6 support.

@osma
Copy link
Member

osma commented Jan 10, 2022

Omikuji eval time has improved a lot in 0.4! The other numbers look good as well.

What happens when you try to use an old model with omikuji 0.4? Is there some kind of exception? Should we catch that and display a more user friendly error message?

@juhoinkinen
Copy link
Member Author

Rebased on master after dropping Python 3.6 support (#550) and force-pushed.

@juhoinkinen
Copy link
Member Author

What happens when you try to use an old model with omikuji 0.4? Is there some kind of exception? Should we catch that and display a more user friendly error message?

Omikuji shows first Failed to load model: Unable to deserialize tree: invalid type: map, expected variant identifier, then about one-screen length of traceback and RuntimeError: Failed to load model from data/projects/yso-bonsai-old-fi/omikuji-model.

This already hints something being wrong with the model, so I'm not sure if it is worth processing it within Annif...?

@osma
Copy link
Member

osma commented Jan 10, 2022

[...] about one-screen length of traceback and RuntimeError: Failed to load model from data/projects/yso-bonsai-old-fi/omikuji-model
This already hints something being wrong with the model, so I'm not sure if it is worth processing it within Annif...?

I think that showing tracebacks to the user is not a recipe for good UX... It would be better to show a more informative message such as "Omikuji models trained on Annif versions older than 0.56 cannot be loaded. Please retrain your project."

@osma
Copy link
Member

osma commented Jan 11, 2022

@juhoinkinen Since you've upgraded (again) Click to 8.0.*, we need to take care of also upgrading Connexion and Flask to avoid hitting the version mismatch problem reported in #533 once more (which was fixed by downgrading Click back to 7.1.2 in PR #534).

There's a little problem with Connexion releases though - PyPI currently only has 2.9.0 which doesn't support Flask 2. Version 2.10.0 has been released on GitHub but on the PyPI side it has a different name connexion2. See here for details. So we should probably switch to connexion2 at least for now.

@juhoinkinen
Copy link
Member Author

juhoinkinen commented Jan 11, 2022

Yes, I already switched to connexion2 in Annif's requirements, but now I noticed some dependency is still using connexion, so both connexion 2.7.0 and connexion2 2.10.0 end up to be installed.

As we don't actually need connexion2 or Click 8, maybe it's best to just go back to Click 8 (again)...? Edit: to Click 7

@juhoinkinen
Copy link
Member Author

swagger-tester being the dependency requiring connexion.

@osma
Copy link
Member

osma commented Jan 11, 2022

Ah, right. The dependency on connexion seems to come from swagger-tester which is a dev dependency for Annif. Its last release 0.2.12 was made in May 2018 and the project seems pretty dead. Maybe we should try to get rid of this dependency - it's used in test_swagger.py for some basic exercise of the REST API methods.

@juhoinkinen
Copy link
Member Author

I checked the compatibility of models trained with Annif 0.55 after upgrading also SciPy, Numpy and Scikit-learn, and TensorFlow again/further.

The usual warnings like UserWarning: Trying to unpickle estimator TfidfTransformer from version 0.24.2 when using version 1.0.2. [...] are shown but models seem to work (suggestions and scores remain the same for a short text input).

@sonarcloud
Copy link

sonarcloud bot commented Jan 17, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@juhoinkinen
Copy link
Member Author

Rebased on master after removing swagger-tester dependency (#551) and force-pushed.

@juhoinkinen juhoinkinen merged commit b894218 into master Jan 17, 2022
@juhoinkinen juhoinkinen deleted the update-dependencies-v0.56 branch January 17, 2022 10:29
@juhoinkinen juhoinkinen mentioned this pull request Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants