-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hugging Face integration #760
Comments
Very excited to see this! Feel free to ping me if you need any support with anything on the HF side :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The 🤗 Hugging Face Hub intends to facilitate the hosting and sharing of AI models and datasets (as well as demo applications), and now also NatLibFi has an organization account in the Hugging Face Hub.
The data (models and datasets) in the HF Hub live in git repositories, and git can be used to handle the data (to commit, push, pull...) . However, also direct integration of applications with HF Hub is supported using the
huggingface_hub
Python library, which is usable also as a CLI tool.Annif could have the functionality to push (and pull) projects or project sets to (and from) the HF Hub. It should to be able to operate on project sets because ensemble projects require the availability of also its base projects and also because of convenience.
There could be the following CLI command to push a set of projects to HF Hub:
For example
would upload the specified projects to NatLibFi/FintoAI-data-YSO repository.
The files and dirs needed to be uploaded are
data/projects/project-id
the project directoriesdata/vocabs/vocab-id
vocabularies of the projectsprojects.{cfg,toml,d}
configurations of the projectsOptions for bundling and uploading
1. Single file
Bundle all files into one zip named:
yso-fi.zip
(possibly include only the configs of the selected projects). Upload to the root of the repo.The filename could be derived by the glob pattern of the projects or it could be a required argument for the upload command (as 2nd argument, to be added to the above example).
This option would be easiest for downloads: just
wget
one file and unzip.2. One file for projects and vocab, and one for projects configs
Bundle projects and vocabulary directories into one zip and leave projects config file uncompressed.
3. One file for projects, one for vocab, and one for projects configs
Bundle the selected projects into one zip (
yso-fi.zip
) and vocabularies into another (yso.zip
) and leave projects config file uncompressed. Upload the projects zip todata/projects
directory and the vocab zip todata/vocabs
.4. Separate files for each project, vocab, and projects configs
Compress each project directory into its own zip (
<project-id>.zip
).For this option for downloads one should use e.g.
wget --accept yso*-fi.zip
for the projects.Some details and ideas:
upload_file
method in the Python client library that could be used for this.ModelHubMixin
class which could help the integration.huggingface-cli login
command.huggingface-cli upload
CLI command (for commit message etc.).Downloading projects
We could also implement a feature to fetch projects from the HF Hub, for example:
But implementing this is probably best done only after the upload functionality; downloading from the HF Hub can be done also by simply with wget or curl. However, if the download function is known to be added, the hierarchy and structure of the data files in the repo should be thought from this point of view.
The text was updated successfully, but these errors were encountered: