Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude dvc related files from source distribution #1086

Closed
seisman opened this issue Mar 20, 2021 · 4 comments · Fixed by #1095
Closed

Exclude dvc related files from source distribution #1086

seisman opened this issue Mar 20, 2021 · 4 comments · Fixed by #1095
Labels
maintenance Boring but important stuff for the core devs
Milestone

Comments

@seisman
Copy link
Member

seisman commented Mar 20, 2021

The new workflow in #1036 adds more files into the git repository and some files are included in the source distribution by default.

Run make package and check the files in the tarball dist/pygmt-X.Y.Z.tar.gz.

I see some new files in the source distribution:

  • .dvc
  • .dvcignore
  • pygmt/tests/baseline/*.dvc

Some DVC images (e.g., pygmt/tests/baseline/test_logo.png) are also included in the tarball although they should be ignored by git and excluded from the source distribution by default.

A related question is: do we want to include the baseline images in the source distribution?

@seisman seisman added maintenance Boring but important stuff for the core devs question Further information is requested labels Mar 20, 2021
@weiji14
Copy link
Member

weiji14 commented Mar 20, 2021

A related question is: do we want to include the baseline images in the source distribution?

As you mentioned in #999 (comment)_:

FYI, the tarball size can be reduced from 6.2 MB to 600 K if we can remove all baseline images. 😄

I think we should move towards trimming the size of the PyPI sdist (.tar.gz) and bdist (.whl) down to less than 1MB. This would mean that pygmt.test() at https://www.pygmt.org/v0.3.1/install.html#full-test-optional won't work out of the box. But really, what other Python package ships with a full test suite and tells users to run those tests anyway?

I see some new files in the source distribution:

* `.dvc`

* `.dvcignore`

* `pygmt/tests/baseline/*.dvc`

Some DVC images (e.g., pygmt/tests/baseline/test_logo.png) are also included in the tarball although they should be ignored by git and excluded from the source distribution by default.

Yep, we should update the MANIFEST.in file to exclude these *.dvc* files, and also the PNG files. If we really want to have those PNG baseline images somewhere, we can think about storing them on the Zenodo DOI for archival purposes (though might need a bit of work).

@seisman
Copy link
Member Author

seisman commented Mar 21, 2021

what other Python package ships with a full test suite and tells users to run those tests anyway?

Matplotlib includes all baseline images in their source tarballs (~40 MB) (https://pypi.org/project/matplotlib/#files), but not in wheels (~10 MB). They recommend users to download the source tarball and extract the test codes and baseline images before doing tests (https://matplotlib.org/stable/users/installing.html#test-data).

If we really want to have those PNG baseline images somewhere, we can think about storing them on the Zenodo DOI for archival purposes (though might need a bit of work).

Perhaps it's simpler to just upload the baseline images as GitHub release assets.

@seisman
Copy link
Member Author

seisman commented Mar 27, 2021

Some DVC images (e.g., pygmt/tests/baseline/test_logo.png) are also included in the tarball although they should be ignored by git and excluded from the source distribution by default.

These images are included in the source distribution because of the following line:

pygmt/setup.py

Line 21 in 74c9366

PACKAGE_DATA = {"pygmt.tests": ["data/*", "baseline/*"]}

@seisman
Copy link
Member Author

seisman commented Mar 27, 2021

Here are the rules about which files are included in the source distributions:

  • All files tracked by git are included
  • Any files in the baseline directory are included (no matter the files are tracked by git or not)
  • Any files listed in the MANEFEST.in files are excluded.

In the "Publish to PyPI" workflow, we don't run the dvc pull command, so the dvc-tracked baseline images are not in the baseline directory, and are not included in the source distributions. If you download the tarball from TestPyPI (https://test.pypi.org/project/pygmt/0.3.2.dev38/#files), you will see that dvc-tracked images (e.g., test_logo.png) are not included.

In the "Tests" workflow, we run the dvc pull command, so the dvc-tracked images are download from the remote repository. When we build the source distribution, the dvc-tracked images are also included and installed. So we can still run the tests.


So, the only thing we need to do is adding .dvc and .dvcignore into the MANIFEST.in file. See the PR #1095.

After we finish the migration of all baseline images, the source distribution will only contain the .dvc files, but not the .png files.

Ideally, we should also exclude .dvc files, but it will fail our Testing workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants