Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize data version control for managing test images #5888

Merged
merged 17 commits into from
Jan 10, 2022

Conversation

maxrjones
Copy link
Member

Description of proposed changes

This PR implements the solution for storing baseline test images proposed in #5724. Only a file describing the contents of a directory with baseline images are stored in the git repository; the actual PS file will be stored in the DAGsHub repository (e.g., https://dagshub.com/GenericMappingTools/gmt/src/manage-tests-dvc-refactor/test/baseline/api). While by default DAGsHub will store all versions of the images that are pushed to the remote storage, these can be easily garbage collected from either the user's local cache and the remote storage (if we reach the 10 GB limit) without breaking the git history.

These are the specific changes in this PR:

  • Initialize dvc in the repository
  • Add dvc as a dependency for developers
  • Add dvc to the CI test workflow
  • Migrate test/api and test/blockmean .PS files to dvc
  • Add instructions for migrating baseline images to dvc to the contributing guide

Addresses #5724

Reminders

  • Make sure that your code follows our style. Use the other functions/files as a basis.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Describe changes to function behavior and arguments in a comment below the function declaration.
  • If adding new functionality, add a detailed description to the documentation and/or an example.

Copy link
Member

@PaulWessel PaulWessel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let others do the initial approval on this...

@maxrjones maxrjones added the maintenance Boring but important stuff for the core devs label Nov 8, 2021
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay! I only have a few minor comments for now, the dvc setup and instructions seem mostly ok for me, but I'll need to run a proper git clone of this branch and do some proper tests in the coming days. Disk space on my 10 year old laptop isn't great for big repos, so I'll need to use some university resources 🙂

doc/rst/source/devdocs/contributing.rst Outdated Show resolved Hide resolved
doc/rst/source/devdocs/contributing.rst Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
@maxrjones
Copy link
Member Author

Sorry for the long delay! I only have a few minor comments for now, the dvc setup and instructions seem mostly ok for me, but I'll need to run a proper git clone of this branch and do some proper tests in the coming days. Disk space on my 10 year old laptop isn't great for big repos, so I'll need to use some university resources 🙂

No worries on the timing - I really appreciate your willingness to give this a review!

@maxrjones maxrjones added this to In Progress in GMT testing improvements via automation Nov 29, 2021
@maxrjones
Copy link
Member Author

Would anyone be willing to give this a review shortly? I think it would be great to make a decision about whether to go with dvc before we update the baseline images for the ~25 failing tests.

Here are some generally instructions for testing out the changes:

  1. Install dvc (https://dvc.org/doc/install)
  2. Checkout branch: git checkout manage-tests-dvc-refactor
  3. Pull baseline images stored in dvc: dvc pull
  4. Run the tests in your regular way (e.g., https://docs.generic-mapping-tools.org/dev/devdocs/contributing.html#running-tests).

Afterwards, you would need to do some cleanup (this wouldn't be necessary when switching branches if this PR gets merged in):

  1. Checkout the other branch (e.g., git checkout master)
  2. Navigate to the base of the repository (e.g., gtop if using aliases)
  3. Delete dvc related files (rm -rf .dvc and rm -rf test/baseline)

One note: migrating all the tests files to dvc would initially add ~100 MB to the repository size for people who run GMT tests locally (see #5724 for details) if we don't also delete git history (which I would prefer avoiding). For anyone who doesn't run dvc pull, this change would have a negligible impact on repository size. To avoid this step increase in size for those who run the tests, I think we can only migrate baseline images when tests fail due to code updates or when new tests are added.

@PaulWessel
Copy link
Member

I will give it a go following the instructions - will slack if running into troubles!

@maxrjones
Copy link
Member Author

Thanks Paul!

@PaulWessel
Copy link
Member

Was able to successfully do all 7 steps.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't found the time to run the full tests, but the documented dvc steps look ok to me, so won't hold this back any longer. 🚀

@maxrjones
Copy link
Member Author

@PaulWessel do you approve of merging this in now? If so, I could use dvc for the postscript files in #6199 and I could update the reference images for the failing tests using dvc.

@PaulWessel
Copy link
Member

Yes, I think that would be great. Go ahead.

@maxrjones maxrjones merged commit 8d318f2 into master Jan 10, 2022
@maxrjones maxrjones deleted the manage-tests-dvc-refactor branch January 10, 2022 21:26
GMT testing improvements automation moved this from In Progress to Done Jan 10, 2022
@maxrjones maxrjones mentioned this pull request Jan 10, 2022
4 tasks
@maxrjones maxrjones added the add-changelog Add PR to the changelog label Jun 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add-changelog Add PR to the changelog maintenance Boring but important stuff for the core devs
Development

Successfully merging this pull request may close these issues.

None yet

3 participants