Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-commit hooks #2819

Merged
merged 44 commits into from
Jul 26, 2022
Merged

pre-commit hooks #2819

merged 44 commits into from
Jul 26, 2022

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Jul 15, 2022

Overview:

  • Currently, code formatting and code generation happens fully on GH. We have a workflow called autoformat.yml that runs on pushes, executes some scripts, and pushes back the changes to the branch where the push was just performed.
  • This causes some issues with forks, external contributors, and permissions.
  • It also messes with branch protection rules, because bot-issued commits don't trigger workflows and forces PR authors to do tricks to make the CI run on the latest commit to be allowed to merge.

Proposed changes:

  • This PR introduces pre-commit hooks for Haystack. Pre-commit hooks are a series of scripts that will run on the contributor's machine right before a git action (normally a commit) to perform checks and other automated operations.
  • At first, pre-commit hooks seems not viable as the code formatting and generation tasks take about 1 minute to run from beginning to end.
  • However, by making them run on push instead of commit, the overhead is greatly reduced.
  • Another drawback is that contributors need a full installation of Haystack to be able to make these hooks work. If not, they won't be able to push. This can quickly become frustrating, so we need to keep an eye on this and prevent a hook to dissuade people from contributing.
  • We will still need the bot to check on GH side if the hook has run (contributors might simply not have installed it)

Details:

  • The hook will be installed with pre-commit install --hook-type pre-push
    • Using the pre-push hook is not necessary but it's convenient, given that on my machine the hooks take 1 full minute to run through
  • The hook requires the all directive and a bunch of system dependencies 🥲
  • The hook currently runs the following checks:
    • 🟢 syntax check for Python, JSON, YAML and TOML (very fast)
    • 🟢 EOF, EOL, trailing spaces, shebanhgs, merge conflicts, executable permissions, protected master (very fast)
    • 🟢 black (fast enough)
    • 🟡 tutorials conversion (medium)
    • 🔴 openapi specs generation (slow)
    • 🔴 json schemas update (slow)
    • 🟡 pydoc-markdown (medium)
  • The current setup is unlikely to run on Windows due to missing dependencies. This might become problematic later.

To do:

  • Implement the hooks
  • Update GH workflows
  • Update the contributing guidelines
  • Update the docs
  • Investigate if we can make the hook output some useful error messages to prevent some frustration to newcomers
  • Test on mac and windows 😨

Remember:

  • Make the new workflows a requirement after merge to master

Closes #2841

@ZanSara ZanSara added action:needs documentation type:feature New feature or request ignore-for-release-notes PRs with this flag won't be included in the release notes. labels Jul 15, 2022
@ZanSara ZanSara requested a review from masci July 15, 2022 09:28
@ZanSara ZanSara marked this pull request as ready for review July 15, 2022 11:35
- id: black

# These can fail if some dependencies are missing. Also untested on Windows.
- repo: local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I avoided the local one repo types as some people (e.g. @ArzelaAscoIi ) use dev machines/containers which means they might not have the dependencies installed in the environment where they commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have considered this. However I think we can't avoid it in our situation. I'd rather ask these people not to install the hook in environments where the deps can't be installed.

I also considered having two hooks that can be installed separately, but haven't looked into that yet. I think this can already be tried out for us as it is, if it causes too much trouble of course we'll revisit 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense 👍🏻

@masci
Copy link
Contributor

masci commented Jul 15, 2022

Thanks for this PR, hooks are going to be very useful for Haystack contributors!

2 things I would like to change:

  • The hook shouldn't be required, the goal is to save CI time and help with tasks that are easy to forget like formatters and linters but CI should fail for specific reasons (e.g. the code was badly formatted) not because the hook didn't run.
  • I think we should remove the "slow checks" except for docs API: they check parts of the codebase that rarely change so I would rather take the risk of failing the CI if I forget to update the OpenAPI spec then having the hook always checking for that.

I tried the hook with the basic checks + the API docs check on each commit and the experience isn't bad. It would be nice to add pylint (or whatever we decide should be our linting strategy) since that's a check in the CI that can often fail if you forget to run it locally before pushing.

@masci
Copy link
Contributor

masci commented Jul 18, 2022

  • The hook runs some scripts and applies changes. It will push only when running the scripts did not cause any change to the files.

You mean It will fail only when...?

  • The CI runs the exact same scripts on the commits that get pushed.
  • The CI then checks if the scripts caused some change to the files on the commit, and if it detects any, it fails.

here's the point I want to change: I want to see specific jobs checking specific features rather than having the CI "emulate" the pre-commit hook. I know some projects use the hook itself in the CI for this purpose but I think the two things should stay separated (I can expand on why I think so but the bottom line is in the long run I see the risk of CI requirements leaking into the pre-commit scripts)

This is a nice idea, but I wonder if that won't confuse external contributors a bit. You would need to explain them that they should go in the .github/utils folder and run that specific script in that specific way... Doable for sure, but it adds friction imho. Maybe there's another way to do that?

The friction would come from lack of documentation in that case. If a contributor can't find how to update the openapi spec (or even know about its existence) having the hook run them would still be confusing. In my opinion the contributing guidelines should be crystal clear about what you're supposed to do when contributing to the "special parts" of Haystack (like "here's what you have to do when working with [tutorials, openapi, pipelines]")

Also true, but beware... Pylint takes about 4 minutes to complete 🙈 I'd add Mypy, but it needs to be installed in a really weird way to work (just check the comments in the CI)... so overall I decided to skip on that part.

Those two are two things we need to fix, they shouldn't affect how we design CI/DevX but I think at this stage it's ok to just skip them, we'll tackle this later.

@ZanSara
Copy link
Contributor Author

ZanSara commented Jul 18, 2022

  • The hook runs some scripts and applies changes. It will push only when running the scripts did not cause any change to the files.

You mean It will fail only when...?

Ok that was unclear. I meant It will only let the users push when....

The hook fails if the scripts result in unstaged changes. So if Black re-formats something, then the hook will fail and prevent you from pushing until you have committed the changes (and therefore, until Black will run on your latest commit without changing anything).

here's the point I want to change: I want to see specific jobs checking specific features rather than having the CI "emulate" the pre-commit hook.

I see! It's clear now 👍

The friction would come from lack of documentation in that case. If a contributor can't find how to update the openapi spec (or even know about its existence) having the hook run them would still be confusing. In my opinion the contributing guidelines should be crystal clear about what you're supposed to do when contributing to the "special parts" of Haystack (like "here's what you have to do when working with [tutorials, openapi, pipelines]")

I agree for the OpenAPI specs. I don't for the JSON schema. The schema changes every time a signature has the slightest change (even a typing change), so that's relatively frequent (it's not a "special part" of Haystack) and also not obvious. As a contributor I would not expect that a JSON file needs to be generated for every signature change.

However, we can try it and see how it goes 😁 Not a blocker.

Those two are two things we need to fix, they shouldn't affect how we design CI/DevX but I think at this stage it's ok to just skip them, we'll tackle this later.

Agree... This is somehow related to the dependency hell we have, so probably it will be fixed together with it. I will add a TODO as a reminder.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job - especially on the docs for the contributors!

.github/workflows/black.yml Outdated Show resolved Hide resolved
- id: black

# These can fail if some dependencies are missing. Also untested on Windows.
- repo: local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense 👍🏻

.pre-commit-config.yaml Outdated Show resolved Hide resolved
always_run: true


# TODO we can make mypy and pylint run at this stage too, once their execution gets normalized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by "once their execution gets normalized"? pyliny is too slow in my opinion to make a nice commit hook. Mypy works fine for DC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy is in a very bad situation right now for OSS. Just have a look here:

- name: Install dependencies
run: |
# FIXME installing the packages before running mypy raises
# a lot of errors which were never detected before!
# pip install .
# pip install rest_api/
# pip install ui/
# FIXME --install-types does not work properly yet, see https://github.com/python/mypy/issues/10600
# Hotfixing by installing type packages explicitly.
# Run mypy --install-types haystack locally to ensure the list is still up to date
# mypy --install-types --non-interactive .
pip install mypy pydantic types-Markdown types-PyYAML types-requests types-setuptools types-six types-tabulate types-chardet types-emoji types-protobuf

When this will get fixed it will be reasonable to add it as a hook too

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, only left a couple of questions

.github/workflows/black.yml Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@ZanSara ZanSara merged commit 2d65c38 into deepset-ai:master Jul 26, 2022
Krak91 pushed a commit to Krak91/haystack that referenced this pull request Jul 26, 2022
* Add pre-commit config

* update contributing guidelines

* try failing the workflow

* add pre-commit to the deps

* updating uninstall instructions

* separate jobs in CI

* make tutorials check fail

* make black check fail

* make openapi check fail

* make yaml schema and api docs checks fail

* highlight the instructions

* Update .pre-commit-config.yaml

Co-authored-by: Tobias Wochinger <[email protected]>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <[email protected]>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <[email protected]>

* Use black --check

* Add images of the CI

* title level

* feedback

Co-authored-by: Tobias Wochinger <[email protected]>
@ZanSara ZanSara mentioned this pull request Jul 26, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ignore-for-release-notes PRs with this flag won't be included in the release notes. type:feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PR stuck after approval and automated "Update Documentation & Code Style" commit
3 participants