Create the build in an async manner to improve response time to webhooks #2870

francoispqt · 2019-11-08T15:18:22Z

Hi,

I will give the context of this PR:

We have been working on a configuration plugin to handle our monorepo and the solution we came up with does the following steps:

clones the repo based on the build event in the same way https://github.com/drone/drone-git does it
looks for diff files affected by the current build
for affected each files it traverses the repo upward until it finds a drone config file
parses these drone config files and passes them to a template engine to compute the root drone file of the repository and send back the final config

Leveraging the multi-pipeline feature we have been able to come up with a solution which handles each microservice individually, erasing the drawbacks of monorepos in classic CI environments.

A root drone file using this can look like the following:

kind: pipeline
type: docker
name: "lint"

steps:
- name: lint
  image: golang:1.12
  commands:
  - lint

{{ if .Pipelines }}
{{ .Pipelines | toYaml }}
---
kind: pipeline
type: docker
name: "deploy"

steps:
- name: deploy
  image: debian
  commands:
  - deploy

depends_on:
{{ .PipelinesName | toYaml }}
{{- end }}

What we have done works great and we are planning to make it available to the community.

Yet there is one problem, our monorepo is pretty big and even though we created a cache mechanism using a mirror repo as reference, it still takes between 5~8 secs to fetch the repository in the configuration plugin and it leads to timeouts in the Github webhooks as the timeout is set at 10sec.

Following the good practices advised by github here https://developer.github.com/v3/guides/best-practices-for-integrators/, we've decided to come up with this PR which makes the handling of a webhook more asynchronous.

With this PR, the triggerer starts a goroutine to do most of the heavy lifting to create the build, if an error occurs, it creates the build with an error message.

Thanks for the feedbacks

… with error if something goes wrong - the goal is to improve response time to webhook

CLAassistant · 2019-11-08T15:18:54Z

All committers have signed the CLA.

bradrydzewski · 2019-11-08T15:42:45Z

First, thanks for the pull request 🙂

One concern I have is that always returning a build we give the impression a build was created, which is not always going to be the case. I see this causing confusion and resulting in support burden. For example, if the trigger condition is not satisfied the build is ignored, but a build is still returned in the hook body.

For this reason, we would probably be better off always returning a 200 OK with an empty body to avoid any confusion. This would simplify the pull request and would only require changing a few lines of code in the hook handler.

	go triggerer.Trigger(ctx, repo, hook)
	writeStatus(w, 200)

Before making this change we also need to understand whether or not anything useful is being sent back in the hook response on error, because we would be taking that away from users who may be using it to debug today. My gut tells me nothing useful is being returned, but we should verify anyway.

Lastly I had hoped one day to return more details in the response, including reasons a build may have been skipped. This will no longer be an option if we process the build async.

The timeout errors in GitHub can be unsightly, however, they do not prevent the trigger routing from executing and scheduling a build.

For the record I am not opposing this change, and I do think it is probably the right thing to do, but I wanted to document all the pros and cons and things we will need to consider so that others can weigh in.

francoispqt · 2019-11-08T17:43:32Z

Thanks for the quick response.

Calling the whole trigger async definitely works for our issue, but I'm not sure it's the best solution.

When writing the PR I was thinking that if we wanted the response to be useful we might as well just have the correct build number in the response, that should be enough to debug, at least it's enough to call the drone API build info endpoint, but I missed the build being skipped in which case the response sent is invalid. What we could do, though, is create the build with the status "skipped" and with a message about why it was skipped.

As for debugging from the webhooks response, we are doing it but only for issues with our configuration plugin because that's the only way although I think it's easier to debug from drone itself because not every contributor has access to the webhooks response but they do have access to their builds on the drone UI.

bradrydzewski · 2019-11-08T17:45:22Z

What we could do though is create the build with the status "skipped" and with a message about why it was skipped.

this is not something I would want to do. This would create a lot of noise in the system and would create large gaps in build numbers, as well as increase database size.

@tboerger any thoughts?

maybe we store the last N webhooks that are sent to the system in a separate table / s3 and we periodically purge?

francoispqt · 2019-11-08T17:55:00Z

this is not something I would want to do. This would create a lot of noise in the system and would create large gaps in build numbers.

I agree.

Trying to find another solution 🤔

bradrydzewski · 2019-11-08T18:02:41Z

I think the optimal solution probably looks something like this:

run trigger async and return a 200OK with an empty body
store the full hook in a new table that is periodically pruned so that it does not grow too big (with an option to store the full hook payload in S3)
store the build id, error or skip reason inside the same record as the hook
provide an option for the user to view hooks received per-repository

not all systems provide access to webhook responses (such as Bitbucket) and it often requires admin access. So hook responses were never a great way to debug. Storing the data in the system and making it accessible via endpoint should make it way easier for users to debug.

francoispqt · 2019-11-08T21:15:11Z

I like the idea. I think we can still make the response from the webhook useful doing something simple. What about that?

Store the hook in a table right away and return 200OK with a payload like:

{
  "hook_id": 1,
  "link": "https://drone.com/api/hooks/1"
}

start a goroutine to process the build, alternatively, a worker could poll the hook table for unprocessed hooks and process them (this adds delivery guarantee)
update the hook entry along the way with metadata such as build id, error, skip reason etc.
provide a hooks resource in the drone api and a hooks tab for each repo in the drone UI to see hooks history
add a repo settings for hooks retention period and have a worker clearing hooks for every repo when hooks are older than the retention period

bradrydzewski · 2019-11-09T00:05:04Z

yes, we'll have to come up with a more formal rfc (we don't want to rush into this) but I feel pretty good about the ideas being discussed so far. Interested to hear from @ashwilliams1 and @tboerger

tboerger · 2019-11-09T07:52:45Z

I like the idea to store the webhooks. It's always a pain to find the right webhook within github for debugging.

I also like listing the hooks in the ui, we could show attributes where you always got to toggle the hook view within github directly in the list, and we could even filter them.

francoispqt · 2019-11-09T13:24:05Z

I don't know what's the decision-making process but if we go ahead at some point, I'd be happy to work on the implementation :)

tboerger · 2019-11-09T18:59:03Z

Brad is the project owner, so he is the one that decides the final solution, but he asked us, the other maintainers or contributors, for our opinion to get to a decision.

bot2-harness · 2023-12-19T19:43:10Z

This PR has been automatically closed due to inactivity.

make trigger create the build in an async manner and always create it…

4a0f1c0

… with error if something goes wrong - the goal is to improve response time to webhook

add panic recover and log and fix tests

ff5aaf1

hitesharinga changed the base branch from master to drone October 4, 2023 02:47

bot2-harness closed this Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create the build in an async manner to improve response time to webhooks #2870

Create the build in an async manner to improve response time to webhooks #2870

francoispqt commented Nov 8, 2019

CLAassistant commented Nov 8, 2019 •

edited

Loading

bradrydzewski commented Nov 8, 2019 •

edited

Loading

francoispqt commented Nov 8, 2019 •

edited

Loading

bradrydzewski commented Nov 8, 2019 •

edited

Loading

francoispqt commented Nov 8, 2019

bradrydzewski commented Nov 8, 2019 •

edited

Loading

francoispqt commented Nov 8, 2019

bradrydzewski commented Nov 9, 2019 •

edited

Loading

tboerger commented Nov 9, 2019

francoispqt commented Nov 9, 2019 •

edited

Loading

tboerger commented Nov 9, 2019

bot2-harness commented Dec 19, 2023

Create the build in an async manner to improve response time to webhooks #2870

Create the build in an async manner to improve response time to webhooks #2870

Conversation

francoispqt commented Nov 8, 2019

CLAassistant commented Nov 8, 2019 • edited Loading

bradrydzewski commented Nov 8, 2019 • edited Loading

francoispqt commented Nov 8, 2019 • edited Loading

bradrydzewski commented Nov 8, 2019 • edited Loading

francoispqt commented Nov 8, 2019

bradrydzewski commented Nov 8, 2019 • edited Loading

francoispqt commented Nov 8, 2019

bradrydzewski commented Nov 9, 2019 • edited Loading

tboerger commented Nov 9, 2019

francoispqt commented Nov 9, 2019 • edited Loading

tboerger commented Nov 9, 2019

bot2-harness commented Dec 19, 2023

CLAassistant commented Nov 8, 2019 •

edited

Loading

bradrydzewski commented Nov 8, 2019 •

edited

Loading

francoispqt commented Nov 8, 2019 •

edited

Loading

bradrydzewski commented Nov 8, 2019 •

edited

Loading

bradrydzewski commented Nov 8, 2019 •

edited

Loading

bradrydzewski commented Nov 9, 2019 •

edited

Loading

francoispqt commented Nov 9, 2019 •

edited

Loading