Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Make archival asynchronous #11296

Merged
merged 52 commits into from
Nov 7, 2020
Merged

Conversation

kevans91
Copy link
Contributor

@kevans91 kevans91 commented May 4, 2020

The prime benefit being sought here is for large archives to not
clog up the rendering process and cause unsightly proxy timeouts.
As a secondary benefit, archive-in-progress is moved out of the
way into a /tmp file so that new archival requests for the same
commit will not get fulfilled based on an archive that isn't yet
finished.

Fixes #11265

This decidedly needs, at a minimum:

  • [] Some tests to go with it
  • [] Perhaps an actual "loading" icon... right now it slaps a loading class on the download icon, sending that sucker spinning.

I've deployed this to my production git instance; the benefit can be seen here: https://git.kevans.dev/kevans/freebsd/branches/ -> any of these branches will take probably 2/3 minutes to finish archiving, so the user experience is slightly improved by making this async. There's a cron job on that instance that will wipe out any generated archives every 15 minutes, so it should be easy to reproduce.

Ultimately, all archival requests get pushed into a queue. The queue handler will asynchronously process all of the requests within it and periodically purge the local archiveInProgress slice. This probably didn't need to be a queue in the end, as we ended up not using the queue very effectively; a later refactor may just move the queue handler into ArchiveRepository().

@kevans91 kevans91 force-pushed the archival-service branch 14 times, most recently from d468ebb to 7fbc4d2 Compare May 5, 2020 05:24
@lafriks
Copy link
Member

lafriks commented May 5, 2020

why deleting them every 15 minutes? imho that is way too fast

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label May 5, 2020
@kevans91
Copy link
Contributor Author

kevans91 commented May 5, 2020

why deleting them every 15 minutes? imho that is way too fast

Ah, this is just for testing, since I only have a handful of branches on this repo for easily checking the behavior when a branch takes too long to archive. I can relax this, but the main point was just to invalidate them quickly enough that I don't need to manually purge. =)

@kevans91 kevans91 force-pushed the archival-service branch 3 times, most recently from e0ce4af to ee1f8da Compare May 5, 2020 15:50
The prime benefit being sought here is for large archives to not
clog up the rendering process and cause unsightly proxy timeouts.
As a secondary benefit, archive-in-progress is moved out of the
way into a /tmp file so that new archival requests for the same
commit will not get fulfilled based on an archive that isn't yet
finished.

This asynchronous system is fairly primitive; request comes in, we'll
spawn off a new goroutine to handle it, then we'll mark it as done.
Status requests will see if the file exists in the final location,
and report the archival as done when it exists.

Fixes go-gitea#11265
Some, or perhaps even most, archives will not take all that long to archive.
The archive process starts as soon as the download button is initially
clicked, so in theory they could be done quite quickly.  Drop the initial
delay down to three-quarters of a second to make it more responsive in the
common case of the archive being quickly created.
This introduces two sync.Cond pointers to the archiver package. If they're
non-nil when we go to process a request, we'll wait until signalled (at all)
to proceed. The tests will then create the sync.Cond so that it can signal
at-will and sanity-check the state of the queue at different phases.

The author believes that nil-checking these two sync.Cond pointers on every
archive processing will introduce minimal overhead with no impact on
maintainability.
Locking/unlocking the queueMutex is allowed, but not required, for
Cond.Signal() and Cond.Broadcast().  The magic at play here is just a little
too much for golangci-lint, as we take the address of queueMutex and this is
mostly used in archiver.go; the variable still gets flagged as unused.
Once we've signaled a cond var, it may take some small amount of time for
the goroutines released to hit the spot we're wanting them to be at. Give
them an appropriate amount of time.
We must setup the mutex/cond variables at the beginning of any test that's
going to use it, or else these will be nil when the test is actually ran.
Things got shuffled around such that we carefully build up and release
requests from the queue, so we can validate the state of the queue at each
step. Fix some assertions that no longer hold true as fallout.
The technique established launches a goroutine to do the wait,
which will close a wait channel upon termination. For the timeout
case, we also send back a value indicating whether the timeout was
hit or not.

The timeouts are expected to be relatively small, but still a multi-
second delay to shutdown due to this could be unfortunate.
We can just grab the shutdown channel from the graceful manager instead of
constructing a channel to halt the caller and/or pass a result back.
@kevans91
Copy link
Contributor Author

Following discussion with @zeripath on Discord (Thanks!), I've simplified the logic that ensures we aren't blocking shutdown to remove the new channels and extra goroutines.

@zeripath
Copy link
Contributor

Hi @kevans91 how's this PR progressing?

@kevans91
Copy link
Contributor Author

kevans91 commented Aug 31, 2020

Hi @kevans91 how's this PR progressing?

As far as I'm aware, I've addressed all of the feedback and am awaiting further review- I'd left one unresolved because I wasn't sure if the solution was satisfactory or not. We discussed it a bit, now I'm just unsure if I need to change those to use a different context (re: #12555)

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Sep 10, 2020
@lafriks
Copy link
Member

lafriks commented Sep 11, 2020

Please resolve conflicts

@kevans91
Copy link
Contributor Author

Heh, a minute after I pushed a version with conflicts resolved, another set of conflicts popped up. =D

I think I've reconciled the differences correctly; WaitForCompletion/TimedWaitForCompletion now use Done() channels on the request context, and the service itself passes shutdown context to CreateArchive().

@lafriks lafriks added this to the 1.14.0 milestone Nov 7, 2020
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Nov 7, 2020
@lafriks
Copy link
Member

lafriks commented Nov 7, 2020

🚀

@codecov-io
Copy link

Codecov Report

Merging #11296 (c93c520) into master (1b65536) will increase coverage by 0.16%.
The diff coverage is 67.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11296      +/-   ##
==========================================
+ Coverage   42.01%   42.17%   +0.16%     
==========================================
  Files         692      693       +1     
  Lines       75985    76098     +113     
==========================================
+ Hits        31924    32095     +171     
+ Misses      38839    38746      -93     
- Partials     5222     5257      +35     
Impacted Files Coverage Δ
routers/repo/repo.go 32.63% <0.00%> (+3.63%) ⬆️
services/archiver/archiver.go 77.14% <77.14%> (ø)
routers/routes/routes.go 84.41% <100.00%> (+0.05%) ⬆️
models/repo_mirror.go 2.38% <0.00%> (-11.91%) ⬇️
modules/secret/secret.go 66.66% <0.00%> (-7.41%) ⬇️
modules/cron/tasks_basic.go 87.35% <0.00%> (-3.45%) ⬇️
modules/git/utils.go 73.77% <0.00%> (-3.28%) ⬇️
modules/git/blame.go 75.43% <0.00%> (-1.76%) ⬇️
modules/queue/unique_queue_disk_channel.go 53.84% <0.00%> (-1.54%) ⬇️
services/pull/check.go 49.63% <0.00%> (-1.46%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b65536...c93c520. Read the comment docs.

@lafriks lafriks merged commit e461f08 into go-gitea:master Nov 7, 2020
@go-gitea go-gitea locked and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timeout while downloading branches
8 participants