Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Output extraction performance tweaks #45

Merged
merged 4 commits into from
Sep 1, 2020
Merged

Conversation

skovhus
Copy link
Contributor

@skovhus skovhus commented Aug 27, 2020

(Pulled out from #44). Fixes #42. Closes #2

The output extraction done during brick build is rather slow. I noticed a comment in the code to use get_archive instead of mounting volumes, and doing so improves the performance a bit.

A brick -r build on tmrow on my local machine when all images are already build: from 198-228 to 156-164 seconds (two measurements).

I also did some other minor clean up while poking around and preparing for cache optimizations.

Future steps

If we simply remove all output extraction we get down to 122 seconds (compared to around ~210 seconds from above). This shows that there is still a lot to gain from optimizing the extraction.

@madsnedergaard
Copy link
Member

Nice, I've triggered a bunch of builds for testing the impact of this:

Screenshot 2020-08-27 at 23 05 45

Will give an update tomorrow on how it went :)

@madsnedergaard
Copy link
Member

madsnedergaard commented Aug 28, 2020

Don't seem like it changed much, will investigate if it was actually working as intended :)
Screenshot 2020-08-28 at 09 05 22

@skovhus
Copy link
Contributor Author

skovhus commented Aug 28, 2020

Don't seem like it changed much, will investigate if it was actually working as intended :)

Sad. What is the usual running time for a CI job where nothing changed? It is likely that get_archive will mostly be a performance boost on OS X while the Docker file system is different. Note that this PR shouldn't change the behaviour when nothing changed (that will be #46), but it should give a slight speed up (25% locally).

I hope that #46 will show a serious performance increase. That PR also includes this branch, so you can test both if you want to. Note that #46 will only work after the build has run once (as images needs to be updated with the dependency hash as part of their Dockerfile).

@madsnedergaard
Copy link
Member

Don't seem like it changed much, will investigate if it was actually working as intended :)

Sad. What is the usual running time for a CI job where nothing changed?

Approximately 8 minutes. So definitely room for improvements! 😅

It is likely that get_archive will mostly be a performance boost on OS X while the Docker file system is different. Note that this PR shouldn't change the behaviour when nothing changed (that will be #46), but it should give a slight speed up (25% locally).

I guess that could be, there's not a huge improvements at least - but it's kind of hard to actually see changes as the speeds differ quite a lot. Below is a sample of two builds with brick@master and brick@optimize-output-extraction:

# ON MASTER BRANCH
12.45s - build of experimental/whatsthecarbonfootprintof.com
13.57s - build of experimental/whatsthecarbonfootprintof.com

# ON OPTIMIZE-OUTPUT-EXTRACTION BRANCH
13.44s - build of experimental/whatsthecarbonfootprintof.com
12.85s - build of experimental/whatsthecarbonfootprintof.com

I hope that #46 will show a serious performance increase. That PR also includes this branch, so you can test both if you want to. Note that #46 will only work after the build has run once (as images needs to be updated with the dependency hash as part of their Dockerfile).

I'll test with #46 :)

Copy link
Member

@madsnedergaard madsnedergaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's still room for improving the output extraction - for one of our packages (bloom) it still takes 4s on a noop run, but this seems like a good first step and then we can more easily measure improvements of it going forward (due to #46 removing all the "noise"). So LGTM! 💪

@skovhus skovhus merged commit fb0eb9b into master Sep 1, 2020
@skovhus skovhus deleted the optimize-output-extraction branch September 1, 2020 08:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use non-root user when outputting from docker containers update brick to use --output flag from docker
2 participants