-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch nightly docker image invalidated layers #125862
Comments
We have this workflow that we introduced recently : https://github.com/pytorch/builder/actions/runs/9023482505/job/24795381627 that runs smoke tests. I see we had issue last night, looking into that |
Do we have a self-check to see if layers are not invalidated on every builds? |
That one is another bug: #125879 Here instead I am talking about invalidating common layers at every nightly rebuild/push/pull |
@bhack can you please specify more detailed with links your idea ? Are you referring to: https://docs.docker.com/build/cache/invalidation/ ? Maybe you have particular tool in mind we can look into ? |
Yes it is mainly:
https://docs.docker.com/guides/docker-concepts/building-images/using-the-build-cache/ So also if we have a single layer in the Dockefile that is going to be different every night we are going to invalidate all the other layers in the image (and also the cache of course). |
Yeah unfortunately the cache-ability of python packages within a docker container don't really make this an easy task. However I'd happily review a PR if you feel like there's an opportunity to make this better! As well, I think this might be more of a specialized workflow that doesn't represent what our average user might see so I think it might be hard for us to prioritize this as of right now. |
If we take e.g. today image
Are we going to produce an fresh new ~5GB layer every night with Other then the download time and bootstrap/provisioning lag this is really going to grown quickly any registry for derived images. |
@atalman I think that the main issue are What do you think about separating this monolithic layer so that it is not anymore a 5GB to re-download, store in the build cache and in the artifact registry every day? E.g. this could be a little bit too atomic, but I think we could find a balance what do you think? COPY --from=build /opt/conda/bin /opt/conda/bin
COPY --from=build /opt/conda/envs /opt/conda/envs
COPY --from=build /opt/conda/lib /opt/conda/lib
COPY --from=build /opt/conda/include /opt/conda/include
COPY --from=build /opt/conda/etc /opt/conda/etc
COPY --from=build /opt/conda/conda-meta /opt/conda/conda-meta
COPY --from=build /opt/conda/pkgs /opt/conda/pkgs |
If you have a local setup to build pytorch images you can easy analyze the conda layer with https://github.com/wagoodman/dive |
An average conda nightly upgrade 1.36GB. So I think that invalidating a 5GB layer every day is a lot of overhead package | build
---------------------------|-----------------
certifi-2024.6.2 | pyhd8ed1ab_0 157 KB conda-forge
pytorch-2.4.0.dev20240611 |py3.11_cuda12.4_cudnn9.1.0_0 1.34 GB pytorch-nightly
torchaudio-2.4.0.dev20240611| py311_cu124 6.4 MB pytorch-nightly
torchvision-0.19.0.dev20240611| py311_cu124 8.6 MB pytorch-nightly
------------------------------------------------------------
Total: 1.36 GB |
🐛 Describe the bug
Is there something that is invalidating Docker nightly image layers?
It seems that every time I am going to pull a new nightly image it is going to download all the heavy layers.
Versions
pytorch nightly
cc @seemethere @malfet @osalpekar @atalman
The text was updated successfully, but these errors were encountered: