Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker datasource registryUrls #10135

Closed
rarkins opened this issue May 25, 2021 · 18 comments
Closed

docker datasource registryUrls #10135

rarkins opened this issue May 25, 2021 · 18 comments
Labels
datasource:docker priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others status:requirements Full requirements are not yet known, so implementation should not be started type:feature Feature (new functionality)

Comments

@rarkins
Copy link
Collaborator

rarkins commented May 25, 2021

What would you like Renovate to be able to do?

Improved registryUrls support for changing the lookup of images.

Did you already have any implementation ideas?

Knowing what is the image name and what is the path is challenging.

DockerHub has a two-part namespace, e.g. <org>/<image>. However for library/<image>, it is also allowed to have simply <image>.

I thought about indexing the list of library images - there are less than 200 - however that doesn't really help because even though node is a library image, it's also possible to have foo/node too.

I think the solution for this is:

  • If you want Renovate to be able to change registryUrls for a docker library image then you need to use the full library/<image> in the path. e.g. registry.mygitlab.test/jobs/library/node:14 and not registry.mygitlab.test/jobs/node:14
  • We can add a special rule to set depName=image,lookupName=library/image if the extracted depName starts with library/node

The next thing we need to do is within the docker datasource: join the registryUrl with lookupName, then treat the origin + /v2 as the "Docker registry" and the rest as the image name. e.g. the Docker registry above would be https://registry.mygitlab.test and the image name would be jobs/library/node.

Once agreement is reached, I will resume and update #10118

@rarkins rarkins added type:feature Feature (new functionality) datasource:docker status:requirements Full requirements are not yet known, so implementation should not be started priority-5-triage labels May 25, 2021
@viceice
Copy link
Member

viceice commented May 25, 2021

Gitlab supports multiple nested groups, so an image repo can be have more that three parts, eg registry.gitlab.test/group/subgroup/subsubgroup/image.

I think we should treat this image as registryUrl=https://registry.gitlab.test and depName=group/subgroup/subsubgroup/image.
This are normal images stored in gitlab container registry.

GitLab also supports a dependency proxy for docker hub images.
They look like gitlab.example.com/groupname/dependency_proxy/containers/alpine:latest or ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/alpine.
ref #9958

If i configure a hub proxy within harbor a image looks like harbor.domain.com/docker-hub/renovate/renovate:23.84.6-slim where harbor.domain.com is the host and docker-hub is the proxy project.

Those registry are pull through proxies, so you can pill any known tag or digest, but the tags/list will only list already pulled tags. They will only updated if you explicit pull a tag, then the proxy will do a HEAD request to check if the tag you pull is up-to-date`
ref #9004

So we need a way to allow users to override the registry host and prefix, so renovate will lookup:

  • harbor.domain.com/docker-hub/renovate/renovate:latest as https://index.docker.io/v2/renovate/renovate/manifests/latest
  • gitlab.example.com/groupname/dependency_proxy/containers/alpine:latest as https://index.docker.io/v2/library/alpine/manifests/latest

But we need to make sure renovate can also lookup (private or public) images from:

  • registry.mygitlab.test/jobs/docker/base:latest as https://registry.mygitlab.test/v2/jobs/docker/base/manifests/latest
  • harbor.domain.com/project/docker/image:latest as https://harbor.domain.com/v2/project/docker/image/manifests/latest
  • ghcr.io/containerbase/buildpack:1.10.0 as https://ghcr.io/v2/containerbase/buildpack/manifests/1.10.0

@rarkins
Copy link
Collaborator Author

rarkins commented May 25, 2021

harbor.domain.com/docker-hub/renovate/renovate:latest as https://index.docker.io/v2/renovate/renovate/manifests/latest will work with my proposal.

gitlab.example.com/groupname/dependency_proxy/containers/alpine:latest as https://index.docker.io/v2/library/alpine/manifests/latest will work if it's instead gitlab.example.com/groupname/dependency_proxy/containers/library/alpine:latest.

And the others should work "as is" so long as the packageRules to change registryUrls do not apply to them.

@viceice
Copy link
Member

viceice commented May 25, 2021

OK, then i'm fine with it. So we extract gitlab.example.com/groupname/dependency_proxy/containers/library/alpine:latest as

  • registryUrls=https://gitlab.example.com/groupname/dependency_proxy/container
  • depName=alpine
  • lookupName=library/alpine

And when combined with registryUrls: [ "https://some.registry.com/project" ] package rule renovate would resolve to:
https://some.registry.com/v2/project/library/alpine ?


What about ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/alpine 🤔 How should we handle it, as we will find it more and more in gitlab repos.
Maybe we can somply remove the specific ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/ while extracting?

@rarkins
Copy link
Collaborator Author

rarkins commented May 25, 2021

Close, not sure if it was a typo but it's:

gitlab.example.com/groupname/dependency_proxy/containers/library/alpine:latest as

  • registryUrls=https://gitlab.example.com/groupname/dependency_proxy/container
  • depName=alpine
  • lookupName=library/alpine

And it would resolve the same as you expect.

I think we can add a special rule for ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX} considering how common it likely is.

@viceice
Copy link
Member

viceice commented May 25, 2021

Yes, was a typo, fixed it

@rarkins
Copy link
Collaborator Author

rarkins commented May 25, 2021

Something I still need to resolve is:

  • We know some users want to override their dependency proxy and use Docker Hub instead, BUT
  • There will be regex manager scenarios where only the depName/lookupName is provided without registryUrls and we don't want to override those to use the default Docker Hub registryUrls

I think the following will work: if the lookupName contains a host then ignore registryUrls (regardless of whether they're default or not).

@HonkingGoose HonkingGoose added priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others and removed priority-5-triage labels Jun 3, 2021
@rv2673
Copy link
Contributor

rv2673 commented Jan 6, 2022

@rarkins Any progress on this issue?
Don't know enough about the internals of renovate. But what I read from the linked issues and PR's.

Would the following also work to make the override scenarios possible?:

  • With package rules make matching lookupName possible with regex, together with something similar to extractVersion an or extractVersionTemplate. This would allow overriding/modify lookup name(and registry by adding or replacing it) before it is passed to docker(or other datasource).

This would fix this issue and would be opt-in.

  • Make it possible in manager to allow and pass trough variables in lookup names (with opt in configuration for manager or something) (As not to ignore ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/ or some other variable). Together with the previous change and configuration it would allow stripping/replacing this variable.

This would fix the linked issue #9958 and also would be opt-in.

@rarkins
Copy link
Collaborator Author

rarkins commented Jan 14, 2022

I got a bit slowed up on this one and started doubting my design decision. I'll try to summarize here.

Ultimately, we want maximum flexibility when it comes to controlling which docker registry we look image versions up from. I've seen for example both these cases:

  • People wanting to use a Docker Hub mirror or proxy, to avoid rate limiting
  • People wanting to use Docker Hub instead of their registry, because their registry doesn't fully support the API e.g. full listing of tags

Therefore we can't make any assumptions other than:

  • Images may or may not have a host prefixed
  • The host needs to be changeable via registryUrls

The way registryUrls works today is that it applies only to short image names such as node or whitesource/renovate. Even that way it has to do some hacky try/fail/retry to check whether library/ needs prepending. It does not apply the registryUrls to fully qualified images like quay.io/foo/bar.

One challenge we have - and I would appreciate help solving it - is knowing how to split up fully qualified images into the registry and the image name. e.g. quay.io and foo/bar is easy, but it's not easy once the path has 3 or more sections. @viceice pointed out examples above including registry.gitlab.test/group/subgroup/subsubgroup/image

So here's the challenge:

  • Can we look at an image like registry.gitlab.test/group/subgroup/subsubgroup/image and automatically split it between registry and image?
  • If not.. can we elegantly determine the registry part vs image part?
  • If we are to "guess and retry", should it be done in the manager or the datasource?

We should also keep in mind that Docker images are not the only type of dependency with this format, e.g. Go Modules are like foo.bar/alpha/beta. However we don't have the same need to change the host for them.

An advantage to guessing the split in the manager extract process is enabling the user to be able to "fix" it using packageRules if we get it wrong. But we may need the ability to change lookupName too, which we don't today support.

Stepping back, what if we simply made a breaking change to how our docker datasource handles registryUrls? Instead of ignoring them for fully qualified ones, we use them in all scenarios. It would be the user's responsibility to control which ones it applies to by using matchPackageNames or more likely matchPackagePatterns. It might mean that we need to guess and retry in the datasource, but that's perhaps ok?

In this scenario we'd never extract registryUrl from an image string.

@viceice
Copy link
Member

viceice commented Jan 14, 2022

In this scenario we'd never extract registryUrl from an image string.

Don't like that, as I've a lot images who are not availabe on hub, so configure hostrules for all of them is painful.

So i would suggest

  • extract the hostname only as registry by default and leave the rest as depName / lookupName
  • prepend lookupName with library for short images like node

If the user overrides the registry url, it get's prepended to lookupName.

  • renovate/node + https://my.registry.com/some/sub/path = https://my.registry.com/v2/some/sub/path/renovate/node/tags/list
  • node + https://my.registry.com/some/sub/path = https://my.registry.com/v2/some/sub/path/library/node/tags/list
  • quay.io/renovate/node + https://my.registry.com/some/sub/path = https://my.registry.com/v2/some/sub/path/renovate/node/tags/list
  • my.registry.com/some/sub/path/renovate/node + https://docker.io = https://index.docker.io/v2/some/sub/path/renovate/node/tags/list

So this works for most cases.

If someone needs to change the existing prefix too, he can use a regex manager.

    {
      "fileMatch": ["^Dockerfile$"],
      "matchStrings": [
        "FROM my\\.registry\\.com\\/some\\/sub\\/path\\/(?<depName>[a-z0-9.\\/-]+)(?::(?<currentValue>[a-z0-9.-]+))?(?:@(?<currentDigest>sha256:[a-f0-9]+))?",
      ],
      "datasourceTemplate": "docker",
      "versioningTemplate": "docker"
      "registryUrlTemplate": "https://docker.io"

this will do the special case:

  • my.registry.com/some/sub/path/renovate/node = https://index.docker.io/v2/renovate/node/tags/list

@rarkins
Copy link
Collaborator Author

rarkins commented Jan 14, 2022

In this scenario we'd never extract registryUrl from an image string.

Don't like that, as I've a lot images who are not availabe on hub, so configure hostrules for all of them is painful.

I didn't phrase that well. What I meant was that we'd never split registryUrl from image strings as part of the manager (essentially like we do today).

@viceice
Copy link
Member

viceice commented Jan 14, 2022

Ok

@LeoniePhiline
Copy link
Contributor

LeoniePhiline commented May 26, 2022

Would it be possible to:

  • fetch <image-name>/blobs and <image-name>/manifests (getConfigDigest() / getManifestResponse()) through GitLab dependency proxy,
  • but at the same time fetch <image-name>/tags/list (getDockerApiTags()) through docker hub at index.docker.io?

My motivation here is that requests for <image-name>/manifests are counted towards the docker hub rate limit, so it makes sense to use GitLab dependency proxy.

However, the proxy does not support <image-name>/tags/list, as seen at: https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/path_regex.rb#L265

Result:
Requesting e.q. GET https://gitlab.<self-hosted>.io/v2/<group>/dependency_proxy/containers/<image-name>/tags/list?n=10000 results in HTTP 404.

Therefore, setting the dependency proxy URL as registryHost currently does not work:

  • <image-name>/tags/list are pulled through registryHost - which needs to be docker hub, as GitLab does not provide this endpoint
  • therefore, all requests for <image-name>/manifests (which would be supported by the GitLab proxy) also need to go through docker hub (because we can currently only have one single registryHost), causing HTTP 429 Too Many Requests from docker hub.

Is it too hacky to bypass GitLab dependency proxy (= override registryHost with 'index.docker.io') in getDockerApiTags() while keeping the proxy for all other requests? Would it be reasonable to implement such an override?

Ideally, GitLab dependency proxy should add support for tags/list, but renovate is a much more dynamic project.

@rarkins
Copy link
Collaborator Author

rarkins commented May 28, 2022

I think whether it's too hacky or not depends on both:

  • How complicated would it need to make our config to allow it (e.g. we probably can't do this with registryUrls alone)
  • How complicated would it make our internal code to implement it

Right now I'm thinking it's probably too complicated to be justified, but I'm definitely not sure about it.

@rarkins
Copy link
Collaborator Author

rarkins commented May 28, 2022

I've been wanting to revisit this topic for a while.

The use cases we're hoping to cover include:

  • non-qualified docker images (e.g. node) which would typically point to Docker Hub, but the user needs us to query their private registry instead (e.g. an Artifactory)
  • fully-qualified images (e.g. internal.registry.company/node) where the user needs us to query Docker Hub instead, e.g. because the internal registry can't support tag listing

We could extract Docker dependencies as either:

  • Separate name (packageName) / registry (registryUrl)
  • Fully qualified with combined registry + name where available
  • Allowing either type above

The challenge with supporting only the separate approach is that we don't always know where the registry stops and the package name begins. e.g. example of registry.gitlab.test/group/subgroup/subsubgroup/image we don't know if the package name should be image, subsubgroup/image, subgroup/subsubgroup/image, etc. It only matters to us in the scenario where someone changes the registry using packageRules, so in theory we could force them to fix the splitting too if they need.

Let's say the extracted value is registry.gitlab.test/group/subgroup/subsubgroup/node and someone configures registryUrls=[https://index.docker.io]. We should map that to index.docker.io + node. But if the extracted value is registry.gitlab.test/group/subgroup/subsubgroup/renovate/renovate then we should map it to index.docker.io + renovate/renovate. We can't magically know this on our own, so we need to either:

  • try/fail/try in the datasource, one chunk at a time, or
  • try only the first one (e.g. node or renovate) then give up (it would be an unsupported scenario)
  • reuse aliases, e.g. to map each registry.gitlab.test/group/subgroup/subsubgroup to index.docker.io

If we went with the aliases approach then we'd be keeping our existing datasource logic, i.e. allowing either type of fully qualified or name + registry extraction, and ignoring custom registryUrls if the packageName is fully qualified.

@LeoniePhiline
Copy link
Contributor

It does sound like a valuable feature. Is there a way to know how many users would benefit from that?

@sclorng
Copy link

sclorng commented Aug 5, 2022

Hi,

I think that it should be assumed that finding the package name from the registry name is not possible. If using aliases, you'd better iterate on each trying to replace if it match.

In our case, we are using jFrog artifactory as an on premise registry mirror. It does mirroring with DNS only (so it preserve the original path). But we have mirrors of theses mirrors that is only accessible from the deployment environnement. So, in our config files, we must set the registry name of the deployment environnement but renovate should replace it with the first one.

I saw in docker manager that when the dep come from gitlabci manager, the registryAliases are iterated to do a simple replace when it starts with.
We don't need something more complex but the behavior should be the same wherever which manager extract the dependency.

@piccit
Copy link

piccit commented Jun 17, 2023

I saw in docker manager that when the dep come from gitlabci manager, the registryAliases are iterated to do a simple replace when it starts with.
We don't need something more complex but the behavior should be the same wherever which manager extract the dependency.

Just to add to this, I've run into a similar situation where I need to set a registryAlias for a docker registry. This mostly works for my use case except that it doesn't work in helm-values b/c helm-values manager doesn't pass along any registryAliases in the call to getDep (whereas other managers do)

@viceice
Copy link
Member

viceice commented Jul 2, 2023

I saw in docker manager that when the dep come from gitlabci manager, the registryAliases are iterated to do a simple replace when it starts with.
We don't need something more complex but the behavior should be the same wherever which manager extract the dependency.

Just to add to this, I've run into a similar situation where I need to set a registryAlias for a docker registry. This mostly works for my use case except that it doesn't work in helm-values b/c helm-values manager doesn't pass along any registryAliases in the call to getDep (whereas other managers do)

please open a new feature request discussion for this. I'll probably add that next week

@renovatebot renovatebot locked and limited conversation to collaborators Oct 1, 2023
@rarkins rarkins converted this issue into discussion #24894 Oct 1, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
datasource:docker priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others status:requirements Full requirements are not yet known, so implementation should not be started type:feature Feature (new functionality)
Projects
None yet
7 participants