Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deletion validation for IPPool webhook #137

Closed

Conversation

haijianyang
Copy link

The webhook should reject IPPool deletion if any IP has been allocated

Fixes #135

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign furkatgofurov7 after the PR has been reviewed.
You can assign the PR to them by writing /assign @furkatgofurov7 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 15, 2022
@metal3-io-bot
Copy link
Contributor

Hi @haijianyang. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 15, 2022
@haijianyang
Copy link
Author

@furkatgofurov7

@mboukhalfa
Copy link
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 16, 2022
@haijianyang
Copy link
Author

haijianyang commented Sep 21, 2022

@furkatgofurov7
Copy link
Member

/test-ubuntu-integration-main

@jessehu
Copy link

jessehu commented Sep 23, 2022

test case needs to be updated as well.
admission webhook \\\"validation.ippool.ipam.metal3.io\\\" denied the request: IPPool.ipam.metal3.io \\\"provisioning-pool\\\" is forbidden: IPPool cannot be deleted because it is in use\"", "Deleting IPPool=\"provisioning-pool\" Namespace=\"metal3\"", "Deleting Metal3MachineTemplate=\"test1-controlplane\" Namespace=\"metal3\"", "Deleting Metal3Cluster=\"test1\" Namespace=\"metal3\"", "Deleting Metal3DataTemplate=\"test1-controlplane-template\" Namespace=\"metal3\"", "Deleting Metal3MachineTemplate=\"test1-workers\" Namespace=\"metal3\"", "Deleting KubeadmControlPlane=\"test1\" Namespace=\"metal3\"", "Deleting Metal3DataTemplate=\"test1-workers-template\" Namespace=\"metal3\"", "Error: [action failed after 10 attempts: error deleting \"ipam.metal3.io/v1alpha1, Kind=IPPool\" metal3/baremetalv4-pool: admission webhook \"validation.ippool.ipam.metal3.io\" denied the request: IPPool.ipam.metal3.io \"baremetalv4-pool\" is forbidden: IPPool cannot be deleted because it is in use, action failed after 10 attempts: error deleting \"ipam.metal3.io/v1alpha1, Kind=IPPool\" metal3/provisioning-pool: admission webhook \"validation.ippool.ipam.metal3.io\" denied the request: IPPool.ipam.metal3.io \"provisioning-pool\" is forbidden: IPPool cannot be deleted because it is in use]"], "stdout": "", "stdout_lines": []}

@haijianyang
Copy link
Author

E2E tests of cluster-api-provider-metal3 use clusterName to bind IPPool, IPPoolManager will set CAPI cluster as the owner of IPPool. So when CAPI cluster was deleted, kubernetes will try to delete the bound IPPool (provisioning-pool and baremetalv4-pool). However, the resources associated with the IPPool have not been released, so the ValidateDelete
webhook is blocked.

The use of clusterName may cause the CAPI cluster and IPPool to depend on each other. Do we have any solutions.

# IPPools of cluster-api-provider-metal3 e2e test
---
apiVersion: ipam.metal3.io/v1alpha1
kind: IPPool
metadata:
  name: provisioning-pool
  namespace: ${NAMESPACE}
spec:
  clusterName: ${CLUSTER_NAME}
  namePrefix: ${CLUSTER_NAME}-prov
  pools:
    - start: ${PROVISIONING_POOL_RANGE_START}
      end: ${PROVISIONING_POOL_RANGE_END}
  prefix: ${PROVISIONING_CIDR}
---
apiVersion: ipam.metal3.io/v1alpha1
kind: IPPool
metadata:
  name: baremetalv4-pool
  namespace: ${NAMESPACE}
spec:
  clusterName: ${CLUSTER_NAME}
  namePrefix: ${CLUSTER_NAME}-bmv4
  pools:
    - start: ${BAREMETALV4_POOL_RANGE_START}
      end: ${BAREMETALV4_POOL_RANGE_END}
  prefix: ${EXTERNAL_SUBNET_V4_PREFIX}
  gateway: ${EXTERNAL_SUBNET_V4_HOST}
// ip-address-manager
func (m *IPPoolManager) SetClusterOwnerRef(cluster *clusterv1.Cluster) error {
	if cluster == nil {
		return errors.New("Missing cluster")
	}
	// Verify that the owner reference is there, if not add it and update object,
	// if error requeue.
	_, err := findOwnerRefFromList(m.IPPool.OwnerReferences,
		cluster.TypeMeta, cluster.ObjectMeta)
	if err != nil {
		if ok := errors.As(err, &notFoundErr); !ok {
			return err
		}
		m.IPPool.OwnerReferences, err = setOwnerRefInList(
			m.IPPool.OwnerReferences, false, cluster.TypeMeta,
			cluster.ObjectMeta,
		)
		if err != nil {
			return err
		}
	}
	return nil
}

@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2022
@jessehu
Copy link

jessehu commented Jan 19, 2023

@furkatgofurov7 https://jenkins.nordix.org/job/metal3_ipam_main_integration_test_ubuntu/32/ returns 404. Could you help check the cause please?

@furkatgofurov7
Copy link
Member

/test-ubuntu-integration-main

@furkatgofurov7
Copy link
Member

@furkatgofurov7 https://jenkins.nordix.org/job/metal3_ipam_main_integration_test_ubuntu/32/ returns 404. Could you help check the cause please?

hey, looks like we have some problems reaching Jenkins in the CI, let me see if I can find out the cause

@Rozzii
Copy link
Member

Rozzii commented Jan 19, 2023

/test-ubuntu-integration-main

@Rozzii
Copy link
Member

Rozzii commented Jan 19, 2023

/retest

@Rozzii
Copy link
Member

Rozzii commented Jan 19, 2023

@jessehu build 32 doesn't exists anymore so that is why you get the 404 error, oldest build log is build 65 at the moment.
@furkatgofurov7 the bigger issue is that I can't retrigger the /test-ubuntu-integration-main for this PR.

@furkatgofurov7
Copy link
Member

@jessehu build 32 doesn't exists anymore so that is why you get the 404 error, oldest build log is build 65 at the moment. @furkatgofurov7 the bigger issue is that I can't retrigger the /test-ubuntu-integration-main for this PR.

I believe Jenkins workers back up now.

/test-ubuntu-integration-main

@andrew-est
Copy link

/test-ubuntu-integration-main

2 similar comments
@andrew-est
Copy link

/test-ubuntu-integration-main

@andrew-est
Copy link

/test-ubuntu-integration-main

@furkatgofurov7
Copy link
Member

@haijianyang can you pleases rebase the PR?

@furkatgofurov7 furkatgofurov7 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2023
@furkatgofurov7
Copy link
Member

/test-ubuntu-integration-main

1 similar comment
@furkatgofurov7
Copy link
Member

/test-ubuntu-integration-main

@mboukhalfa
Copy link
Member

/test-ubuntu-integration-main

@haijianyang
Copy link
Author

haijianyang commented Feb 1, 2023

logs-jenkins-metal3_ipam_main_integration_test_ubuntu-79/k8s_management_cluster/kube-system/kube-apiserver-kind-control-plane/kube-apiserver/stdout.log

W0201 07:16:11.110250       1 dispatcher.go:195] rejected by webhook "validation.ippool.ipam.metal3.io": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"admission webhook \"validation.ippool.ipam.metal3.io\" denied the request: IPPool.ipam.metal3.io \"provisioning-pool\" is forbidden: IPPool cannot be deleted because it is in use", Reason:"Forbidden", Details:(*v1.StatusDetails)(0xc0165419e0), Code:403}}
W0201 07:16:11.798602       1 dispatcher.go:195] rejected by webhook "validation.ippool.ipam.metal3.io": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"admission webhook \"validation.ippool.ipam.metal3.io\" denied the request: IPPool.ipam.metal3.io \"provisioning-pool\" is forbidden: IPPool cannot be deleted because it is in use", Reason:"Forbidden", Details:(*v1.StatusDetails)(0xc016c382a0), Code:403}}
W0201 07:16:12.725203       1 dispatcher.go:195] rejected by webhook "validation.ippool.ipam.metal3.io": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"admission webhook \"validation.ippool.ipam.metal3.io\" denied the request: IPPool.ipam.metal3.io \"provisioning-pool\" is forbidden: IPPool cannot be deleted because it is in use", Reason:"Forbidden", Details:(*v1.StatusDetails)(0xc0163e7680), Code:403}}

https://jenkins.nordix.org/job/metal3_ipam_main_integration_test_ubuntu/79/consoleFull

Kind=IPPool\\\" metal3/provisioning-pool: admission webhook \\\"validation.ippool.ipam.metal3.io\\\" denied the request: IPPool.ipam.metal3.io \\\"provisioning-pool\\\" is forbidden: IPPool cannot be deleted because it is in use\"\nDeleting IPPool=\"provisioning-pool\" Namespace=\"metal3\"\nRetrying with backoff Cause=\"error deleting \\\"ipam.metal3.io/v1alpha1

@mboukhalfa
Copy link
Member

logs-jenkins-metal3_ipam_main_integration_test_ubuntu-79/k8s_management_cluster/kube-system/kube-apiserver-kind-control-plane/kube-apiserver/stdout.log

W0201 07:16:11.110250       1 dispatcher.go:195] rejected by webhook "validation.ippool.ipam.metal3.io": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"admission webhook \"validation.ippool.ipam.metal3.io\" denied the request: IPPool.ipam.metal3.io \"provisioning-pool\" is forbidden: IPPool cannot be deleted because it is in use", Reason:"Forbidden", Details:(*v1.StatusDetails)(0xc0165419e0), Code:403}}
W0201 07:16:11.798602       1 dispatcher.go:195] rejected by webhook "validation.ippool.ipam.metal3.io": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"admission webhook \"validation.ippool.ipam.metal3.io\" denied the request: IPPool.ipam.metal3.io \"provisioning-pool\" is forbidden: IPPool cannot be deleted because it is in use", Reason:"Forbidden", Details:(*v1.StatusDetails)(0xc016c382a0), Code:403}}
W0201 07:16:12.725203       1 dispatcher.go:195] rejected by webhook "validation.ippool.ipam.metal3.io": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"admission webhook \"validation.ippool.ipam.metal3.io\" denied the request: IPPool.ipam.metal3.io \"provisioning-pool\" is forbidden: IPPool cannot be deleted because it is in use", Reason:"Forbidden", Details:(*v1.StatusDetails)(0xc0163e7680), Code:403}}

it seems failing the move step https://github.com/metal3-io/metal3-dev-env/blob/df300e37653cc29e94c20dd06debd014b6a1854c/tests/roles/run_tests/tasks/move.yml#L171

@haijianyang
Copy link
Author

haijianyang commented Feb 1, 2023

clusterctl move try to delete the IPPool that have not released IP resources.
Can we release the IP resources before clusterctl move.

@furkatgofurov7
Copy link
Member

furkatgofurov7 commented Feb 1, 2023

clusterctl move try to delete the IPPool that have not released IP resources. Can we release the IP resources before clusterctl move.

cc @kashifest

@haijianyang move operation is a special case, and we always check if pivoting is succesful in every CI run. Technically, when move is performed, all provider specific objects are deleted from the source cluster and recreated in the target cluster, meaning we were always able to delete the object (no webhook validation check). But now since the validation is in place, it won't be able to do a move properly, hence fails.

Two options I could think of could be:

  1. using just finalizers instead of webhook
  2. adding a custom logic in the validation webhook, checking "is move operation performed" action and allow the deletion if that is the case.

@mboukhalfa
Copy link
Member

mboukhalfa commented Feb 1, 2023

Before deletion in clusterctl move resources will get paused using the cluster.x-k8s.io/paused annotation if you can check the paused annotation existence from the webhook and just skip the restriction in that case

@haijianyang
Copy link
Author

clusterctl move will set Cluster.Spec.Paused = true, the cause of the cluster pause is not necessarily by clusterctl move.
If the user manually pauses the cluster, should the IPPool be deleted without releasing IP resources at this time?

Whether it is possible to add a field(spec、annotation、label) to IPPool to describe whether to allow IPPool to be deleted without releasing IP resources?

For example:
By default, it is not allowed to delete IPPools that do not release IP resources.
When adding allow-force-delete: true, it means that IPPools that have not released IP resources can be deleted.

apiVersion: ipam.metal3.io/v1alpha1
kind: IPPool
metadata:
  name: ip-pool-cluster
  namespace: default
  annotations:
    ipclaim.ipam.metal3.io/allow-force-delete: 'true'
spec:
  pools:
    - start: 10.255.160.1
      end: 10.255.160.1
  prefix: 16
  gateway: 10.255.0.1
  namePrefix: "ip-pool-cluster"
---

@furkatgofurov7
Copy link
Member

/test-centos-e2e-integration-main

@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2023
@metal3-io-bot
Copy link
Contributor

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

@metal3-io-bot
Copy link
Contributor

@metal3-io-bot: Closed this PR.

In response to this:

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot
Copy link
Contributor

Can one of the admins verify this patch?

1 similar comment
@metal3-io-bot
Copy link
Contributor

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The webhook should reject IPPool deletion if any IP has been allocated
7 participants