Actively monitor nodes status for DaemonSet deployments #881

wayt · 2022-03-30T15:06:27Z

This PR is a completion of the original implementation in #580

In the original implementation, the list of nodes is fetched when the deployment start and then is fixed. if one of the nodes goes away / becomes unschedulable while daemonsets are rolling out, it keeps waiting indefinitely.

This PR aims to make the node list dynamic.

It also now filters out unschedulable / not ready nodes.

This should replace the required-rollout PR #878

KnVerey

I too prefer this over #878 -- Nice work finding a solution that makes Krane success detection more robust instead of less!

KnVerey · 2022-03-30T18:26:16Z

lib/krane/kubernetes_resource/daemon_set.rb

    def find_nodes(cache)
      all_nodes = cache.get_all(Node.kind)
+      all_nodes = all_nodes.select { |node_data| node_data.dig('spec', 'unschedulable').to_s.downcase != 'true' }


We're looping over the same data three times here. Can we do it just once instead? My ruby might be rusty, but something like:

def find_nodes(cache) all_nodes = cache.get_all(Node.kind) all_nodes.each_with_object([]) do |node_data, relevant_nodes| cond = node_data.dig('status', 'conditions').find { |c| c['type'].downcase == 'ready' } ready = cond.nil? ? true : cond['status'].downcase == 'true' schedulable = node_data.dig('spec', 'unschedulable').to_s.downcase != 'true' if schedulable && ready relevant_nodes << Node.new(definition: node_data) end end end

Your ruby is less rusty that mine apparently.

We definitely can.

KnVerey · 2022-03-30T18:27:27Z

test/unit/krane/kubernetes_resource/daemon_set_test.rb

+    ds = build_synced_ds(ds_template: ds_template, pod_templates: pod_templates, node_templates: node_templates)
+    refute_predicate(ds, :deploy_succeeded?)
+
+    node_added_status = {


This is the same as above and we're not adding a node in this section. Unneeded/misnamed var?

That's expected, with have the same Pod status, but now one of the node became unschedulable. So it should not block anymore.

node_templates[2]['spec']['unschedulable'] = 'true'

KnVerey · 2022-03-30T18:29:34Z

test/unit/krane/kubernetes_resource/daemon_set_test.rb

+      "numberReady": 2,
+    }
+    ds_template = build_ds_template(filename: 'daemon_set.yml', status: node_added_status)
+    pod_templates = load_fixtures(filenames: ['daemon_set_pods.yml'])


similarly, do we need to reload these for some reason or could we just use the ones above?

KnVerey · 2022-03-30T18:30:31Z

test/unit/krane/kubernetes_resource/daemon_set_test.rb

+      "numberReady": 2,
+    }
+    ds_template = build_ds_template(filename: 'daemon_set.yml', status: node_added_status)
+    pod_templates = load_fixtures(filenames: ['daemon_set_pods.yml'])


same question as above about these three variables/loads

wayt · 2022-03-30T18:54:44Z

@KnVerey thanks for your feedbacks, I've update the PR accordingly.

I'm still tracking down the failing test, not sure yet if it's related or not.

KnVerey

LGTM. Test failure looks like an unrelated flake.

wayt · 2022-03-31T14:05:23Z

I tested it locally with a ci-us-west1-1 / node-local-dns, works great!

wayt requested review from dalehamel and KnVerey March 30, 2022 15:06

wayt requested a review from a team as a code owner March 30, 2022 15:06

wayt requested review from JamesOwenHall, peiranliushop, timothysmith0609 and guidowb and removed request for a team March 30, 2022 15:06

wayt changed the title ~~WIP daemonset deploy~~ Actively monitor nodes status for DaemonSet deployments Mar 30, 2022

Actively monitor nodes status for DaemonSet deployments

7b6ba59

wayt force-pushed the daemonset-deploy-fix branch from 0be5d4c to 7b6ba59 Compare March 30, 2022 15:16

KnVerey reviewed Mar 30, 2022

View reviewed changes

review remarks

c0819e4

KnVerey approved these changes Mar 30, 2022

View reviewed changes

wayt merged commit 78a0855 into master Mar 31, 2022

wayt mentioned this pull request Mar 31, 2022

Release 2.4.3 #882

Merged

shopify-shipit bot temporarily deployed to rubygems March 31, 2022 16:12 Inactive

wayt deleted the daemonset-deploy-fix branch March 31, 2022 16:59

wayt mentioned this pull request Apr 1, 2022

Ignore Evicted Pods in DaemonSet deployments #883

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actively monitor nodes status for DaemonSet deployments #881

Actively monitor nodes status for DaemonSet deployments #881

wayt commented Mar 30, 2022

KnVerey left a comment

KnVerey Mar 30, 2022

wayt Mar 30, 2022

KnVerey Mar 30, 2022

wayt Mar 30, 2022

KnVerey Mar 30, 2022

KnVerey Mar 30, 2022

wayt commented Mar 30, 2022

KnVerey left a comment

wayt commented Mar 31, 2022

Actively monitor nodes status for DaemonSet deployments #881

Actively monitor nodes status for DaemonSet deployments #881

Conversation

wayt commented Mar 30, 2022

KnVerey left a comment

Choose a reason for hiding this comment

KnVerey Mar 30, 2022

Choose a reason for hiding this comment

wayt Mar 30, 2022

Choose a reason for hiding this comment

KnVerey Mar 30, 2022

Choose a reason for hiding this comment

wayt Mar 30, 2022

Choose a reason for hiding this comment

KnVerey Mar 30, 2022

Choose a reason for hiding this comment

KnVerey Mar 30, 2022

Choose a reason for hiding this comment

wayt commented Mar 30, 2022

KnVerey left a comment

Choose a reason for hiding this comment

wayt commented Mar 31, 2022