Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More validation for "istioctl install" before beginning #20730

Closed
esnible opened this issue Jan 31, 2020 · 23 comments
Closed

More validation for "istioctl install" before beginning #20730

esnible opened this issue Jan 31, 2020 · 23 comments
Labels
area/environments/operator Issues related to Operator or installation area/user experience kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed

Comments

@esnible
Copy link
Contributor

esnible commented Jan 31, 2020

If I do ~/go/src/istio.io/istio/out/darwin_amd64/istioctl manifest apply --set values.global.sds.enabled=false --set values.global.imagePullPolicy=Occassionally the install proceeds but eventually fails with

The Deployment "istio-ingressgateway" is invalid: spec.template.spec.containers[0].imagePullPolicy: Unsupported value: "Occassionally": supported values: "Always", "IfNotPresent", "Never" (repeated 1 times)

(etc)

Perhaps the operator should validate Kubernetes settings. Perhaps use kubectl --dry-run on everything it intents to install before starting the real install?

@howardjohn
Copy link
Member

Perhaps use kubectl --dry-run on everything it intents to install before starting the real install?

I suggest against that, its going to be super slow. We can do static validation inside the operator codebase if feasible, but doubling the calls to api server isn't great. Install is already pretty slow

@shamsher31
Copy link
Member

We can do static validation inside the operator codebase if feasible

@esnible I have started looking into static validation for manifest apply. I will put my thoughts here by tomorrow.

@esnible
Copy link
Contributor Author

esnible commented Feb 14, 2020

Yesterday I found istioctl manifest generate --set installPackagePath=/tmp which is an example of an invalid field that is detected very differently from the problems that aren't detected until generated config is sent to Kubernetes.

@shamsher31
Copy link
Member

Here are my thoughts on how static validation will work.

I want to keep this as simple as possible to identify if the provided values in --set is available in valid values.

"imagePullPolicy": []string{"Always", "IfNotPresent", "Never"}

Here imagePullPolicy will have possible values "Always", "IfNotPresent", "Never". Anything else provided in imagePullPolicy will through invalid error and stop further execution.

I think we should perform validation as early as possible, probably just before yamlFromSetFlags function.

Keeping all the possible combinations in a simple string map.

        setValues := map[string]interface{}{
		"sds": map[string][]bool{
			"enabled":  []bool{true, false},
		},
		"imagePullPolicy": []string{"Always", "IfNotPresent", "Never"},
		"k8sIngress": map[string]interface{}{
			"enabled": []bool{true, false},
			"enableHttps": []bool{true, false},
			"gatewayName": []string{"ingressgateway"},
		},
		"mtls": map[string]interface{}{
			"auto": []bool{true, false},
			"enabled": []bool{true, false}, 
		},
		"controlPlaneSecurityEnabled": []bool{true, false},
		"telemetry": map[string]interface{}{
			"enabled": []bool{true, false},
		},
		"security": map[string]interface{}{
			"components": map[string]interface{}{
				"nodeAgent": map[string]interface{}{
					"enabled": []bool{true, false},
				},
			},
		},
	}
	
	fmt.Printf("%+v", setValues["k8sIngress"].(map[string]interface{})["enableHttps"])

By using map lookup to identify if the values are present in the map, thorws error if it is not present and stop execution to avoid K8 REST client calls.

@howardjohn @esnible Let me know what do you think.

@howardjohn
Copy link
Member

I think the bool ones we already have validation for? but other ones it makes sense to me

@esnible
Copy link
Contributor Author

esnible commented Feb 20, 2020

I did some quick-and-dirty performance measurements, passing the output of manifest generate to kubectl --dry-run and kubectl --server-dry-run. I was surprised that having the server validate added 30 seconds of realtime. Having kubectl validate added only 6 seconds, but it didn't catch my silly imagePullPolicy example.

Checking for valid values in fields such as imagePullPolicy isn't particularly helpful because these errors will be caught and helpful messages will expose them. I am more concerned about the example in #21048 about conflicting telemetry settings.

validate.CheckIstioOperatorSpec() should be extended to check for conflicting settings.

Checking should possibly receive as an additional parameter the overlay yaml, because if a user has explicitly specified an override it may need additional checking. (For example, if the user overrides the logging format, but logging isn't turned on, might deserve a warning).

@shamsher31
Copy link
Member

shamsher31 commented Feb 21, 2020

I was surprised that having the server validate added 30 seconds of realtime. Having kubectl validate added only 6 seconds

Interesting, I did try running static validation with initial draft version and it throws error almost instantly within 10ms if I pass invalid values.

but it didn't catch my silly imagePullPolicy example

I'm a big fan of catching bug as early as possible, That is when static validation will come into rescue.

Checking for valid values in fields such as imagePullPolicy isn't particularly helpful because these errors will be caught and helpful messages will expose them.

I agree we can only provide simple messages for invalid values. More validation with conflicting settings need to be handled separately with existing validations.

@esnible
Copy link
Contributor Author

esnible commented Feb 28, 2020

Related: #21473 and its PR #21644

@esnible
Copy link
Contributor Author

esnible commented Feb 28, 2020

Related: #21635

@esnible esnible mentioned this issue Mar 2, 2020
5 tasks
@ostromart
Copy link
Contributor

Adding my 2c to all this...
To validate k8s fields like imagePullPolicy, we shouldn't be mirroring k8s logic. There are just too many validations for us to be exactly 1:1 - at best we could cover a subset of validations at the cost of creating a significant maintenance burden going forward.
However, doing the validation before applying would be very useful. We could accomplish this in different ways:

  1. call kubectl --dry run before applying
  2. call kubectl --server-dry-run before applying
  3. integrate some of k8s library validations into our code.
    Option 3 would be nice if there were a good entry point into these libraries, but I have a feeling that these are structured to be used internally. It would be interesting to investigate this path. At the very least we can try to unmarshal into the k8s generated Go structs rather than our mirrored ones using the k8s unmarshaler, which would call at least some of their validations.

@esnible
Copy link
Contributor Author

esnible commented Mar 3, 2020

Question. https://preliminary.istio.io/docs/tasks/observability/distributed-tracing/overview/ says values.pilot.traceSampling valid values "are from 0.0 to 100.0". This is not checked -- the install just works. Should it be checked?

Does Pilot have an internal code API for the operator to ask about values.pilot.* settings?

@shamsher31
Copy link
Member

values.pilot.traceSampling valid values "are from 0.0 to 100.0". This is not checked -- the install just works. Should it be checked?

Make sense. We should verify that the user is not trying to set beyond 100.

@shamsher31
Copy link
Member

Hey @esnible ,
I'm wondering what is the flag name for inboundClusterStatName and outboundClusterStatName for MeshConfig because I'm unable to find any example to set these values in manifest apply.

Following is incorrect, inboundClusterStatName is not available under v1alpha1.GlobalConfig because it's a part of MeshConfig

$ istioctl manifest apply
   --set values.global.inboundClusterStatName="%SERVICE_FQDN%_%SERVICE_PORT%" 

Also as per the doc, I assume, following are the possible values for both.

[]string{"SERVICE", "SERVICE_FQDN", "SERVICE_PORT", "SERVICE_PORT_NAME", "SUBSET_NAME"}

Let me know if I'm missing something.

@shamsher31
Copy link
Member

@esnible @howardjohn Is there any example available for using Locality based load balancer settings using istioctl manifest apply --set command.

https://preliminary.istio.io/docs/reference/config/networking/destination-rule.html#LocalityLoadBalancerSetting

I tried but couldn't find any example in istio and istio.io repo.

There is validation is written for LocalityLoadBalancerSetting and I want to utilize this in static validation for manifest apply command.

@esnible
Copy link
Contributor Author

esnible commented Apr 3, 2020

@shamsher31 I see many values in MeshConfig that don't seem to be used during install. Checking.

@ostromart MANY of the fields in https://preliminary.istio.io/docs/reference/config/istio.mesh.v1alpha1.html#MeshConfig don't seem to make it into Istio MeshConfig. I searched for ALL of the fields in manifests/istio-control/istio-discovery/templates/configmap.yaml and did spot checks elsewhere in the code.

The following documented MeshConfig fields seem unknown to the operator: proxyListenPort, proxyHttpPort, defaultConfig.statsdUdpAddress, defaultConfig.customConfigFile,
defaultConfig.statNameLength, defaultConfig.proxyBootstrapTemplatePath, defaultConfig.interceptionMode,
defaultConfig.sds.*, defaultConfig.statusPort, defaultConfig.zipkinAddress, enableClientSidePolicyCheck,
configSources, defaultServiceExportTo, defaultVirtualServiceExportTo, defaultDestinationRuleExportTo,
rootNamespace, dnsRefreshRate, disableReportBatch, h2UpgradePolicy, inboundClusterStatName,
outboundClusterStatName, thriftConfig

If these fields are deprecated, please get them marked as DEPRECATED in the docs, similar to how defaultConfig.zipkinAddress is marked.

The following fields are present but hard-coded and thus are settable via IstioOperator: connectTimeout AKA defaultConfig.connectTimeout, defaultConfig.configPath, defaultConfig.binaryPath,
defaultConfig.serviceCluster, defaultConfig.drainDuration, defaultConfig.parentShutdownDuration,
defaultConfig.proxyAdminPort, defaultConfig.controlPlaneAuthPolicy

If these should not be settable perhaps the doc should flag them read-only?

@esnible
Copy link
Contributor Author

esnible commented Apr 3, 2020

@shamsher31 I also did not find any examples but I was able to get it working. Tested under prerelease 1.6; it might work in 1.5.1, probably won't in 1.5.0.

istioctl manifest generate \
   --set values.global.localityLbSetting.failover[0].from=us-east \
   --set values.global.localityLbSetting.failover[0].to=eu-west

and

istioctl manifest generate \
   --set values.global.localityLbSetting.distribute[0].from="us-central/*" \
   --set values.global.localityLbSetting.distribute[0].to."us-central1/*"=80 \
   --set values.global.localityLbSetting.distribute[0].to."us-central2/*"=20

@howardjohn
Copy link
Member

@esnible use --set meshConfig.FOO to set arbitrary values. I have deprecated all the special values that attempt to map to mesh config: #22626. The meshconfig values you have listed are NOT deprecated, we just had no way to configure them via the helm api. Now we just pass everything through so you can configure anything.

You may need to do --set values.meshConfig.FOO though for now due to another bug

@esnible
Copy link
Contributor Author

esnible commented Apr 3, 2020

@shamsher31 I checked and @howardjohn is correct.

Using pre-1.6.x, I was able to do istioctl manifest generate --set meshConfig.inboundClusterStatName=alice and also --set values.meshConfig.inboundClusterStatName=alice worked the same.

As John points out, due to another bug not all of the fields work without values..

@howardjohn
Copy link
Member

howardjohn commented Apr 3, 2020 via email

@shamsher31
Copy link
Member

@esnible @howardjohn I have opened PR to get initial feedback. It will be difficult to cover validation for each and every field at this moment. We can always iterate over and cover specific flags in more detail in subsequent PR's. Let me know what you think.

@howardjohn howardjohn added area/environments/operator Issues related to Operator or installation and removed area/environments/installer labels Jul 31, 2020
@howardjohn howardjohn added this to P2 in Prioritization Sep 4, 2020
@esnible esnible changed the title More validation for "istioctl manifest apply" before beginning More validation for "istioctl install" before beginning Oct 1, 2020
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Oct 5, 2020
@esnible esnible added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Oct 5, 2020
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Oct 5, 2020
@esnible
Copy link
Contributor Author

esnible commented Oct 5, 2020

Today I tried to install using Master with

istioctl install \
  --revision Canary \
  --manifests manifests/ \
  --set hub=gcr.io/istio-testing \
  --set tag=latest \
   && say installed || say failed

It failed (and got stuck) with

failed to create "Service/istio-system/istiod-Canary": Service "istiod-Canary" is invalid: metadata.name: Invalid value: "istiod-Canary": a DNS-1035 label must consist of lower case alphanumeric characters or '-', start with an alphabetic character, and end with an alphanumeric character (e.g. 'my-name',  or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?')
- Processing resources for Ingress gateways. Waiting for Deployment/istio-system/istio-ingressgateway                             

@esnible
Copy link
Contributor Author

esnible commented Mar 23, 2021

Another validation needed: --imagePullSecrets can be initialised with an empty string

@howardjohn
Copy link
Member

I think largely we have validation. Its an ever-present issue to add more and better validation, but no need to track this at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/environments/operator Issues related to Operator or installation area/user experience kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants