Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better load balancing of Envoys across Pilot instances #11181

Open
elevran opened this issue Jan 23, 2019 · 22 comments
Open

Better load balancing of Envoys across Pilot instances #11181

elevran opened this issue Jan 23, 2019 · 22 comments
Labels
area/networking area/perf and scalability kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed

Comments

@elevran
Copy link
Contributor

elevran commented Jan 23, 2019

Describe the feature request

This is in continuation of #7878
Envoys maintain long lived connections to Pilot. In HA scenarios, instances are ready at different times and thus earlier instances receive a disproportionate number of connections. This imbalance is exacerbated during rolling upgrades.
The request is to create a more balanced split of connections between Envoy and Pilots, one that has some intelligence to balance the loads among all Pilot replicas.

Describe alternatives you've considered

#10838, #10870 and #11126 provide a short fix by capping the maximum connection age, allowing rebalancing of load over time. This solution is not ideal:

  • when maximum age is set high, the system responds slowly to imbalance and does not actually respond to changing load in an online manner.
  • when maximum age is set low, it introduces a lot of churn.
  • connections are disconnected even when load is correctly balanced.

Additional context

grpc-lb has been suggested as an alternative solution. Client side LB distributes logic to all Envoys (in addition, it may not solve this problem since it relies on name resolution which would replicate the imbalance as instances come and go. See discussion here).
It may be preferrable if we could encapsulate the implementation server-side, entirely in Pilot.
This comment on the original issue provides some additional context.

@elevran
Copy link
Contributor Author

elevran commented Feb 28, 2019

The solution needs to consider:

  • minimize global coordination needed and still achieve fairness
  • require little "global coordination" available out of box (e.g., minimize communication, limited global state,...)
  • we have no control over client selection of servers (randomized by k8s and service VIP)
  • expect pilots to come and go causing imbalance (e.g., during startup and upgrade, having small number of pilots)

Following Slack conversation with @Stono, we suggest the following.
For each Pilot:

  • keep track of number of connected envoys (e.g., in grpc middleware or from fronting envoy)
  • keep track of total number of pilots (count and IP's are in the service endpoints)
  • get count of envoys connected to other pilots (e.g., existing envoy stat or new pilot API)
  • determine "fair share" (e.g., total envoys / total pilots * allowed imbalance)
  • if over "fair share", randomly drop some of the connections until reaching fair share (can be throttled)
  • reject new connections while over fair share (some dropped connections would reconnect to the pilot)

The above can be run periodically or on event (change in pilot count).

Feedback welcomed!
CC people on original issue: @costin @rshriram @duderino @Stono @mandarjog @linsun @ja30278 @louiscryan @morvencao

@morvencao
Copy link
Member

morvencao commented Mar 4, 2019

@elevran
Have tested the keepaliveMaxServerConnectionAge parameter for pilot with Istio 1.1.0.RC2, looks like it does't works well.
The test scenario is that:

  1. Deploy pilot(with keepaliveMaxServerConnectionAge set to 30s) with 1 instance
  2. Deploy 80 pods with sidecar
  3. All sidecars connect to the pilot instance
  4. Then scale the pilot to 2 instances

After a few minutes, all sidecar are still connecting the old pilot instance.

[root@master istio-1.1.0-rc.2]# kubectl -n istio-system get pod | grep pilot
istio-pilot-8f9b7bf96-7p424                   2/2       Running     0          24m
istio-pilot-8f9b7bf96-8gpdn                   2/2       Running     0          51m
[root@master istio-1.1.0-rc.2]# ./bin/istioctl ps
NAME                                                   CDS        LDS        EDS               RDS          PILOT                           VERSION
details-v1-5fbc6bc87c-5lzb2.default                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
istio-egressgateway-769d84965b-vznzw.istio-system      SYNCED     SYNCED     SYNCED (100%)     NOT SENT     istio-pilot-8f9b7bf96-8gpdn     1.1.0
istio-ingressgateway-757b68b569-hv8gl.istio-system     SYNCED     SYNCED     SYNCED (100%)     SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-0-7f97b55584-chn6s.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-1-5d898645d5-vjv5h.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-10-98c578d6c-zdmrn.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-11-7cd867684d-thrl9.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-12-86b7f54977-vcqpq.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-13-649545d66b-rvkd9.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-14-5b8cf9cc5f-djcg7.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-15-586d9bdc4d-7v5hc.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-16-68bff68d5c-xwz7h.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-17-7dc9c687bb-pd5mw.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-18-5c4fbb845f-v92gk.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-19-576fd7fd68-stsk6.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-2-6d8d667774-54fl6.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-20-c966759dd-bst6q.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-21-845f7fbb67-x5j76.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-22-cd7bc8f66-5djft.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-23-67b575c857-ntnzb.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-24-59945bb9cb-tbfxh.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-25-8b7d4f8c-48qnw.istio-apps                     SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-26-5dd5687458-wmrm5.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-27-79696f8c57-ktc8d.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-28-5b68c879d4-tsjfz.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-29-d945447f5-9cw4g.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-3-58846c4cdc-rntxm.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-30-5759c44689-s9rq5.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-31-cff88c88b-vftns.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-32-bc7f94f99-zptz5.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-33-7bbdcf54cb-h6x86.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-34-6d985b9dc7-fn84h.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-35-69747f77f8-f66kd.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-36-567846799f-lw6s5.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-37-68b5c9cb76-v4d2s.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-38-78c7785cdf-c8bfm.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-39-6dbf6fff5d-b6255.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-4-5f99f95748-pnxz4.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-40-76c6cdcd74-962jp.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-41-69b5588c5c-xlcz2.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-42-7ccb6d6ff6-6sgql.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-43-64996dfd8c-gz28f.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-44-579848cfcd-xrg6h.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-45-7ddd4666fc-qzxvw.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-46-67ccf7946d-f49kr.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-47-d6f4fc8c9-pz5w2.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-48-86b9b6d4cd-w29pg.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-49-599fb887df-mk686.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-5-84c5564b77-g5rbp.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-50-844fcd57ff-mq79w.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-51-55644d7dd-fl29b.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-52-6bbd444d5-qkhl2.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-53-589d998946-rtkvp.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-54-bff448b47-ptn7m.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-55-6b6c45bb44-9xzv5.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-56-5dc65c9c77-c9dph.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-57-69d865c4-7qwm8.istio-apps                     SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-58-68687bc4dc-bkpxz.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-59-58949cdd8-gdrd9.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-6-85688f5f9d-lz2lw.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-60-766c6b4484-8bhrc.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-61-86bd4db7f9-b8bdt.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-62-7d464978c4-9jh5j.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-63-5854579b69-x7xvs.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-64-6d97f7b4cb-zlnbs.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-65-67d6b59b99-4sjld.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-66-78ccbb749d-kw47t.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-67-756975c8f7-wzr9s.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-68-746f77c9bd-9h2v2.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-69-669764b9f-tvgsv.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-7-5bf57b7c95-rf769.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-70-68c8b54bc7-vb7hz.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-71-7d6c57fdc8-pnhg4.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-72-85965f7667-4r9s7.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-73-ff8879f88-tbfkr.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-74-655fc6dffc-7xp4z.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-75-6655554c9d-cxv8b.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-76-6fdbc5f6f5-fzfc8.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-77-6c84c997dd-krhbr.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-78-5cb67df669-slwl5.istio-apps                   SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-79-954bc8bd6-sn5ms.istio-apps                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-8-5c6968bd8-7g88r.istio-apps                     SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
nginx-9-c8db4c77-f2l4r.istio-apps                      SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
productpage-v1-7656c67555-f7vj2.default                SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
ratings-v1-54d9644bfc-zlrsj.default                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
reviews-v1-6d78f9fc98-2pk27.default                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
reviews-v2-79f48686f-vhz2r.default                     SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0
reviews-v3-74bfd9bdd8-44bhf.default                    SYNCED     SYNCED     SYNCED (50%)      SYNCED       istio-pilot-8f9b7bf96-8gpdn     1.1.0

Anything I missing? Or Is there any recommended value for the parameter?

@duderino duderino modified the milestones: 1.2, 1.1 Mar 21, 2019
@Stono
Copy link
Contributor

Stono commented Mar 22, 2019

Seeing as this is enabled by default in 1.1.0 at 30minutes, I'd really like to understand:

  1. Is it even working?
  2. Is it safe to use?

I'm half inclined to edit the injector config and remove it because the potential increased churn worries me.

Also, can anyone confirm if keepaliveMaxServerConnectionAge has some jitter? Otherwise we'll get all pilots reconnecting 30minutes after a rolling deployment, which is pretty big bag.

@hzxuzhonghu
Copy link
Member

can anyone confirm if keepaliveMaxServerConnectionAge has some jitter?

I think it has, but not much. With at most MaxServerConnectionAgeGrace, which is default to 10s. IMO, this option should be disabled by default.

@rmichela
Copy link

grpc-lb has been suggested as an alternative solution.

grpc-lb is now considered deprecated.

@elevran
Copy link
Contributor Author

elevran commented Mar 26, 2019

Anything I missing? Or Is there any recommended value for the parameter?

@morvencao what was the maximum age configured? Rebalance won't happen before the expiration of the maximum age. The default (if unspecified) is infinity.

Also, can anyone confirm if keepaliveMaxServerConnectionAge has some jitter? Otherwise we'll get all pilots reconnecting 30minutes after a rolling deployment, which is pretty big bag.

@Stono according to grpc-go/keepalive.go there is a +/- 10% jitter on the configured value to avoid connection storms. So a 30 min maximum age will spread reconnects over 6 minutes.

	// The current default value is infinity.
	// MaxConnectionAge is a duration for the maximum amount of time a
	// connection may exist before it will be closed by sending a GoAway. A
	// random jitter of +/-10% will be added to MaxConnectionAge to spread out
	// connection storms.

@morvencao
Copy link
Member

@elevran keepaliveMaxServerConnectionAge set to 30s

@hzxuzhonghu
Copy link
Member

Is affinity set for istio-pilot service?

@hzxuzhonghu
Copy link
Member

sessionAffinity

@morvencao
Copy link
Member

@hzxuzhonghu No sessionAffinity set for pilot.

@stale
Copy link

stale bot commented Jun 24, 2019

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 24, 2019
@hzxuzhonghu
Copy link
Member

I've tested with 3 pilots and total tens of pods, and keepaliveMaxServerConnectionAge works for me .

@stale stale bot removed the stale label Jun 25, 2019
@howardjohn
Copy link
Member

Another related problem, if you are on the border of 1 or 2 pilots needed, you get this really bad behavior where we keep flipping between 1 and 2 replicas and the pilot takes 30minutes to fully shed the load

https://snapshot.raintank.io/dashboard/snapshot/SceOCrNpdOr4qmTUk1UHF20xMiNqGk6K?panelId=4&fullscreen&orgId=2

@howardjohn howardjohn modified the milestones: 1.1, Nebulous Future Aug 5, 2019
@howardjohn howardjohn self-assigned this Oct 6, 2019
@geeknoid geeknoid added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Oct 28, 2019
@howardjohn
Copy link
Member

This is pretty broken now, even the max connection age.

  1. If on 15010, max conn age seems to work
  2. If on 15011 in 1.4-, max conn age is broken. The connection for pilot sidecar - pilot is broken, but the connection between envoy -> pilot sidecar remains, so no balancing occurs
  3. If on 15011 in 1.5, or 15012, max conn age is broken. We use gRPC ServeHTTP which seems to ignore this setting

Setting max_requests_per_connection on the envoy (client) seems to fix this in case (2)

@hzxuzhonghu
Copy link
Member

One question: what does one request mean? A xds request is one? Will the connection always break when any new xds request comes.

@howardjohn
Copy link
Member

Request is one gRPC stream, its an http level setting not XDS.

@hzxuzhonghu
Copy link
Member

Got it.

istio-testing pushed a commit that referenced this issue Feb 14, 2020
* Fix load balancing of pilot connections

Context:
#11181 (comment)

* fix repetitive code

* Update goldens
istio-testing pushed a commit to istio-testing/istio that referenced this issue Feb 14, 2020
istio-testing added a commit that referenced this issue Feb 15, 2020
* Fix load balancing of pilot connections

Context:
#11181 (comment)

* fix repetitive code

* Update goldens

Co-authored-by: John Howard <[email protected]>
sdake pushed a commit to sdake/istio that referenced this issue Feb 21, 2020
* Fix load balancing of pilot connections

Context:
istio#11181 (comment)

* fix repetitive code

* Update goldens
@hzxuzhonghu
Copy link
Member

Just to sync up: it seems work for me:

@hzxuzhonghu
Copy link
Member

istioctl ps
NAME                                                   CDS        LDS        EDS        RDS          PILOT                       VERSION
istio-egressgateway-c968fc7d6-sz844.istio-system       SYNCED     SYNCED     SYNCED     NOT SENT     istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
istio-ingressgateway-5f77578484-scp7n.istio-system     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
sleep-8f795f47d-2srqg.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
sleep-8f795f47d-6r8lf.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
sleep-8f795f47d-7fms8.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-2df96     1.7.0-alpha.0
sleep-8f795f47d-j9v9b.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-gk28j     1.7.0-alpha.0
sleep-8f795f47d-ksf2j.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
sleep-8f795f47d-nj8sz.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
sleep-8f795f47d-qz8gf.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-gk28j     1.7.0-alpha.0
sleep-8f795f47d-s2tv2.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-2df96     1.7.0-alpha.0
sleep-8f795f47d-sxnbd.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-gk28j     1.7.0-alpha.0
sleep-8f795f47d-wqq46.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-gk28j     1.7.0-alpha.0

And hours later:

istioctl ps
NAME                                                   CDS        LDS        EDS        RDS          PILOT                       VERSION
istio-egressgateway-c968fc7d6-sz844.istio-system       SYNCED     SYNCED     SYNCED     NOT SENT     istiod-774bbfdf5d-2df96     1.7.0-alpha.0
istio-ingressgateway-5f77578484-scp7n.istio-system     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
sleep-8f795f47d-2srqg.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-2df96     1.7.0-alpha.0
sleep-8f795f47d-6r8lf.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-2df96     1.7.0-alpha.0
sleep-8f795f47d-7fms8.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-gk28j     1.7.0-alpha.0
sleep-8f795f47d-j9v9b.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-gk28j     1.7.0-alpha.0
sleep-8f795f47d-ksf2j.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-2df96     1.7.0-alpha.0
sleep-8f795f47d-nj8sz.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-gk28j     1.7.0-alpha.0
sleep-8f795f47d-qz8gf.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
sleep-8f795f47d-s2tv2.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-2df96     1.7.0-alpha.0
sleep-8f795f47d-sxnbd.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-xcpwz     1.7.0-alpha.0
sleep-8f795f47d-wqq46.default                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-774bbfdf5d-2df96     1.7.0-alpha.0

And no restart

k get pod
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-6b474476c4-9vc42   1/1     Running   0          17d
sleep-8f795f47d-2srqg               2/2     Running   0          16h
sleep-8f795f47d-6r8lf               2/2     Running   0          16h
sleep-8f795f47d-7fms8               2/2     Running   0          16h
sleep-8f795f47d-j9v9b               2/2     Running   0          11d
sleep-8f795f47d-ksf2j               2/2     Running   0          16h
sleep-8f795f47d-nj8sz               2/2     Running   0          16h
sleep-8f795f47d-qz8gf               2/2     Running   0          16h
sleep-8f795f47d-s2tv2               2/2     Running   0          16h
sleep-8f795f47d-sxnbd               2/2     Running   0          16h
sleep-8f795f47d-wqq46               2/2     Running   0          16h

@hzxuzhonghu
Copy link
Member

istioctl version
client version: 1.7.0-alpha.0
control plane version: 1.7.0-alpha.0-37119973c952151e269110170f2fda8c6a34fb5e
data plane version: 1.7.0-alpha.0 (12 proxies)

@srmars
Copy link

srmars commented Nov 9, 2022

@howardjohn I can see default keepaliveMaxServerConnectionAge values is 30m in istiod deployment. I have couple of query regarding this. Can you please check the below. Thank you.

  1. Is there any there any recommended values for productions.
  2. Is there any disadvantage of setting this value as 24h.
  3. From the Istio document I can see the below values has default. Is that correct. I think based on 1st point it should be 30m right.

https://istio.io/latest/docs/reference/commands/pilot-discovery/

Maximum duration a connection will be kept open on the server before a graceful close. (default 2562047h47m16.854775807s)

@howardjohn
Copy link
Member

  1. 30min :-) that is why its the default...
  2. Yes, load will not balance across istiod instances for >24hrs
  3. That is the default in the binary, the helm/istioctl install overrides it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking area/perf and scalability kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
Development

No branches or pull requests

10 participants