Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All Hops Encrypted: TLS between activator and queue-Proxy #12815

Merged
merged 5 commits into from
Apr 19, 2022

Conversation

nak3
Copy link
Contributor

@nak3 nak3 commented Apr 5, 2022

This patch fixes the following issues:
Fix #12502
Fix #12503

TODO:

@knative-prow knative-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale area/networking area/test-and-release It flags unit/e2e/conformance/perf test issues for product features labels Apr 5, 2022
@nak3 nak3 force-pushed the add-queue-proxy-tls-main branch from 440ed18 to 4a79434 Compare April 5, 2022 10:33
@nak3 nak3 force-pushed the add-queue-proxy-tls-main branch from a52ab2a to 87f3d49 Compare April 6, 2022 08:54
@nak3 nak3 force-pushed the add-queue-proxy-tls-main branch from 252faa3 to 4cfda88 Compare April 7, 2022 04:52
@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2022
knative-prow bot pushed a commit to knative/pkg that referenced this pull request Apr 11, 2022
…S proxy (#2479)

* Add `NewProxyAutoTLSTransport` and `DialTLSWithBackOff` to support TLS proxy

Part of: knative/serving#12503
PoC: knative/serving#12815

This patch `NewProxyAutoTLSTransport` which is `NewProxyAutoTransport + TLS config.
Current proxy does not support TLS but it needs for knative/serving#12503.

`DialTLSWithBackOff` is also `DialWithBackOff` + TLS config. It needs
`newH2Transport` which handles HTTP2 with TLS.

* Fix lint

* Fix review comments
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2022
@nak3 nak3 force-pushed the add-queue-proxy-tls-main branch 4 times, most recently from 986115d to 685a942 Compare April 12, 2022 02:18
@codecov
Copy link

codecov bot commented Apr 12, 2022

Codecov Report

Merging #12815 (a377984) into main (ad15a85) will decrease coverage by 0.42%.
The diff coverage is 50.86%.

@@            Coverage Diff             @@
##             main   #12815      +/-   ##
==========================================
- Coverage   87.44%   87.01%   -0.43%     
==========================================
  Files         196      197       +1     
  Lines        9782    14426    +4644     
==========================================
+ Hits         8554    12553    +3999     
- Misses        938     1578     +640     
- Partials      290      295       +5     
Impacted Files Coverage Δ
cmd/activator/main.go 0.00% <0.00%> (ø)
cmd/queue/main.go 0.65% <0.00%> (+0.11%) ⬆️
pkg/reconciler/revision/resources/deploy.go 90.10% <0.00%> (-6.89%) ⬇️
pkg/reconciler/domainmapping/reconciler.go 93.62% <33.33%> (+1.52%) ⬆️
pkg/activator/handler/handler.go 87.67% <45.45%> (-8.08%) ⬇️
pkg/networking/util.go 88.00% <88.00%> (ø)
pkg/reconciler/autoscaling/hpa/hpa.go 91.66% <88.88%> (+0.49%) ⬆️
pkg/http/proxy.go 100.00% <100.00%> (ø)
pkg/reconciler/autoscaling/config/store.go 100.00% <100.00%> (ø)
pkg/reconciler/autoscaling/kpa/kpa.go 95.18% <100.00%> (+0.06%) ⬆️
... and 199 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2de1474...a377984. Read the comment docs.

knative-prow bot pushed a commit to knative/networking that referenced this pull request Apr 12, 2022
* Add certificates keys in config-network

This patch adds the following certificate variables:

- `activator-server-certs`
- `queue-proxy-ca`
- `queue-proxy-san`
- `queue-proxy-server-certs`

It is similar to #608.

knative/serving#12815 and knative/serving#12770
verifeid the change.

* Rename -server-certs with -cert-secret

* Bump checksum

* Fix lint
@nak3 nak3 changed the title [WIP] All Hops Encrypted: TLS between activator and queue-Proxy All Hops Encrypted: TLS between activator and queue-Proxy Apr 12, 2022
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2022
@nak3
Copy link
Contributor Author

nak3 commented Apr 12, 2022

/cc @evankanderson @skonto @rhuss

@evankanderson
Copy link
Member

I'll take a look at this tonight; I'm on my way to sitting break right now.

@knative-prow
Copy link

knative-prow bot commented Apr 13, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nak3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cmd/activator/main.go Outdated Show resolved Hide resolved
cmd/activator/main.go Outdated Show resolved Hide resolved
@@ -123,7 +123,7 @@ func TestActivationHandler(t *testing.T) {

ctx, cancel, _ := rtesting.SetupFakeContextWithCancel(t)
defer cancel()
handler := New(ctx, test.throttler, rt, false /*usePassthroughLb*/, logging.FromContext(ctx))
handler := New(ctx, test.throttler, rt, false /*usePassthroughLb*/, logging.FromContext(ctx), false /* TLS */)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test where this is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not possible to add the HTTPs proxy in this test code. TestNewHeaderPruningProxyHTTPS covered the part instead.

@@ -69,7 +69,7 @@ func BenchmarkHandlerChain(b *testing.B) {
})

// Make sure to update this if the activator's main file changes.
ah := New(ctx, fakeThrottler{}, rt, false, logger)
ah := New(ctx, fakeThrottler{}, rt, false, logger, false /* TLS */)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the overhead if tls is true, does it make sense to have run a benchmark with that too?

@skonto
Copy link
Contributor

skonto commented Apr 15, 2022

/hold for @evankanderson

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 15, 2022
@skonto
Copy link
Contributor

skonto commented Apr 15, 2022

Afaik this is also protected from accidental misuse in this phase so let's get it in.

Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight concern that this PR will break namespaces which don't have the correct certificate and when the ConfigMap is set (mandatory volume mount in the new Deployment of a secret which doesn't exist, and activator only requesting https for those containers), but if the current e2e tests pass, I'm okay with this being a pre-alpha release.

CC @dprotaso just in case this ends up causing problems for the release in 2 business days.

// Enable TLS client when queue-proxy-ca is specified.
// At this moment activator with TLS does not disable HTTP.
// See also https://github.com/knative/serving/issues/12808.
if networkConfig.QueueProxyCA != "" && networkConfig.QueueProxySAN != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a restart of activator to pick up the new network config.

For pre-alpha, this seems okay, but we should aim to make this configurable without a restart by beta.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if the secret name wasn't configurable, you could use a volume mount to achieve this addition to the system cert pool.

pool = x509.NewCertPool()
}

if ok := pool.AppendCertsFromPEM(caSecret.Data["ca.crt"]); !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the secret exists but doesn't have the right data, this will fail.

Do we need to undo the base64 encoding here before AppendCertsFromPEM? (I don't recall what the client-go libraries assist with here.)

@@ -186,9 +213,12 @@ func main() {
concurrencyReporter := activatorhandler.NewConcurrencyReporter(ctx, env.PodName, statCh)
go concurrencyReporter.Run(ctx.Done())

// Enable TLS against queue-proxy when the CA and SA are specified.
tlsEnabled := networkConfig.QueueProxyCA != "" && networkConfig.QueueProxySAN != ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeating 160, it looks like. Can we set this there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or make it part of the transport)?

cmd/queue/main.go Outdated Show resolved Hide resolved
mainTLSServer, drain := buildServer(ctx, env, probe, stats, logger, concurrencyendpoint, true /* enable TLS */)
tlsServers := map[string]*http.Server{
"tlsMain": mainTLSServer,
"tlsAdmin": buildAdminServer(logger, drain),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like moving the admin server to TLS is spread all over this function. What about moving this before the loop on 190 and explicitly moving the server from one map to the other. (It would add another tlsEnabled condition to the function, but that feels simpler than the other logic.)

i.e. build a tlsServer and httpServer map, and then iterate through one or both after building the maps?

cmd/queue/main.go Show resolved Hide resolved
cmd/queue/main.go Outdated Show resolved Hide resolved
pkg/http/proxy.go Outdated Show resolved Hide resolved
@@ -122,6 +139,11 @@ func makePodSpec(rev *v1.Revision, cfg *config.Config) (*corev1.PodSpec, error)
extraVolumes = append(extraVolumes, varTokenVolume)
}

if cfg.Network.QueueProxyCertSecret != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means changing the QueueProxtCertSecret will cause all Knative Serving user deployments to get updated and rolled out, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes't that's correct.

Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

Still have a few pre-alpha and post -1.4 concerns, but I think this should be safe to go in tomorrow.

// See also https://github.com/knative/serving/issues/12808.
var tlsServers map[string]*http.Server
if tlsEnabled {
mainTLSServer, drain := buildServer(ctx, env, probe, stats, logger, concurrencyendpoint, true /* enable TLS */)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the drain from line 176 is lost?

Again, okay for pre-alpha, but it would be nice to unify the drains post 1.4


target := net.JoinHostPort("127.0.0.1", env.UserPort)

httpProxy := pkghttp.NewHeaderPruningReverseProxy(target, pkghttp.NoHostOverride, activator.RevisionHeaders)
httpProxy := pkghttp.NewHeaderPruningReverseProxy(target, pkghttp.NoHostOverride, activator.RevisionHeaders, false /* use HTTP */)
httpProxy.Transport = buildTransport(env, logger)
httpProxy.ErrorHandler = pkghandler.Error(logger)
httpProxy.BufferPool = network.NewBufferPool()
httpProxy.FlushInterval = network.FlushInterval

breaker := buildBreaker(logger, env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we won't be counting concurrency properly across HTTP and HTTPS while the activator change rolls out (but again, a post-1.4 note, just worth adding a TODO here).

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2022
@nak3
Copy link
Contributor Author

nak3 commented Apr 18, 2022

Thank you! Updated.

@evankanderson
Copy link
Member

/lgtm

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2022
@nak3
Copy link
Contributor Author

nak3 commented Apr 18, 2022

/retest

TestActivatorOverload failed. Not only this PR but also other PRs failed with TestActivatorOverload last night.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale area/networking area/test-and-release It flags unit/e2e/conformance/perf test issues for product features lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All Hops Encrypted: Activator uses queue-proxy TLS if available All Hops Encrypted: Queue-Proxy serving TLS
4 participants