-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to seal/unseal own TLS certificate #17
Comments
\o hey @bnevis-i --- some thoughts here. I think this might be hard to achieve as written. Plugins cannot load until the underlying storage is decrypted and this can require manual involvement (Shamir's unseal, which is done remotely via API -- needing certificates in the HTTPS secure case), and generally requires the core + listeners to already be started. This bootstrapping leads to a chicken-and-egg problem, and breaking that cycle might be impossible without local unseal (or strictly requiring auto-unseal). However, if you have an existing certificate, this could be reused and refreshed periodically more easily... though this also requires Bao to be able to issue potentially privileged tokens (bypassing authentication plugins entirely) to itself (or, tightly scoped tokens for single API endpoints, using authentication information it can present to itself). This is all very hard, IMO, as written. However, more broadly applicable (and doing 90% of what you want -- the steady-state renewal) I think, is perhaps using ACME similar to how Caddy works. Let the listener point at an ACME server to fetch certificates from, allow local caching, and perhaps have a temporary self-signed certificate mode when necessary. Luckily the version Bao is based on supports ACME in the PKI plugin, so this should be fairly achievable much more easily. Some detection (for pointing at ourselves) would be necessary -- so we opportunistically fetch via ACME but fallback to self-signing if no locally cached cert and no ACME responder is (yet) available -- but otherwise this should be much more achievable. This would also benefit people not using Bao's PKI capabilities. Let me know your thoughts and if you'd like to formalize that into a proper RFC. |
@cipherboy I can't figure out how to create a new ticket with the RFC template. I can't commit to technical exploration, prototyping, or a PR, but the least I can do is put down a use case. I like your ideas above. SummaryThis feature would allow OpenBao to self-manage its TLS serving certificate. Problem StatementToday, the TLS serving certificate and key must be provided via an external file and referenced by a configuration file. OpenBao's ability to keep secrets means uniquely positions it to be the root of trust (for lack of a better term) of a microservice ecosystem built on top of it. This is especially true of bare-metal and Docker-based architectures where there is no platform-provided capabilities for identity or secret storage. Given that the main value proposition of OpenBao is its ability to keep secrets, the requirement to have a TLS private key unencrypted on disk for OpenBao to talk confidentially over the network is somewhat ironic. Moreover, it means that OpenBao then relies on some OTHER root of trust to bootstrap it, and practically speaking, an entirely manual process. User-facing descriptionFrom the perspective of an OpenBao adopter, OpenBao would offer a mechanism to come up in a TLS-enabled mode by default. Technical descriptionTBD Rationale and alternativesDownsidesSecurity ImplicationsUser/Developer ExperienceUnresolved QuestionsProof of Concept |
The security of ACME depends on the security of DNS, which isn’t ideal. For publicly trusted certificates it is the best one can do, but OpenBao can do better. Instead of trying to mess with self-issued tokens, I think a simpler approach would be to assume that if OpenBao can rotate its own cert, it should. Specifically, whenever OpenBao is set up as a certification authority, OpenBao can check if the issuing certificate it will be using is the one that issued its own TLS certificate. If so, then OpenBao can issue itself a new certificate, so it should. Users should be able to opt out of this behavior, but for most, I suspect it is exactly what they want. |
\o Hey @DemiMarie, great to see you around OpenBao as well! Happy to review PRs for some of your other issues we didn't get to upstream. :-)
While not ideal, as you point out, it is widely trusted by the CA/BF and Let's Encrypt has on the order of 300M active certificates. Automation protocols like ACME are widely deployed internally and ACME is generally regarded as much better than previous iterations (SCEP, EST, or CMPv2) as those protocols lack real-time client path validation (DNS or direct connect) and rely on external authorization (PSK or established key material...) and CA policies for security. Things like multi-point validation (which admittedly isn't present in Bao especially due to the lack of clustering from upstream's Enterprise offering) and DoH/DNS SEC (which Go seems to have taken the option the system resolver should be doing this on its behalf or the application should be using a custom library) make DNS more palatable. Though, admittedly there is still substantial concern over DNS SEC in particular. However, this also doesn't reflect the reality of what I've heard from larger users. Often their networking (including DNS) is one of the few channels they trust, perhaps because it's internal to datacenters or otherwise tightly controlled. Plus, this helps our interoperability with third-party private CA solutions. Outside of AD/CS, nearly every other major and minor private CA software supports ACME. Some do provide Bao plugins though, but I think this is the exception, not the rule.
I'm sorry, but I think you've lost me here. :-) I think on the surface, this part of the comment makes sense from an ideal UX perspective as a bystander but misses the reality of the project. Bao and its upstream aren't a tightly coupled monolith despite being shipped as such. Outside of the Core API routing layer, the "physical" storage backend, the authentication mechanism, and any provided secrets management are wholly plugin oriented. The authorization and routing model rely on this token that is hard to disentangle. The Core (which holds the listener and thus needs the certificate) has no more knowledge of what is being run or its purpose than a simple routing table. One PKI plugin may differ substantially from the next, if only because API design was done at different companies both wishing to provide a plugin for this platform. IMO, this needs to be solved by making the issuance path explicit (in configuration perhaps) and cannot be automatic inferred (which of N PKI plugins of M types is the best to use?). In a clustering product (which admittedly Bao is not -- though I do think it has inherited the HA / hot fail over "Standby" mode -- not "Performance Standby" -- of upstream), the backend plugin may not even be loaded and thus can't issue the certificate without loading and taking leadership. Furthermore, in a proper clustering environment ("Performance Standby nodes" in upstream's terminology), the plugin may be loaded but the operator's specified role to issue against may require a storage write on a local node that can't -- in upstream's model, this forwards the entire API request via GRPC up to the active cluster node to service from scratch -- this makes restarts hard to handle and if secondaries are brought up before primaries, may result in a delay in issuance (just like ACME, pointing at itself -- though ACME at least could use multiple fallback DNS / RR directories, perhaps allowing it to bootstrap from some other already running cluster copy). This is where standardization has been missed by the ecosystem. While nominally some attributes and API routes may apply across providers of PKI plugins, specialized issuance parameters may not (IP sans? SPIFFE?). And while certainly not perfect or complete, ACME helps alleviate most of this. But perhaps there's some angle I haven't considered... Could you perhaps expand on the exact use case and operating model for Bao you're expecting here? How would different layers of Bao know whether it is operating as a CA? How would you handle encrypted storage bootstrapping with manual unseal? With clustering/HA Standby nodes? How would you (safely) bypass authorization layers? Can we show this ultimately doesn't rely on DNS again (e.g., for forwarding between nodes, if necessary)? &c. Maybe there is an alternative here with better UX. |
@cipherboy Great discussion and I learned a lot. Since you bring it up, is a SPIFFE auth plugin impossible? The Vault OIDC auth method thing makes my head spin. |
@bnevis-i Hmm, I'd move that particular thread to a new issue too if you don't mind. ;-) The short of it is that I have no idea what "SPIFFE authentication" actually means (like you and OIDC, it seemed a bit complicated in my mind -- but I'll admit I didn't spend a lot of time looking at it). One of the original requesters of it seemed to point at an external (non-stdlib) TLS stack being required for it. While possible, this isn't appealing for a number of maintainability and supportability reasons. What this could mean is that the auth plugin might need to expose a non-standard port to strictly talk SPIFFE auth over, if replacing the entire TLS stack isn't ideal (similar to upstream's KMIP support, a protocol which doesn't transit on HTTP at all). But without knowing more about SPIFFE and the differences between SPIFFE and other auth methods, I'm most likely wrong. :-) |
Hello! It’s rather funny, considering I have never used Vault or OpenBao myself, mostly since I have not yet been in a position to need either. But I do care about security, so I like to see security-related stuff work well.
I’m not at all surprised here. The main problem then becomes the challenge verification, if a non-DNS challenge is used.
Yup.
No surprise there.
Another option is for OpenBao to make an internal ACME request (to itself) that is treated as having already passed the challenge. |
\o hello @DemiMarie, sorry about the delay in getting back :-)
:D No worries! Qubes is a great project!
Indeed. I wonder if we'd additionally need specialized recovery tools. In the simplest form, perhaps allowing fallback to self-signed certificates (and perhaps more tightly scoped listeners) with explicit pinning might work... This would allow initial setup and recovery operations, but perhaps require a restart of the daemon to fully reset into normal mode. Local-host only might not be sufficient though, since in general this is clustering software. Related in particular to this conversation, with the multitude of storage backends, we might also need to allow offline issuance of certificates (likely not stored, though I'm not quite sure how to force auditing) to allow revitalizing infrastructure in the event of an outage. If say, an intermediate CA expires that is hosted in Bao, that is used for e.g., securing communication with a backing PostgreSQL data store, this might get complicated to revive. Self-signed wouldn't necessarily be sufficient or desired, so we'd definitely need issuance (against multiple mounts, roles, and types of certificates potentially). Limiting to Raft here could help prevent this dependency in this particular case. I'm not quite sure though how the manual tool would handle this case though. Even if manual unseal keys were provided, it'd still need to access the backing data store, which might involve trusting expired certificates (which could be done manually). Maybe it is possible, but we'd have to test it in a lot of various scenarios.
I think this is where ACME is looking more appealing. It might be possible to avoid bypassing issuance authorization in most cases (e.g., by avoiding DNS challenge types and preferring ALPN/HTTP instead). Generating a root token would be possible in general... Though generally this requires using recovery keys to authorize manually and isn't typically done automatically. (So it is usually a privileged operation limited to recovery in the event of failure or initial setup). However you're definitely correct that a root token would let us successfully issue a cert using the non-ACME path.
Agreed, though I don't know that we need this as much with ACME. But one trick is that OpenBao/our upstream don't cache and attribute requests in that way, so its hard in general to redo an operation. It would be up to the client (i.e., the daemon trying to self-request a certificate) to re-issue the request with the same parameters, which means you still need a way of handling the first time.
I'll need to think more about clustering w.r.t. HA mode in OpenBao and if this is sufficient. It might be, but we might still need to handle forwarding as I'm not sure HA mode does do request forwarding transparently to the calling client. Hmm... |
Thank you! So is OpenBao.
My mental model is that OpenBao does something like (in pseudo-Go): fn handle_request() {
payload, err := unmarshalRequest()
if err != nil {
sendBadRequestResponse(err)
return
}
credentials, err := unmarshalToken(payload.token)
if err != nil {
sendBadRequestResponse(err)
return
}
dispatchRequest(payload.path, payload.body, credentials)
}
} OpenBao could instead create a synthentic
I was thinking of extracting them from OpenBao’s own TLS certificate. |
Just dropping some notes here.
I think we can do 1 fairly easily, and reuse the existing paths transparently: if they exist, use them and refresh via ACME. Otherwise, point it at a temporary directory. I think we can thus wrap the returned I think this then makes the entire change relatively small and self-contained. Maybe: + TLSACMEDirectory string `hcl:"tls_acme_server"`
+ TLSACMERoot string `hcl:"tls_acme_ca_root"`
+ TLSACMEKeyType string `hcl:"tls_acme_key_type"`
+ TLSACMEEABId string `hcl:"tls_acme_eab_id"`
+ TLSACMEEABKey string `hcl:"tls_acme_eab_key"`
+ TLSACMEEmail string `hcl:"tls_acme_email"` for config parameters? This doesn't yet handle DNS challenges, but we can deal with those later if/when necessary. Or should we nest it in an inner layer? listener "tcp" {
acme {
server = ""
}
} IMO the latter is probably more extensible when it comes time for dns challenges, so maybe that'd be my preference... Thinking about it more, if we want to identify alternative domains (e.g., bind to For choice of library, I'd probably use the Another challenge I've not yet figured out is how to deal with multiple listeners. Ideally these would be a only a single ACME call (multiple SAN certificate), but we might have to figure out a loading procedure to make sure all listeners inform the ACME stack of the required domains and listen addresses, and then make sure only one actually issues the ACME challenges... Today, you could use multiple listeners with the same path, so adding an ACME directive to one should probably get you ACME for all (and maybe we'll want to validate that duplicate ACME directives for the same paths in different listeners don't exist). This might warrant hosting ACME up into a global directive then: acme {
name = <...>
server = <...>
domains = <...>
[cert_file = <...>]
}
listener "tcp" {
tls_cert_file = "<...>"
}
listener "tcp" {
tls_acme = "<name>"
} Here, multiple listeners could refer to the same ACME config either by file path (which would be unique) or by using a |
@DemiMarie said:
I had forgotten that we added this logic to the CLI: openbao/command/pki_reissue_intermediate.go Lines 149 to 186 in 8dead56
This might need a little modification for leaf certificates, but should provide a good basis if we wanted to take this approach later. |
…penbao#17) * Adds makefile, gh action for tests, dependabot, tools * Update .github/workflows/tests.yaml Co-authored-by: Theron Voran <[email protected]> * Update .github/workflows/tests.yaml Co-authored-by: Theron Voran <[email protected]> * Update .github/workflows/tests.yaml Co-authored-by: Theron Voran <[email protected]> * Update .go-version Co-authored-by: Theron Voran <[email protected]> * Fixes tests to run locally with testacc * try acctests in CI --------- Co-authored-by: Theron Voran <[email protected]>
SummaryAllow OpenBao to self-manage TLS certificates for its listener via the ACME protocol, similar to Caddy's automated certificate management. This would align OpenBao with server projects like Caddy, Apache's httpd, nginx, and others, that can acquire and rotate their listener certificates automatically via the ACME protocol. Problem StatementPresently, OpenBao's TLS server certificate and key must be provided via external files ( OpenBao's ability to keep secrets and keys uniquely positions it to be the X.509 root of trust of an ecosystem built on top of it. This is especially true of bare metal and Docker-based architectures where there is no platform-provided capabilities for identity or secret storage and every application has its own method of acquiring and using certificates. Given that the main value proposition of OpenBao is its ability to rotate secrets, the inability to reuse an internal PKI service for its own leaf certificate, transparently handling its rotation, is a notable shortcoming. Moreover, it means that OpenBao then relies on some other root of trust to bootstrap it, which may be a manual process if that trust does not itself support ACME (or, if no ACME client is present in the environment). Building an ACME client into OpenBao allows it to interact via open protocols to acquire certificates even when the environment lacks such tools, regardless of whether that source is internal or external. With Google's push for 90-day public TLS certificates, any certificate management process other than an automated one is practically unimplementable for certificates from a public CA. When OpenBao is connected to the public internet, it is thus doubly important to have ACME support built-in. User-facing DescriptionACME is the most widely adopted certificate management protocol for TLS certificates. While EST, CMPv2, and SCEP exist for different niches (seemingly IoT, Telco, and device attestation respectively), none are as widely adopted as ACME, in either the public CA or private CA space. Further, ACME was the first protocol to bridge the disparate public CA APIs and domain validation processes into a single, automated protocol. OpenBao defaults to a TLS-enabled listener because secrets management fundamentally isn't secure unless that connection is secure: even over private, internal networks, HTTPS is preferable to prevent accidental logging or packet captures from containing sensitive data. However, this listener lacks certificates by default: using ACME can fix this. ACME aims to automate certificate issuance: when a ACME directory is known (usually defaulting to Let's Encrypt by convention otherwise), an ACME client can authenticate itself to a certificate authority and request certificates for particular domain name(s). The ACME protocol currently defines three types of requests: a HTTP request, hitting a By utilizing ACME in conjunction with a By utilizing "on-demand" ACME certificates (whereby requested SNIs can be automatically requested from the CA), no additional service name configuration is required. Note that reusing the bind address may not work as it may not be to a particular domain but instead to a global listen address such as Technical DescriptionThe Caddy server uses Notable complexity still remains however: in addition to more attributes to configure TLS with ACME (including choice of directory, an option root CA for connecting to said directory, and any external account bindings), for instances wishing to use DNS providers, additional support is necessary for provisioning a When using HTTP challenges, ACME requires the CA validate against port 80. When this bound, we'll solve challenges through OpenBao's Further, Fraser's IETF draft for ACME service discovery appears not to have been accepted by the ACME WG; a replacement standard with reliance on Certificates and keys are stored in-memory. This means that each restart of the service will request new certificates, but also prevents them from being stored on disk unencrypted. When provisioned as the root of trust, this shouldn't cause issues as OpenBao may not have quota information and thus won't ratelimit itself. However, when running against a public CA, too frequent restarts may trigger global rate limiting. This may warrant an option in the future to cache certificates on-disk, if also to help restarts and avoid a CA outage from affecting the ability to become operational. This would also allow unsealing OpenBao externally via the TLS listener on subsequent startups. The following parameters will be added to the TCP listener configuration:
Rationale and AlternativesMaking TLS infrastructure easier to manage is a core use case of OpenBao; this feature brings much-needed UX improvements to this side of instance management. One alternative would be to implement native support for OpenBao's PKI secrets engine's APIs. This could be done by utilizing support for parsing existing certificates back into API parameters, but requiring an authentication bypass as there's no concept of a "self" token for listener-to-plugin requests. Indeed, OpenBao may not be unsealed yet and thus may not even have access to unencrypted storage. Without solving the self-authentication approach, ACME support is preferable as a localhost-only listener can be used to fetch certificates, assuming . Regardless of approach though (ACME vs native PKI APIs), unless some persistent caching of certificates and keys is utilized (likely unencrypted on disk), an auto-unseal mechanism must be used in order to automatically retrieve certificates on startup. Additionally, while native PKI APIs make sense for cross-instance communication in a situation where DNS could not be provisioned or necessarily trusted, DNS must still be trusted for the API request and thus there's little benefit over ACME unless running on non-standard ports. DownsidesThe approach above for two listeners (HTTP on localhost, HTTPS bound preferably) stipulates that all requested SNIs be self-reachable unless DNS challenges were used; this may not be the case in e.g., dual-homed virtualization environments with validated IP addresses (wherein a passthrough listener is used in conjunction with a VM-network or local listener; for DNS hostnames this can be avoided by configuring a local Security Implications
User/Developer ExperienceWhen deploying public network routable, production grade instances, this would make the experience much smoother assuming Let's Encrypt was the preferred certificate provider. Otherwise, this requires setting fewer parameters and requiring less provisioning when an organization can use ACME (at minimum the Unresolved QuestionsThe current Proof of Concept lacks ALPN plumbing and libdns support; both should be achievable, though the latter will require building a wrapper library. Additionally,
as they don't use our formatter. While we could update the formatting of the log message to use the new format, it might be hard to ensure the log locations are reused and adequately locked to allow two disparate writers. Proof of ConceptSee code: https://github.com/cipherboy/openbao/pull/new/auto-tls-listener. Presently this is buildable, but requires Assuming that's alright: use
This node is pre-provisioned with ACME support; we save the root CA for later. Modify the configuration file as so:
Namely, we change the second listener to use a custom ACME directory pointed at the first listener. I've also updated Now we can restart
and in a new terminal, unseal
(ignoring the warning about the ndoe not running, since it wasn't started through Finally, the new address can be used:
Note that changing |
Although there is a built-in PKI secrets engine, OpenBao can't bootstrap its own TLS certificate. A handy feature would be able to use the PKI secrets engine to create OpenBao's own TLS serving certificate, and include it as part of the seal data (and deal with certificate rotation as well). While this would only be doable for a self-unsealing mechanism that doesn't require the API to be up to perform the unseal, it would resolve what today is a difficult chicken-and-egg problem. (Though this would be less of a problem if unsealing cloud be done over a named pipe or other secure non-network mechanism.)
Cloned from #16
The text was updated successfully, but these errors were encountered: