RFE: Avoid host networking for Ironic #21

dtantsur · 2024-03-27T13:33:10Z

Exposing Ironic on the host networking is far from ideal. For instance, if we do so, we're going to expose JSON RPC. There may be other internal traffic that we don't want everyone to see. In theory, only dnsmasq must be on host networking.

So, why does Metal3 use host networking at all?

DNSmasq serves as DHCP and TFTP servers. Both are UDP based and hard to route. DHCP involves broadcasts.
When booting over iPXE, hosts need to download iPXE scripts and kernel/initramfs from an HTTP server. This server is local to the Ironic instance that handles this host. Since the host is not yet a part of the cluster network, it cannot use the cluster service DNS name or IP.
IPA needs to reach back to the Ironic API (any of the running instances, optimally to the one handling the host). Still no cluster networking at this point.

One complication is supporting the bootstrap scenario. While most Metal3 consumers bootstrtap their production clusters by establishing a temporary cluster with Metal3 and then pivoting, OpenShift has an important limitation: the bootstrap cluster only provisions control plane hosts. Thus, it cannot rely on any components that won't come up without workers, including e.g. Ingress.

dtantsur · 2024-03-27T13:33:29Z

/triage accepted
/lifecycle frozen

dtantsur · 2024-03-27T13:49:07Z

Currently discussed solution: using Ingress with a fall-back to a simple httpd-based proxy (probably derived from OpenShift's ironic-proxy) for edge cases like OpenShift.

The Ironic API part is relatively simple and could even be fixed with a load balancer like MetalLB. Ironic-standalone-operator could even create an IPAddressPool for the provisioning network if it's present (otherwise, just expect the human operator to create one). DNSmasq will refer the booting hosts to the load balanced IP address, which will reach to any Ironic instance.

iPXE configuration is harder. The boot configuration API will be required to make sure Ironic serves the iPXE scripts correctly regardless of whether it handles this host. But we still need to serve kernel/initramfs/ISO images, and these should not be proxied through Ironic.

The issue with images can be handled by using Ingress. Since each Ironic is aware of its host name (and thus the cluster DNS name), it can compose an image URL with a sub-path that refers to the right Ironic. So, the Ironic instance with the name ironic-1 will serve images from http(s):https://<ingress IP>/images/ironic-1/..., which will be redirected to http(s):https://ironic-1.<service>.<namespace>:6183/....

Open questions:

Can we have Ingress routes without HTTPS? it is required by both iPXE in the general case and virtual media in some rare cases.
Can we have Ingress IPs on control plane nodes? We do not want normal workloads to anyhow cross paths with either the provisioning network or the exposes Ironic API.

lentzi90 · 2024-03-28T06:25:25Z

Can we have Ingress routes without HTTPS? it is required by both iPXE in the general case and virtual media in some rare cases.

Ingress can handle both HTTP and HTTPS traffic. In general, it is expected that TLS termination happens in the ingress controller though, so the traffic reaching the "backend" would be HTTP. There are solutions to work around this when the traffic needs to be encrypted all the way, but it may be better then to consider LoadBalancers that deal with TCP instead.

Can we have Ingress IPs on control plane nodes? We do not want normal workloads to anyhow cross paths with either the provisioning network or the exposes Ironic API.

I'm not sure I understand the question here but I will try to clarify what I do know. Ingress-controllers are usually exposed through LoadBalancers. The exact implementation of this will differ between clusters, but I would say that it is quite common to exclude control-plane nodes, since you would not normally run the ingress-controller there. Traffic can still be forwarded to any node in the cluster. That said, it is definitely possible to configure things so that the ingress-controller runs on control-plane nodes and that the LoadBalancer targets them.

hardys · 2024-03-28T09:36:17Z

Can we have Ingress IPs on control plane nodes? We do not want normal workloads to anyhow cross paths with either the provisioning network or the exposes Ironic API.

As @lentzi90 says, in situations where dedicated compute hosts exist the application Ingress endpoint would normally be configured so it cannot connect to the control-plane hosts.

But IIUC the question here is actually can we run an additional ingress endpoint with a special configuration that targets the provisioning network, which I think probably is possible by running an additional Ingress Controller and something like IngressClass - we'd also need to consider how to restrict access to that Ingress endpoint so regular users can't connect to the provisioning network.

dtantsur · 2024-03-28T14:08:18Z

This sounds like a lot of complexity to me. I start seeing writing our own simple load balancer based on httpd as actually a viable solution.

dtantsur · 2024-04-14T13:55:33Z

"Fun" addition: I've recently learned that some BMCs severely restrict the URL length for virtual media. So if we start using longer URLs, we may see more issues.

Rozzii · 2024-06-28T09:30:14Z

Linking @mboukhalfa 's discussion an POCs
https://github.com/orgs/metal3-io/discussions/1739

zaneb · 2024-07-04T05:50:45Z

I think you missed off a key reason why we can't just use a NodePort Service to expose the pod network (as in @mboukhalfa's PoC): node ports are constrained to a particular range (30000-32767) and available to Services on a first-come first-served basis. That means users in any namespace can squat on a port and steal traffic intended for ironic, which is a significant security vulnerability. (For existing deployments it also means requiring all users to change the settings for any external firewall they have to account for the ironic port changing.)

This could perhaps be mitigated by only running ironic on the control plane nodes and never allowing user workloads on those nodes. But OpenShift at least has topologies that allow running user workloads on the control plane, so this would be a non-starter for us. Although actually I think kubeproxy will forward traffic to any node on that port to the Service, so even separating the workloads doesn't help.

If this actually worked it would have made many things sooo much easier. So it is not for want of motivation that we haven't tried it.

I don't believe there is a viable alternative to host networking.

I start seeing writing our own simple load balancer based on httpd as actually a viable solution.

The ironic-proxy that you implmented in OpenShift is exactly that, isn't it?

dtantsur · 2024-07-04T08:43:42Z

The ironic-proxy that you implmented in OpenShift is exactly that, isn't it?

Yes. Some community members are not fond of using an alternative to an existing solution, but I actually believe you're right.

mboukhalfa · 2024-07-04T10:10:41Z

@zaneb, good point. We are trying to encourage people to raise concerns in all ways in the discussion https://github.com/orgs/metal3-io/discussions/1739. That's the reason behind having these PoCs. The current showcase is very limited, and it doesn't even consider the dnsmasq case. We foresee that the final design and implementation for Ironic and Metal3 networks will not be easy or quick. It is a long-term process.

Our plan is to start with the following ideas:

NodePort
LoadBalancer services [WIP] PoC Hostnetworkless Ironic with VirtualMedia Using MetalLB LoadBalancer metal3-dev-env#1435
Multus CNI (k8s multi-network)
Possibly a custom LoadBalancer

I would like to get your feedback on the LoadBalancer and Multus use cases in this discussion https://github.com/orgs/metal3-io/discussions/1739. I am not an expert in network security within Kubernetes, so that's something we should document along the way.

Rozzii · 2024-07-11T06:39:38Z

This is not really frozen since @mboukhalfa is working on investigating this very topic.
/remove lifecycle frozen

Rozzii · 2024-07-11T06:40:13Z

/remove lifecycle-frozen

Rozzii · 2024-07-11T06:40:40Z

/remove lifecycle frozen

metal3-io-bot added the needs-triage Indicates an issue lacks a `triage/foo` label and requires one. label Mar 27, 2024

metal3-io-bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Mar 27, 2024

dtantsur mentioned this issue Mar 27, 2024

RFE: finish Distributed architecture support #3

Open

5 tasks

zaneb mentioned this issue Jul 4, 2024

[WIP] PoC Hostnetworkless Ironic with VirtualMedia Using NodePorts metal3-io/metal3-dev-env#1433

Open

Rozzii assigned mboukhalfa Jul 11, 2024

Rozzii removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: Avoid host networking for Ironic #21

RFE: Avoid host networking for Ironic #21

dtantsur commented Mar 27, 2024

dtantsur commented Mar 27, 2024

dtantsur commented Mar 27, 2024

lentzi90 commented Mar 28, 2024

hardys commented Mar 28, 2024 •

edited

Loading

dtantsur commented Mar 28, 2024

dtantsur commented Apr 14, 2024

Rozzii commented Jun 28, 2024

zaneb commented Jul 4, 2024

dtantsur commented Jul 4, 2024

mboukhalfa commented Jul 4, 2024

Rozzii commented Jul 11, 2024

Rozzii commented Jul 11, 2024

Rozzii commented Jul 11, 2024

RFE: Avoid host networking for Ironic #21

RFE: Avoid host networking for Ironic #21

Comments

dtantsur commented Mar 27, 2024

dtantsur commented Mar 27, 2024

dtantsur commented Mar 27, 2024

lentzi90 commented Mar 28, 2024

hardys commented Mar 28, 2024 • edited Loading

dtantsur commented Mar 28, 2024

dtantsur commented Apr 14, 2024

Rozzii commented Jun 28, 2024

zaneb commented Jul 4, 2024

dtantsur commented Jul 4, 2024

mboukhalfa commented Jul 4, 2024

Rozzii commented Jul 11, 2024

Rozzii commented Jul 11, 2024

Rozzii commented Jul 11, 2024

hardys commented Mar 28, 2024 •

edited

Loading