Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realtime kernels and spectre mitigations #141

Open
dtaht opened this issue Oct 23, 2022 · 2 comments
Open

Realtime kernels and spectre mitigations #141

dtaht opened this issue Oct 23, 2022 · 2 comments
Milestone

Comments

@dtaht
Copy link
Collaborator

dtaht commented Oct 23, 2022

I in general run realtime kernels wherever I can (leveraging ubuntu studio in my case). My principal application (ardour) requires it in order to have good flow for audio mixing. With R/T I can usually do 2.7ms of latency reliably for audio mixing. RT in general has historically been used for many device control applications, and I've sometimes worried a lot that my data on packet processing was skewed flatter because I'm always testing on a RT kernel. Anyway, testing on a RT kernel on bare metal might show an improvement on irq handling and other long tail p99 latencies, and I do rather highly recommend using it on your desktop.

Somewhat relative to that - is that the plethora of spectre vulnerability mitigations are not needed on a bare metal system, and some can be disabled at boot, others compiled out. Spectre is primarily a virtual-machine-breaching problem. One of the most recent vulns killed performance by over 30% with the initial round of mitigations.

So it does seem possible to produced a more tuned kernel for what libreqos is doing. But it would help to measure more first, this is just a note for future use.

@interduo
Copy link
Collaborator

Remmember to set:
noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off tsx=on tsx_async_abort=off mitigations=off (not only mitigations=off)

If You use virtualization, disable set kernel options in hypervisor and in guest VM.

The penalty of not turning it off depends on CPU generation if newer than the penalty is less. We did a performance check and on week CPU ussage (average, not max) there was a ~3% diference (on Proxmox VM) but the network throughput was slightly different.

@rchac maybe You could add this to wiki in the performance tips?

@rchac
Copy link
Member

rchac commented Oct 28, 2022

As mentioned by @tohojo here it's important we consider the security implications before recommending disabling mitigations by default. @interduo true - often, newer BIOS and CPU firmware have the mitigations baked into hardware to where disabling them in the kernel either has no impact or actually reduces performance. 3% difference may not be enough to justify the risk. That said, I agree there are cases where turning it off could dramatically improve performance in VM hosts where LibreQoS is the only guest.

We should probably keep doing measurements before/after on additional recent CPUs to see what impact it has, and evaluate the threat model. InfluxDB and the Flask API seem like the main potential targets - but these are usually only accessible to ISP employees anyway. I just want to be careful in prescribing turning off mitigations.

@dtaht dtaht added this to the v1.4 milestone Nov 9, 2022
@dtaht dtaht modified the milestones: v1.4, v1.5 Jan 12, 2023
@rchac rchac modified the milestones: v1.5 Beta, v1.6 May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants