-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHDR control endpoint consumes a CPU core polling a socket with no timeout #514
Comments
Actually, after going over the code once more, I believe this problem affects all RFNoC devices as the uhd/host/lib/rfnoc/link_stream_manager.cpp Line 115 in 748162e
This makes more sense too because on that Ubuntu system I mentioned, I have seen this issue streaming two channels with 160MHz of bandwidth from an N320 (XG firmware). It also leads me to believe that my guess is right that the control |
Issue Description
The X300 uses a thread owned by the
chdr_ctrl_endpoint
class to poll for control ACKs and asynchronous command responses.uhd/host/lib/rfnoc/chdr_ctrl_endpoint.cpp
Line 121 in 748162e
Calls to receive UDP packets eventually reach a function that uses
recv(..., MSG_DONTWAIT)
and thenpoll(..., timeout_ms)
to check for packets and then wait for packets if none were available.uhd/host/lib/include/uhdlib/transport/udp_common.hpp
Line 101 in 748162e
However, the thread always passes a timeout of
0
which causespoll
to return as quickly as possible. If no packets are received, the thread attempts to sleep.uhd/host/lib/rfnoc/chdr_ctrl_endpoint.cpp
Line 150 in 748162e
If the system is under load, the kernel may not be able to sleep this thread in the time allotted (I'm guessing, see Additional Information). Which leaves this thread to wrap up a CPU core polling for UDP packets that arrive relatively infrequently.
The "right" solution probably consists of passing a non-zero timeout to
poll
that will let the kernel block this thread until data arrives or the timeout expires. But currently,poll
is called while holding amutex
shared between other threads that need to communicate with the device, including those sending commands the receiving thread needs to respond to. Thus, passing a non-zero timeout causes device initialization, for example, to take several minutes. The mutex is owned by this class.Comments throughout the code refer to a "threaded_io_service" that needs to be developed to solve this problem. Right now, the only work around I have found is to patch UHD to try and sleep this thread for a longer amount of time. Using a sleep time of
100us
has worked for me and doesn't seem to affect functional behavior. I'm not 100% sure there aren't any side effects to this action though.Setup Details
Expected Behavior
The thread named
uhd_ctrl_ep_<id>
should consume very little CPU time when executing thebenchmark_rate
example program.Actual Behaviour
The thread named
uhd_ctrl_ep_<id>
consumes around up to 99% of CPU time when executing thebenchmark_rate
example program (and other applications).Steps to reproduce the problem
top -H
benchmark_rate --args "addr=192.168.40.2" --rx_rate 200e6 --duration 60
Additional Information
I was able to reproduce the problem with lower sampling rates as well on the system described. But on another system (Ubuntu 20, Linux 5.11, Intel i9-9880H CPU, 16 cores) I was unable to reproduce the issue. I gave my guess above that the kernel is unable to sleep the thread in time on the "smaller" system when it is under load.
Questions
The text was updated successfully, but these errors were encountered: