-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
B210 uhd::usb_error exception on exit with certain --args on Mac OS #68
Comments
@jblackaby we're not able to reproduce this here -- can you please post this to the mailing list? We'll be able to help more effectively there. |
@jblackaby Looks like we're (intermittently) able to reproduce this. Thanks for bringing it up! We don't have a good path forward yet for fixing this, but we can now look into it. |
@jblackaby I'm running macOS Sierra (10.12.0), Xcode 8.0.0, MacPorts latest from SVN/GIT, latest UHD from GIT maint/master, latest LIBUSB from GIT master. I can replicate the issue about 50% of the time. There have been changes to UHD between 3.8 to 3.9, and again from 3.9 to 3.10, which will affect the actual printed error -- but, the gist is the same. When this happens, the next time I use the B210, the FW images have to both be reloaded, so the issue here is really that the device isn't closing out USB robustly. Do you see this similarly? |
@michaelld, Yeah. I just tried it again and I am seeing the same problem. My setup is pretty much identical to yours. |
This is the same error you get when you unplug the USB cable when the device is running, so it appears that something is causing the device to drop off the bus during the destruction of the multi_usrp object. It could be something in UHD, but it may be a USB controller issue. I seem to recall other users claiming that they cannot set the num_recv_frames above 128 or they see errors. The num_recv_frames is the number of bulk receive USB transfers that are queued up. It is a tunable parameter, so it is intended to be adjusted to suit the hardware. Some controllers may just not be able to handle that many transfers. I am currently looking into all things B2xx/USB, so I will add this to the list. Since the default values work and lowering the value of the num_recv_frames seems to resolve the issue, this will be a lower priority. Increasing the num_recv_frames in this context is trying to use the USB transport to buffer up data because the disk I/O cannot sustain the rate. I'm not sure that is a good idea. Adding intermediate buffering between the USB I/O and disk I/O is probably a better option. |
@michael-west, Thank you for taking a look at this. I understand that the parameters can be set in a way that does not work with the hardware, but these parameters work in the 3.8.x series of UHD, so it seems to be something that changed in UHD between 3.8.5 and 3.9.0. In my actual application I am performing a large amount of buffering right after receiving the data from UHD. I basically have a thread that is only reading from uhd and copying the data to a circular buffer in memory. I then pull data out of that circular buffer in a separate thread and do some processing on the samples. In this configuration without raising |
Good information. Thanks! Since there is a change from 3.8.5 to 3.9.0, it does look like a regression. I will look into it further. |
An obvious suspect is the file "host/lib/transport/libusb1_zero_copy.cpp", which is where the error is actually generated. 3.8.5 was released on July 21 (2015), and 3.9.0 on August 31 (2015). This file was modified in b08352f on July 29. The log for this commit even sounds about right for this issue: "Unhandled exceptions during destruction of multi_usrp object cause application to terminate". |
After some more debugging, here's my take-away: More likely the issue is with IOKit internally. It might be that UHD's USB interface could be tweaked to work correctly, but it's not clear what that tweaking might be because the UHD USB programming is so non-obvious. The best solution is probably to not cancel LUTs in ~libusb_zero_copy_single on macOS (or Mac OS X), but that causes other issues to be fixed elsewhere (which, honestly, should probably be fixed anyway so maybe this is the better way to go). Details:
|
@michaelld, thank you for looking into this. It sounds complicated. |
@michael-west Any update on this? The big problem with this issue is that it causes a hard termination of the client application. When the error occurs, an exception gets thrown at one of the upper layers of the UHD code. That exception is thrown up to the client, but in the process of unwinding the stack from the frame that threw the exception, a second exception is thrown. Per C++'s defined behavior in this instance, std::terminate() is called, which terminates the user application. There's no good way from the outside of UHD to prevent this. |
Would love to see if there is an update on this as well. |
We were able to find a workaround for this problem. First, do not issue a command to stop the stream with streamer.reset();
this_thread::sleep_for(chrono::milliseconds(500));
usrp.reset(); I'm not sure why this works. Maybe it allows all of the USB transfers to cleanly exit before destroying the |
@jblackaby Thanks for the follow up. That is good information. That approach will cause all libusb transfers to complete so none have to be canceled. The sleep time is a function of sample rate, number of frames, and frame size (sleep = # of frames * frame size / sample rate). It's probably a good workaround until the root issue is solved. There is clearly an issue with canceling transfers on OSX that is not seen on Linux or Windows. Apologies to all affected. This one has slipped through the cracks for quite a while now and deserves some attention. I will see what I can do to raise the priority. |
I have experienced a similar issue. It is similar because I am getting the LIBUSB_TRANSFER_CANCELLED but it is happening on rx6 not rx8. Also I am not getting this issue on exit but rather when I try to I am using the B200Mini. I have compiled a debug version of the UHD 3.9.7 driver on Windows to try and find out why this is happening.. The stack trace is below:
It is throwing in the code below:
The timeout is set to I have no idea why this is happening ... |
If I move the thread the Seems like a timing issue of some sort. Please help if possible! |
This still happens on UHD 4.0 / MacOS 11 when I adjust num_recv_frames. |
@aholtzma-am thanks for the update. in your UHD-based application, at closing have you tried first telling UHD to shut down and then waiting 5 seconds (or 2, or some reasonably short time) to allow USB transfers to complete? That did the trick for some other users, and is worth a try here. |
I haven't tested B2x0 with UHD 4.1 or UHD 4.2, nor current libusb (1.0.26) -- which I know contains some fixes for interfacing with Darwin IOKit from prior releases. I doubt this issue is fixed, given the workarounds & my research showing some of the issues are in IOKit. Is anyone still experiencing this issue? I will add doing testing here for my work queue ... no timeline yet since it doesn't seem urgent. |
This exception also happens on Android and it is crashing Android app as well, so it is critical issue in that case. |
I am getting an unhandled exception on exit when running the example
rx_samples_to_file
when I increase thenum_recv_frames
parameter. For example, I get the following:The capture runs to completion and the samples in the file seem to be complete. I have also seen this problem when linking to libuhd in my own code and the exception is occurring on shutdown (in the
multi_usrp
destructor). During runtime, everything appears to be working properly.I am building libuhd with libusb 1.0.20 and boost 1.61.0 and running on Mac OS 10.11.6. I am using the 3.9.4 release version of UHD. This issue did not occur on 3.8.x releases, but has been a problem ever since 3.9.0.
The
recv_frame_size
parameter also seems to have an effect. If I run with--args "num_recv_frames=256"
it does not throw the exception, but if I run with--args "recv_frame_size=16376, num_recv_frames=256"
does throw the exception on exit.It appears that the total amount of buffering is related in some way. I am trying to increase the amount of buffering to the maximum possible to avoid overruns during real-time processing. One of the major problems caused by this crash is that the firmware has to be reloaded on startup after it occurs, which takes time.
The text was updated successfully, but these errors were encountered: