B210 uhd::usb_error exception on exit with certain --args on Mac OS #68

jblackaby · 2016-07-28T16:35:02Z

I am getting an unhandled exception on exit when running the example rx_samples_to_file when I increase the num_recv_frames parameter. For example, I get the following:

./rx_samples_to_file --args "num_recv_frames=512" --file samples.dat --nsamps 10000000 --rate 25000000
Mac OS; Clang version 7.3.0 (clang-703.0.31); Boost_106100; UHD_003.009.004-1143-g2efc2bc5


Creating the usrp device with: num_recv_frames=512...
-- Loading firmware image: /<redacted>/uhd/share/uhd/images/usrp_b200_fw.hex...
-- Detected Device: B210
-- Loading FPGA image: /<redacted>/uhd/share/uhd/images/usrp_b210_fpga.bin... done
-- Operating over USB 3.
-- Detecting internal GPSDO.... No GPSDO found
-- Initialize CODEC control...
-- Initialize Radio control...
-- Performing register loopback test... pass
-- Performing register loopback test... pass
-- Performing CODEC loopback test... pass
-- Performing CODEC loopback test... pass
-- Asking for clock rate 16.000000 MHz... 
-- Actually got clock rate 16.000000 MHz.
-- Performing timer loopback test... pass
-- Performing timer loopback test... pass
-- Setting master clock rate selection to 'automatic'.
Using Device: Single USRP:
  Device: B-Series Device
  Mboard 0: B210
  RX Channel: 0
    RX DSP: 0
    RX Dboard: A
    RX Subdev: FE-RX2
  RX Channel: 1
    RX DSP: 1
    RX Dboard: A
    RX Subdev: FE-RX1
  TX Channel: 0
    TX DSP: 0
    TX Dboard: A
    TX Subdev: FE-TX2
  TX Channel: 1
    TX DSP: 1
    TX Dboard: A
    TX Subdev: FE-TX1

Setting RX Rate: 25.000000 Msps...
-- Asking for clock rate 25.000000 MHz... 
-- Actually got clock rate 25.000000 MHz.
-- Performing timer loopback test... pass
-- Performing timer loopback test... pass
Actual RX Rate: 25.000000 Msps...

Setting RX Freq: 0.000000 MHz...
Actual RX Freq: 37.500000 MHz...

Waiting for "lo_locked": ++++++++++ locked.


Done!


UHD Error:
    An unexpected exception was caught in a task loop.
    The task loop will now exit, things may not work.
    EnvironmentError: IOError: usb rx8 transfer status: LIBUSB_TRANSFER_CANCELLED
libc++abi.dylib: terminating with uncaught exception of type uhd::usb_error: RuntimeError: USBError -4: usb tx4 submit failed: LIBUSB_ERROR_NO_DEVICE
Abort trap: 6

The capture runs to completion and the samples in the file seem to be complete. I have also seen this problem when linking to libuhd in my own code and the exception is occurring on shutdown (in the multi_usrp destructor). During runtime, everything appears to be working properly.

I am building libuhd with libusb 1.0.20 and boost 1.61.0 and running on Mac OS 10.11.6. I am using the 3.9.4 release version of UHD. This issue did not occur on 3.8.x releases, but has been a problem ever since 3.9.0.

The recv_frame_size parameter also seems to have an effect. If I run with --args "num_recv_frames=256" it does not throw the exception, but if I run with --args "recv_frame_size=16376, num_recv_frames=256" does throw the exception on exit.

It appears that the total amount of buffering is related in some way. I am trying to increase the amount of buffering to the maximum possible to avoid overruns during real-time processing. One of the major problems caused by this crash is that the firmware has to be reloaded on startup after it occurs, which takes time.

The text was updated successfully, but these errors were encountered:

mbr0wn · 2016-08-02T16:39:51Z

@jblackaby we're not able to reproduce this here -- can you please post this to the mailing list? We'll be able to help more effectively there.

mbr0wn · 2016-08-15T17:51:01Z

@jblackaby Looks like we're (intermittently) able to reproduce this. Thanks for bringing it up! We don't have a good path forward yet for fixing this, but we can now look into it.

michaelld · 2016-10-06T12:24:45Z

@jblackaby I'm running macOS Sierra (10.12.0), Xcode 8.0.0, MacPorts latest from SVN/GIT, latest UHD from GIT maint/master, latest LIBUSB from GIT master. I can replicate the issue about 50% of the time. There have been changes to UHD between 3.8 to 3.9, and again from 3.9 to 3.10, which will affect the actual printed error -- but, the gist is the same. When this happens, the next time I use the B210, the FW images have to both be reloaded, so the issue here is really that the device isn't closing out USB robustly. Do you see this similarly?

jblackaby · 2016-10-06T16:07:42Z

@michaelld, Yeah. I just tried it again and I am seeing the same problem. My setup is pretty much identical to yours.

michael-west · 2016-10-06T16:54:54Z

This is the same error you get when you unplug the USB cable when the device is running, so it appears that something is causing the device to drop off the bus during the destruction of the multi_usrp object. It could be something in UHD, but it may be a USB controller issue. I seem to recall other users claiming that they cannot set the num_recv_frames above 128 or they see errors. The num_recv_frames is the number of bulk receive USB transfers that are queued up. It is a tunable parameter, so it is intended to be adjusted to suit the hardware. Some controllers may just not be able to handle that many transfers. I am currently looking into all things B2xx/USB, so I will add this to the list. Since the default values work and lowering the value of the num_recv_frames seems to resolve the issue, this will be a lower priority.

Increasing the num_recv_frames in this context is trying to use the USB transport to buffer up data because the disk I/O cannot sustain the rate. I'm not sure that is a good idea. Adding intermediate buffering between the USB I/O and disk I/O is probably a better option.

jblackaby · 2016-10-06T17:50:45Z

@michael-west, Thank you for taking a look at this.

I understand that the parameters can be set in a way that does not work with the hardware, but these parameters work in the 3.8.x series of UHD, so it seems to be something that changed in UHD between 3.8.5 and 3.9.0.

In my actual application I am performing a large amount of buffering right after receiving the data from UHD. I basically have a thread that is only reading from uhd and copying the data to a circular buffer in memory. I then pull data out of that circular buffer in a separate thread and do some processing on the samples.

In this configuration without raising num_recv_frames and recv_frame_size, data is still often dropped before I even get it from UHD without my buffer ever filling up. When I raise num_recv_frames and recv_frame_size, it greatly reduces the chance that data is dropped. This indicates to me that the USB buffer is overflowing before I even get the data. The processing I am performing is only moderate and is only be using around one CPU core on a 4 core machine, so I don't think the thread reading from UHD is CPU bound in any way. I feel like that thread is about as tight as it can be (basically just continually reading from UHD and copying the data to a circular buffer), yet I still can't service it fast enough not to drop data without those parameters raised.

michael-west · 2016-10-06T18:20:18Z

Good information. Thanks! Since there is a change from 3.8.5 to 3.9.0, it does look like a regression. I will look into it further.

michaelld · 2016-10-31T16:35:27Z

An obvious suspect is the file "host/lib/transport/libusb1_zero_copy.cpp", which is where the error is actually generated. 3.8.5 was released on July 21 (2015), and 3.9.0 on August 31 (2015). This file was modified in b08352f on July 29. The log for this commit even sounds about right for this issue: "Unhandled exceptions during destruction of multi_usrp object cause application to terminate".

michaelld · 2016-11-15T14:52:08Z

After some more debugging, here's my take-away: More likely the issue is with IOKit internally. It might be that UHD's USB interface could be tweaked to work correctly, but it's not clear what that tweaking might be because the UHD USB programming is so non-obvious. The best solution is probably to not cancel LUTs in ~libusb_zero_copy_single on macOS (or Mac OS X), but that causes other issues to be fixed elsewhere (which, honestly, should probably be fixed anyway so maybe this is the better way to go).

Details:

Apple's USB implementation in IOKit handles cancellations on a pipe-by-pipe basis, not a transfer-by-transfer basis. In ~libusb_zero_copy_single, the looping through to cancel all transfers (if one were to look at the value returned by "libusb_cancel_transfer") actually errors out about 1/2 way through, which I think is that since the IOKit call is non-blocking means that IOKit finally catches up to the cancel request & returns "yeah: cancel is in progress stop bugging me to keep cancelling!".
libusb seems to be doing the right thing internally with the LUT cancel. The issue seems to be IOKit itself -- somehow (maybe) the extraneous cancellations spill over and coerce IOKit into aborting the other UHD USB pipe. The debug printouts are very clear that rx8 transfers are aborted -before- the first rx8 LUT is cancelled in ~ libusb_zero_copy_single. Thus, the UHD USB code is doing the right thing here: it detects an error in transfer and throws a runtime error, when there is no other evidence that something might be going on (e.g., UHD is terminating).
At least in theory, cancelling LUTs is not necessary. In my testing, I remove the cancelling loop and add in a new _status of "STATUS_TERMINATING" that stops all new activity once set (sort of like STATUS_ERROR). This works until USB is closed, and then the close fails due to a lack of thread safety between the top-level libusb task and "libusb_close".
With (3) in place, it also looks like there are still transfers in place even after "wait_for_completion" is called. I think this is due to the way UHD keeps track of transfers ... the variable naming is kinda not obvious & I'm not sure the transfers are really kept track of for which are in use and which are not. There has -got- to be a better way.
IMHO, the whole USB interface is non-obvious how it works, nor does variable naming help, nor the use of variables. Maybe it's very C++ oriented, but it's a real PITA to debug. There has -got- to be a better way.

jblackaby · 2016-11-29T20:29:26Z

@michaelld, thank you for looking into this. It sounds complicated.

otherjason · 2017-02-03T13:23:56Z

@michael-west Any update on this? The big problem with this issue is that it causes a hard termination of the client application. When the error occurs, an exception gets thrown at one of the upper layers of the UHD code. That exception is thrown up to the client, but in the process of unwinding the stack from the frame that threw the exception, a second exception is thrown. Per C++'s defined behavior in this instance, std::terminate() is called, which terminates the user application. There's no good way from the outside of UHD to prevent this.

kyeshmz · 2018-02-27T11:08:22Z

Would love to see if there is an update on this as well.

jblackaby · 2018-02-27T15:19:15Z

We were able to find a workaround for this problem. First, do not issue a command to stop the stream with streamer->issue_stream_cmd(uhd::stream_cmd_t::STREAM_MODE_STOP_CONTINUOUS). Then add a 0.5 second pause between destroying the rx_streamer object and the multi_usrp object.

streamer.reset();
this_thread::sleep_for(chrono::milliseconds(500));
usrp.reset();

I'm not sure why this works. Maybe it allows all of the USB transfers to cleanly exit before destroying the multi_usrp object. This allows us to set the receive buffer size to 16200 and the number of receive frames to 1000 without crashing on shutdown. This provides around 16MB of buffering, which is enough to continuously pull data reliably at up to 50Msps.

michael-west · 2018-02-28T19:43:22Z

@jblackaby Thanks for the follow up. That is good information. That approach will cause all libusb transfers to complete so none have to be canceled. The sleep time is a function of sample rate, number of frames, and frame size (sleep = # of frames * frame size / sample rate). It's probably a good workaround until the root issue is solved. There is clearly an issue with canceling transfers on OSX that is not seen on Linux or Windows.

Apologies to all affected. This one has slipped through the cracks for quite a while now and deserves some attention. I will see what I can do to raise the priority.

Brandon2255p · 2018-03-01T13:49:38Z

I have experienced a similar issue. It is similar because I am getting the LIBUSB_TRANSFER_CANCELLED but it is happening on rx6 not rx8. Also I am not getting this issue on exit but rather when I try to get_rx_stream

I am using the B200Mini. I have compiled a debug version of the UHD 3.9.7 driver on Windows to try and find out why this is happening.. The stack trace is below:

>	uhdd.dll!libusb_zero_copy_mb::get_new<uhd::transport::managed_recv_buffer>(const double timeout) Line 152	C++
 	uhdd.dll!libusb_zero_copy_single::get_buff<uhd::transport::managed_recv_buffer>(double timeout) Line 309	C++
 	uhdd.dll!libusb_zero_copy_impl::get_recv_buff(double timeout) Line 394	C++
 	uhdd.dll!uhd::usrp::recv_packet_demuxer_3000::_internal_get_recv_buff(const unsigned int sid, const double timeout) Line 99	C++
 	uhdd.dll!uhd::usrp::recv_packet_demuxer_3000::get_recv_buff(const unsigned int sid, const double timeout) Line 51	C++
 	uhdd.dll!boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>::call<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000>,unsigned int,double>(boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> & u, const void * __formal, unsigned int & b1, double & b2) Line 271	C++
 	uhdd.dll!boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>::operator()<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> >(boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> & u, unsigned int a1, double a2) Line 286	C++
 	uhdd.dll!boost::_bi::list3<boost::_bi::value<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> >,boost::_bi::value<unsigned int>,boost::arg<1> >::operator()<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>,boost::_bi::rrlist1<double> >(boost::_bi::type<boost::intrusive_ptr<uhd::transport::managed_recv_buffer> > __formal, boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double> & f, boost::_bi::rrlist1<double> & a, long __formal) Line 388	C++
 	uhdd.dll!boost::_bi::bind_t<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>,boost::_bi::list3<boost::_bi::value<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> >,boost::_bi::value<unsigned int>,boost::arg<1> > >::operator()<double>(double && a1) Line 1306	C++
 	uhdd.dll!boost::detail::function::function_obj_invoker1<boost::_bi::bind_t<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>,boost::_bi::list3<boost::_bi::value<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> >,boost::_bi::value<unsigned int>,boost::arg<1> > >,boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,double>::invoke(boost::detail::function::function_buffer & function_obj_ptr, double a0) Line 138	C++
 	uhdd.dll!boost::function1<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,double>::operator()(double a0) Line 770	C++
 	uhdd.dll!uhd::transport::sph::recv_packet_handler::set_xport_chan_get_buff(const unsigned __int64 xport_chan, const boost::function<boost::intrusive_ptr<uhd::transport::managed_recv_buffer> __cdecl(double)> & get_buff, const bool flush) Line 150	C++
 	uhdd.dll!b200_impl::get_rx_stream(const uhd::stream_args_t & args_) Line 442	C++
 	uhdd.dll!multi_usrp_impl::get_rx_stream(const uhd::stream_args_t & args) Line 743	C++

It is throwing in the code below:

    template <typename buffer_type>
    UHD_INLINE typename buffer_type::sptr get_new(const double timeout)
    {
        if (wait_for_completion(timeout))
        {
            if (result.status != LIBUSB_TRANSFER_COMPLETED)
                throw uhd::io_error(str(boost::format("usb %s transfer status: %d")
                                        % _name % libusb_error_name(result.status)));

The timeout is set to 0.00000000000000000

I have no idea why this is happening ...
In my application I do the multi_usrp::make and then allow the user to change sample rate, bandwidth, gain and frequency parameters in a GUI before they start the rx stream. Is this extended time causing issues? How can I prevent it from happening (sorry I did not understand the workaround - I do not know how to cancel the LUTs)?

Brandon2255p · 2018-03-01T14:23:59Z

If I move the thread the get_rx_stream to directly after the multi_usrp::make then the error does not occur on get_rx_stream.
The exact same error will then occur on the first call to m_pRxStreamer->recv(..)

Seems like a timing issue of some sort. Please help if possible!

aholtzma-am · 2021-03-30T14:29:27Z

This still happens on UHD 4.0 / MacOS 11 when I adjust num_recv_frames.

michaelld · 2021-03-30T18:25:11Z

@aholtzma-am thanks for the update. in your UHD-based application, at closing have you tried first telling UHD to shut down and then waiting 5 seconds (or 2, or some reasonably short time) to allow USB transfers to complete? That did the trick for some other users, and is worth a try here.

michaelld · 2022-06-02T16:08:23Z

I haven't tested B2x0 with UHD 4.1 or UHD 4.2, nor current libusb (1.0.26) -- which I know contains some fixes for interfacing with Darwin IOKit from prior releases. I doubt this issue is fixed, given the workarounds & my research showing some of the issues are in IOKit. Is anyone still experiencing this issue? I will add doing testing here for my work queue ... no timeline yet since it doesn't seem urgent.

ernestp · 2023-11-23T03:23:32Z

This exception also happens on Android and it is crashing Android app as well, so it is critical issue in that case.

mbr0wn added the bug label Sep 26, 2016

MattEttus assigned michaelld Sep 26, 2016

natetemple added the Tracking Tracking issue on internal bug repo label Jul 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B210 uhd::usb_error exception on exit with certain --args on Mac OS #68

B210 uhd::usb_error exception on exit with certain --args on Mac OS #68

jblackaby commented Jul 28, 2016

mbr0wn commented Aug 2, 2016

mbr0wn commented Aug 15, 2016

michaelld commented Oct 6, 2016

jblackaby commented Oct 6, 2016

michael-west commented Oct 6, 2016

jblackaby commented Oct 6, 2016 •

edited

michael-west commented Oct 6, 2016

michaelld commented Oct 31, 2016

michaelld commented Nov 15, 2016

jblackaby commented Nov 29, 2016

otherjason commented Feb 3, 2017

kyeshmz commented Feb 27, 2018

jblackaby commented Feb 27, 2018 •

edited

michael-west commented Feb 28, 2018

Brandon2255p commented Mar 1, 2018 •

edited

Brandon2255p commented Mar 1, 2018

aholtzma-am commented Mar 30, 2021

michaelld commented Mar 30, 2021

michaelld commented Jun 2, 2022

ernestp commented Nov 23, 2023

B210 uhd::usb_error exception on exit with certain --args on Mac OS #68

B210 uhd::usb_error exception on exit with certain --args on Mac OS #68

Comments

jblackaby commented Jul 28, 2016

mbr0wn commented Aug 2, 2016

mbr0wn commented Aug 15, 2016

michaelld commented Oct 6, 2016

jblackaby commented Oct 6, 2016

michael-west commented Oct 6, 2016

jblackaby commented Oct 6, 2016 • edited

michael-west commented Oct 6, 2016

michaelld commented Oct 31, 2016

michaelld commented Nov 15, 2016

jblackaby commented Nov 29, 2016

otherjason commented Feb 3, 2017

kyeshmz commented Feb 27, 2018

jblackaby commented Feb 27, 2018 • edited

michael-west commented Feb 28, 2018

Brandon2255p commented Mar 1, 2018 • edited

Brandon2255p commented Mar 1, 2018

aholtzma-am commented Mar 30, 2021

michaelld commented Mar 30, 2021

michaelld commented Jun 2, 2022

ernestp commented Nov 23, 2023

jblackaby commented Oct 6, 2016 •

edited

jblackaby commented Feb 27, 2018 •

edited

Brandon2255p commented Mar 1, 2018 •

edited