Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B210 uhd::usb_error exception on exit with certain --args on Mac OS #68

Open
jblackaby opened this issue Jul 28, 2016 · 20 comments
Open
Assignees
Labels
bug Tracking Tracking issue on internal bug repo

Comments

@jblackaby
Copy link

I am getting an unhandled exception on exit when running the example rx_samples_to_file when I increase the num_recv_frames parameter. For example, I get the following:

./rx_samples_to_file --args "num_recv_frames=512" --file samples.dat --nsamps 10000000 --rate 25000000
Mac OS; Clang version 7.3.0 (clang-703.0.31); Boost_106100; UHD_003.009.004-1143-g2efc2bc5


Creating the usrp device with: num_recv_frames=512...
-- Loading firmware image: /<redacted>/uhd/share/uhd/images/usrp_b200_fw.hex...
-- Detected Device: B210
-- Loading FPGA image: /<redacted>/uhd/share/uhd/images/usrp_b210_fpga.bin... done
-- Operating over USB 3.
-- Detecting internal GPSDO.... No GPSDO found
-- Initialize CODEC control...
-- Initialize Radio control...
-- Performing register loopback test... pass
-- Performing register loopback test... pass
-- Performing CODEC loopback test... pass
-- Performing CODEC loopback test... pass
-- Asking for clock rate 16.000000 MHz... 
-- Actually got clock rate 16.000000 MHz.
-- Performing timer loopback test... pass
-- Performing timer loopback test... pass
-- Setting master clock rate selection to 'automatic'.
Using Device: Single USRP:
  Device: B-Series Device
  Mboard 0: B210
  RX Channel: 0
    RX DSP: 0
    RX Dboard: A
    RX Subdev: FE-RX2
  RX Channel: 1
    RX DSP: 1
    RX Dboard: A
    RX Subdev: FE-RX1
  TX Channel: 0
    TX DSP: 0
    TX Dboard: A
    TX Subdev: FE-TX2
  TX Channel: 1
    TX DSP: 1
    TX Dboard: A
    TX Subdev: FE-TX1

Setting RX Rate: 25.000000 Msps...
-- Asking for clock rate 25.000000 MHz... 
-- Actually got clock rate 25.000000 MHz.
-- Performing timer loopback test... pass
-- Performing timer loopback test... pass
Actual RX Rate: 25.000000 Msps...

Setting RX Freq: 0.000000 MHz...
Actual RX Freq: 37.500000 MHz...

Waiting for "lo_locked": ++++++++++ locked.


Done!


UHD Error:
    An unexpected exception was caught in a task loop.
    The task loop will now exit, things may not work.
    EnvironmentError: IOError: usb rx8 transfer status: LIBUSB_TRANSFER_CANCELLED
libc++abi.dylib: terminating with uncaught exception of type uhd::usb_error: RuntimeError: USBError -4: usb tx4 submit failed: LIBUSB_ERROR_NO_DEVICE
Abort trap: 6

The capture runs to completion and the samples in the file seem to be complete. I have also seen this problem when linking to libuhd in my own code and the exception is occurring on shutdown (in the multi_usrp destructor). During runtime, everything appears to be working properly.

I am building libuhd with libusb 1.0.20 and boost 1.61.0 and running on Mac OS 10.11.6. I am using the 3.9.4 release version of UHD. This issue did not occur on 3.8.x releases, but has been a problem ever since 3.9.0.

The recv_frame_size parameter also seems to have an effect. If I run with --args "num_recv_frames=256" it does not throw the exception, but if I run with --args "recv_frame_size=16376, num_recv_frames=256" does throw the exception on exit.

It appears that the total amount of buffering is related in some way. I am trying to increase the amount of buffering to the maximum possible to avoid overruns during real-time processing. One of the major problems caused by this crash is that the firmware has to be reloaded on startup after it occurs, which takes time.

@mbr0wn
Copy link
Contributor

mbr0wn commented Aug 2, 2016

@jblackaby we're not able to reproduce this here -- can you please post this to the mailing list? We'll be able to help more effectively there.

@mbr0wn
Copy link
Contributor

mbr0wn commented Aug 15, 2016

@jblackaby Looks like we're (intermittently) able to reproduce this. Thanks for bringing it up! We don't have a good path forward yet for fixing this, but we can now look into it.

@michaelld
Copy link
Collaborator

@jblackaby I'm running macOS Sierra (10.12.0), Xcode 8.0.0, MacPorts latest from SVN/GIT, latest UHD from GIT maint/master, latest LIBUSB from GIT master. I can replicate the issue about 50% of the time. There have been changes to UHD between 3.8 to 3.9, and again from 3.9 to 3.10, which will affect the actual printed error -- but, the gist is the same. When this happens, the next time I use the B210, the FW images have to both be reloaded, so the issue here is really that the device isn't closing out USB robustly. Do you see this similarly?

@jblackaby
Copy link
Author

@michaelld, Yeah. I just tried it again and I am seeing the same problem. My setup is pretty much identical to yours.

@michael-west
Copy link
Contributor

This is the same error you get when you unplug the USB cable when the device is running, so it appears that something is causing the device to drop off the bus during the destruction of the multi_usrp object. It could be something in UHD, but it may be a USB controller issue. I seem to recall other users claiming that they cannot set the num_recv_frames above 128 or they see errors. The num_recv_frames is the number of bulk receive USB transfers that are queued up. It is a tunable parameter, so it is intended to be adjusted to suit the hardware. Some controllers may just not be able to handle that many transfers. I am currently looking into all things B2xx/USB, so I will add this to the list. Since the default values work and lowering the value of the num_recv_frames seems to resolve the issue, this will be a lower priority.

Increasing the num_recv_frames in this context is trying to use the USB transport to buffer up data because the disk I/O cannot sustain the rate. I'm not sure that is a good idea. Adding intermediate buffering between the USB I/O and disk I/O is probably a better option.

@jblackaby
Copy link
Author

jblackaby commented Oct 6, 2016

@michael-west, Thank you for taking a look at this.

I understand that the parameters can be set in a way that does not work with the hardware, but these parameters work in the 3.8.x series of UHD, so it seems to be something that changed in UHD between 3.8.5 and 3.9.0.

In my actual application I am performing a large amount of buffering right after receiving the data from UHD. I basically have a thread that is only reading from uhd and copying the data to a circular buffer in memory. I then pull data out of that circular buffer in a separate thread and do some processing on the samples.

In this configuration without raising num_recv_frames and recv_frame_size, data is still often dropped before I even get it from UHD without my buffer ever filling up. When I raise num_recv_frames and recv_frame_size, it greatly reduces the chance that data is dropped. This indicates to me that the USB buffer is overflowing before I even get the data. The processing I am performing is only moderate and is only be using around one CPU core on a 4 core machine, so I don't think the thread reading from UHD is CPU bound in any way. I feel like that thread is about as tight as it can be (basically just continually reading from UHD and copying the data to a circular buffer), yet I still can't service it fast enough not to drop data without those parameters raised.

@michael-west
Copy link
Contributor

Good information. Thanks! Since there is a change from 3.8.5 to 3.9.0, it does look like a regression. I will look into it further.

@michaelld
Copy link
Collaborator

An obvious suspect is the file "host/lib/transport/libusb1_zero_copy.cpp", which is where the error is actually generated. 3.8.5 was released on July 21 (2015), and 3.9.0 on August 31 (2015). This file was modified in b08352f on July 29. The log for this commit even sounds about right for this issue: "Unhandled exceptions during destruction of multi_usrp object cause application to terminate".

@michaelld
Copy link
Collaborator

After some more debugging, here's my take-away: More likely the issue is with IOKit internally. It might be that UHD's USB interface could be tweaked to work correctly, but it's not clear what that tweaking might be because the UHD USB programming is so non-obvious. The best solution is probably to not cancel LUTs in ~libusb_zero_copy_single on macOS (or Mac OS X), but that causes other issues to be fixed elsewhere (which, honestly, should probably be fixed anyway so maybe this is the better way to go).

Details:

  1. Apple's USB implementation in IOKit handles cancellations on a pipe-by-pipe basis, not a transfer-by-transfer basis. In ~libusb_zero_copy_single, the looping through to cancel all transfers (if one were to look at the value returned by "libusb_cancel_transfer") actually errors out about 1/2 way through, which I think is that since the IOKit call is non-blocking means that IOKit finally catches up to the cancel request & returns "yeah: cancel is in progress stop bugging me to keep cancelling!".

  2. libusb seems to be doing the right thing internally with the LUT cancel. The issue seems to be IOKit itself -- somehow (maybe) the extraneous cancellations spill over and coerce IOKit into aborting the other UHD USB pipe. The debug printouts are very clear that rx8 transfers are aborted -before- the first rx8 LUT is cancelled in ~ libusb_zero_copy_single. Thus, the UHD USB code is doing the right thing here: it detects an error in transfer and throws a runtime error, when there is no other evidence that something might be going on (e.g., UHD is terminating).

  3. At least in theory, cancelling LUTs is not necessary. In my testing, I remove the cancelling loop and add in a new _status of "STATUS_TERMINATING" that stops all new activity once set (sort of like STATUS_ERROR). This works until USB is closed, and then the close fails due to a lack of thread safety between the top-level libusb task and "libusb_close".

  4. With (3) in place, it also looks like there are still transfers in place even after "wait_for_completion" is called. I think this is due to the way UHD keeps track of transfers ... the variable naming is kinda not obvious & I'm not sure the transfers are really kept track of for which are in use and which are not. There has -got- to be a better way.

  5. IMHO, the whole USB interface is non-obvious how it works, nor does variable naming help, nor the use of variables. Maybe it's very C++ oriented, but it's a real PITA to debug. There has -got- to be a better way.

@jblackaby
Copy link
Author

@michaelld, thank you for looking into this. It sounds complicated.

@otherjason
Copy link

@michael-west Any update on this? The big problem with this issue is that it causes a hard termination of the client application. When the error occurs, an exception gets thrown at one of the upper layers of the UHD code. That exception is thrown up to the client, but in the process of unwinding the stack from the frame that threw the exception, a second exception is thrown. Per C++'s defined behavior in this instance, std::terminate() is called, which terminates the user application. There's no good way from the outside of UHD to prevent this.

@kyeshmz
Copy link

kyeshmz commented Feb 27, 2018

Would love to see if there is an update on this as well.

@jblackaby
Copy link
Author

jblackaby commented Feb 27, 2018

We were able to find a workaround for this problem. First, do not issue a command to stop the stream with streamer->issue_stream_cmd(uhd::stream_cmd_t::STREAM_MODE_STOP_CONTINUOUS). Then add a 0.5 second pause between destroying the rx_streamer object and the multi_usrp object.

streamer.reset();
this_thread::sleep_for(chrono::milliseconds(500));
usrp.reset();

I'm not sure why this works. Maybe it allows all of the USB transfers to cleanly exit before destroying the multi_usrp object. This allows us to set the receive buffer size to 16200 and the number of receive frames to 1000 without crashing on shutdown. This provides around 16MB of buffering, which is enough to continuously pull data reliably at up to 50Msps.

@michael-west
Copy link
Contributor

@jblackaby Thanks for the follow up. That is good information. That approach will cause all libusb transfers to complete so none have to be canceled. The sleep time is a function of sample rate, number of frames, and frame size (sleep = # of frames * frame size / sample rate). It's probably a good workaround until the root issue is solved. There is clearly an issue with canceling transfers on OSX that is not seen on Linux or Windows.

Apologies to all affected. This one has slipped through the cracks for quite a while now and deserves some attention. I will see what I can do to raise the priority.

@Brandon2255p
Copy link

Brandon2255p commented Mar 1, 2018

I have experienced a similar issue. It is similar because I am getting the LIBUSB_TRANSFER_CANCELLED but it is happening on rx6 not rx8. Also I am not getting this issue on exit but rather when I try to get_rx_stream

I am using the B200Mini. I have compiled a debug version of the UHD 3.9.7 driver on Windows to try and find out why this is happening.. The stack trace is below:

>	uhdd.dll!libusb_zero_copy_mb::get_new<uhd::transport::managed_recv_buffer>(const double timeout) Line 152	C++
 	uhdd.dll!libusb_zero_copy_single::get_buff<uhd::transport::managed_recv_buffer>(double timeout) Line 309	C++
 	uhdd.dll!libusb_zero_copy_impl::get_recv_buff(double timeout) Line 394	C++
 	uhdd.dll!uhd::usrp::recv_packet_demuxer_3000::_internal_get_recv_buff(const unsigned int sid, const double timeout) Line 99	C++
 	uhdd.dll!uhd::usrp::recv_packet_demuxer_3000::get_recv_buff(const unsigned int sid, const double timeout) Line 51	C++
 	uhdd.dll!boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>::call<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000>,unsigned int,double>(boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> & u, const void * __formal, unsigned int & b1, double & b2) Line 271	C++
 	uhdd.dll!boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>::operator()<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> >(boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> & u, unsigned int a1, double a2) Line 286	C++
 	uhdd.dll!boost::_bi::list3<boost::_bi::value<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> >,boost::_bi::value<unsigned int>,boost::arg<1> >::operator()<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>,boost::_bi::rrlist1<double> >(boost::_bi::type<boost::intrusive_ptr<uhd::transport::managed_recv_buffer> > __formal, boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double> & f, boost::_bi::rrlist1<double> & a, long __formal) Line 388	C++
 	uhdd.dll!boost::_bi::bind_t<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>,boost::_bi::list3<boost::_bi::value<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> >,boost::_bi::value<unsigned int>,boost::arg<1> > >::operator()<double>(double && a1) Line 1306	C++
 	uhdd.dll!boost::detail::function::function_obj_invoker1<boost::_bi::bind_t<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,boost::_mfi::mf2<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,uhd::usrp::recv_packet_demuxer_3000,unsigned int,double>,boost::_bi::list3<boost::_bi::value<boost::shared_ptr<uhd::usrp::recv_packet_demuxer_3000> >,boost::_bi::value<unsigned int>,boost::arg<1> > >,boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,double>::invoke(boost::detail::function::function_buffer & function_obj_ptr, double a0) Line 138	C++
 	uhdd.dll!boost::function1<boost::intrusive_ptr<uhd::transport::managed_recv_buffer>,double>::operator()(double a0) Line 770	C++
 	uhdd.dll!uhd::transport::sph::recv_packet_handler::set_xport_chan_get_buff(const unsigned __int64 xport_chan, const boost::function<boost::intrusive_ptr<uhd::transport::managed_recv_buffer> __cdecl(double)> & get_buff, const bool flush) Line 150	C++
 	uhdd.dll!b200_impl::get_rx_stream(const uhd::stream_args_t & args_) Line 442	C++
 	uhdd.dll!multi_usrp_impl::get_rx_stream(const uhd::stream_args_t & args) Line 743	C++

It is throwing in the code below:

    template <typename buffer_type>
    UHD_INLINE typename buffer_type::sptr get_new(const double timeout)
    {
        if (wait_for_completion(timeout))
        {
            if (result.status != LIBUSB_TRANSFER_COMPLETED)
                throw uhd::io_error(str(boost::format("usb %s transfer status: %d")
                                        % _name % libusb_error_name(result.status)));

The timeout is set to 0.00000000000000000

I have no idea why this is happening ...
In my application I do the multi_usrp::make and then allow the user to change sample rate, bandwidth, gain and frequency parameters in a GUI before they start the rx stream. Is this extended time causing issues? How can I prevent it from happening (sorry I did not understand the workaround - I do not know how to cancel the LUTs)?

@Brandon2255p
Copy link

If I move the thread the get_rx_stream to directly after the multi_usrp::make then the error does not occur on get_rx_stream.
The exact same error will then occur on the first call to m_pRxStreamer->recv(..)

Seems like a timing issue of some sort. Please help if possible!

@natetemple natetemple added the Tracking Tracking issue on internal bug repo label Jul 26, 2019
@aholtzma-am
Copy link

This still happens on UHD 4.0 / MacOS 11 when I adjust num_recv_frames.

@michaelld
Copy link
Collaborator

@aholtzma-am thanks for the update. in your UHD-based application, at closing have you tried first telling UHD to shut down and then waiting 5 seconds (or 2, or some reasonably short time) to allow USB transfers to complete? That did the trick for some other users, and is worth a try here.

@michaelld
Copy link
Collaborator

I haven't tested B2x0 with UHD 4.1 or UHD 4.2, nor current libusb (1.0.26) -- which I know contains some fixes for interfacing with Darwin IOKit from prior releases. I doubt this issue is fixed, given the workarounds & my research showing some of the issues are in IOKit. Is anyone still experiencing this issue? I will add doing testing here for my work queue ... no timeline yet since it doesn't seem urgent.

@ernestp
Copy link

ernestp commented Nov 23, 2023

This exception also happens on Android and it is crashing Android app as well, so it is critical issue in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Tracking Tracking issue on internal bug repo
Projects
None yet
Development

No branches or pull requests

10 participants