Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sr_write does not propperly prevent from overwriting send-ethernet-buffer when writing too fast #258

Closed
Kuurusch opened this issue Mar 4, 2019 · 17 comments
Labels
Tracking Tracking issue on internal bug repo

Comments

@Kuurusch
Copy link

Kuurusch commented Mar 4, 2019

Issue Description

We have encountered, that when we write too many values too fast one after each other with sr_write() on a device3 (RFNoC), we get en error where it says, some packet-number is before an other. But this comes, because we overwrite a full buffer. When we write slower (with more time between each sr_write()-call, it works without problem.

But I would expect, that this is handled by the API and it should check, before writing to the ethernet-send-buffer, if it is not full!!

@mbr0wn
Copy link
Contributor

mbr0wn commented Mar 4, 2019

@Kuurusch On which device?

@mbr0wn
Copy link
Contributor

mbr0wn commented Mar 4, 2019

Also, is this on a custom RFNoC block? If so, did you set CMD_FIFO_SIZE to a custom value?

@mbr0wn
Copy link
Contributor

mbr0wn commented Mar 5, 2019

If it is a custom RFNoC Block, try setting CMD_FIFO_SIZE to 8 (see noc_block_radio_core.v).

@mbr0wn
Copy link
Contributor

mbr0wn commented Mar 5, 2019

(If that works, it's still a bug on our side, but it would narrow down the source).

@Kuurusch
Copy link
Author

@Kuurusch On which device?

on a x310 with ubx-160 daughterboards.

@Kuurusch
Copy link
Author

If it is a custom RFNoC Block, try setting CMD_FIFO_SIZE to 8 (see noc_block_radio_core.v).

Yes, it is a custome RFNoC-block. We will try, thank you very much!

@Kuurusch
Copy link
Author

The error we get is the following:
terminate called after throwing an instance of 'uhd::io_error'
what(): EnvironmentError: IOError: [0/AGCCalibration_1] sr_write() failed: EnvironmentError: IOError: Block ctrl (CE_10_Port_D0) packet parse error - EnvironmentError: IOError: Expected packet index: 1129 Received index: 1147

@Kuurusch
Copy link
Author

So we've resized the fifo, but no difference!

@mbr0wn
Copy link
Contributor

mbr0wn commented Mar 14, 2019

@Kuurusch Huh, I was not expecting that. We have a fix for a (maybe? hopefully?) related but in the pipeline, give us a few days to push it out and then we'll try again.

@Kuurusch
Copy link
Author

@mbr0wn Thank you very much!

@mbr0wn
Copy link
Contributor

mbr0wn commented Mar 20, 2019

@Kuurusch You can apply this in the meantime and see if it helps:

From e9dc02d796e8a3206d3847c7d24fc39e78dff472 Mon Sep 17 00:00:00 2001
From: Sugandha Gupta <[email protected]>
Date: Tue, 12 Mar 2019 10:27:23 -0700
Subject: [PATCH] rfnoc: noc_shell: Change default address for the reg
 readbacks

The rb_stb signal is used in the user logic (e.g. simple_spi_core
in the radio) as a means to back pressure and prevent spi setting
register writes during ongoing spi transactions. While in the
noc_shell register it is used more as a "valid" or "done" signal
for register reads. Without this fix, the default rb_addr is the
last used rb address and can affect the behavior of rb_stb/back
pressure in the spi core, which eventually results in spi reg
writes during a spi transaction. Changing the default to RB_USER,
allows the user to control the behavior of rb_stb.
---
 usrp3/lib/rfnoc/cmd_pkt_proc.v | 6 +++++-
 usrp3/lib/rfnoc/noc_shell.v    | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/usrp3/lib/rfnoc/cmd_pkt_proc.v b/usrp3/lib/rfnoc/cmd_pkt_proc.v
index 1c1f422db..8506a89bf 100644
--- a/usrp3/lib/rfnoc/cmd_pkt_proc.v
+++ b/usrp3/lib/rfnoc/cmd_pkt_proc.v
@@ -52,6 +52,7 @@ module cmd_pkt_proc #(
   // TODO: Eliminate extra readback address output once NoC Shell / user register readback address spaces are merged
   parameter SR_RB_ADDR = 0,         // Settings bus address to set NoC Shell readback address register
   parameter SR_RB_ADDR_USER = 1,    // Settings bus address to set user readback address register
+  parameter DEFAULT_RB_ADDR = 0,    // Default address for the readback address bus
   parameter FIFO_SIZE = 5           // Depth of command FIFO
 )(
   input clk, input reset, input clear,
@@ -150,7 +151,7 @@ module cmd_pkt_proc #(
       resp_tvalid         <= 1'b0;
       set_stb             <= 1'b0;
       set_has_time        <= 1'b0;
-      rb_addr             <= 'd0;
+      rb_addr             <= DEFAULT_RB_ADDR;
       rb_addr_user        <= 'd0;
     end else begin
       case (state)
@@ -212,6 +213,9 @@ module cmd_pkt_proc #(
               rb_addr       <= int_tdata[RB_AWIDTH-1:0];
             end else if (set_rb_addr_user) begin
               rb_addr_user  <= int_tdata[RB_USER_AWIDTH-1:0];
+              rb_addr       <= DEFAULT_RB_ADDR;
+            end else begin
+              rb_addr       <= DEFAULT_RB_ADDR;
             end
             set_stb         <= 1'b1;
             if (int_tlast) begin
diff --git a/usrp3/lib/rfnoc/noc_shell.v b/usrp3/lib/rfnoc/noc_shell.v
index b66688aed..aef26a3b2 100644
--- a/usrp3/lib/rfnoc/noc_shell.v
+++ b/usrp3/lib/rfnoc/noc_shell.v
@@ -249,6 +249,7 @@ module noc_shell
          .USE_TIME(USE_TIMED_CMDS),
          .SR_RB_ADDR(SR_RB_ADDR),
          .SR_RB_ADDR_USER(SR_RB_ADDR_USER),
+         .DEFAULT_RB_ADDR(RB_USER_RB_DATA),
          .FIFO_SIZE(CMD_FIFO_SIZE[8*k+7:8*k]))
        cmd_pkt_proc (
          .clk(clk), .reset(reset), .clear(1'b0),
-- 
2.20.1

@Kuurusch
Copy link
Author

@mbr0wn Hi, we hadn't time to test this out but we just moved from UHD 3.14.0.0-rc1 to the final release 3.14.0.0. But we saw, that nothing has changed in the FPGA-files. Where have you introduced your changes?

@mbr0wn
Copy link
Contributor

mbr0wn commented Apr 23, 2019

Good question; they haven't been merged yet. I will ping the team and see what's up with that changeset.

@rigoorozco
Copy link

I'm having a similar issue:
IOError: Block ctrl (CE_02_Port_30) packet parse error - EnvironmentError: IOError: Expected packet index: 48 Received index: 49

I changed CMD_FIFO_SIZE to 8 and applied the patch above but it did not have any effect.

I noticed the radio block also sets RESP_FIFO_SIZE to 5. This buffers response packets coming out of the command packet processor. Could this help? I'm currently building this now..

@natetemple natetemple added the Tracking Tracking issue on internal bug repo label Jul 26, 2019
@patientCat
Copy link

patientCat commented Dec 26, 2019

hi current
I'm having a similar issue problem in X310 in UHD-3.14.1.1-release

when I do this

usrp_->set_rx_rate(sampRate);

the sampeRate is 50e6, it's very easy to throw this exception when I recevie huge data.
uhd::io_error EnvironmentError: IOError: [0/Radio_0] sr_write() failed: EnvironmentError: IOError: Block ctrl (CE_01_Port_40) packet parse error - EnvironmentError: IOError: Expected packet index: 2360 Received index: 2361
I just use the function RecevieNSymbol some times, then UHD will throw exception.
the recvSymbol is about 10e6.

For testing this, I download gnuradio and use `/usr/local/bin/uhd_fft -f800e6 -s50e6' , it also will throw this exception when it's running for about 1minutes. When I change the sample rate to 20e6, it seems work well.

void UsrpObj::ReceiveNSymbol(ComplexType *hostBuff, size_t nSymbol)
{
	uhd::stream_cmd_t stream_cmd(uhd::stream_cmd_t::STREAM_MODE_START_CONTINUOUS);
	stream_cmd.num_samps = nSymbol;
	stream_cmd.stream_now = true;
	stream_cmd.time_spec = uhd::time_spec_t();
	//stream_cmd.time_spec = usrp_->get_time_now();

	uhd::stream_args_t stream_args("fc32", "sc16");
	std::vector<size_t> channel_nums;
	channel_nums.push_back(boost::lexical_cast<size_t>(0));
	stream_args.channels = channel_nums;
	uhd::rx_streamer::sptr rx_stream = usrp_->get_rx_stream(stream_args);

	rx_stream->issue_stream_cmd(stream_cmd);

	uhd::rx_metadata_t md;

	size_t samplePerBuff = 0;
	size_t toWrite = 0;
	size_t nSampsLeft = nSymbol;
	const int kSAMPLE_PER_BUFF = rx_stream->get_max_num_samps();
	cout << boost::format("sample_per_buff = %d\n") % kSAMPLE_PER_BUFF;
	size_t numRxSamps;
	const auto start = std::chrono::steady_clock::now();
	while (nSampsLeft != 0)
	{
		if (nSampsLeft >= kSAMPLE_PER_BUFF)
			samplePerBuff = kSAMPLE_PER_BUFF;
		else
			samplePerBuff = nSampsLeft;
		// std::cout << nSampsLeft << std::endl;
		try {
			numRxSamps = rx_stream->recv(hostBuff + toWrite, samplePerBuff, md, 3.0, false);
		}
		catch (uhd::exception &e) {
			cout << boost::format("uhd::io_error %s") % e.what() << endl;
			continue;
		}
		if (md.error_code == uhd::rx_metadata_t::ERROR_CODE_TIMEOUT)
		{
			//printf("Timeout while streaming......\n");
			break;
		}
		if (md.error_code == uhd::rx_metadata_t::ERROR_CODE_OVERFLOW)
		{
			//printf("Overflowing while stream......\n");
			continue;
		}
		if (md.error_code != uhd::rx_metadata_t::ERROR_CODE_NONE)
		{
			//printf("Receive error: %s \n", md.strerror());
			continue;
		}
		
		//std::copy(buff.begin(), buff.end(), hostBuff + toWrite);
		toWrite += numRxSamps;
		nSampsLeft -= numRxSamps;
	}
	const auto end = std::chrono::steady_clock::now();
	const auto time_diff = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
	cout << boost::format("recv cost %d milliseconds") % time_diff.count() << endl;

	stream_cmd.stream_mode = uhd::stream_cmd_t::STREAM_MODE_STOP_CONTINUOUS;  
	stream_cmd.stream_now = false;
	rx_stream->issue_stream_cmd(stream_cmd);  //to end continuous
}

@Kuurusch
Copy link
Author

Does someone tested this problem? And the fix? We hadent time so fare but run into the problem again at an other part of our CPP software. If not necessary, I would like to prevent to powering up all the environment to develop on the FPGA again to figure out, that the fix didn't work...

@mbr0wn
Copy link
Contributor

mbr0wn commented Mar 19, 2024

We are no longer fixing issues on 3.15, so I'm closing this. As a workaround, you can try splitting command and data traffic, and reducing the poke speed.

@mbr0wn mbr0wn closed this as not planned Won't fix, can't repro, duplicate, stale Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tracking Tracking issue on internal bug repo
Projects
None yet
Development

No branches or pull requests

5 participants