Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC ECM packet pool drop #120

Open
ZerAtaii opened this issue Oct 18, 2023 · 8 comments
Open

CDC ECM packet pool drop #120

ZerAtaii opened this issue Oct 18, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@ZerAtaii
Copy link

ZerAtaii commented Oct 18, 2023

What target device are you using?
LPC55S69
Which version of Azure RTOS?
6.1
What toolchain and environment?
arm-none-eabi-* + WSL

Hi,
I am developping a embedded application on a MCU that runs Azure RTOS with the netxduo/usbx layers to manage TCP/IP sockets & packets.
This MCU communicates with a laptop through an Ethernet over USB protocol (USB CDC-ECM). The laptop runs Ubuntu and can open TCP ports through netcat command and send a various range of command.

Everything is running fine most of the time but we sometimes experience random problem where all the opened TCP ports get frozen.
Indeed, it is no longer possible to open sockets on these ports, or even to send commands on ports that have already been opened.

I was suspecting packet pool leak with NetX/UsbX so I've added a debug log trace on _nx_packet_allocate() and _nx_packet_release() to find out which thread is taking a packet from which packet pool and what are the number of packet left.

Thanks to this, I can see that, when the problem is happening, the "ux_slave_class_cdc_ecm_bulkout_thread" thread requests one packet per second without ever releasing it, as you can see on the screen attached. As soon as the packet pool drops to 0, my application is stuck (no TCP port available, everything seems to be frozen).

After several minutes (5/10/15?), all packets are released at once, but I can no longer communicate with the MCU. It's as if the USB link had been removed. I have to reboot my MCU to make it work again.

Such behavior is not acceptable, as it will not be possible to unplug/replug in the final product.

Do you have any idea of the cause and how to solve this bug?

Thanks in advance,
Best regards,
Antoine

host_ip_service
packet_drop
teraterm3.log

@ZerAtaii ZerAtaii added the bug Something isn't working label Oct 18, 2023
@xiaocq2001
Copy link
Contributor

ux_slave_class_cdc_ecm_bulkout_thread keeps polling CDC-ECM ethernet packets from USB bulkout endpoint, when there is packets received, they will be passed to NX to handle (and released in NX). From your description it seems there are real ethernet inputs from USB ethernet, but input packets are too many for NX to process and release in time. Maybe you can consider increasing the pool size to buffer more packets for processing.

@ZerAtaii
Copy link
Author

Thanks for the answer.
It also happens in cases of low use (very few exchanges), and not necessarily in intensive use (curl downloads for 12 hours in a row, for example). It seems purely random.
We tried increasing the number of packets, but the problem remained the same, but a little later (time to empty the packet pool).
We're currently already at the ram limit...

@xiaocq2001
Copy link
Contributor

On USB side, the packets are received and passed to upper layer (maybe application) and upper layer take the ownership to free them, so I think application may need optimization on ethernet packets handling, while we are checking if there is something could be done on USB side.

BTW, it seems CDC-ECM only recognized by linux. I'm not sure if you can share your way to make CDC-ECM recognized on windows for WSL so it's easier for us to reproduce the issue.

@ZerAtaii
Copy link
Author

Thanks !
What do you have in mind when you say "application"? In NetX or in our application even higher up?
You can find attached a tutorial to use our application with WSL. It will probably help you.

CM connect-3-5.pdf

@xiaocq2001
Copy link
Contributor

Thanks for sharing.

When I say "application", I mean your application or even higher up. The packets allocated and filled in USBX is passed to upper layers and the packets ownership is also passed to upper layers, they should process and free the packets in time.

@xiaocq2001
Copy link
Contributor

A possible improvement for ethernet packets handling in USBX is, in ux_device_class_cdc_ecm_bulkout_thread.c, if no free packet is available, host bulk out transfers are NAKed currently. Such a blocking of host bulk out transfer may cause host behavior to reset the device (just a guess, from your observation of deactivate and activate again, it's host specific behavior), maybe you can try to allocate NX packet after USB bulk out transfer, if no packet free the packet is dropped by discarding the data. In this way the host is not resetting the device, but network packets are dropped until free packet available.

Note that the upper logic change does not help on packets handling and releasing, upper layers still need to be checked to find the real issue (why packets are not handled and released).

@ZerAtaii
Copy link
Author

ZerAtaii commented Nov 8, 2023

We've clearly understood that packet pool creation takes place in the application part. However, when all goes well (for a CURL via WIFI, for example), the entire packet release mechanism does not leave the NetX layer. At least, that's what the stack frame suggests...
MicrosoftTeams-image (7)

Some packets are indeed released by the application layer, but not in very specific cases of important network exchange.

@TiejunMS
Copy link
Contributor

TiejunMS commented Nov 9, 2023

For TCP packets, some are indeed queued by TCP control block.

  • Incoming SYN packets for TCP server. The max counter is set while doing listening. Application should call accept to consume them.
  • Incoming data packets. Application should call receive to consume them.
  • Outgoing data packets. As long as the network is still alive, all the packets will be released in 1 second. If connection is dead, call disconnect to release them all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants