Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32H7 SPI DMA: Invalidate Dcache Timing Wrong #11594

Closed
dmammolo opened this issue Jan 24, 2024 · 9 comments
Closed

STM32H7 SPI DMA: Invalidate Dcache Timing Wrong #11594

dmammolo opened this issue Jan 24, 2024 · 9 comments

Comments

@dmammolo
Copy link

I am observing issues on the SPI interface using IMUs (BMI055 and ICM40688-P) on a pix32v6 (STM32H7 with both IMUs on SPI1).
All IMUs use the spi_exchange() function to send a command and read back the FIFO data of the IMU.

static void spi_exchange(struct spi_dev_s *dev, const void *txbuffer,

The issue I am observing is corrupted RX data, basically the data that is received is not the one that is expected. After some debugging and reading I suspect that the dcache invalidation of the rx dma buffer is done too early, i.e. before the SPI transfer and read even started. Currently the invalidation is done here:

if (rxbuffer)
{
up_invalidate_dcache((uintptr_t)rxbuffer,
(uintptr_t)rxbuffer + nbytes);
}

But IMO it should happen here:

if (orig_rxbuffer && priv->rxbuf)
{
memcpy(orig_rxbuffer, priv->rxbuf, nbytes);
}
}

Before we copy the dma buffer on the local buffer(orig_buffer). Debugging with gdb I observed that the copying is successful (verified with memcmp) but when comparing the content of the dma buffer priv->rxbuf (I assume by printing the content of the buffer(using it's address) using gdb it does not use Dcache) with the orig_buffer it sometimes differs. The issue is hard to reproduce and probably just an edge case. I.e. By changing a random line in my code the issue can start popping up or disappears again. Thus fixing this issue is not so easy.

After some investigation I found this document from STM (AN4839), from which I extract that we have to do following:

  • Dcache clean before TX transmit
  • Dcache invalidate before reading the DMA updated memory region, i.e. the dma buffer. And this has to happen after the SPI transmission/reading!

Here some citations from the document making that clear:

4 Mistakes to avoid and tips
...
if the software is using cacheable memory regions for the DMA source/or destination buffers. The software
must trigger a cache clean before starting a DMA operation to ensure that all the data are committed to the
subsystem memory. After the DMA transfer complete, when reading the data from the peripheral, the
software must perform a cache invalidate before reading the DMA updated memory region.
Another case is when the DMA is writing to the SRAM1 and the CPU is going to read data from the SRAM1. To
ensure the data coherency between the cache and the SRAM1, the software must perform a cache invalidate
before reading the updated data from the SRAM1.

Note, that the STM32F7 implementation follows above suggestions. the STM32H7 implementation did that too until this PR changed that logic: #1040

@dmammolo
Copy link
Author

@davids5 interested to hear from you what you think?

@davids5
Copy link
Contributor

davids5 commented Jan 24, 2024

@dmammolo I can tell you that the version of the H7 spi driver in PX4 is working (the diff is minimal: semaphore to mutex).
I am not sure of your use case and if it differs from the PX4 wrapper but: If after the invalidate (and before the DMA completion) the CPU reads any data from the a cache line overlapping (or within) the buffer address that can give you corrupted results. But this should not be happening.

@alandeassis
Copy link

Pinging @jlaitine for awareness!

@dmammolo thank you for your deep investigation! Is it possible to reproduce the issue using the spitool app? Maybe just connecting MOSI to MISO pin and trying to write something and read it back and checking the result. It test also could be easier to let more people to investigate the issue.

@davids5 is STM32H7 already used in many drones powered by PX4? As Dario commented this issue could be trick to notice.

@dmammolo
Copy link
Author

@davids5 regarding the diff to the PX4 version (mutex instead of semaphore) are you talking about this part of the code:
https://github.com/PX4/PX4-Autopilot/blob/da28d9a7f2c9d664631c3ee536fa6f1c116d65de/src/lib/drivers/device/nuttx/SPI.cpp#L130-L163
?
My code should be based on v1.13.3 and not touch this part of the logic. For me the _locking_mode is LOCK_NONE. To have a mutex I assume LOCK_PREEMPTION would be desired?

If I understand the Dcache functionalities, invalidating triggers a sync of the the SRAM to Dcache memory. So IMO it does not make sense to do this if the SPI transfer/read and DMA completion is not done yet.

@alandeassis this hardware change is not so easy to achieve with a pix32v6 and as mentioned this issue pops up only with specific code changes (which I can't share unfortunately).

@davids5
Copy link
Contributor

davids5 commented Jan 24, 2024

invalidating triggers a sync of the the SRAM to Dcache memory

No it is not like that. The invalidate marks the cache lines invalid so the next read will be out to memory (not the cache) and bring in the lines to the cache.

If the code is obeying the use case it should not be reading memory before the DMA completion. I suspect you have some overlapping accesses with specific code changes which you can't share.

@alandeassis
Copy link

@dmammolo I understand, but you don't need to share all the modifications, just the minimum to help people to identify the issue. Also if you can reproduce the issue in some common STM32H7 evaluation board is easier because most people doesn't have pix32v6 hw to test it.

@dmammolo
Copy link
Author

@davids5 I think you are right and I have a suspicion what could cause the issue that I am observing. It is probably only an issue in combination with the PX4 SPI wrapper/stack.

The issue can only occur if you have multiple devices on the same SPI and/or an IMU such as BMI055 or BMI088 which require requesting data for gyro and accel separately.
E.g. if we take the bmi055 there is a chance that the gyro and accel spi transfer happens one after each other. During the spi_dmarxwait(priv) there can be a context switch from accel/gyro which will start a new dma transfer. We will then again be in spi_dmarxwait(priv). On some point the dma transfer of the first spi dma transfer will be completed and proceed with the code, but in the background the dma will be executing the last dma transfer and update the memory. This update might happen during the memcpy(xbuffer, priv->rxbuf, nbytes). In my case I observed that roughly the first half of the bytes were not as expected, e.g. the accel data did contain gyro data in the first half.

I think that the px4 layer is not protecting against this, see my comment and link above.

If you agree I can create an issue on PX4-Autopilot.

@davids5
Copy link
Contributor

davids5 commented Jan 25, 2024

@dmammolo I would expect that with 2 device on the same bus _locking_mode needs to be used and should be set to LOCK_PREEMPTION or LOCK_THREADS depending on the drivers and what their thread of execution is. LOCK_THREADS if a thread (Where the ISR: hrt or direct off of drdy, sets a semaphore ) and LOCK_PREEMPTION if off an ISR (hrt or direct off of drdy)

@dmammolo
Copy link
Author

I agree and I just verified that on px4 release 1.13.3 and 1.14.0 the locking mode for fmu-v6c is always LOCK_NONE. I will create an issue on the PX4 repository and close it here. Thanks for your inputs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants