STM32H7 SPI DMA: Invalidate Dcache Timing Wrong #11594

dmammolo · 2024-01-24T10:16:06Z

I am observing issues on the SPI interface using IMUs (BMI055 and ICM40688-P) on a pix32v6 (STM32H7 with both IMUs on SPI1).
All IMUs use the spi_exchange() function to send a command and read back the FIFO data of the IMU.

nuttx/arch/arm/src/stm32h7/stm32_spi.c

Line 2047 in 55ec92e

static void spi_exchange(struct spi_dev_s *dev, const void *txbuffer,

The issue I am observing is corrupted RX data, basically the data that is received is not the one that is expected. After some debugging and reading I suspect that the dcache invalidation of the rx dma buffer is done too early, i.e. before the SPI transfer and read even started. Currently the invalidation is done here:

nuttx/arch/arm/src/stm32h7/stm32_spi.c

Lines 2170 to 2174 in 55ec92e

 if (rxbuffer) 

 { 

 up_invalidate_dcache((uintptr_t)rxbuffer, 

 (uintptr_t)rxbuffer + nbytes); 

 }

But IMO it should happen here:

nuttx/arch/arm/src/stm32h7/stm32_spi.c

Lines 2253 to 2257 in 55ec92e

 if (orig_rxbuffer && priv->rxbuf) 

 { 

 memcpy(orig_rxbuffer, priv->rxbuf, nbytes); 

 } 

 }

Before we copy the dma buffer on the local buffer(orig_buffer). Debugging with gdb I observed that the copying is successful (verified with memcmp) but when comparing the content of the dma buffer priv->rxbuf (I assume by printing the content of the buffer(using it's address) using gdb it does not use Dcache) with the orig_buffer it sometimes differs. The issue is hard to reproduce and probably just an edge case. I.e. By changing a random line in my code the issue can start popping up or disappears again. Thus fixing this issue is not so easy.

After some investigation I found this document from STM (AN4839), from which I extract that we have to do following:

Dcache clean before TX transmit
Dcache invalidate before reading the DMA updated memory region, i.e. the dma buffer. And this has to happen after the SPI transmission/reading!

Here some citations from the document making that clear:

4 Mistakes to avoid and tips
...
if the software is using cacheable memory regions for the DMA source/or destination buffers. The software
must trigger a cache clean before starting a DMA operation to ensure that all the data are committed to the
subsystem memory. After the DMA transfer complete, when reading the data from the peripheral, the
software must perform a cache invalidate before reading the DMA updated memory region.

Another case is when the DMA is writing to the SRAM1 and the CPU is going to read data from the SRAM1. To
ensure the data coherency between the cache and the SRAM1, the software must perform a cache invalidate
before reading the updated data from the SRAM1.

Note, that the STM32F7 implementation follows above suggestions. the STM32H7 implementation did that too until this PR changed that logic: #1040

The text was updated successfully, but these errors were encountered:

dmammolo · 2024-01-24T10:17:15Z

@davids5 interested to hear from you what you think?

davids5 · 2024-01-24T13:46:22Z

@dmammolo I can tell you that the version of the H7 spi driver in PX4 is working (the diff is minimal: semaphore to mutex).
I am not sure of your use case and if it differs from the PX4 wrapper but: If after the invalidate (and before the DMA completion) the CPU reads any data from the a cache line overlapping (or within) the buffer address that can give you corrupted results. But this should not be happening.

alandeassis · 2024-01-24T14:04:04Z

Pinging @jlaitine for awareness!

@dmammolo thank you for your deep investigation! Is it possible to reproduce the issue using the spitool app? Maybe just connecting MOSI to MISO pin and trying to write something and read it back and checking the result. It test also could be easier to let more people to investigate the issue.

@davids5 is STM32H7 already used in many drones powered by PX4? As Dario commented this issue could be trick to notice.

dmammolo · 2024-01-24T14:31:01Z

@davids5 regarding the diff to the PX4 version (mutex instead of semaphore) are you talking about this part of the code:
https://github.com/PX4/PX4-Autopilot/blob/da28d9a7f2c9d664631c3ee536fa6f1c116d65de/src/lib/drivers/device/nuttx/SPI.cpp#L130-L163
?
My code should be based on v1.13.3 and not touch this part of the logic. For me the _locking_mode is LOCK_NONE. To have a mutex I assume LOCK_PREEMPTION would be desired?

If I understand the Dcache functionalities, invalidating triggers a sync of the the SRAM to Dcache memory. So IMO it does not make sense to do this if the SPI transfer/read and DMA completion is not done yet.

@alandeassis this hardware change is not so easy to achieve with a pix32v6 and as mentioned this issue pops up only with specific code changes (which I can't share unfortunately).

davids5 · 2024-01-24T14:44:39Z

invalidating triggers a sync of the the SRAM to Dcache memory

No it is not like that. The invalidate marks the cache lines invalid so the next read will be out to memory (not the cache) and bring in the lines to the cache.

If the code is obeying the use case it should not be reading memory before the DMA completion. I suspect you have some overlapping accesses with specific code changes which you can't share.

alandeassis · 2024-01-24T16:12:50Z

@dmammolo I understand, but you don't need to share all the modifications, just the minimum to help people to identify the issue. Also if you can reproduce the issue in some common STM32H7 evaluation board is easier because most people doesn't have pix32v6 hw to test it.

dmammolo · 2024-01-25T13:36:14Z

@davids5 I think you are right and I have a suspicion what could cause the issue that I am observing. It is probably only an issue in combination with the PX4 SPI wrapper/stack.

The issue can only occur if you have multiple devices on the same SPI and/or an IMU such as BMI055 or BMI088 which require requesting data for gyro and accel separately.
E.g. if we take the bmi055 there is a chance that the gyro and accel spi transfer happens one after each other. During the spi_dmarxwait(priv) there can be a context switch from accel/gyro which will start a new dma transfer. We will then again be in spi_dmarxwait(priv). On some point the dma transfer of the first spi dma transfer will be completed and proceed with the code, but in the background the dma will be executing the last dma transfer and update the memory. This update might happen during the memcpy(xbuffer, priv->rxbuf, nbytes). In my case I observed that roughly the first half of the bytes were not as expected, e.g. the accel data did contain gyro data in the first half.

I think that the px4 layer is not protecting against this, see my comment and link above.

If you agree I can create an issue on PX4-Autopilot.

davids5 · 2024-01-25T13:54:12Z

@dmammolo I would expect that with 2 device on the same bus _locking_mode needs to be used and should be set to LOCK_PREEMPTION or LOCK_THREADS depending on the drivers and what their thread of execution is. LOCK_THREADS if a thread (Where the ISR: hrt or direct off of drdy, sets a semaphore ) and LOCK_PREEMPTION if off an ISR (hrt or direct off of drdy)

dmammolo · 2024-01-25T14:42:58Z

I agree and I just verified that on px4 release 1.13.3 and 1.14.0 the locking mode for fmu-v6c is always LOCK_NONE. I will create an issue on the PX4 repository and close it here. Thanks for your inputs!

dmammolo closed this as completed Jan 25, 2024

dmammolo mentioned this issue Jan 25, 2024

[Bug] SPI DMA transfer is not locked with multiple devices PX4/PX4-Autopilot#22684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STM32H7 SPI DMA: Invalidate Dcache Timing Wrong #11594

STM32H7 SPI DMA: Invalidate Dcache Timing Wrong #11594

dmammolo commented Jan 24, 2024

dmammolo commented Jan 24, 2024

davids5 commented Jan 24, 2024

alandeassis commented Jan 24, 2024

dmammolo commented Jan 24, 2024

davids5 commented Jan 24, 2024 •

edited

Loading

alandeassis commented Jan 24, 2024

dmammolo commented Jan 25, 2024

davids5 commented Jan 25, 2024

dmammolo commented Jan 25, 2024

STM32H7 SPI DMA: Invalidate Dcache Timing Wrong #11594

STM32H7 SPI DMA: Invalidate Dcache Timing Wrong #11594

Comments

dmammolo commented Jan 24, 2024

dmammolo commented Jan 24, 2024

davids5 commented Jan 24, 2024

alandeassis commented Jan 24, 2024

dmammolo commented Jan 24, 2024

davids5 commented Jan 24, 2024 • edited Loading

alandeassis commented Jan 24, 2024

dmammolo commented Jan 25, 2024

davids5 commented Jan 25, 2024

dmammolo commented Jan 25, 2024

davids5 commented Jan 24, 2024 •

edited

Loading