-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using TRICE_RING_BUFFER still crashes if "Tricing a lot" #462
Comments
There is no protection against overflow. If you produce in the average more data than transmittable or have a data burst, which exceeds the provided buffer size, you will end up with a crash. Adding some overflow protection would lead to the question, which data to throw away or how to slow down the code. Also it would increase the code size and decrease the performance. The recommended way is to firstly make the buffer size reasonable big and after running a while to check the |
but if TRICE_RING_BUFFER is circular, what is making it to overflow? |
The ring buffer overflow does not mean, that other memory regions are destroyed. Inside the ring buffer older messages not transmitted yet, are overwritten. This could lead to incomplete messages or simply missed messages. About missed messages the trice tool will complain with a cycle counter mismatch and incomplete messages will cause also error messages. |
In this case, clarifying my original question, I'm experience my firmware crashing when flooding with Trices (eg calling Trice() inside some regular loop in the code that will work regular when not calling a Trice line). If not ring buffer does not corrupt other memory regions, could be other case? |
Please have a look at issue #294. Could that be the reason? What is the buffer size and what does the max depth value say? How many date bytes you produce per time unit and how much you can transmit (what is your baud rate)? A trice without time stamp and without values is only 4 bytes. But if you have a loop of 1000 for example you produce 4KB date in nearly no time. Can you reduce the data amount just experimental to 1/10 or 1/100 just to understand what happens? You could check the buffer borders too (maybe there is a bug in the code). That is a bit tricky because the linker is free to arrange the memory regions. Use your debug tool for that. AND: Do not forget about your interrupts! |
I don't think it is the case here.
On this test I'm deliberating flooding, i.e: with the propose to exceed the size of the buffer. So, a part from this deliberated test, Trice is working ok. I will debug for buffer overflow. |
I like that you do such harsh tests. Maybe the trice tool |
Just to explain this for now: TriceBufferWritePosition = (TriceBufferWritePosition + (TRICE_BUFFER_SIZE>>2)) <= triceRingBufferLimit ? TriceBufferWritePosition : TriceRingBuffer; \ The This explanation is now added to the source code. |
Unfortunately I have no time right now for looking deeper and tomorrow I am mostly offline. The problem you found could be a part of the TCOBS code. So one shot could be, that you switch for investigation to COBS. Just change the setting in triceConfig.h and apply the appropriate trice tool CLI switch. |
Could it be, that you mixed I would do it like this: Change uint32_t TriceRingBuffer[TRICE_DEFERRED_BUFFER_SIZE>>2] ; into uint32_t TriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+4] ; and initialize TriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+0] = 0xdeadb33f ;
TriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+1] = 0xdeadb33f ;
TriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+2] = 0xdeadb33f ;
TriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+3] = 0xdeadb33f ; and watch if any of these change. To check that complete you need to do that also at the ring buffer start. Maybe like this: uint32_t checkedTriceRingBuffer[4+(TRICE_DEFERRED_BUFFER_SIZE>>2)+4] ;
uint32_t* TriceRingBuffer = &checkedTriceRingBuffer[4]; and initialize checkedTriceRingBuffer[0] = 0xdeadb33f ;
checkedTriceRingBuffer[1] = 0xdeadb33f ;
checkedTriceRingBuffer[2] = 0xdeadb33f ;
checkedTriceRingBuffer[3] = 0xdeadb33f ;
checkedTriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+0] = 0xdeadb33f ;
checkedTriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+1] = 0xdeadb33f ;
checkedTriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+2] = 0xdeadb33f ;
checkedTriceRingBuffer[(TRICE_DEFERRED_BUFFER_SIZE>>2)+3] = 0xdeadb33f ; and watch if any of these change. THIS CODE IS COMPLETELY UNTESTED! IT IS JUST AN IDEA. |
You wrote: "Please note that (TRICE_DEFERRED_BUFFER_SIZE>>2) is 1 position out of the buffer size (0 indexes array)". I cannot clearly understand what you mean. Please explain it a bit more. |
Firstly, sorry I mistake TRICE_BUFFER_SIZE vs TRICE_DEFERRED_BUFFER_SIZE, they have completely different means :) Now I changed for the correct debug:
|
This looks good. Maybe you define the dummy value as a global variable, initialize it with 0 and set it inside the check to 1 in case of an overwritten memory. This would not allow the compiler to optimise it out. Also you could run the test without debug for a long time. |
I'm debugging with TRICE_FRAMING_NONE and I experience it writing (way) outside the buffer on this
so the issue must come before this |
Some more experiments,
My code won't crash and I get overflow message while using diagnostic:
4294967268 == FFFFFFE4 |
Two questions arise me:
I believe that once the buffer gets corrupted, it can't recover anymore? |
Hopefully I will find time to look a bit deeper into this within the next 2 days. Just to be curious: There is a note in https://github.com/rokath/trice/blob/master/docs/TriceOverRTT.md#10-possible-issues. When performing excessive duration tests (within the transfer capabilities), some of the STM32 evaluation boards flashed with OB J-Link started to not work properly anymore over RTT, even after re-flashing the firmware image. I could get them back to normal operation only with a power cycle. My assumption is, that the OB J-Link software went somehow in a instability, probably due to internal buffer overruns, what required a OB J-Link reset. This is probably not related to this issue, as I understood, that you use an UART. But maybe some other software problem exists. |
Thanks for having a look on this. |
I found another issue to address:
|
My very first assumption is, that the code seems to work correctly, when and only when less data are produced then transmittable. In the case you investigate, more data are produced than transmitted. That finally produces data garbage inside the ring buffer and the results of function |
Does this makes sense? |
I asume this will not work. The check needs to be done before |
On any case, the check would need to be "thread safe" (this is why I thought put after the critical section, after the trice enter) |
There is now branch experimentalProtectOption containing a first idea how to add a protection option. I was unable to test this so far but will do asap. I think, you can already do your own tests with it. To activate the fix, please add |
Thanks for the update, I tested only the ring buffer,
it must be < (instead of <=) and it must give room to the current ongoing transmission. I also added this for diagnostic:
For Doublebuffer (I didn't test) but the comparison should be < too |
Probably the branch implementation is correct. |
On my tests it was showing errors the Trice log (meaning that trice buffer is become corrupted). |
Concerning the |
Your proposed check will probably work and you can relax it a bit also. Please refer to the comment in #468 or directly to the comments in the |
About the If the device runs a long long time without the connted trice tool, most Trices are blown in the air and who cares then about potentially missed ones? If the trice tool is connected, it will report and count cycle counter errors. If you mistrust the overflow protection, you can add such counter to your test code. Or are there any other arguments ? EDIT: Yes, |
In experimentaProtectOption branch in example project trice/examples/vsCode_NucleoF030R8_instrumented is now successful tested overflow code including an |
TriceOverflowCounter would be used for diagnostic, so we know that we need to slowdown or do something else to fix it. Should be #conditional build |
Solved in main branch. |
I'm using TRICE_RING_BUFFER
however I notice if I flood with Trice's my software will crash.
I was assuming TRICE_RING_BUFFER was buffer overflow safe (as opposite to TRICE_DOUBLE_BUFFER) or Am I mistake?
What could be wrong?
Edit: I found if I use TRICE_FRAMING_COBS instead of TRICE_FRAMING_TCOBS it takes long until I get a crash.
The text was updated successfully, but these errors were encountered: