You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experiencing a surprisingly common corruption issue with the persistent database. It looks a bit like the previously reported and fixed #189 and #424, but isn't the same.
Context
This is happening on 2.0.14. I'm conscious that there have been some updates to the persistence mechanism in 2.0.15, but unfortunately I can't try that due to #2634. Based on the commit history, I don't feel that these updates in persist behavior play a role here.
We have around 100 Raspberry Pi-based dataloggers running in the field. A couple of weeks ago we installed Mosquitto on them (bridged to a central broker), with the persistence setting
persistence true
autosave_interval 300
Since then, this corruption issue has cropped up on 5-10% of them, which seems quite high. These dataloggers do experience occasional power outages, but in most cases those shouldn't occur more than once or twice per day. So I find it unlikely that over such a short period there would have been many power outages that occurred exactly within the (cumulative) few seconds of the day when persistence is taking place. But that's veering into speculation.
The issue itself
The symptoms of database corruption that I'm experiencing are consistently the following:
On startup, Mosquitto logs the messages
Error: Unable to restore persistent database. Unrecognised file format.
Error: Couldn't open database.
and exits
A mosquitto.db.new file exists, and contains a valid, non-corrupted, database. I.e. simply renaming mosquitto.db.new to mosquitto.db allows Mosquitto to run successfully.
A mosquitto.db file exists, and has the same size as the mosquitto.db.new file, but is blank. I.e. contains nothing but NULL bytes.
The two files (obviously) have different inodes.
The size of the persistence database is in most cases just a few kilobytes. So the persistence operation shouldn't take a meaningful amount of time (which would increase the chance of it being in progress as a power outage hits).
I suppose it could be device or OS-specific behavior (or behavior that's specific to writing to SD cards) but haven't found any clues there either. Also no luck reproducing it in a controlled environment yet.
The text was updated successfully, but these errors were encountered:
I'm experiencing a surprisingly common corruption issue with the persistent database. It looks a bit like the previously reported and fixed #189 and #424, but isn't the same.
Context
This is happening on 2.0.14. I'm conscious that there have been some updates to the persistence mechanism in 2.0.15, but unfortunately I can't try that due to #2634. Based on the commit history, I don't feel that these updates in persist behavior play a role here.
We have around 100 Raspberry Pi-based dataloggers running in the field. A couple of weeks ago we installed Mosquitto on them (bridged to a central broker), with the persistence setting
Since then, this corruption issue has cropped up on 5-10% of them, which seems quite high. These dataloggers do experience occasional power outages, but in most cases those shouldn't occur more than once or twice per day. So I find it unlikely that over such a short period there would have been many power outages that occurred exactly within the (cumulative) few seconds of the day when persistence is taking place. But that's veering into speculation.
The issue itself
The symptoms of database corruption that I'm experiencing are consistently the following:
and exits
mosquitto.db.new
file exists, and contains a valid, non-corrupted, database. I.e. simply renamingmosquitto.db.new
tomosquitto.db
allows Mosquitto to run successfully.mosquitto.db
file exists, and has the same size as themosquitto.db.new
file, but is blank. I.e. contains nothing but NULL bytes.I briefly looked at https://github.com/eclipse/mosquitto/blob/v2.0.14/src/persist_write.c but can't figure out what could be causing the behavior we're observing - specifically a
mosquitto.db
full of NULL bytes. Any ideas?I suppose it could be device or OS-specific behavior (or behavior that's specific to writing to SD cards) but haven't found any clues there either. Also no luck reproducing it in a controlled environment yet.
The text was updated successfully, but these errors were encountered: