Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corruption of persistent database file cause by sudden lost of power #189

Closed
thanhvtruong opened this issue Jun 20, 2016 · 3 comments
Closed

Comments

@thanhvtruong
Copy link

It looks like when the persistent file is being save there is a tiny amount of time where a sudden power lost will cause the persistent database file to be corrupted.

Thousands of sudden power lost on our system were performed we notice the following:

  • Mosquitto start up with "invalid argument" and "database" read error.
  • mosquitto.db and mosquitto.db.new both existed in /var/lib/mosquitto
  • Both mosquitto.db and mosquitto.db.new have the same inode. (suggest that a rename has occurred)

We have a similar problem with another application that we build and reading into Linux documentation, we found out that flushing or closing a file is not enough to write the content to disk. An fsync need to be perform to confirm that the content is written to disk.

You can see the documentation in the man page (man close, under "note" second paragraph) on Fedora 23.

"A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored, use fsync(2). (It will depend on the disk hardware at this point.)"

@ralight ralight added this to the Fixes-next milestone Jun 26, 2016
ralight added a commit that referenced this issue Jun 26, 2016
To ensure it is correctly written. Closes #189.

Thanks to thanhvtruong.

Bug: #189
@ralight
Copy link
Contributor

ralight commented Jun 26, 2016

Thanks for the report, I've added code to call fsync() on the file and on its directory. Could you please confirm whether this fixes the problem for you?

@kcallin
Copy link
Contributor

kcallin commented Jun 27, 2016

Recommend calling fflush before fsync to ensure that application buffers are completely flushed to kernel buffer before being flushed to disk. It's awful hard to reproduce this, but between thanhvtruong and myself we started a long-term test series to mechanically verify.

I do not believe the directory sync is required; the rename logic should work as-is.

I opened a pull requrest for these changes and will update as the long-term tests progress.

kcallin added a commit to kcallin/mosquitto that referenced this issue Jul 6, 2016
Mosquitto database writes are not atomic and if power is lost during
a write the file will be permanently lost.  This commit makes writes as
atomic as possible.

Signed-off-by: Keegan Callin <[email protected]>
Bug: eclipse#189
ralight pushed a commit that referenced this issue Aug 16, 2016
Mosquitto database writes are not atomic and if power is lost during
a write the file will be permanently lost.  This commit makes writes as
atomic as possible.

Signed-off-by: Keegan Callin <[email protected]>
Bug: #189
@ralight
Copy link
Contributor

ralight commented Aug 16, 2016

Thanks very much for your work on this, I'm closing this now based on your pull request.

@ralight ralight closed this as completed Aug 16, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Aug 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants