Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better documentation of what happens with Remove Snapshot? #875

Open
wolftune opened this issue Feb 28, 2018 · 13 comments
Open

better documentation of what happens with Remove Snapshot? #875

wolftune opened this issue Feb 28, 2018 · 13 comments
Labels
Documentation GOOD FIRST ISSUE Used by 24pullrequests.com to suggest issues HELP-WANTED Used by 24pullrequests.com to suggest issues Medium

Comments

@wolftune
Copy link

I've searched around but feel unsure still. I'd like to understand what happens if I remove snapshots 2, 4, and 6 from a list of snapshots 1 through 7. Somehow, all the links and whatever are readjusted fully so that each snapshot is guaranteed to be as if it really were done over the remaining next-older one? And I still have an effectively complete backup if I do that sort of removal and then make a new snapshot?

@glyndon
Copy link

glyndon commented Feb 28, 2018

I believe the links will continue to behave as you describe. In a traditional Unix filesystem, files can be pointed-to by multiple 'links' (aka 'directory entries', filenames), as long as there's at least one. When the number goes to zero, the file becomes 'deleted' (nothing points to the inode, so it's freed).
Since (for a file that doesn't change) the link to it in each snapshot points to the same inode with the file's content. Until the last referencing link is deleted, the file remains accessible via any of them. No 'adjustment' of any links need occur. It's just the links in the removed snapshot that are removed, but the file and its other links are unaffected.
Note that (again, for a file that doesn't change) the links aren't pointing to each other in a chain across snapshots, but are all pointing to the same inode, like spokes of a wheel.
For files that do change across snapshots (let's say it changes between each one), there'll only be one link to the inode of each version (each being a separate copy) of the file, so when a snapshot containing one of those links is removed, that copy's inode is freed (that version of the file is gone).
I hope that makes sense, addresses your concern, explains it usefully (that it's the file system that's doing the work in this scenario, not BiT), and I especially hope it's accurate and that someone more knowledgeable will corroborate or correct me.

@wolftune
Copy link
Author

For files that haven't changed, that makes sense. But, for the simplest case, what happens if there are two snapshots and I delete the first one? I assume in that case that all links from the second to the first are replaced by actually copying the files. Otherwise the second snapshot would be left with almost nothing!

@colinl
Copy link

colinl commented Feb 28, 2018

When a file is initially created it is effectively created as a hard link to the actual file contents. When another hard link to that file is created then it is just another hard link to the same contents. There is no difference between the first link and the second. So you can delete either one of them and the other will still be a link to the original contents. The contents will not be removed until both links have been deleted.

@wolftune
Copy link
Author

@colinl so you are saying that the very first snapshot is itself made of hard links and the file contents are effectively separate? And that when any snapshot is deleted, BIT knows to check whether each deleted link is present in any other snapshot, and if so leave the contents alone but if not, delete the contents?

@colinl
Copy link

colinl commented Feb 28, 2018

@wolftune no, that is not really what I am saying, though the result is the same. What I described is how the linux file system works, it is nothing to do with BIT. You can try it yourself. Create a file, make a hard link to the file, and delete the original and the second one will still be there. All that happens when you delete the original is that you delete the original link to the file. In fact when you create a hard link to an existing file there is absolutely no difference between the original link and the new one (apart from being in different places of course and possibly having different names). There is no way that you can tell which one was created first for example (as far as I know).

@wolftune
Copy link
Author

So, the point being that data is on a storage device regardless of whether there are hard links, but the existence of any hard links within any particular media device makes sure that the file system will not overwrite the data, but if there are no remaining hard links, then the file system allows that data to be overwritten?

@colinl
Copy link

colinl commented Feb 28, 2018

@wolftune, I think you still have not fully got it. There is no such thing as a file without any hard links. When we talk about a file what we really mean is a hard link to the file contents. When a file is initially created the file contents are written and a hard link is made to those contents. When we talk about deleting a file that is strictly a misuse of words. What we really mean by deleting a file is deleting a link to the file contents. If there is only one link to the file (as is the case normally) then file contents will also be deleted (well released actually, the area of disc is not overwritten, it is just made available for reuse). If there is more than one link, as is the case if a file is created then a hard link made (a second snapshot for example), then the second link will still exist so the contents will not be deleted.

@wolftune
Copy link
Author

Right, you're confirming what I wrote above.

There is no such thing as a file without any hard links.

I didn't say that. I referenced "data" as in the binary bits on some storage media in some non-random state that captures the "contents" (to use your wording).

So, to clarify finally (I hope): all the redundancy in BIT snapshots are in the form of duplicated hard links. The file system recognizes them all and knows not to release the relevant area of the storage medium unless all of the hard links are deleted.

To reiterate: after a new snapshot is completed, it doesn't rely at all on any previous snapshot. I could even manually remove older snapshots outside of BIT. It's the file system that knows that some of the deletions of hard-links do not release the storage area for rewrite because the file system is aware of the remaining hard-links in the new snapshot.

Do I get it now?

@glyndon
Copy link

glyndon commented Feb 28, 2018 via email

@wolftune
Copy link
Author

If/when the last card pointing to a book is removed, then the librarian (the filesystem algorithms) also discards the book

That is actually intuitive to me, and that's where I get worried. The librarian has to be a superhuman computer to know when removing a card that it's the last card. There's nothing on the card itself to tell you that. I'm sure this is where some thing could go wrong, like a hard link to data across drives or something. In this case, the conclusion I make is that the file system is that superhuman computer that actually does the intensive process of knowing whether a removed link is the last one.

@glyndon
Copy link

glyndon commented Feb 28, 2018

No analogy is perfect, and mine is no exception.
We just have to trust that the filesystem does the right thing and knows what it's doing.
Every time we "delete a file" (a simple file with only one link to it) we're removing the last link to it, and the filesystem handler knows it's taking the last link (because the count of links goes to 0 on the file's metadata), so it knows to also release the file's inode for reuse.

Trust, but verify:
One of the things the fsck utility does is confirm that every occupied inode has at least one link to it, and that every link points to a valid occupied inode. If one is missing, the utility creates new links in the 'lost&found' folder so you can reach a formerly 'orphan' file whose last link may have been damaged (e.g. by a read error on the sector containing the directory).

This behaviour is hinted at in the Posix API where the function call to delete a file is called 'unlink()'
:D

@wolftune
Copy link
Author

Okay, so besides my edification, it seems reasonable to ask if the BIT docs clarify that even though each snapshot is incremental (fast backups, only needs to copy changed files) each snapshot has no reliance on any other (or some similar clarification about the situation). It would be nice to avoid the worry I started with about wondering about such dependencies.

@DonEdwards
Copy link

It's easy to make the mistake of thinking that a file's identity is its path and name. Particularly because that is actually true on FAT file systems.

But on Linux/Unix file systems (also on NTFS but you have to dig a lot to verify it), a file's identity is its inode. Not its path and name. They are just tools (called hard links) for finding the inode. Just like a screw doesn't care which of your two compatible screwdrivers you use on it, an inode doesn't care - or, for that matter, know - which hard link you use to find it. Nor do the hard links know anything about each other, not even that they exist, so they have no precedence.

The inode does, however, know how many hard links point at it. That's a counter in the inode, that is updated whenever a hard link is added or deleted.

Note: symbolic links are different. I won't go into them here.

Now when looking at what BIT does, you can choose to look at either the process (an incremental backup) or the result (a full backup). Both are correct, but if the backups will run at a time of low or no other activity - or run very fast because they'll have very little to do - then I think the latter is more useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation GOOD FIRST ISSUE Used by 24pullrequests.com to suggest issues HELP-WANTED Used by 24pullrequests.com to suggest issues Medium
Projects
None yet
Development

No branches or pull requests

6 participants