better documentation of what happens with Remove Snapshot? #875

wolftune · 2018-02-28T00:34:18Z

I've searched around but feel unsure still. I'd like to understand what happens if I remove snapshots 2, 4, and 6 from a list of snapshots 1 through 7. Somehow, all the links and whatever are readjusted fully so that each snapshot is guaranteed to be as if it really were done over the remaining next-older one? And I still have an effectively complete backup if I do that sort of removal and then make a new snapshot?

glyndon · 2018-02-28T12:47:20Z

I believe the links will continue to behave as you describe. In a traditional Unix filesystem, files can be pointed-to by multiple 'links' (aka 'directory entries', filenames), as long as there's at least one. When the number goes to zero, the file becomes 'deleted' (nothing points to the inode, so it's freed).
Since (for a file that doesn't change) the link to it in each snapshot points to the same inode with the file's content. Until the last referencing link is deleted, the file remains accessible via any of them. No 'adjustment' of any links need occur. It's just the links in the removed snapshot that are removed, but the file and its other links are unaffected.
Note that (again, for a file that doesn't change) the links aren't pointing to each other in a chain across snapshots, but are all pointing to the same inode, like spokes of a wheel.
For files that do change across snapshots (let's say it changes between each one), there'll only be one link to the inode of each version (each being a separate copy) of the file, so when a snapshot containing one of those links is removed, that copy's inode is freed (that version of the file is gone).
I hope that makes sense, addresses your concern, explains it usefully (that it's the file system that's doing the work in this scenario, not BiT), and I especially hope it's accurate and that someone more knowledgeable will corroborate or correct me.

wolftune · 2018-02-28T14:58:18Z

For files that haven't changed, that makes sense. But, for the simplest case, what happens if there are two snapshots and I delete the first one? I assume in that case that all links from the second to the first are replaced by actually copying the files. Otherwise the second snapshot would be left with almost nothing!

colinl · 2018-02-28T15:23:09Z

When a file is initially created it is effectively created as a hard link to the actual file contents. When another hard link to that file is created then it is just another hard link to the same contents. There is no difference between the first link and the second. So you can delete either one of them and the other will still be a link to the original contents. The contents will not be removed until both links have been deleted.

wolftune · 2018-02-28T15:41:08Z

@colinl so you are saying that the very first snapshot is itself made of hard links and the file contents are effectively separate? And that when any snapshot is deleted, BIT knows to check whether each deleted link is present in any other snapshot, and if so leave the contents alone but if not, delete the contents?

colinl · 2018-02-28T15:57:46Z

@wolftune no, that is not really what I am saying, though the result is the same. What I described is how the linux file system works, it is nothing to do with BIT. You can try it yourself. Create a file, make a hard link to the file, and delete the original and the second one will still be there. All that happens when you delete the original is that you delete the original link to the file. In fact when you create a hard link to an existing file there is absolutely no difference between the original link and the new one (apart from being in different places of course and possibly having different names). There is no way that you can tell which one was created first for example (as far as I know).

wolftune · 2018-02-28T16:06:50Z

So, the point being that data is on a storage device regardless of whether there are hard links, but the existence of any hard links within any particular media device makes sure that the file system will not overwrite the data, but if there are no remaining hard links, then the file system allows that data to be overwritten?

colinl · 2018-02-28T16:37:30Z

@wolftune, I think you still have not fully got it. There is no such thing as a file without any hard links. When we talk about a file what we really mean is a hard link to the file contents. When a file is initially created the file contents are written and a hard link is made to those contents. When we talk about deleting a file that is strictly a misuse of words. What we really mean by deleting a file is deleting a link to the file contents. If there is only one link to the file (as is the case normally) then file contents will also be deleted (well released actually, the area of disc is not overwritten, it is just made available for reuse). If there is more than one link, as is the case if a file is created then a hard link made (a second snapshot for example), then the second link will still exist so the contents will not be deleted.

wolftune · 2018-02-28T17:26:00Z

Right, you're confirming what I wrote above.

There is no such thing as a file without any hard links.

I didn't say that. I referenced "data" as in the binary bits on some storage media in some non-random state that captures the "contents" (to use your wording).

So, to clarify finally (I hope): all the redundancy in BIT snapshots are in the form of duplicated hard links. The file system recognizes them all and knows not to release the relevant area of the storage medium unless all of the hard links are deleted.

To reiterate: after a new snapshot is completed, it doesn't rely at all on any previous snapshot. I could even manually remove older snapshots outside of BIT. It's the file system that knows that some of the deletions of hard-links do not release the storage area for rewrite because the file system is aware of the remaining hard-links in the new snapshot.

Do I get it now?

glyndon · 2018-02-28T18:49:33Z

Sounds like you *do* get it. An analogy I have found helpful is that of a classic (real-world) library, with books on shelves by number, and indexed by a card catalog. There can be any number of cards in the catalog (analogous to filenames in directories) which 'point to' the same book on a shelf. e.g., one card is alphabetic by author, another card may be in a subject-matter catalog, another in a different subject matter catalog. All pointing to the same book on the shelf. As long as one of those cards remains, the book remains also. If/when the last card pointing to a book is removed, then the librarian (the filesystem algorithms) also discards the book, making the shelf space it occupied available for use when a new book needs to be stored. For index cards which point to the same book, they have no effect on one another. New index cards can be added (e.g. if someone wanted to build a card index based on the books' cover colors), and others removed (e.g. when the book is deemed irrelevant to one of its subject indices), and doing so alters neither the book nor the other index cards which point to it. So, each time BiT asks rsync to add a new snapshot, rsync checks whether the file has changed since the last snapshot. If no change, rsync creates a new hard link to the existing copy (a very efficient action), rather than transferring a new copy (and redundantly use new space for it). After a while, there are links to that same file in a number of snapshots made over time. (again, we are presuming the file is unchanged in this example). So, removing the *first* link to the file (i.e. deleting the original snapshot) doesn't affect the stored file at all. It only reduces the count of links pointing to it. Intuitively, we tend to think of the file as "living" in the folder of the first snapshot (it took me a long time to get my head around this). But once a second snapshot points links to the same file, those links exist as equals - neither one having any more 'ownership' of the file than the other. Chronological order of the creation of the links does not matter for the continued presence of the file; it's only necessary that there remain at least one link to it in existence.

On Wed, Feb 28, 2018 at 12:35 PM Aaron Wolf ***@***.***> wrote: Right, you're confirming what I wrote above. There is no such thing as a file without any hard links. I didn't say that. I referenced "data" as in the binary bits on some storage media in some non-random state that captures the "contents" (to use your wording). So, to clarify finally (I hope): all the redundancy in BIT snapshots are in the form of duplicated hard links. The file system recognizes them all and knows not to release the relevant area of the storage medium unless *all* of the hard links are deleted. To reiterate: after a new snapshot is completed, it doesn't rely *at all* on any previous snapshot. I could even *manually* remove older snapshots outside of BIT. It's the file system that knows that some of the deletions of hard-links do not release the storage area for rewrite because the file system is aware of the remaining hard-links in the new snapshot. Do I get it now? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#875 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH88XJfeFXeZzmDdKwXR-3IQQwBgQi3fks5tZYwpgaJpZM4SV4RL> .

-- Gary Dobbins (if sent from the phone, typos due to fat thumbs and autocorrect)

wolftune · 2018-02-28T19:03:47Z

If/when the last card pointing to a book is removed, then the librarian (the filesystem algorithms) also discards the book

That is actually intuitive to me, and that's where I get worried. The librarian has to be a superhuman computer to know when removing a card that it's the last card. There's nothing on the card itself to tell you that. I'm sure this is where some thing could go wrong, like a hard link to data across drives or something. In this case, the conclusion I make is that the file system is that superhuman computer that actually does the intensive process of knowing whether a removed link is the last one.

glyndon · 2018-02-28T19:26:34Z

No analogy is perfect, and mine is no exception.
We just have to trust that the filesystem does the right thing and knows what it's doing.
Every time we "delete a file" (a simple file with only one link to it) we're removing the last link to it, and the filesystem handler knows it's taking the last link (because the count of links goes to 0 on the file's metadata), so it knows to also release the file's inode for reuse.

Trust, but verify:
One of the things the fsck utility does is confirm that every occupied inode has at least one link to it, and that every link points to a valid occupied inode. If one is missing, the utility creates new links in the 'lost&found' folder so you can reach a formerly 'orphan' file whose last link may have been damaged (e.g. by a read error on the sector containing the directory).

This behaviour is hinted at in the Posix API where the function call to delete a file is called 'unlink()'
:D

wolftune · 2018-02-28T19:42:38Z

Okay, so besides my edification, it seems reasonable to ask if the BIT docs clarify that even though each snapshot is incremental (fast backups, only needs to copy changed files) each snapshot has no reliance on any other (or some similar clarification about the situation). It would be nice to avoid the worry I started with about wondering about such dependencies.

DonEdwards · 2018-06-28T07:58:21Z

It's easy to make the mistake of thinking that a file's identity is its path and name. Particularly because that is actually true on FAT file systems.

But on Linux/Unix file systems (also on NTFS but you have to dig a lot to verify it), a file's identity is its inode. Not its path and name. They are just tools (called hard links) for finding the inode. Just like a screw doesn't care which of your two compatible screwdrivers you use on it, an inode doesn't care - or, for that matter, know - which hard link you use to find it. Nor do the hard links know anything about each other, not even that they exist, so they have no precedence.

The inode does, however, know how many hard links point at it. That's a counter in the inode, that is updated whenever a hard link is added or deleted.

Note: symbolic links are different. I won't go into them here.

Now when looking at what BIT does, you can choose to look at either the process (an incremental backup) or the result (a full backup). Both are correct, but if the backups will run at a time of low or no other activity - or run very fast because they'll have very little to do - then I think the latter is more useful.

aryoda mentioned this issue Aug 24, 2022

Documentation: Master issue for improvments (please reference all doc-related issues here) #1276

Closed

emtiu added the Documentation label Sep 8, 2022

buhtz added this to the 1.3.5 or 1.4.0 milestone Mar 7, 2023

buhtz added Medium GOOD FIRST ISSUE Used by 24pullrequests.com to suggest issues HELP-WANTED Used by 24pullrequests.com to suggest issues labels Dec 7, 2023

buhtz mentioned this issue May 28, 2024

Add Sleep/Hibernate option to "shutdown after backup"? #1268

Open

buhtz modified the milestones: Upcoming release (1.5.0), 2nd release from now Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better documentation of what happens with Remove Snapshot? #875

better documentation of what happens with Remove Snapshot? #875

wolftune commented Feb 28, 2018

glyndon commented Feb 28, 2018 •

edited

Loading

wolftune commented Feb 28, 2018

colinl commented Feb 28, 2018

wolftune commented Feb 28, 2018

colinl commented Feb 28, 2018

wolftune commented Feb 28, 2018

colinl commented Feb 28, 2018

wolftune commented Feb 28, 2018

glyndon commented Feb 28, 2018 via email

wolftune commented Feb 28, 2018

glyndon commented Feb 28, 2018

wolftune commented Feb 28, 2018

DonEdwards commented Jun 28, 2018

better documentation of what happens with Remove Snapshot? #875

better documentation of what happens with Remove Snapshot? #875

Comments

wolftune commented Feb 28, 2018

glyndon commented Feb 28, 2018 • edited Loading

wolftune commented Feb 28, 2018

colinl commented Feb 28, 2018

wolftune commented Feb 28, 2018

colinl commented Feb 28, 2018

wolftune commented Feb 28, 2018

colinl commented Feb 28, 2018

wolftune commented Feb 28, 2018

glyndon commented Feb 28, 2018 via email

wolftune commented Feb 28, 2018

glyndon commented Feb 28, 2018

wolftune commented Feb 28, 2018

DonEdwards commented Jun 28, 2018

glyndon commented Feb 28, 2018 •

edited

Loading