Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object Versioning for posix/scoutfs #678

Open
benmcclelland opened this issue Jul 16, 2024 · 7 comments · May be fixed by #708
Open

Object Versioning for posix/scoutfs #678

benmcclelland opened this issue Jul 16, 2024 · 7 comments · May be fixed by #708
Assignees
Labels
enhancement New feature or request

Comments

@benmcclelland
Copy link
Member

Describe the solution you'd like
We would like to optionally support object versioning compatible with AWS S3. The following requirements/behaviors are expected:

  • Only enabled when specifically configured
  • Storing object versions will consume filesystem capacity to store each version of an object. It is possible some filesystems can de-duplicate the data extents, but that is outside of the scope of the gateway and the gateway wont do anything specific to enable this.
  • Versioning can be enabled/disabled per bucket once configured for the gateway, defaulting to disabled

Objectives
Versioning behavior compatible to AWS S3 when enabled.
AWS documentation can be found here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html

Design
To enable, a directory should be configured for where to store the non-current object versions. The older object versions should not be stored within the gateway root namespace to prevent confusion when accessing the namespace outside of S3. When deleting or uploading an existing object, the older version can be moved to the version directory. If the version directory is within the same filesystem, then the move will likely happen fast not needing to re-write all the file data. If it is not within the same filesystem, then the move will have to copy all file data to the new location. This is handled automatically in file renaming.

Version Namespace
The directory structure for the older object versions does not need to be a compatible namespace with posix filenames like the primary namespace does. The easiest namespace for these would be based on a sha256 hash of the object name, and creating a small directory structure with that name. The top level directory will still need to be the bucket to prevent collisions across buckets. To be nicer to posix filesystems and not have all objects in the same directory, we can split the object name hash into directories based on the first few bytes of the hash. This is a common tactic in other projects.
For example,

bucket: mybucket object: dir1/dir2/myobject
sha256("dir1/dir2/myobject") = cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100

location of version "1":

<version directory>/mybucket/ce/fc/88/cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100/1

Version IDs
Each object version in the version namespace has an ID associated with it in AWS that uniquely identifies that object version. We can explore a few options here:

  • timestamp
  • counter
  • random string
  • uuid
    There needs to be some consideration in ordering when listing results. The list object versions probably expects results in time order?

Delete Markers
When an object is deleted, the current object gets moved to versioning and a new empty object gets placed in the primary namespace with a delete marker attribute indicating that this object shouldn't be listed or retrieved (as it was deleted). But older versions can still be restored to replace the delete marker object.
We will likely just add a new xattr to signify that the file is a delete marker, and handle this accordingly in the listing walks.

list-object-versions
We need to enable listing of the object versions as well as objects when list-object-versions called. This can be handled in the listing walk function to look into the version namespace for each object visited. The walk function results may need to be modified for handling versioning.

RFC
This is intended to be RFC style open to comments. Any requirement changes or design change proposals can be discussed in issue comments.

@benmcclelland benmcclelland added the enhancement New feature or request label Jul 16, 2024
@benmcclelland
Copy link
Member Author

We probably need to consider the best naming conventions for the versions. In the above example I listed the version file name as "1", but maybe a timestamp, uuid, or something else should also be considered. I assume it is expected that list versions would list the versions in mtime order? So we may want something sortable in the way the API response expects.

@benmcclelland
Copy link
Member Author

I think it only makes sense to version file-objects, not directory-objects since versioning directory contents wouldnt really be possible.

@jonaustin09
Copy link
Contributor

We probably need to prevent the versioning directory from being the same as or in the root directory of the posix/scoutfs backend ? As there will be bucket/object collision.

@jonaustin09
Copy link
Contributor

jonaustin09 commented Jul 18, 2024

As object versions are ordered by the modification date in ListObjectVersions, a reasonable solution would be to choose version names as the Unix nanoseconds of the last modification date.

For example:

bucket: mybucket object: dir1/dir2/myobject
sha256("dir1/dir2/myobject") = cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100

// Object versions location
<version directory>/mybucket/ce/fc/88/cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100/<version_1creation_nano_seconds>
directory>/mybucket/ce/fc/88/cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100/<version_2creation_nano_seconds>
...

However, the downside of this solution is that the GET by VersionID operation becomes expensive because the exact location of the VersionID can only be determined after listing all the versions(in worst scenario).

@jonaustin09
Copy link
Contributor

The VersionID could simply be a UUID, as it doesn't affect the sorting of object versions. The only requirement is that it must be unique.

The important thing to note is that the VersionID uniquely identifies a version of an object within a bucket. VersionIDs may vary in length but generally look like the following:

3/L4kqtJlcpXroDTDmJ+rmspuAmTeQhF3

@benmcclelland
Copy link
Member Author

@jonaustin09
Copy link
Contributor

Another solution which would best match to our use case is using lexicographically sortable, timestamp dependent uuids:
https://github.com/oklog/ulid

@jonaustin09 jonaustin09 linked a pull request Jul 31, 2024 that will close this issue
@jonaustin09 jonaustin09 closed this as completed by moving to Done in VersityGW Project Aug 8, 2024
@benmcclelland benmcclelland reopened this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Review
Development

Successfully merging a pull request may close this issue.

2 participants