gitoxide 0.3.0

A command-line application for interacting with git repositories
gitoxide-0.3.0 is not a library.
Visit the last successful build: gitoxide-0.37.0

Rust Crates.io

gix is a command-line interface (CLI) to access git repositories. It's written to optimize the user-experience, and perform as good or better than the native implementation.

Furthermore it provides an easy and safe to use API in the form of various small crates for implementing your own tools in a breeze. Please see 'Development Status' for a listing of all crates and their capabilities.

asciicast

Development Status

  • gitoxide (CLI)
    • please note that all functionality comes from the gitoxide-core library, which mirrors these capabilities and itself relies on all git-* crates.
    • limit amount of threads used in operations that support it.
    • repository
      • init
    • plumbing
      • pack verify
      • pack index verify including each object sha1 and statistics
      • pack explode, useful for transforming packs into loose objects for inspection or restoration
        • verify written objects (by reading them back from disk)
      • index from pack - create an index file by streaming a pack file as done during clone
        • support for thin packs (as needed for fetch/pull)
  • git-object
    • decode (zero-copy) borrowed objects
      • commit
      • tree
      • tag
    • encode owned objects
      • commit
      • tree
      • tag
    • transform borrowed to owned objects
    • API documentation with examples
  • git-odb
    • loose objects
      • traverse
      • read
        • into memory
        • streaming
        • verify checksum
      • streaming write for blobs
      • buffer write for small in-memory objects/non-blobs
    • packs
      • traverse pack index
      • 'object' abstraction
        • decode (zero copy)
        • verify checksum
      • simple and fast pack traversal
      • decode
        • full objects
        • deltified objects
      • streaming
        • decode a pack from Read input
        • Read to Iterator
          • read as is, verify hash, and restore partial packs
        • create index from pack alone
          • various memory options allow trading off speed for lower memory consumption
          • resolve 'thin' packs
      • encode
        • create new pack
        • create 'thin' pack
      • verify pack with statistics
        • brute force - less memory
        • indexed - faster, but more memory
      • advanced
        • Multi-Pack index file (MIDX)
        • 'bitmap' file
    • API documentation with examples
    • sink
      • write objects and obtain id
    • alternates
      • database that act as link to other known ODB types on disk
      • handles cycles
      • handles recursive configurations
    • multi-odb
      • an ODB for object lookup from multiple lower level ODB at once
    • promisor
      • It's vague, but these seems to be like index files allowing to fetch objects from a server on demand.
  • git-repository
    • initialize
      • Proper configuration depending on platform (e.g. ignorecase, filemode, …)
    • read and write all data types
    • rev-parsing and ref history
    • remotes with push and pull
    • configuration
    • merging
    • stashing
    • API documentation with examples
    • Commit Graph - split and unsplit
  • git-config
    • read and write git configuration files
    • API documentation with examples
  • git-ref
    • Handle symbolic references and packed references
    • discover them in typical folder structures
    • name validation
    • API documentation with examples
  • git-index
    • read and write a git-index file
    • add and remove entries
    • API documentation with examples
  • git-diff
    • diffing of git-object::Tree structures
    • diffing, merging, working with hunks of data
    • find differences between various states, i.e. index, working tree, commit-tree
    • API documentation with examples
  • git-protocol
    • client side
    • server side
  • git-transport
    • client side
    • server side
    • via ssh
      • push
      • pull
    • via https
      • push
      • pull
    • API documentation with examples
  • git-features
    • parallel feature toggle
      • When on…
        • in_parallel
        • join
      • When off all functions execute serially
  • git-tui
    • a terminal user interface seeking to replace and improve on tig
  • Stress Testing
    • Verify huge packs
    • Explode a pack to disk
    • Generate huge back from a lot of loose objects
  • Ideas for Demos
    • A simple git-hours clone
    • Open up SQL for git using sqlite virtual tables. Check out gitqlite as well. What would an MVP look like? Maybe even something that could ship with gitoxide.

Installation

Binary Release

curl -LSfs https://raw.githubusercontent.com/byron/git-oxide/master/ci/install.sh | \
    sh -s -- --git byron/git-oxide --crate gix-max-termion

See the releases section for manual installation and various alternative builds that are slimmer or smaller, depending on your needs, for Linux, MacOS and Windows.

Cargo

cargo is the Rust package manager which can easily be obtained through rustup. With it, you can build your own binary effortlessly and for your particular CPU for additional performance gains.

# The default installation, 'max'
cargo install gitoxide

# On linux, it's a little faster to compile the termion version, which also results in slightly smaller binaries
cargo install gitoxide --no-default-features --features max-termion

# For smaller binaries and even faster build times that are traded for a less fancy CLI implementation, use `lean`
# or `lean-termion` respectively.
cargo install gitoxide --no-default-features --features lean

Usage

Once installed, there are two binaries:

  • gix
    • high level commands, porcelain, for every-day use, optimized for a pleasant user experience
  • gixp
    • low level commands, plumbing, for use in more specialized cases

Project Goals

  • a pure-rust implementation of git
    • including transport, object database, references and cli
    • a simple command-line interface is provided for the most common git operations, optimized for user experience. A simple-git if you so will.
    • be the go-to implementation for anyone who wants to solve problems around git, and become the alternative to GitPython in the process.
    • become the foundation for a free distributed alternative to github.
  • learn from the best to write the best possible idiomatic Rust
    • libgit2 is a fantastic resource to see what abstractions work, we will use them
    • use Rust's type system to make misuse impossible
  • be the best performing implementation
    • use Rust's type system to optimize for work not done without being hard to use
    • make use of parallelism from the get go
  • assure on-disk consistency
    • assure reads never interfere with concurrent writes
    • assure multiple concurrent writes don't cause trouble
  • take shortcuts, but not in quality
    • binaries may use anyhow::Error exhaustively, knowing these errors are solely user-facing.
    • libraries use light-weight custom errors implemented using quick-error.
    • internationalization is nothing we are concerned with right now.
    • IO errors due to insufficient amount of open file handles don't always lead to operation failure

Non-Goals

  • replicate git command functionality perfectly
    • git is git, and there is no reason to not use it. Our path is the one of simplicity to make getting started with git easy.
  • be incompatible to git
    • the on-disk format must remain compatible, and we will never contend with it.
  • use async IO everywhere
    • for the most part, git operations are heavily relying on memory mapped IO as well as CPU to decompress data, which doesn't lend itself well to async IO out of the box.
    • Use blocking as well as git-features::interruptible to bring operations into the async world and to control long running operations.
    • When connecting or streaming over TCP connections, especially when receiving on the server, async seems like a must though. It should be possible to put it behind a feature flag though.

Roadmap to Future

Roadmap to 1.0

Provide a CLI to for the most basic user journey:

  • initialize a repository
  • clone a repository
    • http(s) (or ssh, whatever is easier)
  • create a commit
  • add a remote
  • push
    • create (thin) pack

Cargo features guide

Cargo uses feature toggles to control which dependencies are pulled in, allowing users to specialize crates to fit their usage. Ideally, these should be additive. This guide documents which features are available for each of the crates provided here and how they function.

gitoxide

The top-level command-line interface.

  • fast
    • Makes the crate execute as fast as possible by supporting parallel computation of otherwise long-running functions as well as fast, hardware accelerated hashing.
    • If disabled, the binary will be visibly smaller.
  • (mutually exclusive)
    • pretty-cli
      • Use clap 3.0 to build the prettiest, best documented and most user-friendly CLI at the expense of file size.
      • provides a terminal user interface for detailed and exhaustive progress.
      • provides a line renderer for log-like progress
    • lean-cli
      • Use argh to produce a usable binary with decent documentation that is smallest in size, usually 300kb less than pretty-cli.
      • If pretty-cli is enabled as well, lean-cli will take precedence, and you pay for building unnecessary dependencies.
      • provides a line renderer for log-like progress
  • prodash-render-line-crossterm or prodash-render-line-termion (mutually exclusive)
    • The --verbose flag will be powered by an interactive progress mechanism that doubles as log as well as interactive progress that appears after a short duration.

There are convenience features, which combine common choices of the above into one name

  • max = pretty-cli + fast + prodash-render-tui-crossterm
    • default, for unix and windows
  • max-termion = pretty-cli + fast + prodash-render-tui-termion
    • for unix only, faster compile times, a little smaller
  • lean = lean-cli + fast + prodash-render-line-crossterm
    • for unix and windows, significantly smaller than max, but without --progress terminal user interface.
  • lean-termion = lean-cli + fast + prodash-render-line-termion
    • for unix only, faster compile times, a little smaller
  • light = lean-cli + fast
    • crossplatform by nature as this comes with simplified log based progress
  • small = lean-cli
    • As small as it can possibly be, no threading, no fast sha1, log based progress only, no cleanup of temporary files on interrupt

git-features

A crate to help controlling which capabilities are available from the top-level crate that uses gitoxide-core or any other gitoxide crate that uses git-features. All feature toggles are additive.

  • parallel
    • Use scoped threads and channels to parallelize common workloads on multiple objects. If enabled, it is used everywhere where it makes sense.
    • As caches are likely to be used and instantiated per thread, more memory will be used on top of the costs for threads.
  • fast-sha1
    • a multi-crate implementation that can use hardware acceleration, thus bearing the potential for up to 2Gb/s throughput on CPUs that support it, like AMD Ryzen or Intel Core i3.
  • interrupt-handler
    • Listen to interrupts and termination requests and provide long-running operations tooling to allow aborting the input stream.
      • Note that git_features::interruptible::init_interrupt_handler() must be called at the start of the application.
    • If unset, these utilities will be a no-op which may lead to leaking temporary files when interrupted.
    • If the application already sets a handler, this handler will have no effect.

Serialization Support

What follows is feature toggles to control serialization of all public facing simple data types.

  • serde1
    • Data structures implement serde::Serialize and serde::Deserialize

The feature above is provided by the crates:

  • git-object
  • git-odb
  • gitoxide-core

Plumbing vs Porcelain

Both terms are coming from the git implementation itself, even though it won't necessarily point out which commands are plumbing and which are porcelain. The term plumbing refers to lower-level, more rarely used commands that complement porcelain by being invoked by it or for certain use cases. The term porcelain refers to those with a decent user experience, they are primarily intended for use by humans.

In any case, both types of programs must self-document their capabilities using through the --help flag.

From there, we can derive a few rules to try adhere to:

Plumbing

  • does not show any progress or logging output by default
  • if supported and logging is enabled, it will show timestamps in UTC
  • it does not need a git repository, but instead takes all variables via the command-line

Porcelain

  • Provides output to stderr by default to provide progress information. There is no need to allow disabling it, but it shouldn't show up unless the operation takes some time.
  • If timestamps are shown, they are in localtime.
  • Non-progress information goes to stdout.

Shortcomings

  • lean and light and small builds don't support non-UTF-8 paths
  • Packfiles use memory maps
    • Even though they are comfortable to use and fast, they squelch IO errors.
    • potential remedy: We could generalize the Pack to make it possible to work on in-memory buffers directly. That way, one would initialize a Pack by reading the whole file into memory, thus not squelching IO errors at the expense of latency as well as memory efficiency.
  • Packfiles cannot load files bigger than 231 or 232 on 32 bit systems
    • As these systems cannot address more memory than that.
    • potential remedy: implement a sliding window to map and unmap portions of the file as needed.
  • CRC32 implementation doesn't use SIMD
    • Probably at no cost one could upgrade to the crc32fast crate, but it looks unmaintained and has more code.

Credits

  • itertools (MIT Licensed)
    • We use the izip! macro in code
  • deflate2 (MIT Licensed)
    • We use various abstractions to implement decompression and compression directly on top of the rather low-level miniz_oxide crate

Unused Performance Optimizations

  • miniz-oxide
    • unnecessary buffer reset
      • In the InflateState struct, there is a big 32kb buffer which gets zeroed for every decompression attempt.
      • This costs ~4s for 7.5 million objects.
    • reuse of state between decompressions could be faster
      • Similar to above, there are several occasions when we decompress in an 'all at once', which also requires to recreate a 32kb buffer filled with zeroes. If most of that state could be reused, we would save time when handling millions of objects both during pack lookup as well as pack streaming.

Fun facts

  • Originally I was really fascinated by this problem and believe that with gitoxide it will be possible to provide the fastest solution for it.
  • I have been absolutely blown away by git from the first time I experienced git more than 13 years ago, and tried to implement it in various shapes and forms multiple times. Now with Rust I finally feel to have found the right tool for the job!