Skip to content

Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads.

License

Notifications You must be signed in to change notification settings

ctmcisco/versatile-data-kit

 
 

Repository files navigation

Versatile Data Kit

Last Activity license pre-commit build status

Overview

Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads (referred to as "Data Jobs"). A "Data Job" enables Data Engineers to implement automated pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database.

About Versatile Data Kit

Versatile Data Kit provides an abstraction layer that helps solve common data engineering problems. It can be called by the workflow engine with the goal of making data engineers more efficient (for example, it ensures data applications are packaged, versioned and deployed correctly, while dealing with credentials, retries, reconnects, etc.). Everything exposed by Versatile Data Kit provides built-in monitoring, troubleshooting, and smart notification capabilities. For example, tracking both code and data modifications and the relations between them enables engineers to troubleshoot more quickly and provides an easy revert to a stable version.

Versatile Data Kit consists of:

  • Control Service which enables creating, deploying, managing and executing Data Jobs in a Kubernetes runtime environment. It offers multitenancy support, SSO, Access Control and auditing capabilities. It exposes CLI.
  • A development Kit to develop, test and run Data Jobs on your machine. It comes with common functionality for data ingestion and processing.

Installation and Getting Started

Install Versatile Data Kit SDK

pip install -U pip setuptools wheel
pip install quickstart-vdk

Note that Versatile Data Kit requires Python 3.7+.

See the Installation page for more details.

Use

# see Help to see what you can do
vdk --help

Check out the Getting Started page to create and run your first Data Job.

Documentation

Official documentation for Versatile Data Kit can be found here.

Contributing

If you are interested in contributing as a developer, visit CONTRIBUTING.md.

Contacts

You can join our public Slack workspace by clicking here or request to join our mailing list by emailing here.

Code of Conduct

Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels and mailing lists is expected to follow the Code of Conduct.

About

Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.9%
  • Java 46.9%
  • Shell 1.6%
  • Other 0.6%