Skip to content

Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads.

License

Notifications You must be signed in to change notification settings

mrdavidlaing/versatile-data-kit

 
 

Repository files navigation

Versatile Data Kit

Last Activity license pre-commit build status twitter YouTube Channel Subscribers

Overview

Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads (referred to as "Data Jobs"). A "Data Job" enables Data Engineers to implement automated pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database.

About Versatile Data Kit

Versatile Data Kit provides an abstraction layer that helps solve common data engineering problems. It can be called by the workflow engine with the goal of making data engineers more efficient (for example, it ensures data applications are packaged, versioned and deployed correctly, while dealing with credentials, retries, reconnects, etc.). Everything exposed by Versatile Data Kit provides built-in monitoring, troubleshooting, and smart notification capabilities. For example, tracking both code and data modifications and the relations between them enables engineers to troubleshoot more quickly and provides an easy revert to a stable version.

Versatile Data Kit consists of:

  • Control Service which enables creating, deploying, managing and executing Data Jobs in a Kubernetes runtime environment. It offers multitenancy support, SSO, Access Control and auditing capabilities. It exposes CLI.
  • A development Kit to develop, test and run Data Jobs on your machine. It comes with common functionality for data ingestion and processing.

Installation and Getting Started

Install Versatile Data Kit SDK

pip install -U pip setuptools wheel
pip install quickstart-vdk

Note that Versatile Data Kit requires Python 3.7+.

See the Installation page for more details.

Use

# see Help to see what you can do
vdk --help

Check out the Getting Started page to create and run your first Data Job.

Documentation

Official documentation for Versatile Data Kit can be found here.

Contributing

If you are interested in contributing as a developer, visit CONTRIBUTING.md.

Contacts

Feedback is very welcome via the GitHub site as issues or pull requests

Join our mailing list or follow us on twitter. Subscribe to the Versatile Data Kit YouTube Channel. Join our dedicated Slack channel on the CNCF Slack workspace - simply search for #versatile-data-kit.

Code of Conduct

Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels and mailing lists is expected to follow the Code of Conduct.

About

Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Python 52.5%
  • Java 45.8%
  • Shell 1.3%
  • Other 0.4%