CDF (Continuous Data Framework)

Craft end-to-end data pipelines and manage them continuously

📖 Table of Contents

📖 Table of Contents
📍 Overview
📦 Features
🚀 Getting Started
📚 Documentation
🤝 Contributing
🛣 Roadmap
📄 License
👏 Acknowledgments

📍 Overview

CDF (Continuous Data Framework) is an integrated framework designed to manage data across the entire lifecycle, from ingestion through transformation to publishing. It is built on top of two open-source projects, sqlmesh and dlt, providing a unified interface for complex data operations. CDF simplifies data engineering workflows, offering scalable solutions from small to large projects through an opinionated project structure that supports both multi-workspace and single-workspace layouts.

Sources are consumed in pipelines. Pipelines procedurally describe the extraction of one or more sources into a single dataset. The combination of the pipeline script and some static configuration comprises what is called a specification. A pipeline is executed with a sink that is externally injected. Sinks describe a single logical location. This means pipelines are parameterized and the same specification can be reused across different sinks. Sinks are scripts which export an ingest variable that pipelines use and a transform variable which models use. Models are transformations of data within a sink.

Features

Unified Data Management: Seamlessly manage data pipelines, transformations, and publishing within a single framework.
Opinionated Project Structure: Adopt a scalable project structure that grows with your data needs, from single to multiple workspaces.
Automated Environment Management: Automatically manage virtual environments to isolate and manage dependencies.
Automated Component Discoverability: Automatically discover pipelines, models, publishers, and other components within your workspace.
Enhanced Configuration Management: Leverage automated configuration management for streamlined setup and deployment.
Extensible and Scalable: Designed to scale from small to large data projects, providing extensible components for custom operations.

Getting Started

Installation:

CDF requires Python 3.8 or newer. Install CDF using pip:
```
pip install cdf
```
Initialize a Workspace or Project:

Create a new workspace or project in your desired directory:
```
cdf init-workspace /path/to/workspace
# or
cdf init-project /path/to/project
```
Run Pipelines and Scripts:

Execute data pipelines, scripts, or notebooks within your workspace:
```
cdf pipeline workspace_name.pipeline_name
cdf execute-script workspace_name.script_name
```
Publish Data:

Publish transformed data to external systems or sinks:
```
cdf publish workspace_name.publisher_name
```

Documentation

For detailed documentation, including API references and tutorials, visit CDF Documentation.

Contributing

Contributions to CDF are welcome! Please refer to the contributing guidelines for more information on how to submit pull requests, report issues, or suggest enhancements.

License

CDF is licensed under MIT License.

This README provides an overview of the CDF tool, highlighting its primary features, installation steps, basic usage examples, and contribution guidelines. It serves as a starting point for users to understand the capabilities of CDF and how it can be integrated into their data engineering workflows.

🧪 Tests

Run the tests with pytest:

pytest tests

🛣 Project Roadmap

TODO: Add a roadmap for the project.

🤝 Contributing

Contributions are welcome! Here are several ways you can contribute:

Submit Pull Requests: Review open PRs, and submit your own PRs.
Join the Discussions: Share your insights, provide feedback, or ask questions.
Report Issues: Submit bugs found or log feature requests for z3z1ma.

Contributing Guidelines

Click to expand

Fork the Repository: Start by forking the project repository to your GitHub account.
Clone Locally: Clone the forked repository to your local machine using a Git client.
```
git clone <your-forked-repo-url>
```
Create a New Branch: Always work on a new branch, giving it a descriptive name.
```
git checkout -b new-feature-x
```
Make Your Changes: Develop and test your changes locally.
Commit Your Changes: Commit with a clear and concise message describing your updates.
```
git commit -m 'Implemented new feature x.'
```
Push to GitHub: Push the changes to your forked repository.
```
git push origin new-feature-x
```

7a. Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.

Once your PR is reviewed and approved, it will be merged into the main branch.

📄 License

This project is distributed under the Apache 2.0 License. For more details, refer to the LICENSE file.

👏 Acknowledgments

Harness (https://harness.io/) for being the proving grounds in which the initial concept of this project was born.
SQLMesh (https://sqlmesh.com) for being a foundational pillar of this project as well as the team for their support, advice, and guidance.
DLT (https://dlthub.com) for being the other foundational pillar of this project as well as the team for their support, advice, and guidance.

Return

Name		Name	Last commit message	Last commit date
Latest commit History 526 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/cdf		src/cdf
tests		tests
.gitignore		.gitignore
.neoconf.json		.neoconf.json
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDF (Continuous Data Framework)

Craft end-to-end data pipelines and manage them continuously

📖 Table of Contents

📍 Overview

Features

Getting Started

Documentation

Contributing

License

🧪 Tests

🛣 Project Roadmap

🤝 Contributing

Contributing Guidelines

📄 License

👏 Acknowledgments

About

Releases

Languages

License

z3z1ma/cdf

Folders and files

Latest commit

History

Repository files navigation

CDF (Continuous Data Framework)

Craft end-to-end data pipelines and manage them continuously

📖 Table of Contents

📍 Overview

Features

Getting Started

Documentation

Contributing

License

🧪 Tests

🛣 Project Roadmap

🤝 Contributing

Contributing Guidelines

📄 License

👏 Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages