The core function of dbt is SQL compilation and execution. Users create projects of dbt resources (models, tests, seeds, snapshots, ...), defined in SQL and YAML files, and they invoke dbt to create, update, or query associated views and tables. Today, dbt makes heavy use of Jinja2 to enable the templating of SQL, and to construct a DAG (Directed Acyclic Graph) from all of the resources in a project. Users can also extend their projects by installing resources (including Jinja macros) from other projects, called "packages."
Most of the python code in the repository is within the core/dbt
directory.
single python files
: A number of individual files, such as 'compilation.py' and 'exceptions.py'
The main subdirectories of core/dbt:
adapters
: Define base classes for behavior that is likely to differ across databasesclients
: Interface with dependencies (agate, jinja) or across operating systemsconfig
: Reconcile user-supplied configuration from connection profiles, project files, and Jinja macroscontext
: Build and expose dbt-specific Jinja functionalitycontracts
: Define Python objects (dataclasses) that dbt expects to create and validatedeps
: Package installation and dependency resolutionevents
: Logging eventsgraph
: Produce anetworkx
DAG of project resources, and selecting those resources given user-supplied criteriainclude
: The dbt "global project," which defines default implementations of Jinja2 macrosparser
: Read project files, validate, construct python objectstask
: Set forth the actions that dbt can perform when invoked
Legacy tests are found in the 'test' directory:
unit tests
: Unit testsintegration tests
: Integration tests
The "tasks" map to top-level dbt commands. So dbt run
=> task.run.RunTask, etc. Some are more like abstract base classes (GraphRunnableTask, for example) but all the concrete types outside of task should map to tasks. Currently one executes at a time. The tasks kick off their “Runners” and those do execute in parallel. The parallelism is managed via a thread pool, in GraphRunnableTask.
core/dbt/task/docs/index.html This is the docs website code. It comes from the dbt-docs repository, and is generated when a release is packaged.
dbt uses an adapter-plugin pattern to extend support to different databases, warehouses, query engines, etc. Note: dbt-postgres used to exist in dbt-core but is now in its own repo
Each adapter is a mix of python, Jinja2, and SQL. The adapter code also makes heavy use of Jinja2 to wrap modular chunks of SQL functionality, define default implementations, and allow plugins to override it.
Each adapter plugin is a standalone python package that includes:
dbt/include/[name]
: A "sub-global" dbt project, of YAML and SQL files, that reimplements Jinja macros to use the adapter's supported SQL syntaxdbt/adapters/[name]
: Python modules that inherit, and optionally reimplement, the base adapter classes defined in dbt-coresetup.py
The Postgres adapter code is the most central, and many of its implementations are used as the default defined in the dbt-core global project. The greater the distance of a data technology from Postgres, the more its adapter plugin may need to reimplement.
The test/
subdirectory includes unit and integration tests that run as continuous integration checks against open pull requests. Unit tests check mock inputs and outputs of specific python functions. Integration tests perform end-to-end dbt invocations against real adapters (Postgres, Redshift, Snowflake, BigQuery) and assert that the results match expectations. See the contributing guide for a step-by-step walkthrough of setting up a local development and testing environment.
- docker: All dbt versions are published as Docker images on DockerHub. This subfolder contains the
Dockerfile
(constant) andrequirements.txt
(one for each version). - etc: Images for README
- scripts: Helper scripts for testing, releasing, and producing JSON schemas. These are not included in distributions of dbt, nor are they rigorously tested—they're just handy tools for the dbt maintainers :)