PeerDB is a Postgres-compatible SQL interface to seamlessly integrate multiple data-stores. It enables you to sync, transform and query data across your stores using simple SQL commands. PeerDB takes a datastore native approach in engineering — enabling 10x faster and a highly reliable ETL experience for you.
We are starting with Postgres, Snowflake and BigQuery as the supported data-stores and plan to expand to others based on user-feedback.
You can use PeerDB for any of the below use-cases:
- Real-time sync (CDC) across stores.
- Customized ETL across data-stores using SQL
- Federated query workloads - Query multiple data-stores through a common SQL interface
git clone --recursive [email protected]:PeerDB-io/peerdb.git
cd peerdb
# Run docker containers: peerdb-server, postgres as catalog, temporal
export COMPOSE_PROJECT_NAME=peerdb-stack
docker compose up
# connect to peerdb and query away
psql "port=9900 host=localhost password=peerdb"
- More details on adding PEERs available here
- More details on creating MIRRORs available here
- Detailed documentation available here.
Existing ETL tools primarily focus on supporting a wide range of data-stores. However, they fall short in providing a rich experience for any two specific data-stores. This becomes evident when your workloads need scale or have demanding feature requirements. It is common for such users to try out these tools and fail – tools not meeting their performance and reliability SLAs or lacking the required features. Such users resort to developing their own in-house solutions, investing a lot of time and resources.
PeerDB takes a data-store first approach to ETL. It supports a set of highly adopted stores, implements multiple infrastructural and data-store native optimizations, providing a highly scalable and a feature-rich ETL experience. For example, in a sync from Postgres to BigQuery or Snowflake, PeerDB is 10 times faster than other tools. We are database experts and believe that an ETL tool should be datastore centric, than a hodge-podge of too many connectors.
The Postgres-compatible SQL interface for ETL is unique to PeerDB and enables you to operate in a language you are familiar with. You can do ETL the same way you work with your databases.
You can use Postgres’ eco-system to manage your ETL —
- Client tools like pgadmin, psql to run SQL commands.
- BI tools like grafana, tableau to visually monitor syncs and transforms.
- Database migration and versioning tools like Flyway to manage your ETL.
- Any language (Python, Go, Node.JS etc) and Scheduler (AirFlow) for development.
- And many more
Currently PeerDB is in development phase. We have not launched yet. Below tables captures different features and their state
Query supported data-stores with a Postgres-compatible SQL interface
Data-store | Support | Status |
---|---|---|
BigQuery | SELECT commands | STABLE |
Snowflake | SELECT commands | Beta |
PostgreSQL | DML + SELECT commands | Beta |
Sync and transform data-from one store to another using CREATE MIRROR SQL command.
Real-time syncing of data from source to target based on change-feed or CDC (logical decoding in the Postgres world)
Feature | Source | Target | Status |
---|---|---|---|
CDC | PostgreSQL | BigQuery | Beta |
CDC | PostgreSQL | Snowflake | Beta |
CDC | PostgreSQL | Kafka | Beta |
Initial Load | PostgreSQL | BigQuery | Coming Soon! |
Initial Load | PostgreSQL | Snowflake | Coming Soon! |
Continuous syncing of data from source to target based on any SELECT query on the source. So this is basically a pre-transform - i.e. transform data on the source before syncing it to the target.
Source | Target | Status |
---|---|---|
PostgreSQL | BigQuery | Beta |
PostgreSQL | Snowflake | Beta |
PostgreSQL | S3 | Under development |
PeerDB is licensed under Elastic License 2.0 (ELv2). Please see the LICENSE file for additional information. If you have any licensing questions please email [email protected]