Skip to content

wanlipu/insight-de

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synchronize Databases with Kafka Connect

Real-time synchronization from PostgreSQL to Amazon Redshift with Kafka Connect API in distributed mode, and the same approach can be applied to other database systems.

Link to my presentation.
Link to my live demo on youtube.

Introduction

Kafka Connect API is a core component of Apache Kafka platform, and it provides a scalable and fault-tolerant database synchronization option between various database systems.

In this project, a streaming data pipeline was created with Kafka Connect API to continuously capture any changes in a PostgreSQL database and replicate them into an Amazon Redshift data warehouse.

Architecture

Once the streaming pipeline is constructed, a snapshot of the source PostgreSQL database will be captured, and all data in the source database will be streaming through a Kafka cource connector, a Kafka broker cluster, and a Kafka sink connector into Amazon Redshift data warehouse. architecture

When there are any changes in the source database, the streaming system will capture them and replicate them into the Amazon Redshift data warehouse. new_data

System Setup

Deploy Kafka Brokeer cluster and Kafka Connect cluster on AWS EC2 instances with Ansible Playbook

Postgres Node

  • Simply Install: PostgreSQL
    • sudo apt update
    • sudo apt upgrade -y
    • sudo apt install postgresql postgresql-contrib
    • sudo service postgresql start

Amazon Redshift

Run Demo

  • Create tables in PostgreSQL database
    • python3 /postgres/create_tables.py <db-name> <user> <password> <server-address> <port> <table-name> <sample-file>
  • Create tables on Amazon Redshift cluster
    • python3 /redshift/create_tables.py <db-name> <user> <password> <server-address> <port> <table-name>
  • Run Kafka source connector
    • bash postgresql_source.sh
  • Run Kafka sink connector
    • bash redshift_sink.sh
  • Load more data into PostgreSQL database
    • python3 /postgres/create_tables.py <db-name> <user> <password> <server-address> <port> <table-name> <data-file>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published