kafka-connect-jdbc is a Kafka Connector for loading data to and from any JDBC-compatible database.
Documentation for this connector can be found here.
Original project can be found here. The enhancements made in this fork are primarily for Postgres. They are as follows:
Added support for anonymizing data at a column specific level for any database table. SHA-256 encryption is used for anonymization. Columns with the following data types are supported - Text, TextArray, Json. To anonymize a column add the following in the SourceConnector configuration.
<table-name>.anonymize.column.name = <column-name>
.
Schema's without an incrementing/timestamp column in one or more tables have to use the inefficient bulk mode for polling every table. This enhancement allows setting polling modes in the SourceConnector at a table specific level. This allows configuring of different polling modes for each table in the SourceConnector configuraion.
<table-name>.mode = incrementing
<table-name>.incrementing.column.name = <column-name>
Built in support for data deduplication is already provided using the upsert mode. An update is performed instead of an insert in the Sink database using a record's Primary Key. However primary keys are not passed from Source to Sink. We pass Primary Keys from Source to Sink by creating a KeySchema for each record and use upsert mode for data deduplication.
insert.mode = upsert
pk.fields = record_key
To build a development version you'll need a recent version of Kafka. You can build kafka-connect-jdbc with Maven using the standard lifecycle phases.
- Source Code: https://github.com/confluentinc/kafka-connect-jdbc
- Issue Tracker: https://github.com/confluentinc/kafka-connect-jdbc/issues
The project is licensed under the Apache 2 license.