ksqlDB

The database purpose-built for stream processing applications

Overview

ksqlDB is a database for building stream processing applications on top of Apache Kafka. It is distributed, scalable, reliable, and real-time. ksqlDB combines the power of real-time stream processing with the approachable feel of a relational database through a familiar, lightweight SQL syntax. ksqlDB offers these core primitives:

Streams and tables - Create relations with schemas over your Apache Kafka topic data
Materialized views - Define real-time, incrementally updated materialized views over streams using SQL
Push queries- Continuous queries that push incremental results to clients in real time
Pull queries - Query materialized views on demand, much like with a traditional database
Connect - Integrate with any Kafka Connect data source or sink, entirely from within ksqlDB

Composing these powerful primitives enables you to build a complete streaming app with just SQL statements, minimizing complexity and operational overhead. ksqlDB supports a wide range of operations including aggregations, joins, windowing, sessionization, and much more. You can find more ksqlDB tutorials and resources here.

Getting Started

Follow the ksqlDB quickstart to get started in just a few minutes.
Read through the ksqlDB documentation.
Take a look at some ksqlDB tutorials for examples of common patterns.

Documentation

See the ksqlDB documentation for the latest stable release.

Use Cases and Examples

Materialized views

ksqlDB allows you to define materialized views over your streams and tables. Materialized views are defined by what is known as a "persistent query". These queries are known as persistent because they maintain their incrementally updated results using a table.

CREATE TABLE hourly_metrics AS
  SELECT url, COUNT(*)
  FROM page_views
  WINDOW TUMBLING (SIZE 1 HOUR)
  GROUP BY url EMIT CHANGES;

Results may be "pulled" from materialized views on demand via SELECT queries. The following query will return a single row:

SELECT * FROM hourly_metrics
  WHERE url = 'https://myurl.com' AND WINDOWSTART = '2019-11-20T19:00';

Results may also be continuously "pushed" to clients via streaming SELECT queries. The following streaming query will push to the client all incremental changes made to the materialized view:

SELECT * FROM hourly_metrics EMIT CHANGES;

Streaming queries will run perpetually until they are explicitly terminated.

Streaming ETL

Apache Kafka is a popular choice for powering data pipelines. ksqlDB makes it simple to transform data within the pipeline, readying messages to cleanly land in another system.

CREATE STREAM vip_actions AS
  SELECT userid, page, action
  FROM clickstream c
  LEFT JOIN users u ON c.userid = u.user_id
  WHERE u.level = 'Platinum' EMIT CHANGES;

Anomaly Detection

ksqlDB is a good fit for identifying patterns or anomalies on real-time data. By processing the stream as data arrives you can identify and properly surface out of the ordinary events with millisecond latency.

CREATE TABLE possible_fraud AS
  SELECT card_number, count(*)
  FROM authorization_attempts
  WINDOW TUMBLING (SIZE 5 SECONDS)
  GROUP BY card_number
  HAVING count(*) > 3 EMIT CHANGES;

Monitoring

Kafka's ability to provide scalable ordered records with stream processing make it a common solution for log data monitoring and alerting. ksqlDB lends a familiar syntax for tracking, understanding, and managing alerts.

CREATE TABLE error_counts AS
  SELECT error_code, count(*)
  FROM monitoring_stream
  WINDOW TUMBLING (SIZE 1 MINUTE)
  WHERE  type = 'ERROR'
  GROUP BY error_code EMIT CHANGES;

Integration with External Data Sources and Sinks

ksqlDB includes native integration with Kafka Connect data sources and sinks, effectively providing a unified SQL interface over a broad variety of external systems.

The following query is a simple persistent streaming query that will produce all of its output into a topic named clicks_transformed:

CREATE STREAM clicks_transformed AS
  SELECT userid, page, action
  FROM clickstream c
  LEFT JOIN users u ON c.userid = u.user_id EMIT CHANGES;

Rather than simply send all continuous query output into a Kafka topic, it is often very useful to route the output into another datastore. ksqlDB's Kafka Connect integration makes this pattern very easy.

The following statement will create a Kafka Connect sink connector that continuously sends all output from the above streaming ETL query directly into Elasticsearch:

 CREATE SINK CONNECTOR es_sink WITH (
  'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
  'key.converter'   = 'org.apache.kafka.connect.storage.StringConverter',
  'topics'          = 'clicks_transformed',
  'key.ignore'      = 'true',
  'schema.ignore'   = 'true',
  'type.name'       = '',
  'connection.url'  = 'https://elasticsearch:9200');

Join the Community

For user help, questions or queries about ksqlDB please use our user Google Group or our public Slack channel #ksqldb in Confluent Community Slack. Everyone is welcome!

You can get help, learn how to contribute to ksqlDB, and find the latest news by connecting with the Confluent community.

For more general questions about the Confluent Platform please post in the Confluent Google group.

Contributing

Contributions to the code, examples, documentation, etc. are very much appreciated.

Report issues and bugs directly in this GitHub project.
Learn how to work with the ksqlDB source code, including building and testing ksqlDB as well as contributing code changes to ksqlDB by reading our Development and Contribution guidelines.
One good way to get started is by tackling a newbie issue.

License

The project is licensed under the Confluent Community License.

Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation.

Name		Name	Last commit message	Last commit date
Latest commit History 8,915 Commits
.github		.github
.mvn/wrapper		.mvn/wrapper
bin		bin
build-tools		build-tools
checkstyle		checkstyle
config		config
debian		debian
design-proposals		design-proposals
docs		docs
ext		ext
findbugs		findbugs
js		js
ksqldb-api-client		ksqldb-api-client
ksqldb-api-reactive-streams-tck		ksqldb-api-reactive-streams-tck
ksqldb-benchmark		ksqldb-benchmark
ksqldb-cli		ksqldb-cli
ksqldb-common		ksqldb-common
ksqldb-console-scripts		ksqldb-console-scripts
ksqldb-docker		ksqldb-docker
ksqldb-engine-common		ksqldb-engine-common
ksqldb-engine		ksqldb-engine
ksqldb-etc		ksqldb-etc
ksqldb-examples		ksqldb-examples
ksqldb-execution		ksqldb-execution
ksqldb-functional-tests		ksqldb-functional-tests
ksqldb-metastore		ksqldb-metastore
ksqldb-package		ksqldb-package
ksqldb-parser		ksqldb-parser
ksqldb-rest-app		ksqldb-rest-app
ksqldb-rest-client		ksqldb-rest-client
ksqldb-rest-model		ksqldb-rest-model
ksqldb-rocksdb-config-setter		ksqldb-rocksdb-config-setter
ksqldb-serde		ksqldb-serde
ksqldb-streams		ksqldb-streams
ksqldb-test-util		ksqldb-test-util
ksqldb-tools		ksqldb-tools
ksqldb-udf-quickstart		ksqldb-udf-quickstart
ksqldb-udf		ksqldb-udf
ksqldb-version-metrics-client		ksqldb-version-metrics-client
licenses		licenses
notices		notices
scripts/changelog		scripts/changelog
.env		.env
.github_changelog_generator		.github_changelog_generator
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
LICENSE-ConfluentCommunity		LICENSE-ConfluentCommunity
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
commitlint.config.js		commitlint.config.js
docker-compose.yml		docker-compose.yml
down		down
ksql-rocket.png		ksql-rocket.png
mkdocs.yml		mkdocs.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
package-lock.json		package-lock.json
package.json		package.json
pom.xml		pom.xml
screencast.jpg		screencast.jpg
up		up

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

ksqlDB

The database purpose-built for stream processing applications

Overview

Getting Started

Documentation

Use Cases and Examples

Materialized views

Streaming ETL

Anomaly Detection

Monitoring

Integration with External Data Sources and Sinks

Join the Community

Contributing

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

Farzad-Jalali/ksql

Folders and files

Latest commit

History

Repository files navigation

ksqlDB

The database purpose-built for stream processing applications

Overview

Getting Started

Documentation

Use Cases and Examples

Materialized views

Streaming ETL

Anomaly Detection

Monitoring

Integration with External Data Sources and Sinks

Join the Community

Contributing

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages