Skip to content

Archival, search, and replay of Amazon Kinesis streams

License

Notifications You must be signed in to change notification settings

CrossfireCurt/watershed

 
 

Repository files navigation

Watershed

Archival-replay suite designed for AWS Kinesis and S3

###Watershed is a system of tools designed to:

  • Launch an EMR Cluster with Apache Drill and Pump
  • Configure Apache Drill and Hive to recognize S3 Buckets and Kinesis Streams
  • Expose an HTTP REST service which allows someone to query against Drill
  • Run SQL Query Results back into a Kinesis Stream

####Watershed includes the Pump REST Service:

  • which supports key-value compaction so only the last record for a key is replayed.
  • which supports bounded replay (one needn’t replay the full archive).
  • which supports filtered replay (only replay records matching some criteria).
  • which supports annotating records as they are replayed in order to alter consumer behavior, such as to force overwrite.
  • which, with consumer cooperation, provides some definition of eventual consistency with respect to records that arrive on a stream concurrently with a replay operation, without requiring this solution to mediate the flow of the stream.

###Prerequisites

  • Python3
  • Pip3
  • Java 7 (for development of Pump)

###Wiki Any information you need can be found in our wiki.

####Getting Started

####Using Watershed

####More Information

About

Archival, search, and replay of Amazon Kinesis streams

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 39.7%
  • Python 29.2%
  • Groovy 22.0%
  • Shell 4.8%
  • Ruby 4.3%