GitHub - tigerquoll/datacollector: StreamSets DataCollector - Continuous big data ingest infrastructure

What is StreamSets Data Collector?

StreamSets Data Collector is an enterprise grade, open source, continuous big data ingestion infrastructure. It has an advanced and easy to use User Interface that lets data scientists, developers and data infrastructure teams easily create data pipelines in a fraction of the time typically required to create complex ingest scenarios. Out of the box, StreamSets Data Collector reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra and many others. You can use Python, Javascript and Java Expression Language in addition to a large number of pre-built stages to transform and process the data on the fly. For fault tolerance and scale out, you can setup data pipelines in cluster mode and perform fine grained monitoring at every stage of the pipeline.

To learn more, check out https://streamsets.com

License

StreamSets Data Collector is built on open source technologies, our code is licensed with the Apache License 2.0.

Getting Help

A good place to start is to check out https://streamsets.com/community. On that page you will find all the ways you can reach us and channels our team monitors. You can post questions on Google Groups sdc-user or on StackExchange using the tag #StreamSets. Post bugs at https://issues.streamsets.com or tweet at us with #StreamSets.

If you need help with production systems, you can check out the variety of support options offered on our support page.

Contributing code

We welcome contributors, please check out our guidelines to get started.

Changelog

See the latest changelog

Name		Name	Last commit message	Last commit date
Latest commit History 3,296 Commits
apache-kafka_0_8_1-lib		apache-kafka_0_8_1-lib
apache-kafka_0_8_2-lib		apache-kafka_0_8_2-lib
apache-kafka_0_9-lib		apache-kafka_0_9-lib
apache-kudu-0_7-lib		apache-kudu-0_7-lib
aws-lib		aws-lib
basic-lib-fake-java7		basic-lib-fake-java7
basic-lib		basic-lib
bootstrap		bootstrap
cassandra-protolib		cassandra-protolib
cassandra_2-lib		cassandra_2-lib
cdh_5_2-lib		cdh_5_2-lib
cdh_5_3-lib		cdh_5_3-lib
cdh_5_4-cluster-cdh_kafka_1_2-lib		cdh_5_4-cluster-cdh_kafka_1_2-lib
cdh_5_4-cluster-cdh_kafka_1_3-lib		cdh_5_4-cluster-cdh_kafka_1_3-lib
cdh_5_4-lib		cdh_5_4-lib
cdh_5_5-cluster-cdh_kafka_1_3-lib		cdh_5_5-cluster-cdh_kafka_1_3-lib
cdh_5_5-lib		cdh_5_5-lib
cdh_kafka_1_2-lib		cdh_kafka_1_2-lib
cdh_kafka_1_3-lib		cdh_kafka_1_3-lib
cli		cli
cloudera-integration		cloudera-integration
cluster-common		cluster-common
cluster-hdfs-protolib		cluster-hdfs-protolib
cluster-kafka-protolib		cluster-kafka-protolib
common-ui		common-ui
common		common
commonlib		commonlib
container-common		container-common
container		container
datacollector-ui		datacollector-ui
dev-lib		dev-lib
dev-support		dev-support
dist		dist
docs		docs
e2e-tests		e2e-tests
elasticsearch-protolib		elasticsearch-protolib
elasticsearch_1_4-lib		elasticsearch_1_4-lib
elasticsearch_1_5-lib		elasticsearch_1_5-lib
elasticsearch_1_6-lib		elasticsearch_1_6-lib
elasticsearch_1_7-lib		elasticsearch_1_7-lib
elasticsearch_2_0-lib		elasticsearch_2_0-lib
elasticsearch_2_1-lib		elasticsearch_2_1-lib
elasticsearch_2_2-lib		elasticsearch_2_2-lib
elasticsearch_2_3-lib		elasticsearch_2_3-lib
flume-protolib		flume-protolib
groovy-protolib		groovy-protolib
groovy_2_4-lib		groovy_2_4-lib
hbase-protolib		hbase-protolib
hdfs-protolib		hdfs-protolib
hdp_2_2-lib		hdp_2_2-lib
hdp_2_3-lib		hdp_2_3-lib
hive-protolib		hive-protolib
influxdb_0_9-lib		influxdb_0_9-lib
integration-testing		integration-testing
jdbc-lib		jdbc-lib
jms-lib		jms-lib
json-dto		json-dto
jython-protolib		jython-protolib
jython_2_7-lib		jython_2_7-lib
kafka-common		kafka-common
kafka_source-protolib		kafka_source-protolib
kafka_target-protolib		kafka_target-protolib
kudu-protolib		kudu-protolib
mapr_5_0-lib		mapr_5_0-lib
mapr_5_1-lib		mapr_5_1-lib
maprfs-protolib		maprfs-protolib
maprstreams-common		maprstreams-common
maprstreams-source-protolib		maprstreams-source-protolib
maprstreams-target-protolib		maprstreams-target-protolib
mesos-bootstrap		mesos-bootstrap
messaging-client		messaging-client
miniIT		miniIT
miniSDC		miniSDC
mongodb-protolib		mongodb-protolib
mongodb_3-lib		mongodb_3-lib
omniture-lib		omniture-lib
python/sdc-cli		python/sdc-cli
rabbitmq-lib		rabbitmq-lib
rbgen-maven-plugin		rbgen-maven-plugin
release		release
root-lib		root-lib
root-proto		root-proto
root		root
rpm		rpm
sdc-elasticsearch-api		sdc-elasticsearch-api
sdc-elasticsearch_1		sdc-elasticsearch_1
sdc-elasticsearch_2		sdc-elasticsearch_2
sdc-kafka-api		sdc-kafka-api
sdc-kafka_0_8		sdc-kafka_0_8
sdc-kafka_0_9-common		sdc-kafka_0_9-common
sdc-kafka_0_9		sdc-kafka_0_9
sdc-kafka_0_9_mapr_5_1		sdc-kafka_0_9_mapr_5_1
sdk		sdk
solr-protolib		solr-protolib
spark-bootstrap		spark-bootstrap
sso		sso
stage-lib-archetype		stage-lib-archetype
stats-lib		stats-lib
utils		utils
.gitignore		.gitignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is StreamSets Data Collector?

License

Getting Help

Contributing code

Changelog

About

Releases

Packages

Languages

License

tigerquoll/datacollector

Folders and files

Latest commit

History

Repository files navigation

What is StreamSets Data Collector?

License

Getting Help

Contributing code

Changelog

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages