Skip to content

Golang based Snowplow collector which forwards events to an SNS topic

Notifications You must be signed in to change notification settings

13scoobie/snowblower-1

Repository files navigation

Snowblower

A lightweight high-performance Golang Snowplow collector and enricher. Besides the language choice, Snowblower differs from the official Snowplow implementations the following ways:

  • Snowblower supports SNS/SQS as the intermediate data store between stages
  • Snowblower uses a JSON serialization for CollectorPayloads instead of Thrift.

It’d be rather trivial to add both a Kinesis stream as a destination for the collector as well as to support Thrift, at which point it would be a complete drop-in replacement for the Snowplow Scala Kinesis Collector. However, for our needs, SQS provides a pretty compelling solution.

Performance and Cost

In initial testing, the collector service requires between 10 and 20 times fewer front-end compute resources than the Scala-based Snowplow Kinesis collector, based on the observation that we scaled down from 24 c3.xlarge machines to 2 on our initial deployment. There are likely many reasons other than the langauge choice including:

  • Snowblower only ships collected payloads that have data. It ignores the large number of empty data requests generated by Snowplow trackers.
  • The Scala-based Kinesis collector is clearly marked as beta and likely not optimized.

On the other hand, the two c3.xlarge instances that replaced the Scala cluster handle a peak of over 350,000 requests per minute with an average latency at our load balancer of ~15ms and a CPU load of around 20%. We could scale back to one server, but we’ll likely experiment with smaller instances first.

On using SQS instead of Kinesis

One advantage to using SNS/SQS instead of Kinesis is that SQS scales transparently without explicit provisioning instruction.

Running

Snowblower has three commands:

  • collect Runs the collector, sending events to SNS or SQS, acting as the second stage in a Snowplow pipeline.
  • etl Pulls events from SQS, enriches them, and sends them into storage into Postgres or Redshift, acting as the third stage in a Snowplow pipeline.
  • precipitate Pulls events from Cloudfront logs recorded on S3 and sends them to SNS for future enrichment see: Setting up the Cloudfront collector.

Configuration

The following environment variables configure the operation of Snowblower when running the collector:

  • SNS_TOPIC Must contain the ARN of the SNS topic to send events to. REQUIRED
  • PORT Optionally sets the port that the server listens to. Defaults to 8080.
  • AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION Amazon Web Services credentials / config. If not set, Snowblower will attempt to use IAM Roles.
  • MONGO_URI The mongo connection string for the DB.
  • MONGO_DB The mongo DB to use.
  • MONGO_COLLECTION The mongo collection to save events to.
  • COOKIE_DOMAIN if not set, a domain won't be set on the session cookie

Installation

Quick install reference:

  • Install godep see: github.com/tools/godep
  • godep restore installs the package versions specified in Godeps/Godeps.json to your $GOPATH.
  • godep go install compiles and places snowblower binary in bin dir.

About

Golang based Snowplow collector which forwards events to an SNS topic

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages