This guide outlines how to use helm to deploy and manage the Analysis Engine (AE) on kubernetes (tested on 1.13.3
).
It requires the following steps are done before getting started:
- Access to a running Kubernetes cluster
- Helm is installed
- A valid account for IEX Cloud
- A valid account for Tradier
- Optional - Install Ceph Cluster for Persistent Storage Support
- Optional - Install the Stock Analysis Engine for Local Development Outside of Kubernetes
AE builds multiple helm charts that are hosted on a local helm repository, and everything runs within the ae
kubernetes namespace.
Please change to the ./helm
directory:
cd helm
This will build all the AE charts, download stable/redis and stable/minio, and ensure the local helm server is running:
./build.sh
Each AE chart supports attributes for connecting to a:
Depending on your environment, these services may require you to edit the associated helm chart's values.yaml file(s) before starting everything with the start.sh script to deploy AE.
Below are some of the common integration questions on how to configure each one (hopefully) for your environment:
The start.sh
script installs the stable/redis chart with the included ./redis/values.yaml for configuring as needed before the start script boots up the included Bitnami Redis cluster
The start.sh
script installs the stable/minio chart with the included ./minio/values.yaml for configuring as needed before the start script boots up the included Minio
Each of the AE charts can be configured prior to running the stack's core AE chart found in:
Please set your AWS credentials (which will be installed as kubernetes secrets) in the file:
Data collection is broken up into three categories of jobs: intraday, daily and weekly data to collect. Intraday data collection is built to be fast and pull data that changes often vs weekly data that is mostly static and expensive for IEX Cloud
users. These chart jobs are intended to be used with cron jobs that fire work into the AE workers which compress + cache the pricing data for algorithms and backtesting.
Set your
IEX Cloud
account up in each chart:Supported IEX Cloud Attributes
# IEX Cloud # https://iexcloud.io/docs/api/ iex: addToSecrets: true secretName: ae.k8.iex.<intraday|daily|weekly> # Publishable Token: token: "" # Secret Token: secretToken: "" apiVersion: beta
Set your
Tradier
account up in each chart:Supported Tradier Attributes
# Tradier # https://developer.tradier.com/documentation tradier: addToSecrets: true secretName: ae.k8.tradier.<intraday|daily|weekly> token: "" apiFQDN: api.tradier.com dataFQDN: sandbox.tradier.com streamFQDN: sandbox.tradier.com
-
- Set the intraday.tickers to a comma-delimited list of tickers to pull per minute.
-
- Set the daily.tickers to a comma-delimited list of tickers to pull at the end of each trading day.
-
- Set the weekly.tickers to a comma-delimited list of tickers to pull every week. This is used for pulling "quota-expensive" data that does not change often like
IEX Financials or Earnings
data every week.
- Set the weekly.tickers to a comma-delimited list of tickers to pull every week. This is used for pulling "quota-expensive" data that does not change often like
Please set your Jupyter login password that works with a browser:
jupyter: password: admin
By default, Jupyter is hosted with nginx-ingress with TLS encryption at:
Default login password is:
- password:
admin
By default, Minio is hosted with nginx-ingress with TLS encryption at:
Default login credentials are:
- Access Key:
trexaccesskey
- Secret Key:
trex123321
The AE pods are using a Distributed Ceph Cluster for persistenting data outside kubernetes with ~300 GB
of disk space.
To set your kubernetes cluster StorageClass to use the ceph-rbd use the script:
./set-storage-class.sh ceph-rbd
By default the AE charts use the Stock Analysis Engine container, and here is how to set up each AE component chart to use a private docker image in a private docker registry (for building your own algos in-house).
Each of the AE charts values.yaml files contain two required sections for deploying from a private docker registry.
Set the Private Docker Registry Authentication values in each chart
Please set the registry address, secret name and docker config json for authentication using this format.
Note
The
imagePullSecrets
attribute uses a naming convention format:<base key>.<component name>
. The base isae.docker.creds.
and the approach allows different docker images for each component (for testing) like intraday data collection vs running a backup job or even hosting jupyter.Supported Private Docker Registry Authentication Attributes
registry: addToSecrets: true address: <FQDN to docker registry>:<PORT registry uses a default port 5000> imagePullSecrets: ae.docker.creds.<core|backtester|backup|intraday|daily|weekly|jupyter> dockerConfigJSON: '{"auths":{"<FQDN>:<PORT>":{"Username":"username","Password":"password","Email":""}}}'
Set the AE Component's docker image name, tag, pullPolicy and private flag
Please set the registry address, secret name and docker config json for authentication using this format.
Supported Private Docker Image Attributes per AE Component
image: private: true name: YOUR_IMAGE_NAME_HERE tag: latest pullPolicy: Always
This command can take a few minutes to download and start all the components:
./start.sh
If you do not want to use start.sh
you can start the charts with helm using:
helm install \ --name=ae \ ./ae \ --namespace=ae \ -f ./ae/values.yaml
helm install \ --name=ae-redis \ stable/redis \ --namespace=ae \ -f ./redis/values.yaml
helm install \ --name=ae-minio \ stable/minio \ --namespace=ae \ -f ./minio/values.yaml
helm install \ --name=ae-jupyter \ ./ae-jupyter \ --namespace=ae \ -f ./ae-jupyter/values.yaml
helm install \ --name=ae-backup \ ./ae-backup \ --namespace=ae \ -f ./ae-backup/values.yaml
helm install \ --name=ae-intraday \ ./ae-intraday \ --namespace=ae \ -f ./ae-intraday/values.yaml
helm install \ --name=ae-daily \ ./ae-daily \ --namespace=ae \ -f ./ae-daily/values.yaml
helm install \ --name=ae-weekly \ ./ae-weekly \ --namespace=ae \ -f ./ae-weekly/values.yaml
./show-pods.sh ------------------------------------ getting pods in ae: kubectl get pods -n ae NAME READY STATUS RESTARTS AGE ae-minio-55d56cf646-87znm 1/1 Running 0 3h30m ae-redis-master-0 1/1 Running 0 3h30m ae-redis-slave-68fd99b688-sn875 1/1 Running 0 3h30m backtester-5c9687c645-n6mmr 1/1 Running 0 4m22s engine-6bc677fc8f-8c65v 1/1 Running 0 4m22s engine-6bc677fc8f-mdmcw 1/1 Running 0 4m22s jupyter-64cf988d59-7s7hs 1/1 Running 0 4m21s
Once your ae-intraday/values.yaml
is ready, you can automate intraday data collection by using the helper script to start the helm release for ae-intraday
:
./run-intraday-job.sh <PATH_TO_VALUES_YAML>
And for a cron job, include the -r
argument to ensure the job is recreated.
./run-intraday-job.sh -r <PATH_TO_VALUES_YAML>
After data collection, you can view compressed data for a ticker within the redis cluster with:
./view-ticker-data-in-redis.sh TICKER
Once your ae-daily/values.yaml
is ready, you can automate daily data collection by using the helper script to start the helm release for ae-daily
:
./run-daily-job.sh <PATH_TO_VALUES_YAML>
And for a cron job, include the -r
argument to ensure the job is recreated.
./run-daily-job.sh -r <PATH_TO_VALUES_YAML>
Once your ae-weekly/values.yaml
is ready, you can automate weekly data collection by using the helper script to start the helm release for ae-weekly
:
./run-weekly-job.sh <PATH_TO_VALUES_YAML>
And for a cron job, include the -r
argument to ensure the job is recreated.
./run-weekly-job.sh -r <PATH_TO_VALUES_YAML>
Once your ae-backup/values.yaml
is ready, you can automate backing up your collected + compressed pricing data from within the redis cluster and publish it to AWS S3 with the helper script:
Warning
Please remember AWS S3 has usage costs. Please set only the tickers you need to backup before running the ae-backup job.
./run-backup-job.sh <PATH_TO_VALUES_YAML>
And for a cron job, include the -r
argument to ensure the job is recreated.
./run-backup-job.sh -r <PATH_TO_VALUES_YAML>
Add these lines to your cron with crontab -e
for automating data collection:
Pull Data Per Minute of each Trading Day
Note
This will pull data on holidays or closed trading days, but PR's welcomed!
Every minute M-F
between 9 AM and 5 PM (assuming system time is EST
)
# intraday job: # min hour day month dayofweek job script path job KUBECONFIG * 9-17 * * 1,2,3,4,5 /opt/sa/helm/cron/run-job.sh intra /opt/k8/config
Friday at 6:01 PM (assuming system time is EST
)
# daily job: # min hour day month dayofweek job script path job KUBECONFIG 1 18 * * 1,2,3,4,5 /opt/sa/helm/cron/run-job.sh daily /opt/k8/config
Friday at 7:01 PM (assuming system time is EST
)
# weekly job: # min hour day month dayofweek job script path job KUBECONFIG 1 19 * * 5 /opt/sa/helm/cron/run-job.sh weekly /opt/k8/config
Friday at 8:01 PM (assuming system time is EST
)
# backup job: # min hour day month dayofweek job script path job KUBECONFIG 1 20 * * 1,2,3,4,5 /opt/sa/helm/cron/run-job.sh backup /opt/k8/config
Friday at 8:01 PM (assuming system time is EST
)
# restore job: # on a server reboot (assuming your k8 cluster is running on just 1 host) @reboot /opt/sa/helm/cron/run-job.sh restore /opt/k8/config
Describe:
./describe-engine.sh
View Logs:
./logs-engine.sh
Describe:
./describe-intraday.sh
View Logs:
./logs-job-intraday.sh
Describe:
./describe-daily.sh
View Logs:
./logs-job-daily.sh
Describe:
./describe-weekly.sh
View Logs:
./logs-job-weekly.sh
Describe Pod:
./describe-jupyter.sh
View Logs:
./logs-jupyter.sh
View Service:
./describe-service-jupyter.sh
Jupyter uses the backtester pod to peform asynchronous processing like running an algo backtest. To debug this run:
Describe:
./describe-backtester.sh
View Logs:
./logs-backtester.sh
Describe:
./describe-minio.sh
Describe Service:
./describe-service-minio.sh
Describe Ingress:
./describe-ingress-minio.sh
Describe:
./describe-redis.sh
To stop AE run:
./stop.sh
And if you really, really want to permanently delete ae-minio
and ae-redis
run:
Warning
Running this can delete cached pricing data. Please be careful.
./stop.sh -f