gke/airflow walkthrough

A guide for using CNDI to deploy a GitOps enabled Airflow cluster on Kubernetes in Google Cloud Platform

overview 🔭

This walkthough uses cndi to customize and deploy our gke/airflow Template. In just a few minutes we will be able to deploy a new Kubernetes cluster to GKE that has been optimally configured for Airflow, including GitOps with Secrets management, TLS and High Availibility right out-of-the-box. This framework will enable quick iteration of infrastructure, applications and manifests in a GitHub workflow you are already comfortable with.

prerequisites ✅

You will need the following things to get up and running with cndi successfully:

A GCP account and a GCP project: cndi will deploy infrastructure into a Google Cloud Project connected to a valid billing account.
Your GCP service account credentials: cndi will leverage a Google Cloud Service Account using a service-account-key.json credentials file to deploy resources.
A Domain Name: Because the gke/airflow template sets up TLS certificates, we need to have a domain on which to apply them. We also need access to the domain registrar so we can add a couple A records there for our cluster Ingresses.
(Optional if you dont have an domain name) Here's a guide of how to connect to your Google Kubernetes Cluster once its deployed and Port Forward Argocd and the Airflow Web Server
A GitHub account: cndi helps you manage the state of your infrastructure using a GitOps workflow, so you'll need a GitHub account with a valid GitHub Personal Access Token.
Here's a guide of how to set up your Google Cloud account including roles and permissions

download cndi ⬇️

Run the following command within your terminal to download and install cndi:

# this will download the correct binary for your OS
curl -fsSL https://raw.githubusercontent.com/polyseam/cndi/main/install.sh | sh

create your cndi repository 📂

CNDI is designed around a GitOps workflow, so all of your cluster configuration and infrastructure will be stored as code within a git repo, let's create that now!

gh repo create my-cndi-cluster --private --clone && cd my-cndi-cluster

creating cluster config with cndi templates using the interactive cli 🛠️

Now that we have a repo, let's use cndi to generate all of our Infrastructure as Code and Cluster Configuration:

cndi init --interactive

You will get an interactive prompt where you'll name your project, then one to specify the CNDI template you want.

For this project select the gke/airflow Template.

? Pick a template
   ec2/basic
   gke/basic
   avm/basic
   ec2/airflow
   avm/airflow
 ❯ gke/airflow

Below is the list of all of the interactive prompt values that should be supplied for this project:

Cndi Project Name: name of project
Template: list of templates to choose from

GitHub Username: a user's handle on GitHub.
GitHub Repository URL: the url for the GitHub repository that will hold all cluster configuration
GitHub Personal Access Token: the access token CNDI will use to access your repo for cluster creation and synchronization

GCP Region: region where the infastructure is being created
Path to GCP service account key json: path to JSON credentials file for GCP Service Account

Git Username for Airflow DAG Storage: a user's handle on GitHub used to synchronize Airflow DAGs
Git Password for Airflow DAG Storage: a personal access token used to synchronize Airflow DAGs
Git Repo for Airflow DAG Storage: url for repo where your Airflow DAGs will be stored

Domain name you want ArgoCD to be accessible on: domain where ArgoCD will be hosted
Domain name you want Airflow to be accessible on: domain where Airflow will be hosted

Email address you want to use for lets encrypt: an email for lets encrypt to use when generating certificates
Username you want to use for airflow cnpg database: username you want to use for airflow database
Password you want to use for airflow cnpg database: password you want to use for airflow database
Name of the postgresql database you want to use for airflow cnpg database: name of the postgresql database you want to use for airflow cnpg database

This process will generate a cndi_config.yaml file, and cndi directory at the root of your repository containing all the necessary cluster and infrastructure resources. It will also generate a .env file that will be used to store sensitive information that we don't want to commit to our repository as source code.

The structure of the generated CNDI project will be as follows:

├── 📁 cndi
│   ├── 📁 cluster_manifests
│   │   ├── 📁 applications
│   │   │   └── airflow.application.yaml
│   │   ├── argo-ingress.yaml
│   │   ├── cert-manager-cluster-issuer.yaml
│   │   └── git-credentials-secret.yaml
│   └── 📁 terraform
│       ├── aks_cluster_airflow_nodes.tf.json
│       └── etc 
├── cndi_config.yaml
├── .env
├── .gitignore
├── .github
└── README.md

For a breakdown of all of these files, checkout the outputs section of the repo's main README.

upload environment variables to GitHub ⬆️

GitHub Actions is responsible for calling the cndi run command to deploy our cluster, so it is important that our secrets are available in the actions runtime. However we don't want these to be visible in our source code, so we will use GitHub Actions Secrets to store them. The gh CLI makes this very easy.

gh secret set -f .env
# if this does not complete the first time, try running it again!

deploy your templated cluster configration 🚀

Once all the config is created and environment variables are uploaded to GitHub, add, commit and push the config to your GitHub repository:

git add .
git status # take a quick look and make sure these are all files you want to push
git commit -m "initial commit"
git push --set-upstream origin main

You should now see the cluster configuration has been uploaded to GitHub:

Now, open your web browser and navigate to your project on GitHub. Click on the Actions tab, then click on the job that was triggered from your latest commit.

You will see something like the image below, which shows that GitHub has successfully run the workflow.

It is common for cndi run to take a fair amount of time, as is the case with most Terraform and cloud infrastructure deployments.

Once cndi run has been completed, at the end of the run will be a link to resource group, where you can view resources deployed by CNDI for this project.

attach the load balancer to your domain 🌐

At the end of the cndi run there is also an output called public host, which is the IP address (A record) of the load Balancer thats attached to your GKE instances.

Copy public host
Go to your custom domain,
Create an A record to route traffic to the load balancer IP address public host for Airflow and Argocd at the domain you provided.

(Optional if you dont have an domain name) Here's a guide of how to connect to your Google Kubernetes Cluster once its deployed and Port Forward Argocd and the Airflow Web Server

Wait 2 to 5 mins to open the domain name you've assigned for ArgoCD in your browser in order to see the Argocd UI Login page.

To log in, use the username admin and the password which is the value of the ARGOCD_ADMIN_PASSWORD in the .env located in your CNDI project folder

Notice that the cluster_manifests in the GitHub repository matches config in the ArgoCD UI

└── 📁 cndi
   └── 📁 cluster_manifests
       ├── 📁 applications
       |    ├── cnpg.application.yaml 
       |    └── airflow.application.yaml
       ├────── git-credentials-secret.yaml
       ├────── cert-manager-cluster-issuer.yaml
       └────── argo-ingress.yaml

Verify all applications and manifests in the GitHub repository are present and their status is healthy in the ArgoCD UI

verify that Airflow is accessible on the chosen domain 🧐

After setting up your Airflow application on the chosen domain, it is necessary to verify that Airflow is accessible. To do this, the user can simply go to the chosen domain and see if they can see Airflow's login page. The default username is admin and the password is admin. If the page is accessible, then the user can log in and begin using Airflow. If not, the user should go back and make sure the previous steps were done correctly.

Verify Airflow is connected to the private DAG repository 🧐

Verify that Airflow is connected to the private DAG repository. If correct, the private DAGs should be visible on the Airflow UI. If not,you should go back and make sure that the private DAG repository is properly connected to Airflow with the correct credentials:

and you are done! ⚡️

You now have a fully-configured 3-node Kubernetes cluster with TLS-enabled Airflow and ArgoCD.

modifying the cluster! 🛠️

To add another a node to the cluster:

Go to the cndi_config.yaml
In the infrastructure.cndi.nodes section, add a new airflow node and save the file
Run cndi ow
Commit changes
Push your code changes to the repository

destroying resources in the cluster! 💣

If you just want to take down any of your individual applications:

Delete that application or manifest from your cndi_config.yaml
Run cndi ow
Commit changes
Push your code changes to the repository

If you want to take down the entire cluster run:

cndi destroy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

airflow.md

airflow.md

gke/airflow walkthrough

overview 🔭

prerequisites ✅

download cndi ⬇️

create your cndi repository 📂

creating cluster config with cndi templates using the interactive cli 🛠️

upload environment variables to GitHub ⬆️

deploy your templated cluster configration 🚀

attach the load balancer to your domain 🌐

verify that Airflow is accessible on the chosen domain 🧐

Verify Airflow is connected to the private DAG repository 🧐

and you are done! ⚡️

modifying the cluster! 🛠️

destroying resources in the cluster! 💣

Files

airflow.md

Latest commit

History

airflow.md

File metadata and controls

gke/airflow walkthrough

overview 🔭

prerequisites ✅

download cndi ⬇️

create your cndi repository 📂

creating cluster config with cndi templates using the interactive cli 🛠️

upload environment variables to GitHub ⬆️

deploy your templated cluster configration 🚀

attach the load balancer to your domain 🌐

verify that Airflow is accessible on the chosen domain 🧐

Verify Airflow is connected to the private DAG repository 🧐

and you are done! ⚡️

modifying the cluster! 🛠️

destroying resources in the cluster! 💣