In this repo, you will learn about some of the AKS features that make it easier for developers to deploy that code without having to worry too much about infrastructure. Begin by cloning the repository.
⚠️ You will need to fork this repo and clone your fork in order to complete the steps requiring updating the GitHub(GH) repository and using GH Actions for deployment. I recommend you do this and clone your repo instead of the repo used in the command below.
git clone --recurse-submodules https://github.com/mosabami/SuperAKS-App
cd SuperAKS-App
In this workshop, you are a developer who created the POC of a promotional app for your organization that showcases one of their ML API. The app includes a ML model that identifies the location of faces in pictures provided to it via URL. You have written the code and would love to deploy it online. You have chosen to deploy it to a k8s cluster. You have heard AKS is the best place for kubernetes. You decide to try it out yourself.
AKS has a lot of amazing features that makes software development and delivery very easy. It takes care of a lot of the infrastructure security related heavy lifting for you. In this workshop we will discuss the following features of AKS:
- Bicep IaC to make it easier and quicker to deploy AKS and its supporting resources in a reproducible way
- Azure AD for authentication so you don't have to manage that yourself
- Azure RBAC integration with AKS for authorization
- Azure Key vault integration and the CSI driver for easy secrets management
- AKS persistent volume & persistent volume claim provisioning Azure file resources dynamically
- Bridge to Kubernetes which allows you to test & debug individual microservices against other services already running on AKS
- Developer tools for AKS and the Draft tool to automatically create Kubernetes manifest files
- Automated Deployments feature on AKS to automatically create GitHub Action pipelines and provide federated identity to the repo so that it can be used to deploy changes to AKS
- Azure container registry integration for storage in a High availability registry as well as image security
- AKS workload identity (preview) which makes it easy to assign identities to individual pods in your cluster for better security. It can be integrated with various identity providers
- AKS CNI overlay (preview) so you don't have to worry about pod IP exhaustion
- Horizontal pod autoscaler
- Cluster autoscaler for easy and quick scaling of your application
- Azure load testing (preview) to test scalability of your application
If there are other AKS features you'd like to see here that help with developer productivity, please create an issue. PRs are also welcome!
Check out the Prerequisites page for information on this.
Introduce a bug in the code so that we can correct it later
⚠️ If you are using a mac you will need to change the command tosed -i '' "s/bug = false/bug = true/" smartbrain/smartbrain/smartbrainapi/server.js
.
💡 If these sed commands don't work for any reason or if you don't have sed installed, you will need to update these files manually by replacing the placeholders in the files mentioned below.
sed -i "s/bug = false/bug = true/" smartbrain/smartbrain/smartbrainapi/server.js
If you have docker desktop install and started on your computer, and you have some experience with docker-compose you can run the application on your local computer.
Run the application using docker-compose.
cd smartbrain/smartbrain
docker-compose up
You can access the website at port 3050 on your local computer using NGINX as an ingress controller. Check out the docker-compose.yaml file for more details. You can test the app by clicking on register and providing the required information.
You will notice that even though the app is able to detect the face, there is an error in the detection count. This is the error we intentionally added earlier. We fill fix this at a later step. Here is what the architecture of the app looks like
- The Nginx service acts as a load balancer routing traffic to the appropriate service based on config
- The Client Server is a React app that renders the front end of the app on a browser
- The API server receives requests from the front end and updates the postgres database with the user information including their email, password and how many times they have used to app to detect a face
- The ML service includes a machine learning model. It checks redis to see if the requested URL has been requested in the past. If so, it gets the coordinates of where the face is and returns it to the client. If it doesn't it will run the image through the model to detect coordinates of where the face is, store the coordinates in redis and returns the coordinates to the client
- Redis microservice runs a redis instance and stores the coordinates of where faces are previously detected pictures so that the ML service doesnt have to calculate it every time a request is made to it
- Postgres microservice runs a postgres instance that is used to store the user information
CD out of the app's folder.
cd ../..
Now that we have seen the app running locally, it is time to deploy it to AKS. There are preview features being used including workload identity and CNI overlay. You will need to ensure these features are enabled in your subscription before proceeding with the deployment.
AKS Construction (AKSC) is part of the AKS landing zone accelerator program and allows rapid development and deployment of secure AKS clusters and its supporting resources using IaC (mostly Bicep), Azure CLI and/or GitHub Actions. AKS Landing Zone Accelerator is a set of tools, resources and guidance that helps deploy and operationalize secure and scalable AKS and supporting services rapidly, AKS Construction helper being one of them. Check out the official docs for more information.
⚠️ It is very important to note that the AKS-LZA can be used to develop secure and compliant AKS clusters that are (almost) ready for production. However, many of the best practice guidance are not used in this implementation to facilitate easy learning and deployment in this workshop. Do not use this configuration for production workloads. To deploy a more secure environment, consider reading the AKS-LZA docs and/or deploy your environment using a configuration similar to this.
if you used the --recurse-submodules flag when cloning this repo, you wont need to clone the AKSC repository which has the bicep files our deployment depends on into the IaC folder.
cd IaC
# git clone https://github.com/Azure/AKS-Construction
Get the signed in user id so that you can get admin access to the cluster you create
SIGNEDINUSER=$(az ad signed-in-user show --query id --out tsv)
RGNAME=superapp
Create deployment
az group create -n $RGNAME -l EastUs
DEP=$(az deployment group create -g $RGNAME --parameters signedinuser=$SIGNEDINUSER -f main.bicep -o json)
💡 The DEP variable is very important and will be used in subsequent steps. You can save it by running
echo $DEP > test.json
and restore it by runningexport DEP=$(cat test.json)
Get required variables
KVNAME=$(echo $DEP | jq -r '.properties.outputs.kvAppName.value')
OIDCISSUERURL=$(echo $DEP | jq -r '.properties.outputs.aksOidcIssuerUrl.value')
AKSCLUSTER=$(echo $DEP | jq -r '.properties.outputs.aksClusterName.value')
SUPERAPPID=$(echo $DEP | jq -r '.properties.outputs.idsuperappClientId.value')
TENANTID=$(az account show --query tenantId -o tsv)
ACRNAME=$(az acr list -g $RGNAME --query [0].name -o tsv)
Log into AKS and deploy NGINX ingress.
az aks get-credentials -n $AKSCLUSTER -g $RGNAME --overwrite-existing
kubectl get nodes
Run the command in the official nginx docs page for Azure to install the nginx ingress controller that will be used later
Create the superapp namespace that will be required for future steps
kubectl create namespace superapp
cd out of IaC folder
cd ..
We will build images from source code and pull database images from Dockerhub. We will store these images in our container registry to stay in compliance with our policy to only use images in approved registry
Build front end image
cd smartbrain/smartbrain/smartbrainclient
az acr build -t client:v1 -r $ACRNAME --resource-group $RGNAME .
Build api image
cd ../smartbrainapi
az acr build -t server:v1 -r $ACRNAME --resource-group $RGNAME .
Build ML service image
cd ../smartbrainml
az acr build -t worker:v1 -r $ACRNAME --resource-group $RGNAME .
Import redis and postgres images from Dockerhub
az acr import --name $ACRNAME --source docker.io/library/redis:latest --resource-group $RGNAME
az acr import --name $ACRNAME --source docker.io/library/postgres:latest --resource-group $RGNAME
Verify that the 5 required images are in the container registry
az acr repository list --name $ACRNAME --resource-group $RGNAME
Draft is a tool that makes it easy to develop resources required to deploy applications to kubernetes. This includes the creation of Docker files, Kubernetes manifest files, Helm charts, Kustomize files, GitHub Action pipelines, etc. In this section, we will be showcasing the use of Draft to crete Kubernetes manifest files to speed up the creation of resources required to deploy to kubernetes using the Developer Tools for Azure Kubernetes Service extension. For more information about the features of this extension, check out this video that walks you through it later if you are a Microsoft employee or check out the demo repo otherwise.
⚠️ For the Draft and Developer Tools for AKS extension to work properly, you need to ensure your file path is not too long. Make sure you are working off a folder that doesn't have a long file path.
💡 At this point, you might want to switch to a different branch so that the changes you are about to make to the files don't carry on to your main branch allowing you to rerun this workshop in the future easily:
git checkout -b test
- Expand the smartbrain/smartbrain folder
- Right click on the "smartbrainml" folder on the left side of the screen in your repo within vs-code. Hover over "Run AKS DevX Tool" then click on "AKS Developer: Draft a Kubernetes Deployment and Service
- Choose the folder you want your manifest files to be saved in. In this case we will choose the fib-calculator folder which in which the worker folder is
- Click on "Manifests" in the resulting prompt
- Enter the application name, in this case it will be "worker-deployment" and hit enter
- Enter port "2000" and hit enter
- Choose the superapp namespace. This tool has used other extensions to connect to the k8s cluster and identify the available namespaces
- Choose Azure Container Registry
- Choose your Subscription where your ACR subscription is
- Choose the Resource Group where your ACR resides
- Choose your ACR
- Select your worker registry
- Select the v1 tag
AKS DevX tool will automatically create a draft of deployment and service manifest files ready for you to modify to suit your needs within the manifests folder in the folder you selected as your output folder. These files can then be updated to include required environment variables, specify resources and limits, proper labeling for deployment and service selectors, etc.
For the rest of this workshop however, we will use the manifest files already provided in the k8s folder in the root directory.
Change the deployment files to use the proper container registry names using sed commands.
⚠️ If you are using a mac you will need to change the command tosed -i '' "s/<ACR name>/$ACRNAME/" client-deployment.yaml
.
💡 If these sed commands don't work for any reason or if you don't have sed installed, you will need to update these files manually by replacing the placeholders in the files mentioned below.
⚠️ CD to the root directory before proceeding to the next steps
cd k8s
sed -i "s/<ACR name>/$ACRNAME/" client-deployment.yaml
sed -i "s/<ACR name>/$ACRNAME/" postgres-deployment.yaml
sed -i "s/<ACR name>/$ACRNAME/" redis-deployment.yaml
sed -i "s/<ACR name>/$ACRNAME/" api-deployment.yaml
sed -i "s/<ACR name>/$ACRNAME/" worker-deployment.yaml
Update the secret provider class file
sed -i "s/<identity clientID>/$SUPERAPPID/" secret-provider-class.yaml
sed -i "s/<kv name>/$KVNAME/" secret-provider-class.yaml
sed -i "s/<tenant ID>/$TENANTID/" secret-provider-class.yaml
Update the service account files. These service accounts are using workload identity federated identity.
sed -i "s/<identity clientID>/$SUPERAPPID/" svc-accounts.yaml
sed -i "s/<tenant ID>/$TENANTID/" svc-accounts.yaml
Deploy the resources into the superapp namespace.
kubectl apply -f .
💡 Since the API pod depends on the postgres pod being deployed, we included an init container in the API deployment file which will ensure that the postgres pod is ready before starting the api pod. Check out the API pod definition file to see how to use an init container.
So what have we done here? We are using workload identities. Workload identities is a soon to be released AKS feature that allows you to use any of various identity providers as the identity of your pod. In this case we are using Azure AD as the identity provider and using the AKS cluster as the OIDC issuer. In the future, you will be ale to use other identity providers as well. This identity will only be assigned to the pods that are using the service account attached to the identity. This way other pods within the same node wont have the same access. This is important for securing your workloads by providing minimum access. In this case, we are using the identity to get access to the Azure Keyvault. Only this identity and consequently the pods configured to use the identity will be able to pull secrets from it and get the postgres database password. Check out the postgres and server deployment yaml files as well as the svc accounts and secret provider class yaml files for more details.
But how was the workload identity deployed? Check the resources towards the end of the main.bicep file in the IaC folder as well as the workloadid.bicep file. The kvrbac.bicep file shows how the workload identity was granted access to keyvault to pull secrets as well as how the postgres password was created.
You can now access your application on a web browser (or postman) using the nginx ingress controller you deployed. You will need the ip address of the ingress controller's service.
kubectl get ingress -n superapp
💡 You might want to set context of your kubeconfig to use the superapp namespace by default so you dont have to add
-n superapp
at the end of every command. To do this, run the command:kubectl config set-context $(kubectl config current-context) --namespace=superapp
You should be able use the ip address as shown in the screenshot below. You can test the app by entering a number under 40, clicking submit and refreshing the page.
AKS makes it easy to monitor your applications using various tools including Prometheus, Grafana, and Azure monitor. In this workshop, we will be using container insights.
We tested the app using a single user accessing it using the website. But how do we ensure our application will hold when there are hundreds or thousands of users using it at once? We will use Azure load testing using a JMeter test script to test this. We will see if our cluster scales and learn how easy it is to enable scaling. For more general information about this check out the Scalability scenario on AKS-LZA.
Open Azure portal in two tabs. In the first one, navigate to container insights to see the usage of your cluster. AKS resource -> Insights (in the left blade under monitoring) -> "Containers" tab (the containers tab is found in the top middle of the page) -> Time range (which can be found at the top left no in the blade) -> last 30 minutes -> Apply
Here you should be able to see the usage of the pods over the last 30 minutes. Filter to only show pods in your node pool. "Add filter" -> Namespace -> superapp
- On the second Azure portal tab, create a new Azure load testing resource within the same resource group as your AKS cluster.
- Click on "Go to resource" and click "Create" under Upload a JMeter script
- Enter an appropriate test name and hit Next at the bottom
- Click on the folder close to the "Choose files" field, then navigate to folder where you cloned the repo (AKS-Superapp). Under that open the load testing folder and choose the "test-superapp.jmx" file.
- IMPORTANT: Click on the "Upload" button, then click Next
- You will need to enter some parameters. You can enter them as shown in the picture below:
- Move forward to the monitor tab and add the AKS cluster running the app
- Click "Review and Create" then click "Create"
- Watch Azure load testing run the test you should be able to see something similar to below indicating that some of the requests failed Heading over to the monitoring tab however you can see that the worker pod was heavily utilized. The "login" and "create" tests hit the api pod responsible for handling user related routes. This pod hits the postgres pod used to store user data. The "create" test hits the ML pod that determines the coordinates where the face is in the provided picture. Since the same picture is requested every time in the test, this pod hits the redis pod that has the coordinates cached. Container insights shows that even though the api pod is being highly utilized, the ML pod isnt (because it doesnt need to run the pictures through the model every time since the same picture being used for the test's coordinates has already been cached in redis).
To help the api pod with the load, we will deploy a horizontal pod autoscaler for the api deployment. cd to the ./loadtesting folder and deploy the horizontal pod autoscaler. Watch the pods to see if there are any new ones being created as we run the load test again
cd ../loadtesting
kubectl apply -f .
kubectl get pods -n superapp -w
Head back to the load test tab and rerun the test by clicking on the "Rerun" button in the results page. After a minute or so, after the test is completed, you will find that some new pods have been created but many of them are sitting in pending state
kubectl get pods -n superapp
Heading over to Azure monitor will show that the worker pods that were able to be scheduled are all fully utilized And this is happening in spite of the fact that the node itself is not fully utilized as showed in the nodes tab This is happening because the total requests of all the scheduled pods has reached the available cpu available in the node even though the requested CPUs are not being fully utilized by all the pods in the node. To avoid this, set your requests numbers in your deployment manifest files to a lower number. For the sake of this demo however, we will leave it as is.
To allow more worker pods to be scheduled, we will enable Cluster Autoscaler. Cluster autoscaler is an AKS feature that allows the k8s control plane create new nodes and add them to the cluster node pool so that your application can scale automatically without having to worry about that. Run the command below the enable cluster autoscaler. You can also enable it at the time of cluster creation or by updating the bicep deployment scripts and rerunning it.
- Run the following command to change the default cluster autoscaler profile (default values can be found in the AKS Cluster REST API documentation). These parameters enable a rather aggressive scale-down to avoid longer waiting times in this tutorial. Please be mindful when setting these values in your own cluster.
az aks update \ --resource-group $RGNAME \ --name $AKSCLUSTER \ --cluster-autoscaler-profile \ scale-down-unneeded-time=2m \ scale-down-utilization-threshold=0.8
- Enable autoscaling between a number of 1 and 5 nodes on your npuser01 node pool :
az aks nodepool update \ --resource-group $RGNAME \ --cluster-name $AKSCLUSTER \ --name npuser01 \ --enable-cluster-autoscaler \ --min-count 1 \ --max-count 5
Rerun the test again to see pods scheduled to help with the load but this time, let us increase the load. In the test run result screen click on "View all test runs" in the top right corner. Then click on "Configure" -> "Test", then head over to the "Parameters" tab. Update the threads to 50 and loops to 250. Click "Apply" at the bottom left. This will take you to your tests screen. Click on the test you just modified, and click "Run" at the top of the resulting page, then click "Run" to run the test. Click "Refresh" at the top left side of the screen then click on the run you just executed.
You will find that the test completed successfully. Entering kubectl get pods -n superapp after a minute will show pods in the pending state. Wait a couple of minutes and you will see a new node created. Check to see which pods are in the new node.
kubectl get pods -n superapp -o wide
After a couple of minutes, heading over to the "Nodes" tab of Azure monitor shows the new node and its utilization.
Lets test it again but this time with threads set to 300 and loops set to 450. You can also update the loadtesting/worker-hpa.yaml and server-hpa.yaml files to increase the maxReplicas
numbers.
This exercise shows one of the advantages of containers and kubernetes. You can optimize the utilization of your nodes (virtual machines) and scale various components of your application independently to increase and decrease based on the demand on that particular service. With AKS CNI overlay, you don't have to worry about IP exhaustion. The overlay network takes care of that for you. You can have max 250 pods in each node with CNI overlay and those IP addresses will be from a different IP space than your node (and virtual network).
Bridge to kubernetes is an amazing tool that allows developers debug and test their code by running their Microservice locally on their computer and having it connect to other microservices running in their kubernetes cluster. This way, they can test changes they make to their local microservice against the entire application already running on kubernetes. For more information about this, check out this video. For Bridge to work, you need to be able to run the application locally. We begin by installing the packages required to run the worker node express microservice locally.
⚠️ To begin you might have to set the context of kubeconfig to use the superapp namespace in order for bridge to kubernetes to work properly. You would also want to delete the api hpa and redeploy the worker deployment.
kubectl config set-context --current --namespace=superapp
kubectl delete -f api-hpa.yaml
cd ../k8s
kubectl delete -f api-deployment.yaml && kubectl apply -f api-deployment.yaml
-
CD to the directory that has your api server code
cd ../smartbrain/smartbrain/smartbrainapi/
-
Install the required packages
npm install
-
Open the command pallet (you can do this by shortcut ctrl + shift + p)
-
Type "bridge" and select "Bridge to Kubernetes: Configure"
-
Select the "api-service" service
-
Enter "5000" as the port
-
Select "Configure Bridge to Kubernetes without a launch configuration"
-
Select "No" since you are the only one working on this application. You should see a notification similar to below stating your configuration was successful.
-
Click on the "Connect to the cluster" button that pops up at the top of the screen and wait for the connection to be established
-
When you see the pop-up, click on "Continue" and then "Yes" to provide Bridge to kubernetes the required permission
-
Once the connection is complete, take note of the host address of the postgres-service. You might need it if you have service discovery issues
-
You can open the smartbrain\smartbrain\smartbrainapi\server.js file, put a break point on the last line of the script and click on the debug tab to the left
-
Click on the green Play button next to "Launch Program" at the top left side of the screen to begin debugging
-
Click on the "Continue" button of the debugger to complete the run
-
If you head to the "DEBUG CONSOLE" tab and see an error about connecting to postgres, it means you are having service discovery issues.
-
BUGFIX: To fix postgres connection error: In your smartbrain\smartbrain\smartbrainapi\server.js file, find the
connection
variable and replace host in the object with the host address of the postgres-service noted earlier -
Run the debugger and on the browser try detecting the face in the picture. You will that the detect count error persists ![incorrect rank]
-
Head to the smartbrain\smartbrain\smartbrainapi\server.js and fix the error by changing the variable, bug to false
-
Head back to the index file and run the debugger again
-
Refresh the browser, sign in and try detecting a face again. You will see that the detect count now works perperly.
-
You can stop the debugger. Once you are satisfied with the changes, you can disconnect from bridge to kubernetes by clicking on the Kubernetes status bar menu and clicking on "Disconnect current session". You can then head to your Terminal tab and hit any key to close the connection. You might have to delete the current api deployment and redeploy it again to restore connection to the worker pod running on AKS.
kubectl delete deployment api-deployment kubectl apply -f api-deployment.yaml
⚠️ Before you proceed, set the host value in the connection object back to pgHost (assuming your replaced that variable with the postgres IP address to fix the postgres connection error above)
- You can now push your changes to GitHub
git add . git commit -m "fix ranking error and update manifest files" git push origin test # or main assuming that is the branch you are using
Now that we have pushed our changes to GitHub, you can follow the instructions in this section of the demo repo to use the "Automated Deployment" feature on AKS.
You will be using the automated deployment to update the api deployment since that is service that has the bug that was fixed. When prompted to specify your folder ensure you are pointing to the "smartbrain\smartbrain\smartbrainapi" path.
Please note that this is a preview feature that currently has two bugs which you can easily fix to deploy your changes. You can follow the steps below after following the instructions above to deploy your workflow which will fail.
The first one is because DevHub hasn't been updated to use Kubelogin. Add:
- name: Set up kubelogin for non-interactive login
uses: azure/use-kubelogin@v1
with:
kubelogin-version: 'v0.0.24'
to the generated workflow in .github/workflows folder right before aks-set-context. You will also need to add the following to AKS set context (under with):
admin: 'false'
use-kubelogin: 'true'
Your pipeline file should look similar to the picture below:
The second bug only occurs in AKS clusters which use RBAC for authorization (which is the case in the cluster are using in this workshop). It occurs because the OIDC issuer which provides an identity to the GitHub deployment runner isnt providing sufficient permission to that identity.
- You will need to head to Azure portal and find the AKS cluster.
- Click on "Access Control (IAM)" on the left blade and then click on the "Role Assignments" tab at the top of the "Access Control (IAM)" window. If you scroll down you will see that the github workflow identity only has "Contributor" access to the cluster. Contributor access doesn't permit the runner to deploy to AKS for RBAC clusters. It will need to be granted "Azure Kubernetes Service RBAC Cluster Admin" access.
- Click on "Add" at the top left side of the screen then click on "Add role assignment".
- For role, search for and select "Azure Kubernetes Service RBAC Cluster Admin" then click "Next".
- Click "+ Select members" while User, group, or service principal radio option is selected. Search for "workflowapp" and pick the one that was provided contributor access to your cluster. You might want to choose all the workflowapp available to be sure for demo purposes.
- Click "Select" then click "Review + assign"
- Click "Review + assign" again and wait for the assignment to be complete
- Head back to your GH repository and rerun the GitHub Action which you can find in the "Actions" tab in GitHub. Select the failed run then click on "Re-run jobs" on the top right corner of the screen then click on "Re-run all jobs"
- Watch it pass this time. After the deployment has completed, head back to your browser and try running a calculation again by entering a number and hitting submit. You will that the result is now calculated correctly on the live site
As you have seen, AKS has lots of features that help with developer productivity by automating steps for you. These can help you get to production very quickly. There are many tools that were not covered in this workshop including the following:
- Open service mesh extension for AKS for workload security by helping limiting pod traffic allowing pods only talk to other pods they need to reducing attach surface
- Dapr extension for AKS, a portable event-driven runtime that simplifies building resilient, stateless, and stateful applications by abstracting away code required connect microservices to datastores, amongst other things
- Flux GitOps extension which makes it easy to deploy applications to your cluster using GitOps
- We talked about autoscaling your app using horizontal pod autoscaler. With KEDA addon for AKS, you can auto scale using other metrics and events
What AKS features for developer productivity would you like me to showcase next? Let me know by creating issues or commenting on existing ones.