Tutorial: Installing Diffgram on Azure AKS

Pablo Estrada
Diffgram
Published in
7 min readMay 12, 2021

On this tutorial we will show you how to install Diffgram on an Azure AKS cluster. We will go from the cluster creation, DB creation all the way until the helm chart installation.

Let’s get started!

Creating and Setting Up an AKS Cluster

Start by going into the Azure Portal and search for “Kubernetes Service”, then click the + icon and click “Add Kubernetes Cluster”

Creating the kubernetes cluster

Select a resource group to attach your cluster to. You can create a new on if you like, on our example we added it to the “kuberenetesDiffgramResourceGroup”.

After that give your cluster a name, and then decide on which node types you want on your cluster as well as you node count. For this tutorial we will set the count to 3 and use the Standard DS2 v2 node type.

You may change this depending on the usage of the cluster, but this is a good starting point.

Setting up node count, node types and cluster details.

Click “Review+Create”, review all the information is correct, and then click create.

You will see a deployment in progress screen. Wait a bit until is ready and you will be able to access it on the Kubernetes Service dashboard on the azure portal.

Deployment in Progress Example

Connecting to the Cluster

Once your cluster has been deployed. Click on the cluster name on the list and the click the connect button at the top menu bar:

AKS dashboard for the created cluster.

You will see a series of commands that you will need to run on your host to be able to connect to the Kubernetes cluster.

We assume you already have installed kubectl, and the azure SDK.

Connection commands to the Kubernetes Cluster.

You can run a sample command to test that the connection was successful:

# List all deployments in all namespaces
kubectl get deployments — all-namespaces=true

Finally Install the NGINX ingress controller:

kubectl create namespace ingress-basic# Add the ingress-nginx repository
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
# Use Helm to deploy an NGINX ingress controller
helm install nginx-ingress ingress-nginx/ingress-nginx \ --namespace ingress-basic \ --set controller.replicaCount=2 \ --set controller.nodeSelector."beta\.kubernetes\.io/os"=linux \ --set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux \ --set controller.admissionWebhooks.patch.nodeSelector."beta\.kubernetes\.io/os"=linux

For more info visit: https://docs.microsoft.com/en-us/azure/aks/ingress-basic

Now you have your kubernetes cluster ready to go!

Setting Up Managed DB Service (Recommended)

Diffgram defaults to using PostgresSQL as its database.

Note: If you already have a working Postgres DB and are ok to reuse it you can skip this step.

We always suggest to have a managed SQL service such as Azure SQL to have all the DB maintenance work done for you.

To setup the DB simply go to the Azure Portal and search for “Azure Database for PostgreSQL servers”. You will see a screen similar to this one:

Azure PostgresSQL Managed Servers

Recommend starting with Single Server — For Large Use Cases consider Hyperscale.

Click on “new” and select Single Server:

Azure Diffgram Database Selection

Similar to the AKS cluster, give your DB server a name and add it to a Resource Group. Select an appropriate DB instance size and set a password to connect into it.

Then click “Review+Create”

Setting the params for the Postgresql DB server.

Once created, you’ll be able to connect to it by clicking your new DB server on the Azure Dashboard and then going to the connection string section:

Examples of connection string screen

You will be able to see a handful of connection string that you can use to connect to your database.

Key Note: In this Azure database context for Diffgram we don’t copy and paste the entire connection string. (This is distinctly different from the storage context in which we do.)

Instead, please extract the following values from the first string line:

  • host IP
  • username
  • password

Later, you will use these values to configure the diffgram installation.

Advanced case note: The connection string options are also needed to access your database directly if you wish to directly inspect or query the information Diffgram is storing. This is entirely optional and for advanced cases.

Allowing DB Access to Azure Resources

Once you have the DB created, click it and on the connection security settings toggle the “Allow Access To Azure Services” to “Yes”.

Installing Diffgram on the Kubernetes Cluster

To setup Diffgram in the kubernetes cluster, first download the helm chart from our Git Repo:

git clone https://github.com/diffgram/diffgram-helm.git
cd diffgram-helm

Install some dependencies for the helm chart. This is mainly for the TLS certificates manager:

helm repo add jetstack https://charts.jetstack.io
helm install cert-manager --namespace default jetstack/cert-manager --set installCRDs=true

Now we’ll have to set some of the values of the values.yaml file:

Open the values.yaml on a text editor and search for the following keys:

  • diffgramVersion: Set this to the diffgramVersion you want to use. You can leave “latest” if you want to use the latest version.
  • diffgramEdition: Can be either ‘opencore’ or ‘enterprise’. For this tutorial set it to ‘opencore’ if you want to get the enterprise version, please contact us!
  • diffgramDomain: You need to set this to a domain you own and have acces to the DNS servers. You will also need to create a DNS record to make this domain point to your AKS cluster.
  • useCertManager: If you want TLS on your diffgram instance, you can use the certificate manager to automatically generate certificates for you. Set it to true if you want that.
  • dbProvider: Set this to “azure
  • azureSQLEndpoint: This should point to your Postgres server, if you have your DB on azure the URL is similar to this: yourServerName.postgres.database.azure.com
  • dbUser: The postgres DB username.
  • dbName: The postgres DB name. Make sure the DB is empty. If it does not exists Diffgram will make sure to create it.
  • dbPassword: The postgres DB password.
  • DIFFGRAM_STATIC_STORAGE_PROVIDER: The cloud provider used for static files. Either ‘azure’, ‘gcp’ or ‘aws’. For our example we’ll set it to ‘azure’.
  • DIFFGRAM_AZURE_CONNECTION_STRING: The connection string to your blob storage account. This is for the management of the static files.
  • DIFFGRAM_AZURE_CONTAINER_NAME: container name for azure blob storage.
  • ML__DIFFGRAM_AZURE_CONTAINER_NAME: container name for azure blob storage on ML related data. You can set it to the same container as DIFFGRAM_AZURE_CONTAINER_NAME.

There are other parameters you can tweak for the Diffgram installation such as the pods CPU, RAM and Memory. If you want to read more about it visit our github repo. For now, once you have this parameters with the right values you can run:

helm install diffgram . -f values.yaml

And diffgram will install on you kubernetes cluster!

Accessing Diffgram

To access Diffgram you have to use your cluster’s public IP.

Go to the azure Dasbhoard, then to the kubernetes services and click on your cluster. Then click on “Services and Ingresses”. We need to access Diffgram via the NGINX ingress created by the helm chart.

Nginx Ingress Controller External IP

Search for the row that has the name: “nginx-ingress-ingress-nginx-controller”. You will notice it has an IP on the column that says “External IP” visit that IP address and you will be able to access your Diffgram instance :)

Domain Name Step (Optional)

If you are using a domain name. Make sure to point your domain to the NGINX ingress controller IP address so users can access your Diffgram instance using your domain.

Congrats!

Congratulations! You’ve setup a AKS Cluster, Postgres DB and installed Diffgram on top of them to have a fully working Diffgram installation inside of Kubernetes.

Support

For any help or support please contact us or create an issue on github.

About Diffgram

Diffgram is open source annotation and training data software. Diffgram is the best tool for data labeling for machine learning. Learn more about Diffgram features here:

--

--