Managing the cluster — Kubernetes in ACS
In my previous posts “Containerizing a .Net core application using Docker, ACS and kubernetes”, we have seen how to create a Kubernetes cluster using Azure Container Service and run our applications inside docker containers with in the cluster.
In this post we will look at few troubleshooting steps we can follow to fix our cluster in case of any error related to the azure resources.
Kubernetes cluster maintains the azure configuration in a azure.config file inside all the nodes. So if we want to change certain things like the Subscription under which all the ACS resources have been created in azure, this is the file you need to modify.
In order to access this file we need to ssh into our master node and we will be able to find the file from root location etc/kubernetes/azure.json
Step to find the file:
1. SSH into the master node using any SSH client.
2. login with your root credential : sudo su
3. cd etc/kubernetes
4. cat azure.json
The file contains the following details of our azure cluster:
tenantId, subscriptionId, aadClientId, aadClientSecret, resourceGroup, location, subnetName, securityGroupName, vnetName, routeTableName, primaryAvailabilitySetName.
Though all these properties are quite self explanatory, I will quickly go through all the above properties, so that we can get a better picture of what is going on under the hood between kubernetes and azure resources.
For resourceGroup, location, subnetName, securityGroupName, vnetName, routeTableName, primaryAvailabilitySetName, these are part of the resource group which was used to create the necessary resources for setting up the cluster.
For tenantId, subscriptionId the easiest way to get their values is to login to the azure portal and click on the help icon and then the show diagnostics link. Your browser will open up a json file which contains information about all the subscription that you own and the subsequent tenantIds for all the directories which you own and you are member of.
For the aadClientId and aadClientSecret we can find them inside the Azure Active Directory Applications or the Active Directory Service Principal which has been created along with the other resources.
In Azure Container Service, a Kubernetes cluster requires an Azure Active Directory service principal to interact with Azure APIs. The service principal is needed to dynamically manage resources such as user-defined routes and the Layer 4 Azure Load Balancer.
For more detailed information about Azure Active Directory Service Principal please visit here.
Lets take a look at the Azure Active Directory Service Principal which we created for our cluster.
In our case we created an active directory application named ACS_Kubernetes, so the aadClientId is the ApplicationId which we can see above, and for the aadClientSecret we will need to create a key for our application and copy the key value.
Note: Once the key is generated, we need to copy the value, its unrecoverable after that.
Last thing to check is the permission of the app in the subscription. For that navigate to Subscriptions and select the subscription which you have used to create the cluster. Now we will have to give new permission of role: Contributor or higher to the app.
In our case I have given the permission as owner just for the sake of debugging, you can give role contributor that will be sufficient for the Active Directory application to communicate with the subscription resources.
So in any case if we need to migrate our Kubernetes cluster and all other resources it uses to any other subscription or suppose we have deleted anything accidentally from our portal we can recreate those items and update their values in the azure.json file to fix any problem in the cluster.
Note: we need to change and restart the node for the changes to take effect and for the kubernetes cluster to communicate with the azure resources.
For the kubernetes agent nodes we have to repeat the same process by individually accessing the azure.json file form the same location as in the master node and restart the nodes.
This is not a good solution for a cluster configuration with multiple agent nodes which is a very common case, but kubernetes has not provided any option as of now (I may not be sure whether this feature is present or not) to dynamically update the azure.json file to multiple nodes using one command.
Misconfigured Service Principal:
If your Service Principal is misconfigured, none of the Kubernetes components will come up in a healthy manner, it will lead to this kind of error message:
“failed to get external ID from cloud provider”
Misconfigured Azure resources:
If your azure resources and the subscriptionId or the TenantId is not specified properly the kubernetes cluster will not be able to find the resources in your azure account. Which leads to the following error message:
“failed to get instances from cloud provider”
“failed to get instance ID from cloud provider”..
Hope this post will help you to quickly debug your cluster in case of any such errors. If you think this post is useful do share this with your friends or colleagues who ever is planning to work with kubernetes.
Till then happy troubleshooting :)