Automate GitHub Enterprise operational tasks where API is the key and Python is the glue

Bikash Karmakar
GSK Tech
Published in
5 min readNov 21, 2019

Background

A year ago, we started with self-hosted GitHub Enterprise on Azure cloud in GSK. Within a few months, the number of active users grew from 50 to 900+ with more than 750+ repositories and 100+ teams — a clear use case for it.

To increase collaboration, visibility, and self-service, we decided to have all users in a single organization with top-level teams for each line of business. The teams and organization have few users representing the line of business who have admin or elevated privileges to enable self-service.

However, this requires regular operational work for new users and leavers:

New Users - Add any new user to the organization and the top-level team in GitHub and provide new users with useful and mandatory information about the system usage as per GSK policies.

Leavers - There is no mechanism in place for us to get notified when someone leaves the company, so in the case of a user leaving the company, we have to manually suspend the user in GitHub to free up the license. Since we only have limited licenses, it becomes necessary to free up licenses allocated to dormant users (not active in the last 90 days) who do not require access anymore. We wanted to notify dormant users over email prior to suspension and manage exception requests.

With the number of users increasing, these regular and repetitive onboarding tasks were distracting us from our usual engineering activities. So, we attempted to automate the above operational tasks as much as possible to save effort and provide better service to our end users.

Here’s how:

GitHub Enterprise APIs were utilized to automate many of our manual operational tasks. We used Python 3 to implement our playbooks for New Users and Leavers. A playbook is meant to automate (fully or partially) or document a set of repeated and manual tasks. The reason for using Python is because of our familiarity with the language from previous projects and the various packages for JSON, LDAP & SMTP, etc which makes our lives easy. It is also an excellent language to act as a glue to integrate with various enterprise systems.

New user’s playbook

In the new user’s playbook implementation, the code was expected to perform the below tasks:

  1. Find new users in GitHub in the last 4 (configurable) days by using the Search-label API and iterate for each user as below.
  2. Check if the user is already a part of the GitHub organization using members-list API.
  3. If the user is not part of the GitHub organization then find the department of the user by querying the Active Directory.
  4. Add the user to the GitHub organization and team using add-or-update-team-membership API.
  5. Send an email to the new user with details of the FAQ and policies.

A sample Python code to call the Search API is as below. Note the API expects an HTTPS personal access token with appropriate scopes or permissions in the header.

try:
usersAPIUri = "https://<gitserver>/api/v3/search/users"
params = {
"q": "created:>="+str(createdDateSince),
"sort" : "joined"
}

headers = {'Authorization': 'token %s' % getGHEToken(), "Content-Type": "application/json" }
response = requests.get(usersAPIUri, params=params,headers=headers)print('API response code '+str(response.status_code))if response.status_code >= 200 and response.status_code <= 205
json_data = json.loads(response.text)
return json_data
else:
exit(1)
except requests.exceptions.HTTPError as errh :
print ("Http Error:",errh)
exit(1)

The add-team-members API was available only in preview mode and hence we need to provide custom media type “application/vnd.github.hellcat-preview+json” in the header.

try:

teamAPIUri = "https://<gitserver>/api/v3/teams/<github-team-id>/memberships/<userId>"

headers = {
'Authorization': 'token %s' % getGHEToken(),
'Content-Type': 'application/json',
'Accept' :'application/vnd.github.hellcat-preview+json'
}
response = requests.put(teamAPIUri,headers=headers)
print('API response code '+str(response.status_code))

if response.status_code != 200:
return False
else:
return True
except requests.exceptions.HTTPError as errh :
print ("Http Error:",errh)
exit(1)

Now, all we needed was to execute the above new joiners’ playbook continuously somewhere. Since we are part of the Cloud Hosting team, we decided to leverage Azure Cloud as much as possible and attempted to run the code as a docker container in our Azure Kubernetes Service.

We decided to use Python 3 Alpine Docker image as the base image because of its tiny size. Our Dockerfile executes crond as a foreground process. Our Python code for new joiner’s is invoked by the startjob script every 15 minutes.

FROM python:3.7-alpineCOPY requirements.txt /RUN pip install -r /requirements.txtCOPY cronscript.sh /etc/periodic/15min/startjobRUN chmod a+x /etc/periodic/15min/startjobRUN addgroup –S appgroup && adduser –S appuser –G appgroupUSER appuserCOPY src/ /appCMD [ "crond", "-l", "2", "-f" ]

One thing that we learned the hard way was not to put file extensions in the script names, so startjob will run but startjob.sh will not run as per FAQ on Alpine Linux

The docker image was built and published to Azure Container Registry (ACR). The deployment to Azure Kubernetes Service (AKS) was done using YAML file as below using Kubernetes secret for the ACR. ACR is a private container registry and has out of the box integration capability with AKS which made it easier for us to deploy the container directly to AKS. It is a good practice to specify limits and requests for your Kubernetes POD.

--
kind: Pod
apiVersion: v1
metadata:
name: cloudlink-app-pod
namespace: cloudops-devtest
spec:
containers:
- name: cloudlink-app-container
image: <ACR-url>/<container-name>:<tag>
resources:
limits:
cpu: 100m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
imagePullSecrets:
- name: <kubernetes-secret-ACR>

Monitoring and Logging

For monitoring and alerting of the application running in AKS, we decided to go ahead with basic logging to stdout which was published to Log Analytics Workspace. A Log Query was used to notify us in case of errors like API authentication error or Active Directory Bind errors using alert rules

let ContainerIdList = KubePodInventory| where TimeGenerated > now() - 40m| where ContainerName =~ 'xxx-xxx-xxx/<container-name>'| where ClusterId =~ '/subscriptions/xxx-xxx-xxx/resourceGroups/<rgname>/providers/     Microsoft.ContainerService/managedClusters/<cluster-name>'| distinct ContainerID;ContainerLog
Alerting with Azure Action Groups — looking for certain keywords in the logs

Leaver’s playbook

In the leaver’s playbook implementation, the code was expected to perform the below tasks:

  1. Extract the all-users reports and iterate over all users to find users who are missing in Active Directory and suspend them using the suspend-a-user API.
  2. Extract the dormant-users report and send them an email. After giving notice to the user, suspend the user using suspend-a-user API.

The leaver’s tasks are generally performed once a month and as a minimal viable product (MVP), we decided to extract the reports manually and feed the file to the code for 1 and 2 above.

What’s next?

We’re still in early our automation journey and there is a lot of scope for improvement in the MVP solution with regards to security, build and release pipelines, user experience, effort reduction, etc. The MVP allows us to test our services to make it better for our end customers ultimately.

Feedback

We’d love your feedback! Our aim is to best describe our approach and its implementation, including example code that not only demonstrates the implementation — but allows you to take it away and reproduce it. Did we do a good job? We’d love to hear from you in the comments below ❤️

--

--