How to find changes made in GitHub organization within the last few days

Scripting to reduce the time it takes to consolidate and review the changes using Python and GitHub APIs

Suresh Devalapalli
Gaussian Solutions
9 min readJun 13, 2023

--

Photo by Yancy Min on Unsplash

I lead a group of very active developer engineers who commit code, PR and merge almost on a daily basis. I do not often have the time to go through each of the changes. However, from time to time, I monitor the progress, do code reviews, comment on the changes I need the engineers to do. One of the challenge I often face is tracking down the commits, and PRs across multiple repositories (we have over 15). I usually dedicate a day in the week to go through the changes and point check them. I needed a way to consolidate all the changes in one place that I can review. GitHub UI doesn't provide any such facility, and I need to go through each repository separately to check the commits. This is a painful process. I needed a way to fetch the changes on demand and quickly.

GitHub APIs provided the solution. This article is about how I used GitHub APIs with Python to achieve what I wanted to achieve:

  • Quick summary of all the Pull Requests in the last week. More generically within the last ‘x’ days. The information I want to see include who the author of the PR is, what state it is in, the URL I can follow to see the full details of the PR
  • Quick summary of all the commits made in the last ‘x’ days. Details I am interested in are author, link to the commit, and the commit message.

Figuring out how to consolidate the PRs

GitHub APIs are documented at GitHub REST API documentation — GitHub Docs

What we are interested in is the Pulls — GitHub Docs ‘REST API to interact with pull requests’, more specifically ‘list pull requests’ API specified at List Pulls — GitHub Docs

https://api.github.com/repos/OWNER/REPO/pulls

The above APIs takes two main inputs:

  1. OWNER: the owner of the repos we are interested in querying. For my case, OWNER is my organization.
  2. REPO: the repository for which we can fetch the pull requests for.

As you can see GitHub doesn't provide a way to fetch all the PRs under an organization, but rather gives ability to pull from one repository at a time.

Oh! oh! How do we then get PRs for all the repositories? Well, is there an API to list all the repositories within an organization? If so, we can iterate over the repositories, and fetch the PRs for each repository.

How to fetch repositories under an organization?

Sure enough there is an API for that. Repositories — GitHub Docs https://api.github.com/orgs/ORG/repos

ORG: the organization name we are interested in pulling the information for. The curl command to fetch repositories under the ORG is as follows:

curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer <YOUR-TOKEN>"\
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/orgs/ORG/repos

Wait a minute what is the <YOUR-TOKEN> in the above call?

Authorization requirements for using the APIs

To use the APIs effectively, GitHub API end points require an access token. For good security measures, it is always helpful to create access tokens with just enough permissions to carry out what we want to achieve. In this case, we want to read the repositories, their PRs, and commits. GitHub allows creation of ‘Personal Access Tokens’ with fine grain permissions. Procedure is:

  1. Go to your GitHub profile. Profile → Settings → Developer Settings → Personal Access Tokens → Generate new token.
  2. Give the token a name, a description that you can use to remember why you created a token for and choose the access levels. Since I have some private repositories, I chose ‘All repositories’ as the repository access level.
  3. Chose the permissions to give. In my case, I need to read the ‘commit statuses’ and the ‘pull requests’. So, I chose those two to create a personal access token. Keep this token in a secure place, as you don’t want to share this with anyone, and you would need it to access the end points.

Note: It seems like for public repositories, one doesn't need a token to access the repository, PRs or commits.

Just a quick check on what we did so far:

  • We got a PAT (Personal Access Token)
  • Figured out the APIs to fetch the repositories under an organization, and the pull requests under a repository.

With this information, we are ready to go onto writing the code to get the PRs under the organization. Logic is pretty simple:

  1. Fetch the repositories under an organization.
  2. For each of the repositories, fetch the PRs under the organization.
  3. Filter the ones that were created, updated in the last ‘x’ days
  4. Format the output to show: the URL, the title, the author, and in what state the PRs is in (open, or closed)

I chose Python as it is an elegant language, and something I enjoy working with. Here is the Python code.

def get_pr_changes(num_past_days):

# Construct the API URL to get all repositories
api_url = f"https://api.github.com/orgs/{organization}/repos"

# Set the headers with your access token
headers = {
'Authorization': f'Bearer {access_token}',
# 'Accept': 'application/vnd.github+json',
'X-GitHub-Api-Version': '2022-11-28'
}

# Send a GET request to the API to get all repositories
response = requests.get(api_url, headers=headers)
repositories = response.json()

# Get the current date and the date 7 days ago
current_date = datetime.datetime.now().date()
num_days_ago = current_date - datetime.timedelta(days=num_past_days)
change_log = []

# Iterate over the repositories and retrieve the pull requests made within the last 7 days
for repo in repositories:
repo_name = repo['name']
# Construct the API URL to get the repository's pull requests
changes_ep = "pulls"
changes_url = f"https://api.github.com/repos/{organization}/{repo_name}/{changes_ep}"

# Send a GET request to the API to get the repository's pull requests
changes_response = requests.get(changes_url, headers=headers)
if changes_response.ok:
pull_requests = changes_response.json()
else:
print(f"Issue getting info from {repo_name}, response status = {changes_response.status_code}")
continue

for pr in pull_requests:
pr_created_at = datetime.datetime.strptime(pr['created_at'], "%Y-%m-%dT%H:%M:%SZ").date()
pr_updated_at = datetime.datetime.strptime(pr['updated_at'], "%Y-%m-%dT%H:%M:%SZ").date()
pr_closed_at = datetime.datetime.strptime(pr['closed_at'], "%Y-%m-%dT%H:%M:%SZ").date() if pr['closed_at'] else None

if pr_created_at >= num_days_ago or pr_updated_at >= num_days_ago or (pr_closed_at and pr_closed_at >= num_days_ago):
pr_url = pr['html_url']
pr_title = pr['title']
pr_author = pr['user']['login']
pr_state = pr['state']
print(f'url = {pr_url}, title={pr_title}, author={pr_author}, state={pr_state}')
change_log.append( (pr_url, pr_title, pr_author, pr_state))

return change_log

I initialized organization and access_token as global variables, though one can chose to pass them as arguments to the function. I chose it because we are going to write another function to retrieve commits as well.

We will also talk about the Python modules we will need to run the above code in a bit.

Issue with the above code

When I ran the above code, I observed that it was NOT retrieving all the PRs that were created in the last ‘x’ days that I gave. Many repositories showed zero PRs even though I knew there were PRs that were raised and merged. On further debugging (read the docs carefully), it was revealed that the endpoint by default only returns PRs that are in open state. To fetch all the PRs including the ones that were closed we need to add a parameter to the request.

changes_url = f"https://api.github.com/repos/{organization}/{repo_name}/{changes_ep}?state=all"

With the above change, I was able to fetch all the PRs including the ones that were closed.

The full code then is as follows:

def get_pr_changes(num_past_days):

# Construct the API URL to get all repositories
api_url = f"https://api.github.com/orgs/{organization}/repos"

# Set the headers with your access token
headers = {
'Authorization': f'Bearer {access_token}',
# 'Accept': 'application/vnd.github+json',
'X-GitHub-Api-Version': '2022-11-28'
}

# Send a GET request to the API to get all repositories
response = requests.get(api_url, headers=headers)
repositories = response.json()

# Get the current date and the date 7 days ago
current_date = datetime.datetime.now().date()
num_days_ago = current_date - datetime.timedelta(days=num_past_days)
change_log = []

# Iterate over the repositories and retrieve the pull requests made within the last 7 days
for repo in repositories:
repo_name = repo['name']
# print("Repo being investigated: ", repo_name)
# Construct the API URL to get the repository's pull requests
changes_ep = "pulls"
changes_url = f"https://api.github.com/repos/{organization}/{repo_name}/{changes_ep}?state=all"

# Send a GET request to the API to get the repository's pull requests
changes_response = requests.get(changes_url, headers=headers)
if changes_response.ok:
pull_requests = changes_response.json()
else:
print(f"Issue getting info from {repo_name}, response status = {changes_response.status_code}")
continue

for pr in pull_requests:
pr_created_at = datetime.datetime.strptime(pr['created_at'], "%Y-%m-%dT%H:%M:%SZ").date()
pr_updated_at = datetime.datetime.strptime(pr['updated_at'], "%Y-%m-%dT%H:%M:%SZ").date()
pr_closed_at = datetime.datetime.strptime(pr['closed_at'], "%Y-%m-%dT%H:%M:%SZ").date() if pr['closed_at'] else None

if pr_created_at >= num_days_ago or pr_updated_at >= num_days_ago or (pr_closed_at and pr_closed_at >= num_days_ago):
pr_url = pr['html_url']
pr_title = pr['title']
pr_author = pr['user']['login']
pr_state = pr['state']
print(f'url = {pr_url}, title={pr_title}, author={pr_author}, state={pr_state}')
change_log.append( (pr_url, pr_title, pr_author, pr_state))

return change_log

Figuring out how to consolidate the commits done in the last x days

Let’s figure out how to consolidate the commits now.

Reading the document at (now that we have experience with the missing state with the PRs) Commits — GitHub Docs, reveals that by default the API will only return the commits on the default branch which is main. The API provides parameter called sha to indicate the branch we are interested in pulling the commits for.

Wait a minute, how do we then pull commits for all the branches? Well, we have to iterate over the branches in a repository. The API to get the branches in a repository is Branches — GitHub Docs

curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer <YOUR-TOKEN>"\
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/OWNER/REPO/branches

The logic to fetch the commits:

  1. Get list of repositories under the organization
  2. For each repository, get the full list of the branches
  3. For each of the branches, fetch the commits
  4. Filter the commits that were done in the last ‘x’ days

Interestingly the API for commits gives ability to filter out the commits within a date range using since and until parameters. So that’s what we will use for step 4 instead of writing our own date filter logic. (I am not sure why GitHub didn't implement this for PRs)

The code is

def get_commit_changes(num_past_days):

# Construct the API URL to get all repositories
api_url = f"https://api.github.com/orgs/{organization}/repos"

# Set the headers with your access token
headers = {
'Authorization': f'Bearer {access_token}',
# 'Accept': 'application/vnd.github+json',
'X-GitHub-Api-Version': '2022-11-28'
}

# Send a GET request to the API to get all repositories
response = requests.get(api_url, headers=headers)
repositories = response.json()

current_date = datetime.datetime.now()
num_days_ago = current_date - datetime.timedelta(days=num_past_days)
num_days_ago_iso = num_days_ago.strftime("%Y-%m-%dT%H:%M:%SZ")
change_log = []

# Iterate over the repositories and retrieve the pull requests made within the last 7 days
for repo in repositories:
repo_name = repo['name']
# print("Repo being investigated: ", repo_name)
# Construct the API URL to get the repository's pull requests
changes_ep = "commits"

branches_url = f"https://api.github.com/repos/{organization}/{repo_name}/branches"
branches = requests.get(branches_url, headers=headers).json()
for branch in branches:
# Send a GET request to the API to get the repository's pull requests
branch_name = branch['name']
changes_url = f"https://api.github.com/repos/{organization}/{repo_name}/{changes_ep}?since={num_days_ago_iso}&sha={branch_name}"
changes_response = requests.get(changes_url, headers=headers)
if changes_response.ok:
commits = changes_response.json()
else:
print(f"Issue getting info from {repo_name}, response status = {changes_response.status_code}")
continue

for commit in commits:
# print(commit)
commit_url = commit['commit']['url']
commit_title = commit['commit']['message']
commit_author = commit['commit']['author']['name']
print(f'url = {commit_url}, title={commit_title}, author={commit_author}')
change_log.append( (commit_url, commit_title, commit_author))

return change_log

I added code to take as argument which changes I am interested in, and how many past days I want to see, and optionally write the output to a csv file for further analysis. The code for that, and the imports are as follows:

import csv
import argparse
import requests
import datetime

def get_n_save_changes(args):

if args.changes == "prs":
changes = get_pr_changes(args.num_past_days)
elif args.changes == "commits":
changes = get_commit_changes(args.num_past_days)
else:
print( " Unsupported type of changes specified")
exit(0)

if args.save_in:
with open(args.save_in, "w") as f:
writer = csv.writer(f)
writer.writerows(changes)



if __name__ == "__main__":
# Create the argument parser
parser = argparse.ArgumentParser(description='Arg parsers for getting recent changes from github.')

# Argument for looking at the last 'x' days
parser.add_argument("-d","--num_past_days", type=int, help='Number of days to look back into')
# Type of changes we want to get
parser.add_argument("-c","--changes", choices=github_changes.keys(), help=f'Type of changes to fetch:{github_changes.keys()}')

#save info in a csv file
parser.add_argument("-s", "--save_in", type=str, help="csv file to store the output in")

# Parse the arguments
args = parser.parse_args()
if args.num_past_days is None or args.changes is None:
print("Invalid arguments provided... See the usage below\n\n")
parser.print_help()

get_n_save_changes(args)

You can get the full code here:

https://gist.github.com/sureshgaussian/c78159e2b8a7027423827340cb828cbe

🙌 Enjoyed this article? If you found it helpful, insightful, or inspiring, consider supporting me on BuyMeACoffee! Your support goes a long way in fueling my passion for creating valuable content and empowering more readers like you.

--

--

Suresh Devalapalli
Gaussian Solutions

Explore, Learn, Share! Owner of https://gaussiansolutions.com/. Your partner in translating ideas into products.