Monitoring traffic of your Github repositories using Python and Google Cloud Platform — Part 1

Artem Rys
Artem Rys
Sep 11 · 4 min read
Photo by Paweł Czerwiński on Unsplash

It is an article about monitoring your Github open-source repositories traffic. Unfortunately, you can see these statistics only by accessing each repository step by step. You may not want to access them at all… But if you do, you can use this small tool.

Technical stack:

And from the perspective of $ — this solution is zero cost because of the free quota you have in Google Cloud Platform. No ads — I just like using free opportunities.

So, the main concept is to get top referrers from Github for each your public repository and then store this data in firestore based on the date to be able to create a report on the weekly basis (in the next part).

In this part, we are going only to get top referrers from Github. The main and only code:

import os
from datetime import date, datetime
from flask import jsonify
from typing import Generator, List

from github import Github
from github.Referrer import Referrer
from github.Repository import Repository
from google.cloud import firestore


def _build_github() -> Github:
"""Builds Github client."""
return Github(os.getenv("GITHUB_TOKEN"))


def _build_firestore() -> firestore.Client:
"""Builds Firestore client."""
return firestore.Client()


def _get_public_repos(github: Github) -> Generator[Repository, None, None]:
"""Gets public repositories from Github for your user.

The user is that user which owns GITHUB_TOKEN.
"""
for repo in github.get_user().get_repos(visibility="public"):
yield repo


def _get_repo_top_referrers(repo: Repository) -> List[Referrer]:
"""Gets repository top referrers."""
return repo.get_top_referrers()


def _add_top_referrers_to_firestore(
firestore_client: firestore.Client,
repos_top_referrers) -> None:
"""Adds top referrers for each repository to Firestore."""
today = date.today()
document = f"{today.year}-{today.month}-{today.day}"
doc_ref = (
firestore_client
.collection("referrers")
.document(document)
)
doc_ref.set(repos_top_referrers)


def parse_github_repos_traffic(request):
github = _build_github()
firestore_client = _build_firestore()

repos_top_referrers = dict()
repos_top_referrers["created_at"] = datetime.now()
for repo in _get_public_repos(github):
repo_top_referrers = _get_repo_top_referrers(repo)
if repo_top_referrers:
repos_top_referrers[repo.full_name] = [
{
"referrer": referrer.referrer,
"count": referrer.count,
"uniques": referrer.uniques,
}
for referrer in repo_top_referrers
]
_add_top_referrers_to_firestore(firestore_client, repos_top_referrers)
return jsonify(repos_top_referrers)

Cloud Function to get top referrers for each of your open-source Github repositories.

requirements.txt file:

PyGithub==1.43.8
google-cloud-firestore==1.4.0

You are going to need a Github personal access token to be able to make requests to Github API to get all your public repositories and then get traffic data from them. You can get it here.

Github personal access token page.

Click Generate new token.

Github new personal access token page. Click only `public_repo` scope.

Click Generate token.

From the GCP perspective — you need to have a created project, enabled Cloud Functions API and created a Cloud Firestore database and that’s all.

Docs to deploy Cloud Function in several ways can be found here.

I am using a gcloud tool from my local machine.

gcloud config set core/project <your-project-name>gcloud functions deploy parse_github_repos_traffic --runtime python37 --trigger-http --set-env-vars=GITHUB_TOKEN=<your-github-token>
Newly created Cloud function.

To test the function, click on it, then go to the Testing tab.

A testing tab of your Cloud Function.

And press Test the function. And you should be able to see something like this.

A testing tab of your Cloud Function after pressing a Test the function button.

And this data is also available in Cloud Firestore (for future analytics).

Cloud Firestore with data.

That is all for this part. In the next part we are going to set up a scheduler to run call this function weekly and analyze the data we got to create a report.


Thanks for the attention to the topic, feel free to leave your questions in the comments for discussion.

python4you

Articles about general Python, best practices and interviews.

Artem Rys

Written by

Artem Rys

Senior Python Developer @ EPAM Poland

python4you

Articles about general Python, best practices and interviews.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade