The Easy Way to Integrate GitHub with Sagemaker

Muhammad Umar Amanat
6 min readJul 12, 2023

--

GitHub with Sagemaker, Source: Author

GitHub is a version control and a robust CI/CD platform that lets you automate your build, test, and deployment pipelines. Workflows are triggered by events, such as pull requests or deployments, and can be used to build and test code, deploy it to production, or even send notifications. In easy words, it manages the tracking of your code and smoothly handles all typical development workflow.

When working with Sagemaker notebooks we can save our code on the Notebook instance easily. But this approach does not handle code versioning and development workflow. Besides this, if someone accidentally deletes Notebook Instance then we can lose our code in a matter of seconds. This is why the best approach is to integrate your Sagemaker with GitHub or any other versioning control systems.

In this article, I am going to show an easy way of integrating Sagemaker with Github and also show a basic git workflow.

Integrating GitHub

I am assuming that you have set up Sagemaker in your AWS and know how to navigate inside Sagemaker. GitHub integration started with first moving to the Notebook section on the left pane in Sagemaker UI then expanding the “Notebook” dropdown and selecting “Git Integration”.

A new input prompt opens and asks for Git details. Follow the flow as shown in the below image from point 3 and onwards.

Make sure you have created secrets if not then select “Create secret” as shown in step 6 and follow the details. Once created then select it and click “Add repository” option.

Git integration from Sagemaker UI, Source: Author

Jupyter lab overview

Before moving to the next steps let’s explore the basics of jupyer lab. The main points of interactions are highlighted in the following image and their purpose is as follows:

  1. Show the Directory of your connected repo
  2. Show running notebooks, terminals, etc.
  3. Managing repository workflow like a stage, commit, and push code, etc.
  4. Table of content for selected notebook, created based on the heading used in your notebooks
  5. Examples by AWS
  6. Plugins, rarely use.
Jupyter lab UI, Source: Author

First Commit

It’s time to do a first commit. In this example, I cloned my repo, then delete some empty cells from the end of my notebook. This small change was identified by Jupyter lab UI and it shows under the changed section in UI.

New change detected, Source: Author

Git push codes workflow start with staging your code. Following image shows how to stage your changes.

, Source: Author

Once changes were staged we can unstage our changes and commit our changes as well. In this article, we perform our first commit by giving the title of our commit message and also a description of what we are going to commit. It is best practice to always provide details for your commit, later in the pull request phase repository’s collaborator knows the purpose of corresponding commits.

Staged changes, Source: Author

After committing the changes in your local repository the next step is to push to a remote server. A cloud button with an upside arrow as depicted in the below image used to perform push operation.

push operation, Source: Author

NOTE: When prompted with the input box after clicking push, make sure to provide your GitHub personal token. If you provide a password then you will get the following error.

Error on push, Source: Author

Once you pushed your changes on the remote server you can see those changes by opening Github and clicking on a repository. As shown in our case we pushed the “remove empty cells” change to remote.

Pushed changes, Source: Author

Create a branch

Creating a branch is an important step daily Git routine. The Git branch is used to create an isolated version of your code and incorporate new changes and then merge into the main branch without affecting your main codebase.

In Sagemaker, we can create a branch from Jupyter lab UI. Move to the Git repository tab from the left menu, then click “New Branch” and fill in the relevant details. In my case, I create a “del_resource” branch in which I am going to incorporate logic for deleting Sagemaker resources.

create a branch, Source: Author

Once you created a new branch you will instantly see a small orange dot appear on the “cloud with an upside arrow” which pushes your changes to a remote server. Click this icon and do the same push step as explained above.

Pushing new branch , Source: Author

On performing it correctly we can confirm from GitHub UI that new branch is created.

New branch created in GitHub, Source: Author

Create a Pull request

The final step of this article is creating a pull request and merging our newly created branch into the main branch. We need to do the same steps as explained above i.e., stage code, commit code, and push code to a remote server. In this case, our changes will be pushed to newly created branch on remote server.

commit code for PR, Source: Author

Once we pushed our new branch changes to remote, after visiting the GitHub repository page we can see a message that there are some changes in our branch (not main), and allow you to create a pull request. Click “compare & pull request” to create a new PR

PR option, Source: Author

Follow the detail after clicking “compare & pull request” and fill in the details. I have shown a simple PR title and detail here, but the best approach is to provide more details according to the fix/feature or based on whatever logic you incorporated in your new branch.

, Source: Author

After performing the merge request operation, make sure to delete the branch as it already served its purpose. Always create a branch based on a fix/feature and never create a pull request for solving multiple details.

Branch deleted after merging PR, Source: Author

Again, from the Sagemaker platform and inside the Jupyter lab UI click the pull option to pull the latest version of your repository from remote. I have added a new logic for deleting Sagemaker resources, I can confirm it by opening my newly pull notebook inside Jupyter notebook.

Main branch updated, Source: Author

Conclusion

Integrating GitHub with Sagemaker is a great way to improve the collaboration, reproducibility, and scalability of your machine learning projects. By using GitHub to store your code, you can easily share it with others and track changes over time. You can also use GitHub Actions to automate your machine learning workflows, such as training and deploying models. This can save you time and effort, and help you to release models more quickly.

About Author

I am Muhammad Umar Amanat working as Sr. Data & AI Consultant. I have more than 5 years of experience in the Data & AI domain. I am working on AWS Sagemaker since 2018 and successfully deployed a number of projects on the Sagemaker platform. I learned a lot from the open-source community and now I am trying my best to pay backto community.

Follow me on Medium to keep updated with new articles. You can also find me on LinkedIn.

Need advice on the AWS platform? You can book a 1:1 call with me.

--

--