Stata and GitHub Integration

Asjad Naqvi
The Stata Guide
Published in
13 min readApr 27, 2021

--

Last updated: October 2022

In this guide learn how to use syntax-based synchronization to GitHub directly from Stata using Git.

Why is this necessary? More and more projects are moving online and data sharing is now commonplace. In economics and other fields, having some online repository with data and code for replication is also becoming a norm. While some websites and journals provide their own platforms for data sharing, GitHub is now also slowly gaining traction as a hosting service. Furthermore, GitHub excels are two services that other online data sharing platforms lack: version control and the seamless ability to collaborate with code writing.

Additionally, on GitHub one can also follow other code development projects, access data sets, and set up interactive websites using a whole suite of languages which have the ability to showcase functionality of the code. And a plus point is that all of this is free of cost.

Preamble

Like other guides, a basic knowledge of Stata is assumed. In order to make the graphs exactly as they are shown here, several additional item are required:

ssc install schemepack, replace

Use can use any scheme from the suite. For example, let’s set “black_w3d”:

set scheme black_w3d
  • Set default graph font to Arial Narrow (see the Font guide on customizing fonts)
graph set window fontface "Arial Narrow"

Narrower fonts work better for longer texts.

Introduction to GitHub

Before we start, the first step is to set up a GitHub account:

This is fairly straightforward. Here I would suggest that as you put more and more stuff online, switch to two-factor authentication to boost the account security.

Once your login, you need to set up a new repository by either clicking on the + sign or the “New” icon:

On this page fill in the following fields:

You can give it whatever name and description but keep the repository public. Usually one would check “Add a README file” but we skip this for now. If you click on “Create repository”, you will get this screen:

Here you can see the URL of your folder. It will in this format: https://github/<username>/<foldername>.git

Below it are a bunch of commands starting with git that we need to get started. Copy these somewhere if you want. Note that GitHub recommends creating a README.md file. In fact without this file, one cannot see the contents of the folder.

We did not check the README.md option in the earlier step but we will interactively set it up within Stata. The file README.md contains the information that is shown in the description of a repository. For example, the screenshot below is the README.md is from my Stata schemes repository:

Here the extension .md stands for “markdown”, a language for quick formatting of text commonly used in blogging platforms and websites. You can click on the README.md file and click on the edit icon (pen) to see how the markdown entries look like:

Basically one can view any README.md or other files on GitHub in any repository. This also goes at the heart of information sharing in GitHub. Everything is open for use, copying, and adapting.

There are dozens of online guides that one can follow to learn markdown. Here is one for example:

We won’t go in the details of markdown here but it is important to learn if you plan to use GitHub. It is also used in Dropbox Paper, another online collaborate tool for document writing.

Setting up GitHub on the desktop

While GitHub allows one to use the interactive interface to add and modify basic stuff, some things cannot be done online. For example, files cannot be deleted online. Additionally, if you have dozens of files that you want to sync in various folders, it is fairly messy to do all of this manually via drag and drop.

That is why we need a local setup on our computer where we can sync or “push” the local desktop to the online GitHub folder. Here one can also do some version control locally in terms of the folder one wants to synchronize.

One way of synchronizing the folders is to use the GitHub Desktop application:

https://desktop.github.com/

Once you install and open this program, you can login to your GitHub account and copy or “clone” your online repositories:

This basically creates a local folder on your computer. You can right click on a repository and click on “Show in Explorer” to go to the contents on your computer:

If I make any changes in this local folder, for example, if I change the README.md file, or add new stuff to the schemes:

Here you can see in the screenshot that I have updated the README.md file. It shows me a comparison of the old version and the new one. I also made some minor fixes to the scheme files. Since these are significant changes, I need to give this upload a version name. Once this is done, I can press the “Commit to main” icon. This brings me to this screen:

Now if I press “Push origin”, my files will be synchronized on the GitHub server. I can do a refresh on the online repository and here both the name of the version and when it was changes is showing up in the online repository:

This is one way of synchronizing files from the desktop to the online repository. Copy paste stuff to the local GitHub desktop folder and synchronize it. Why do I use this manual copy-pasting for some folders? First, I have way more scripts and files than I want to put online. Second, for larger projects if you don’t set up the step for directly synchronizing from your work directory at the very beginning, the sunk costs are too high to script all of this at a later stage. Therefore, by writing this guide I am forcing myself to start directly synchronizing files as well :)

Coming back to the issue at hand, here the aim is to not discuss the GitHub desktop app, but to show synchronization from your desktop to GitHub and introduce some terms like “fork”, “push”, “pull”, “commit”, “markdown”, etc. Next step, let’s do all of this from a syntax.

How to Git from command lines

Git in its core essence is syntax based. This means that we can issue commands in the DOS shell and push information directly into GitHub.

To start with git, we need to install another software, Git:

https://git-scm.com/

Here, I would also highly recommend the “Documentation folder”, which has a comprehensive introduction to Git. Once you install it, you will get three new softwares added to your computer:

Git GUI, Git CMD, and Git Bash. Feel free to explore these and read up the documentation in detail. What is important here is that Git commands are now available on the Windows shell. Here you can access the shell by either clicking on Git CMD or Git Bash, or go to Windows search and type cmd or powershell:

I personally use PowerShell for most of syntax-based stuff since it looks nicer than cmd which has an old DOS feel. If you open PowerShell and type “git” and press enter, you will get a long list of commands you can use:

These commands allow us to call in the shell from any software and push the git code from that software. Next step, let’s do this in Stata.

Stata-to-GitHub integration

Stata has built-in functionality to call the windows shell. I have used it in several of my previous guides. For example, I use it in the Eurostat guide to call in 7-zip, and in the Animations guide to call in ffmpeg.

In fact, in Stata if we type:

help shell

we get this entry which shows how Stata can interface with various operating systems:

This is an extremely powerful (and also dangerous) tool that allows one to create, modify, and delete files outside the Stata environment. The shell can be invoked either by typing shell or simply starting the line using !. I will stick to the exclamation sign ! for brevity.

First we start off by creating a dummy project folder:

Here we assume that we have three subfolders: data contains the data, dofiles contains the Stata scripts, and figures contain the graphs etc. What is important is that we know the path of the root folder. For my computer, I start my dofile as follows:

clear
cap cd "D:/Dropbox/STATA - MEDIUM/graphs/github"

We can save this dofile in the dofile folder above as setup.do. Let’s make some graphs using the auto dataset:

sysuse auto, clearscatter price mpg 
graph export ./figures/figure1.png, replace wid(1000)
scatter length weight
graph export ./figures/figure2.png, replace wid(1000)
scatter price weight
graph export ./figures/figure3.png, replace wid(1000)
scatter length mpg
graph export ./figures/figure4.png, replace wid(1000)

which saves these four figures in the figures directory:

Now our folders are populated with the files we need. We are ready to push these to GitHub.

First time GitHub setup

Here I would like to circle back to the instructions given on the GitHub website:

Since we have installed Git, we can now use these commands in Stata as well now. First we need to make sure we are in the correct directory. You can check by typing:

dir  // for Windows
ls // for Mac and Unix

This shell command tells you which directory you are working in. If in your code you have switch to some sub-folder, just make sure you are in the root folder you want to synchronize.

Next we need to use a bunch of Git commands. Here is an official cheat sheet to get you started:

https://training.github.com/downloads/github-git-cheat-sheet.pdf

Since we are using this for the first time, we need to generate the README.md file, point to the folder online, and connect to it. We do this using the following steps:

Generate the README.md file using this command:

! echo # github-tutorial  >> README.md

The operator # is basically saying that the title of the readme file is “github-tutorial”. In Stata don’t enclose the # and the title in double quotation " marks. However single quotations ' work.

Next we initialize the Git code:

! git init

These two commands should create the README.md file and create this hidden .git directory to your folder:

Both are essential for your folder to correctly synchronize online.

Next we add the README.md file and commit it:

! git add README.md
! git commit -m 'my first upload'

In the “commit” command we also need to add a small description. The more accurate the description, the better for version control. For example, this can be something like “v1.015 added on 26042021”.

Next we define the directory where we want to add this file:

! git remote add origin https://github.com/asjadnaqvi/github-tutorial.git

The path is exactly the one that appears on your GitHub setup page above. And then we push these changes to the directory:

! git push -u origin main

These last two commands are the key ones to remember. At some point you will be asked to login and authenticate the app with GitHub to give it access to your account:

This is an approved app so it should be fine. But in general one should be careful in giving third-party apps control to your data repositories. Once this is done, your README.md should appear on the website:

Synchronizing all files on GitHub

Once the README.md file is created, it can be modified in any text editor. For Windows, I prefer to use Notepad++ since it auto selected the markdown language making it easy to see the changes:

We also need to add the remaining folders and their contents. While these can be done fairly seamlessly using the command shell prompt and using the following set of basic commands:

git remote add origin "https://github.com/asjadnaqvi/github-tutorial.git"git status
git add --all
git commit -m "minor fixes"
git push

in Stata we run into a technical issue. Every time we invoke the DOS shell using the ! or shell, the instance is executed and closed. For git, we need to make sure all the commands run in the same instance since every command stores new piece of information on what to synchronize, how to synchronize it, and how to deal with the version control. As one moves towards more advanced synchronization, various commands can be added as well.

To circumvent this batch processing commands problem, we go back to the basics of how DOS functions. This is actually where I started learning programming in the 80s as a kid. Here we can define a DOS batch or .bat file, which contains a host of different syntax-based commands. We can write this batch file in Stata, and also execute it in Stata using the shell command. Currently, this is the only hack I can think off without making life too complicated.

First, let’s write the batch file by making use of the file (see help file) command:

file close _allfile open git using mygit.bat, write replace file write git "git remote add origin " `"""' "https://github.com/asjadnaqvi/github-tutorial.git" `"""' _n
file write git "git add --all" _n
file write git "git commit -m "
file write git `"""' "minor fixes" `"""' _n
file write git "git push" _n
file close git

Here we close all open files (also works with Stata logs). Next we open a new batch file called mygit.bat and give it the handle git. This is just a reference name in case several text files are open. In the next few lines, we basically write a bunch of git syntax that we want to run. Here I would like to highlight how double quotations are defined `"""' in Stata. This was quite a pain to figure out. The delimiter _n closes a line and move to the next one.

Also note that if your default GitHub folder is not “main”, you might need to define it as well in the script above:

file write git "git branch -M main"

Note here that since we will be executing all of this in the DOS shell, we don’t need to worry about Stata-specific issues like the use of ! or single quotations. Once all the commands are there, we close the git file. We can also view the mygit.bat file in a text editor. It is just a bunch of git commands that we want to runs sequentially:

We can execute the mygit.bat file in Stata by simply typing:

! mygit.bat

And if everything runs correctly, we should get a screen showing the git synchronization:

If everything is synchronized properly, the changes will show up on GitHub as well. Here we can also use git commands to pull, merge, delete, modify files. We can also define which folders and files we want to synchronize. All of this requires a one-time setup costs to write some batch files. But once they are done, they can be recycled fairly easily for various projects.

And that is it for this guide. Git is very extensive, and this guide is just an introduction since I am learning about it myself. If I come up with other helpful git tips, I will add them there as well. Or if you have other suggestions, comments and feedback, then please feel free to share. Hope you found this guide useful!

About the author

I am an economist by profession and I have been using Stata since 2003. I am currently based in Vienna, Austria. You can see my profile, research, and projects on GitHub or my website. You can connect with me via Medium, Twitter, LinkedIn, or simply via email: asjadnaqvi@gmail.com. If you have questions regarding the Guide or Stata in general post them on The Code Block Discord server.

The Stata Guide releases awesome new content regularly. Subscribe, Clap, and/or Follow the guide if you like the content!

--

--

Asjad Naqvi
The Stata Guide

Here you will find stuff on Stata, data visualizations, data wrangling, workflows, and programming.