Automating GIT from Python for non-interactive applications

In my previous post, I wrote about how to use SSH-agent from Python to interactively add basic SSH support to Python applications.

Checking out git repositories can happen one of two ways, and using ssh-agent as described in the previous blog post can help.

Git repositories can of course *ALSO* be checked out over http and https.

When we want to execute checkouts of git and have full control over the command line tools, we really don’t want to use any libraries to do it.

However, in doing so, we also must be very careful to keep git from going interactive. We want to avoid having to write any complex expect scripts, in particular.

Vespene handles this with it’s git module, HERE. I’ll highlight key parts.

SSH-Agent

First off, we’re running under SSH-agent as described in that previous blog post. Any SSH_AUTH_SOCK environment variable has been preserved, and we should already have access to any SSH keys. Any keys with unlock passwords have already had their passwords supplied.

Where’s The Server

There’s a good chance we haven’t talked to this server before. While it MAY seem like a good idea to say, “hey user, go ahead and add this IP to known hosts”, in the reality of a build system, build workers may autoscale frequently, and IP addresses of sites like github may change. We want to avoid the problem where we break production builds.

To do this, we can supply an environment variable to git to tell it to ignore the Known Hosts file in SSH.

“GIT_SSH_COMMAND” : “ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no”

We do this knowing we are only using SSH keys, and some site spoofing GitHub would be all kinds of evil bad for the entire world already. Yes, we should probably have an option to turn this off. Yes, turning this off will make you hate yourself.

From past experience, hanging git checkouts were a massive source of bug tickets, so it’s better to do it this way. If you just blindly execute ssh-keyscan on new instances, that would be doing the same thing by ignorning whether the host is the same or not.

HTTP/HTTPS

In Http or Https mode, git contains some annoying helper programs that will occasionally ask the user for a password. On OS X, this can even be a desktop popup — it makes no sense.

This can be disabled by supplying a command line flag to the git binary

 — config core.askpass=some_program

Where some_program is a program that echo’s back the password, and nothing except the password.

To generate this program in the most basic of ways, we temporarily write a bash script to a secure location, and delete it when we are done.

If no password is supplied, we just say:

--config core.askpass=''

Additionally, this isn’t enough. There’s a good chance the user didn’t supply their username when describing the address of the https:// repo.

If the repo looks like this:

https://git.example.com/some-directory/foo.git

We have to modify the string to insert the repository address:

https://username@git.example.com/some-directory/foo.git

I believe we could *technically* write some user dotfiles and pass in the location of those dotfiles, but then we’d have to clean up after ourselves more and that would be more work.

Efficiency

To avoid checking out extra branches (we’re a build system), we can add

-b branch --single-branch

Paranoia

Despite all we’ve done, the git command might still hang. The timeout utility from coreutils is useful to set a maximum time for the git operation.

timeout 600 git …

Important: If installed from “brew install coreutils”, timeout is renamed “gtimeout”.

What Didn’t Work

Git seemed to have some command line or environmental flags that should have allowed passing the SSH unlock passwords to git. It ended up being a lot easier to just rely on SSH-agent, which also was nice because we could eventually reuse that feature to enable other types of SSH-based automation inside build scripts.

Other Git Commands

We also execute some other git commands once we get a checkout, but thankfully these commands — like getting the head version and the most recent commit user, are all local. We are done with the repository setup.

The Code

Take a look at https://github.com/vespene-io/vespene/blob/master/vespene/plugins/scm/git.py for the most recent version. Upgrades are always welcome!