Building a Git Server

A dive into Twisted, SSH, & Git

Emil Davtyan
Joltem — An Open Incubator

--

Our intention was simple — enable people to openly collaborate on projects that could turn into startups. Groups would be open, dynamic, and grow with the popularity of a project.

We needed a place to store our code. The fork and pull model did not meet our needs so integrating with GitHub was not an option. Our proposed access model required branch level permissions. A makeshift Gitolite solution proved to be buggy, difficult to test, and hard to setup.

Building our own git server was the best option. Our main requirement was that it would need to enforce branch level push permissions. Additionally, credentials such as public RSA keys would need to be stored in the database.

Tasked with building the Git server, I quickly found myself knee-deep in Twisted’s source. Several SSH RFCs helped me piece it together. While specifications of Git’s transfer protocols helped guide me towards a suitable solution. What follows is a distillation of what I learned.

Connections — Protocols, Factories, Services

With some Twisted basics we can make the server listen. Protocols are used to manage a connection and receive data. Connections are represented by ITransport implementations.

With a buildProtocol method Factory instances create Protocol instances for each connection. A service is used to bind a Factory instance to a port.

SSHFactory is the base factory class for SSH. It has to be extended to provide the host keys used in the SSH handshake. The Key class represents RSA or DSA keys.

The factory for our Git server ( we call it a “gateway” ) provides its RSA keys.

from twisted.conch.ssh.keys import Key
from twisted.conch.ssh.factory import SSHFactory
from django.conf import settingsSERVER_PRIVATE_KEY = Key.fromFile(settings.GATEWAY_PRIVATE_KEY_FILE_PATH)
SERVER_PUBLIC_KEY = Key.fromFile(settings.GATEWAY_PUBLIC_KEY_FILE_PATH)

class GatewayFactory(SSHFactory):
def getPrivateKeys(self):
return {‘ssh-rsa’: SERVER_PRIVATE_KEY}
def getPublicKeys(self):
return {‘ssh-rsa’: SERVER_PUBLIC_KEY}

We use a TCPServer service to bind our factory to a port. To start the server as a daemon we put the following in a gateway.tac file :

import sys
import os
os.environ.setdefault(“DJANGO_SETTINGS_MODULE”, “joltem.settings.local”)
from django.conf import settings
from gateway.libs.factory import factory
from twisted.application import service, internet
application = service.Application(“Gateway”)
internet.TCPServer(settings.GATEWAY_PORT, factory).setServiceParent(application)

And then run twistd :

twistd -y gateway.tac

Authentication — Credentials, Realms, Portals

A user must present credentials when attempting to open up a connection. Twisted stores them in implementations of the ICredentials class.

An implementation of ICredentialsChecker is responsible for verifying the credentials. The requestAvatarId method should return an identification for the user or an UnauthorizedLogin exception. To deal with Twisted’s asynchronous nature, the returned value should be wrapped with the success ( a form of a Deffered ) function, while an exception should be wrapped with the Failure class.

IRealm implementations are responsible for creating an avatar instance representing the connected user. Identification provided by a ICredentialChecker are passed through its requestAvatar method to instantiate the avatar.

A Portal instance is used to attach multiple credential checkers to a realm. Credentials are funneled through a portal, checked against the first capable credentials checker, and if the login is successful create an avatar using the realm.

Since our users provide RSA keys as credentials we use the ISSHPrivateKey implementation of ICredentials in our credential checker.

class GatewayCredentialChecker(object):    implements(ICredentialsChecker)
credentialInterfaces = (ISSHPrivateKey, )
@staticmethod
def requestAvatarId(credentials):
… # check source ;)

The requestAvatarId method checks whether it is a valid RSA key. If so computes a fingerprint and uses it to query the database for matching credentials. If successful it returns an instance of our Authentication model representing the RSA key.

The realm then uses an extension of the ConchUser ( SSH specific avatar ) class to create an avatar instance.

class GatewayUser(ConchUser):     def __init__(self, key):
ConchUser.__init__(self)
self.user = key.user
self.project = key.project
@staticmethod
def logout():
pass

class GatewayRealm(object):
implements(IRealm) @staticmethod
def requestAvatar(authentication, mind, *interfaces):
user = GatewayUser(authentication)
return interfaces and interfaces[0] or None, user, user.logout

Authentication is enabled by attaching our portal to our GatewayFactory instance.

portal = Portal(GatewayRealm())
portal.registerChecker(GatewayCredentialChecker())
factory = GatewayFactory()
factory.portal = portal

SSH — Channels, Sessions, Requests

A small primer on SSH and how Twisted implements it is required before proceeding.

In SSH, requests and data are transferred over channels :

All terminal sessions, forwarded connections, etc., are channels. Either side may open a channel. Multiple channels are multiplexed into a single connection.

To execute something on the remote end a session channel will have to be opened.

A session is a remote execution of a program. The program may be a shell, an application, a system command, or some built-in subsystem. It may or may not have a tty, and may or may not involve X11 forwarding. Multiple sessions can be active simultaneously.

Once the session is opened a request is sent to begin the session’s process. It can either run the default shell, execute a command, or open a subsystem.

Since Git uses SSH to execute a command on the remote end, we are interested in the “exec” request :

 byte     SSH_MSG_CHANNEL_REQUEST
uint32 recipient channel
string “exec”
boolean want reply
string command

This message will request that the server start the execution of the given command. The ‘command’ string may contain a path. Normal precautions MUST be taken to prevent the execution of unauthorized commands.

Twisted represents channels and sessions with SSHChannel and SSHSession respectively. SSHChannel contains general methods to write and receive data from channels. SSHSession extends SSHChannel and handles session specific channel requests.

Instances of SSHSession contain a session attribute with an instance of an ISession implementing callbacks for processed session requests.

Our ISession implementation is initiated with a GatewayUser avatar an handles “exec” requests.

class GatewaySessionInterface():    implements(ISession)    def __init__(self, avatar):
self.avatar = avatar
def execCommand(self, protocol, command_string):
log.msg(“Execute command : %s” % command_string)

We extend the SSHSession class so it will use GatewaySessionInterface as an ISession. There is another way to do this mentioned in the docs, but it is less explicit.

class GatewaySession(SSHSession):    def channelOpen(self, specificData):
self.session = GatewaySessionInterface(self.avatar)

To enable our new GatewaySession class we override our avatar’s channel lookup “session” entry.

class GatewayUser(ConchUser):    def __init__(self, key):
ConchUser.__init__(self)
self.user = key.user
self.project = key.project
self.channelLookup[‘session’] = GatewaySession

Git — Processes, Reference Discovery, Packfile Negotiation

To serve up Git we will need to understand its protocol and transport mechanisms. Git can utilize several transports, but we are only interested in SSH.

Each time a user fetches or pushes data a command is executed on the server using SSH. The process spawned from the command allows the server and client to communicate.

All commands are handled by two executables on the server : git-upload-pack and git-receive-pack. Sending data to the server fires up the git-receive-pack executable. Fetching data which includes cloning executes git-upload-pack.

For example, the command :

git clone emil@joltem.com:1

Is mapped to this command :

ssh emil@joltem.com “git-upload-pack ‘1'”

Which establishes an SSH connection and executes the command :

git-upload-pack ‘1'

Once the line of communication is opened Git’s protocol proceeds. The protocol uses packet lines ( or pkt-lines ) to communicate :

A pkt-line is a variable length binary string. The first four bytes of the line, the pkt-len, indicates the total length of the line, in hexadecimal. The pkt-len includes the 4 bytes used to contain the length’s hexadecimal representation.

Examples (as C-style strings):——
pkt-line actual value
————————————————-
“0006a\n” “a\n”
“0005a” “a”
“000bfoobar\n” “foobar\n”
“0004" “”
——

A 0000 flush packet indicates an end of a group of statements.

The first order of business is reference discovery :

When the client initially connects the server will immediately respond with a listing of each reference it has (all branches and tags) along with the object name that each reference currently points to.

Along with the first reference in the list the server informs the client of its capabilities.

009bb8915a670ec46e39f25ff700eb260ef5e8d045b1 HEAD\x00multi_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress include-tag multi_ack_detailed
0040cbb40cbc1f82c5d823484b1ccaf742fe210bcf52 refs/heads/develop
003fb8915a670ec46e39f25ff700eb260ef5e8d045b1 refs/heads/master
0040964838da3cd6e95f71d6bf6b425431769f76c22e refs/tags/1.3.2^{}
003d101c5cb5601d9ac04533bf3db810095d0faf7568 refs/tags/1.3.3
003d5da834fffb7e252418b080cca78041517bcd9734 refs/tags/1.3.4
0000

The next step depends on whether the client is fetching or pushing. If the client is fetching a process called pack file negotiation proceeds. During the process the client informs the server what it has and what it wants. The server uses this to compile the necessary objects into the smallest pack file possible.

Since our main goals is to enforce branch level push permissions, let us examine what happens during a push.

Once the client knows what references the server is at, it can send a list of reference update requests. For each reference on the server that it wants to update, it sends a line listing the obj-id currently on the server, the obj-id the client would like to update it to and the name of the reference.

An example, pushing the develop and master branches :

00859d4c1ed658ee2617521d846bf601cec06001a2c8 cbb40cbc1f82c5d823484b1ccaf742fe210bcf52 refs/heads/develop\x00report-status side-band-64k
00676860ded8258f538407e9eb1d2017ffc7a3015abe b8915a670ec46e39f25ff700eb260ef5e8d045b1 refs/heads/master
0000

Immediately after this, the client compiles and sends a pack file. The file contains all the objects the server will need to fill the gap between the old and new commit.

To enforce branch level permissions we decided to create a middleware. Something that sits between the client and server, listens to Git’s transfer protocols, and interrupts when necessary.

Twisted uses ProcessProtocols to connect to and receive data from process. Since an SSH session intends to execute something it is represented as a process. An instance of SSHSessionProcessProtocol is created by SSHSession and passed to our GatewaySessionInterface execCommand method as protocol.

def execCommand(self, protocol, command_string):

The commands the server gets will require execution of git-receive-pack and git-upload-pack in their own separate processes. We extend ProcessProtocols for each of these to mediate the protocol.

Our GitReceivePackProcessProtocol buffers the client’s input and processes packet lines. If a client attempts to push to a prohibited branch the server ignores their pack file and reports back with an informative error message.

To demonstrate our ProcessProtocols requires a lot of code and I didn’t want to complicate the article too much. Luckily, we are an open company and our code is publicly visible. You can clone the repository on our site and test the server out.

If you want to get involved in the development read over this and jump right in.

Caveat — Terminal Access

When a user directly SSHs to our server a terminal appears implemented completely using Twisted. It only handle a few basic commands and I won’t cover it here, but nonetheless some may find it interesting.

Terminal access to joltem.com

Joltem didn’t work out, but the code is open sourced here. Feel free to fork and put it to use.

--

--