Laying out a git repository
How littleman.co organises projects as well as why we do it in that way
Version control is one of the more fundamental pieces of software development. It allows developers to navigate through a projects history to understand who implemented each change, as well as why they did so. It is an invaluable tool for use while understanding any given issue.
littleman.co uses git as its version control tool of choice.
git is the defacto standard of the software industry, having replaced Mercurial, Subversion and CVS. The majority of our development tools and our workflow builds on top of
git primitives such as:
And so forth. That said
git, for all its opinions, is remarkably silent about how to lay out a project.
This is a good thing for the tool but not necessarily for the developer. When first reading a project to understand and debug it a developer needs to build a model of that project as quickly as possible. They can then use that model to make predictions about how the software should behave; as well, spotting things that violate such predictions. If we are able to keep projects consistent we are able to reduce the number of odd things developers need to investigate to find the desired issue.
Accordingly it’s a good idea to structure all projects in the in the same way and that developers can easily understand and search through them.
Defining a standard for how a project should be laid out is hardly a new endeavour. There is:
- The Linux Filesystem Hierarchy Standard
- The standard go project layout
- The Maven standard directory layout
- The Python standard project layout
If one of these standards is in wide use in your organisation its best to continue with that, rather than adopt yet another standard. However each of those standards have the limitation they’re only used in the context of the language or build tooling they’re defined in. In an environment such as littleman.co that includes many different languages, applications and other types of development these standards either do not define enough behaviour to be useful or define things that do not propagate well between languages.
Determining the boundaries of a repository
There are usually many different components of a project that need to come together to have that project user facing and doing useful work. Things like the:
These things must all be coordinated in some way that allows developers to make changes to a project in a predictable way and with predictable timing and have those changes be pushed to users.
Traditionally each of these components would be kept separate, handled by different teams. However with the advent of continuous delivery developers can push code to production in a “self service” manner, and have a robot take care of tasks such as:
- Ensuring the application works as expected before it hits users
- Replacing the existing application in production with the new application
- Rolling back the application to its previous version in the case of failure
- Creating testing environments
And so forth.
Deployments are the boundary that seems most useful to determine what should be in a single repository. For example, if the application is the only thing that should change in a single deployment it can be the only thing in that repository. However, if the application is changing and requires an underlying infrastructure change, that infrastructure should also be in the repository. If the application requires a new set of tests and those tests should be in the CI/CD configuration that also belongs in the repository.
However, this also provides good boundaries as to what does not belong in the repository. The application should never require Kubernetes to be in a specific configuration, and Kubernetes configuration and life cycle should thus be managed in a separate repository. If the application requires new TLS certificates but those certificates are handled in a process outside the normal application development process they should also not be stored in the repository.
By using the deployment as our boundary to determine what goes in and out of the project we see a number of benefits:
Democratised project tooling
Even though things such as Docker or CI/CD may require specialised knowledge that the application developers do not have any reason to learn, by seeing those changes in the same place and subject to the same standards as other parts of the application those developers get a better understanding of their own project lifecycle. They can use that knowledge to decrease the time required to understand and resolve issues that are associated with any changes in that process, such as configuration changes in CI/CD breaking asset compilation in the application.
Additionally those developers can contribute application specific insights to the CI/CD process, such as the best place to store configuration or environment specific application configuration that must be applied.
Single view of changes
When understanding how and when a bug was introduced into a service the fewer places we must look and correlate the change the faster we can find and resolve the issue.
By having all changes associated with the project down to the next “deployment layer” we can quickly see whether it was an application code change, configuration change or environment change that was introduced at the same time as an issue started hitting users.
There are times in which an application change and a configuration or environment change must happen at the same time. Examples include:
- The addition of a new data store (Redis)
- Newly exposed application configuration
- A new application feature that requires a system library
By having both the application and the infrastructure in a single repository we can review both the application changes and the infrastructure changes in a single pull request and ensure they’re released and tested in a coordinated way.
Additionally any deployment artifacts generated can be directly traced back to a change in the
git repository allowing operations team members to know exactly what code is running in production at any given time.
The littleman.co standard is derived from the requirements as above. The directory layout is as follows:
$ tree .├── bin
│ ├── ci
│ └── container
│ ├── Dockerfile
│ └── etc
│ ├── ansible
│ │ └── playbook.yml
│ ├── docker-compose
│ │ ├── docker-compose.yml
│ │ └── mnt
│ │ └── app
│ └── helm
└── web14 directories, 5 files
There are various files that are either required by convention or by project tooling to be in the root of the project.
- LICENSE.txt: The project license
- README.adoc: Some basic description about the project
- .drone.yml: The task runner / CI configuration for the project
- .arclint: Configuration for the Arcanist lint runner
Build configuration is expected to produce some sort of artifact, either consumed later in the build or deployed to some sort of environment.
Sometimes there are limitations with the build system that require additional procedural scripts to do some
Containers are the canonical deployment artifact used by littleman.co. They’re build from the
Generally there is only one production container per project, though other containers may be used to assist with bespoke application build tasks.
The deployment folder contains any “infrastructure as code” configuration. There are various kinds that are in common use, including:
Helm is a project for managing the definitions and lifecycle of Kubernetes objects. It is an opinionated way of packaging and vendoring software and there are a number of pre-packaged bits of software.
Each bit of software is packaged into a “chart”. This chart includes:
- Some metadata describing the software
- The deployment definitions
- The deployment definition configuration
Usually a project only has a single chart. However, where there are multiple charts required to launch this project each chart is nested in its own subdirectory:
Generally speaking however, it is an anti-pattern to need multiple services for a single project. The project should be deployed as a single, atomic change. These services are better organised in the subchart pattern.
Ansible is a tool for defining machine specifications and having them enforced. The layout within this folder should be the layout defined by Ansible upstream, with the exception that each project is expected to only define one role.
docker-compose is a tool that is useful for spinning up a "production like" environment in a limited way in the local development environment.
Its scope is limited to local development by design.
Project specific documentation
All files associated with the application.
If the application is interpreted this should be called “app”.
The generated web application
Our tools shape our conceptual model of a project. When developing keeping things consistent reduces the amount we need to investigate given each different project before we can start diagnosing issues or adding features to that project and adopting a single project layout keeps things as consistent as possible. The things included in a
git repository in littleman.co projects are all the things that are needed to deploy a project to users or subsequently change that project’s behaviour, given consistent underlying infrastructure. The layout is fairly straight forward but is subject to iteration, and has thus been pushed to GitHub. Hopefully understanding how we structure projects will give you some guidance on how to structure your own projects, or invite questions as to whether your projects are currently structured to maximise clarity and consistency in your team.