This is Part 1 of the Comprehensive Guide to Running GitLab on AWS. In the intro post to this series, we discussed why Alchemy uses GitLab. In this post, we’re going to discuss at a high-level what we need to build to install and run GitLab on AWS. We will cover a lot of the same material as GitLab’s official guide to running a highly available GitLab on AWS, but we will go over some caveats and modifications you can make to add durability and reliability and increase performance.
Above is GitLab’s architecture diagram for running a highly available GitLab on AWS. We will start by setting up the application servers, Postgres, Redis, and a Network File System, and we will use that foundation to build out the rest of the architecture later on in the series.
GitLab offers everything in the diagram above as a part of its Omnibus installation, and you could use their documentation and run it yourself. But at Alchemy, we are a small team, so we use managed services whenever we can. Therefore, we will leverage the power of RDS and Elasticache to easily create a Postgres database and Redis Cluster. We will also use AutoScaling groups to make sure that the GitLab application servers can scale up and down to meet demand. Using these services, all the burden of managing durability and scaling is on AWS (and they’re really good at it).
The only piece of the architecture we cannot use is the AWS service Elastic File System (EFS) as our NFS server. There are many documented issues from GitLab users using EFS. The following are the warnings from the GitLab team:
EFS bases allowed IOPS on volume size. The larger the volume, the more IOPS are allocated. For smaller volumes, users may experience decent performance for a period of time due to ‘Burst Credits’. Over a period of weeks to months credits may run out and performance will bottom out.
For larger volumes, allocated IOPS may not be the problem. Workloads where many small files are written in a serialized manner are not well-suited for EFS. EBS with an NFS server on top will perform much better.
You can read more about the ongoing issue here.
Instead of EFS, you will create a plain old NFS server running on CentOS.
NOTE: There are other more advanced products you can use as a Network File System. For example, Ceph is a great choice. However, as part of a small team, I try to use what I can to achieve our needs while keeping overhead and complexity low (Ceph is fairly complex to set up and manage). Furthermore, NFS has been around for awhile and has proven it can scale to handle as much traffic as we will ever need at Alchemy. BitBucket uses an NFS server in their BitBucket Data Center package, and GitLab was infamously running gitlab.com on 1 server that served 100,000 repositories and 20,000 users.
Because Alchemy’s GitLab servers will have pretty high traffic, we want to make sure that the disk and network IOPS of the NFS server is tuned to get the best performance. According to the NFS and CIFS Options for AWS session from re:Invent 2013, there are two simple things you can do to greatly improve performance and durability. First, create a RAID array and attach it your instance. Creating a RAID array from multiple EBS volumes can multiply your IOPS performance.
The resulting size of a RAID 0 array is the sum of the sizes of the volumes within it, and the bandwidth is the sum of the available bandwidth of the volumes within it … For example, two 500 GiB Amazon EBS
io1volumes with 4,000 provisioned IOPS each will create a 1000 GiB RAID 0 array with an available bandwidth of 8,000 IOPS and 1,000 MB/s of throughput…
According to the NFS and CIFS Options for AWS session, eight EBS volumes is the sweet spot to gain the best RAID performance. Anything more than eight volumes in an array, and the network bandwidth to EBS becomes the bottleneck.
The second thing to ensure good performance on your NFS server is to pick an instance type that is EBS optimized and has plenty of network bandwidth. The EBS RAID array can be extremely fast, but if all your network bandwidth is eaten up, things will crawl to a halt. We are starting with an
h1.2xlarge instance at Alchemy, and we can always edit the instance type to be larger if we need later. We will be using a
t2.micro instance in this tutorial to save money, but please don’t do that in production.
Putting it all together
Head over to Part 2 of our series, Automating running GitLab on AWS using Terraform, Packer, and Ansible, to learn how to create this yourself.
Alchemy is always hiring great engineers! If you are excited about building great software with some of the world’s largest brands, email email@example.com.