Building the NeverLAN CTF Infrastructure

For the last year I worked as a mentor with a group of five high-school students that were building a CTF aimed at middle school students. The students made the decision early that they would open the CTF to the Internet and allow anyone who wanted to play to register and participate.

After working on the project for many months the students submitted and were accepted to talk at SAINTCON, a medium sized regional security conference.

From an engineering view-point this presented some issues as we were unsure if there would be just a few local players or thousands of players from all around the world. We knew that the infrastructure needed to be repeatable, scalable and flexible in terms of resource usage. There were also monetary concerns as we were working on a small budget.

In the beginning the plan was to use a Mesophere installation with Marathon running on Amazon’s web services (AWS) infrastructure. For those not familiar, Mesosphere is a fancy front-end for the Apache Mesos cloud framework and Marathon is a Mesos orchestration platform that allows you to manage Docker containers in your cloud. As it got closer to the CTF the decision was made that this setup was overkill. We still used AWS but chose to use EC2 Container Service (ECS) to run Docker containers on Elastic Compute Cloud (EC2) virtual servers with Ansible as our automation tool.

Before diving too deep into the infrastructure setup let’s touch on each of the core technologies used for the CTF.

Docker was chosen as it allowed codification of the environment needed for each challenge. This gave the ability to quickly duplicate each challenge for use in a cloud or load balanced environment.

Our secondary reason for choosing Docker was for the security offered through process and resource isolation. It can be argued that Docker containers can and have been broken out of, but it certainly raises the bar for a malicious actor.

Each of the challenges that needed to run processes were setup with a Dockerfile and put into a private Github repository. The Github repository was then linked in Dockerhub as an automated build so that each time code was pushed to Github the container was built and made available on Dockerhub.

All of the Github repositories and Dockerhub repositories used for the CTF have now been made public.

For the scoreboard we used the CTFd software used for the CSAW CTF. A Dockerfile was built for CTFd setting it up to work with a data container and a database container. The Docker compose file included in the CTFd code was used as a reference.

AWS was an easy choice as it wasn’t reasonable from a cost perspective for us to run the CTF on our own hardware. As an added bonus the students were exposed to a few of the multitude of services that AWS offers. Because AWS only charges for the time the services are running and resources can be scaled up and down as needed it made a very cost effective option for our infrastructure needs. We only spent ~$40 total in AWS charges for February, the month the CTF was held.

AWS is well supported by Ansible so we were able to codify the infrastructure and deploy and scale resources as needed.

Anytime services are exposed to the Internet it raises the concern that the service may be compromised, destroyed or misused. This is even truer when you are setting up vulnerable services on purpose aka a CTF. We knew some sort of DevOps automation tool was needed so that the environment could be quickly recreated from scratch if something went wrong. We also wanted to be able to scale the environment with as little manual interaction as possible. If something went down in the middle of the night (and it did) none of us wanted to spend much time fixing, scaling or rebuilding the challenges.

Ansible was chosen above other DevOps tools such as Chef, Puppet or SaltStack because it is minimalist, powerful and agent-less making for very rapid development.

The Infrastructure

The infrastructure environment was divided between the scoreboard servers and the challenge servers. Both environments used almost identical setups with different Docker containers.

The AWS ECS service was used to deploy and maintain the Docker containers needed for each environment. ECS uses a specific EC2 AMI to deploy to virtual servers that run Docker containers, map internal ports to external ports and restart containers if they go down. The ECS setup can be easily managed through the AWS console or in our case with Ansible scripts.

The EC2 instances running the ECS services sat in a Virtual Private Cloud (VPC) and were accessed through Elastic Load Balancing (ELB). An Elastic IP was assigned to the ELB for the scoreboard and challenges.

Note that even though only one EC2 instance was used for the scoreboard a load balancer was used as this was the only way we saw to use the AWS Certificate Manger to add a SSL/TLS certificate to make the scoreboard available over over HTTPS.

As for building the infrastructure with Ansible, we started by using the AWS EC2 inventory script to pull a dynamic inventory directly through the AWS API. The EC2 instances were tagged (with Ansible) and these tags were used to differentiate between the scoreboard and challenges pieces of the infrastructure.

A challenge role and scoreboard role were written to build their respective pieces of the infrastructure and playbooks were written to use these roles. Now we had the ability to build the CTF infrastructure from scratch and update the infrastructure easily with new or updated challenges with very minimal effort. The playbooks were fairly straight forward as there are Ansible modules for all the AWS components used in the CTF infrastructure.

The load balancer port mappings and the VPC subnet were setup manually in AWS as we waited to long to get started on the automation scripts and had already started manually building testing the infrastructure.

All of the Ansible code for the CTF is now publicly available on Github.

Lessons learned

  1. Use more variables in Ansible playbooks to avoid tweaking things by hand.
  2. Automate everything, the manual steps were easy to forget as challenges were being added to the server.
  3. Start earlier. Ran into immediate problems with the SQL challenges chewing up all the resources on the t2.micro EC2 instances. This could have been avoided with even a minimal amount of testing with something like sqlmap.
  4. Set heartbeat for ELB to something representative of what is being load balanced. As luck would have it the heartbeat was set to monitor the sql_fun1 challenge and when it went down the ELB removed both challenge servers from the pool taking down all the challenges.
  5. Ansible saved the day when the SQL challenges became a problem and we were able to switch from t2.micro to t2.medium in a matter of minutes with no downtime on the challenges.

In the end it was a fun and challenging CTF made all worthwhile by comments such as this.