Accelerating the debugging process by using a hybrid environment

Boris Churzin
Fundbox Engineering
6 min readJun 28, 2021

As developers, working in continuous integration environments, we constantly strive to accelerate the development and testing process. As part of the widespread adoption of the “Shift Left’’ approach, we have realized that testing from day one helps us find issues early in the software development lifecycle (SDLC). Fixing these issues early takes significantly less time, and resources and thereby reduces the development costs. But when systems grow bigger and more complex, involving countless microservices and databases, maintaining speed throughout the development and debugging processes may become challenging. In this blog, I will share the challenges that we faced and how we solved them through an innovative approach developed during our last hackathon at Fundbox.

A little background

At Fundbox, we offer an AI-powered financial platform for small businesses that provides fast and intuitive access to business credit. Since our founding in 2013, nearly 300,000 small businesses have put their trust in us. As a financial system, our production environment is relatively large and complex, involving a multitude of interdependent microservices, databases, load balancers, and countless other elements.

As we develop additional features, they are debugged and tested before pushing the new code to the main branch and deploying it. To understand how each feature interacts with the various elements, we need to somehow simulate the various microservices' production environments.

There are two options for simulating this production environment, each with its advantages and drawbacks.

Option 1 — running a comprehensive test machine on a remote cloud-based environment

This option involves launching a testing machine in a remote cloud environment, replicating all of our production ecosystem, with all the services, empty databases, etc. Unfortunately, this option has several limitations:

  • No debugging — It is impossible to debug in this environment because the docker images are bootstrapped without development or debugging capabilities, and they do not have the right environment parameters. We might go in this direction in the future, but currently, it’s too complicated to make it work.
  • Long feedback loop — even if you want to test something small, and the test machine is up - you still need to upload the code, replace the Docker image, restart the machine, check the migration, etc. Typically this entire process takes around 5–10 minutes. For a fast-paced agile CI/CD organization, this feedback loop period is simply too long.

Option 2 — running the test machine locally on a local virtual development environment

In this option, the developer runs a vagrant-managed virtual machine locally. This VM is a development environment that mimics production environments by providing a similar operating system, packages, users, and configurations, all while giving users the flexibility to use their favorite IDE. Although this solution solves the latency problem, unfortunately, this option also has its issues:

  • High utilization of local machine resources — while running on the developer’s laptop, the system utilizes much of the CPU power, drains the battery, etc.
  • Updating the image — when the image needs to be updated, the process of stopping the machine and updating the image also may take a considerable amount of time.
  • Different tooling — while the production environment utilizes many cloud solutions, our VM is managed by Puppet, which takes considerable time and effort to maintain.
  • Black box — the solution is a closed black box with many moving parts managed by the infrastructure team. So if something goes wrong, it is typically very complicated for a developer to find the root cause of the issue and fix it.

The solution — a hybrid environment

We tried to think of a solution that would allow getting the best of both worlds — on the one hand, the ability to work locally to enjoy fast response times, while on the other hand keeping the local environment lightweight (i.e., without the need to simulate our entire production environment locally). We came up with a container-based hybrid solution.

The solution includes a “generic” test machine that runs on the cloud with all the microservices and is ready to be used at any time. The developer runs a script that disables the desired service on the cloud-based environment and then runs it on a local Docker container, running on the local machine. When the local service needs to interact with another remote service or with a database, which is located in the cloud-based environment, the service communicates through a local port on the local machine, and this local port connects to the remote cloud-based test machine through a reverse SSH tunnel or a forward tunnel.

On the remote side, the only wait time involves the initial setup of the remote test machine, which is done one time, and which we plan to automate in the future by managing a fleet of test machines. On the local side, instead of running the resource-intensive VM environment, the developer needs to run a single docker and can use the IDE normally. When debugging, IDE connects to the process’s debug port.

Of course, this solution will not work for everybody. For it to work, you need to run the entire environment (with all the microservices and databases) on a testing machine. This is not necessarily the norm among development/production environments. Also, it was helpful to have worked with Docker containers in production. This allowed us to seamlessly move services from the remote environment to the local environment.

Developing an MVP during the Fundbox hackathon

The Fundbox hackathon is a 2-day event in which employees can present an idea, and then if the idea is approved, an MVP of this idea is developed by a small team. As part of this effort, a small team of developers and infrastructure engineers developed the scripts that disable the remote service and enable it locally. Additional scripts were also responsible for creating an SSH tunnel dynamically for all the ports in use.

One of the main challenges that we faced was since we have load balancers before the remote machine. These load balancers listen on a known port (the local service knows about) while rerouting to a randomly generated port assigned to the container by the orchestrator. This meant that we had to find a way to pass through the load balancers to get to the remote services. As the load balancers are internal, the problem is that they don’t accept connections from outside their local network. To work around this issue, we have created another SSH tunnel, from the local machine to the remote machine and then to the load balancer (Docker→Mac→Tunnel→Load balancer→Remote machine).

We have put together a working solution during the hackathon, and we are considering replacing our current development workflow. Future developments of this solution will include automatic installation and setup of the local ecosystem (docker, credentials, etc.), support for multiple local services that can communicate without routing via the remote machine, and more.

--

--