Feedback Loops and Solving the Right Problem

Early and regular feedback keeps us on track when solving difficult problems

Michael Fitzmaurice
Arup’s City Modelling Lab
7 min readJun 1, 2020

--

Image by OpenClipart-Vectors from Pixabay

The Question

There is a question we regularly ask ourselves in the City Modelling Lab:

What is the problem we are trying to solve?”

It is a device to keep us focused on the right goals. We need it because it’s easy to disappear down a series of rabbit holes and lose sight of what exactly we set out to do in the first place.

Plus, as we work our way through a problem, we may realise that the problem we thought we wanted to solve isn’t really the right thing. Our understanding grows, and our target moves accordingly.

You are solving the wrong problem

“The Question” helps keep us honest, but one particular pattern repeatedly arises in answering it: feedback loops.

Aza Raskin’s excellent exploration of what it means to solve the right problem is very dear to our hearts. Using the example of Paul MacCready’s attempt to build an entirely human-powered aircraft to fly across the English Channel, Aza illustrates how solving a complex problem requires measuring progress and making adjustments at regular intervals a feedback loop.

A generic feedback loop — image by author

As Aza writes (emphasis mine):

MacCready’s insight was that everyone working on solving human-powered flight would spend upwards of a year building an airplane on conjecture and theory without the grounding of empirical tests.

Triumphantly, they’d complete their plane and wheel it out for a test flight. Minutes later, a year’s worth of work would smash into the ground. With that single new data point, the team would work for another year to rebuild, retest, relearn.

Paul realized that the problem to be solved was not, in fact, human powered flight. That was a red herring. The problem was the process itself, and along with it the blind pursuit of a goal without a deeper understanding of how to tackle deeply difficult challenges.

He came up with a new problem that he set out to solve: how can you build a plane that could be rebuilt in hours not months?

One of the biggest reasons MacCready succeeded where so many others had failed was that he shortened his feedback cycle enormously.

Although we’re not in the business of building human-powered aircraft, we recognise the importance of feedback loops in staying on the right path and making faster progress, so we work hard to create, tighten and amplify them.

What does this have to do with developing software?

I’m long enough in the tooth to remember when Extreme Programming entered the scene. Before software engineers talked about Kanban, Scrum or Lean Startups, we talked about Extreme Programming.

The idea at the heart of XP, and all agile approaches that have since followed, is to adapt your processes so as to receive feedback early and often. This feedback allows you to course-correct in small, regular steps, giving you a better chance of reaching a distant goal (or “solving a difficult problem”, as Aza would have it).

For example, if testing our software to prove its functional correctness provides valuable feedback — what if we tested it all the time, even as we write it? And what if we automated that testing? That seems like a faster, more powerful feedback loop.

Here in the City Modelling Lab, we strive for high levels of unit test coverage. For Python code, we use pytest and set code coverage thresholds that scream at us if we drop below them.

All is well

Writing automated unit tests has never been easier — the toolchain is now well-established and polished for practically every programming language. But what if similar tools don’t exist for the exact feedback loop we have in mind? Well, sometimes, we build those tools.

Taming a puma

We have built an application called Puma that takes public transit schedule and Open Street Map data as input, combining them into a graph that maps transit services onto the physical network. We use the resultant network as input to MATSim, an open-source Agent-Based Modelling tool.

Generating large networks is memory-hungry and computationally expensive, so we run such jobs in the cloud on AWS. However, even on decent-sized boxes, larger jobs would run for a few hours and then keel over, having exhausted the memory on the box. We jumped back into the code, looking for opportunities to be more memory-efficient.

“UK Motorways and A-roads” by tommyh, licensed under CC BY-SA 2.0

We had an existing feedback loop, but it was long and baggy: modify the code, theorising that we’ve made it more memory-efficient, deploy to AWS, re-run our job, then wait several hours (all the while burning money on AWS costs) to see if we have solved the problem.

This felt like the software engineering equivalent of the years of failed efforts to conquer human-powered flight described in Aza’s blog post. Just as Paul MacCready needed to create a plane that could be built and rebuilt very quickly, we also needed to shorten our feedback cycle to make faster progress.

To this end, we created a shell script that uses mprof to profile Puma running with small test inputs and compares memory usage to a supplied benchmark. This gives us a quick way to see the effects of our code changes — is the dial moving in the right direction?

Our feedback loop went from hours to seconds.

Fast feedback on Puma’s memory footprint

We added this tool to Puma’s Continuous Integration build, right alongside our unit tests, so the build fails if a commit increases the memory footprint. A fast feedback mechanism nested inside another fast feedback mechanism.

Puma’s Continuous Integration Build

Whatsim?

In a similar quest for early and regular feedback, we developed a tool called BitSim using Python, AWS Batch and AWS Step Functions.

MATSim simulations are configured with a number of iterations — the number of times you want to run through the simulated day. Agents in the simulation can modify their transport plans with each iteration, based on learning from previous iterations, in an attempt to optimise for things like time and money spent travelling.

Some simulations “go bad” early, meaning utility scores for agent plans are pathologically bad from an early point and degrade further with each iteration. It is apparent a long time before the final iteration that these models will never converge. We also see cases where a model converges early, and subsequent iterations add nothing further.

Before BitSim, we would discover these scoring trends only after the simulation had run to completion. As with Puma, we run these jobs in AWS, which can take many hours. We realised we could save ourselves time and money by automatically detecting simulations that had either already converged or were never going to and taking early action to halt the job.

Enter BitSim. BitSim terminates simulations early if they match conditions that indicate pathology or early convergence.

Early stopping in BitSim

BitSim++

Although BitSim’s automated early stopping works well for us, we aim to go further. What if we could automatically pause the simulation every few iterations, inspect the output, stop if need be, or… modify the configuration for the next set of iterations in an attempt to reach earlier convergence?

In the team, we refer to this more complex kind of intervention as “full BitSim”, and it’s a great example of tightening a feedback loop to make faster progress towards your goal.

How we approach a problem changes when we have fast and regular feedback to guide us. New possibilities appear. For instance, with “full BitSim”, we could confidently and (relatively) cheaply launch many simulations in parallel to explore the hyperparameter space. This would not be feasible without BitSim’s fast feedback loop.

The TL;DR

When solving a complex problem, we often need to reframe the problem before we can make progress. In many cases, the first problem we must solve is creating a fast feedback loop to guide us towards our overall goal.

As Aza concludes:

When you are solving a difficult problem, re-ask the problem so that your solution helps you learn faster. Find a faster way to fail, recover, and try again.

Resources

--

--

Michael Fitzmaurice
Arup’s City Modelling Lab

I'm a software engineer in Arup's City Modelling Lab, where we use agent-based models to help improve transport and cities.