The Deep Tech part of building a Deep Tech company (part 2)

In the previous episode, I told the story of how we collected and created a giant dataset to prove that our problem was solvable, and that construction schedules were a learnable input. In this episode, I’ll tell you more about how we built our first models that led us to show that it works and raise capital to build the company.

Can we do some ML now please?

We had to train one component at a time, and find an objective for each one that would correlate to the predictive ability of its successors in the pipeline. We struggled for a while on what would be the best architectures and the best strategy, and finally converged on a technique that I call “proxy evaluation”. For every step that is not being currently trained, we optimise hyper-parameters by attaching the rest of the trained architecture and seeing if that improves the results of the final output. It’s indirect, it’s inefficient, and it’s just plain wrong if you are looking for a truly optimal solution. However, at this particular point in our history we were looking for something that would just do the job, and this did the job just fine. We’ve since found ways of improving this simple greedy strategy, and we continue to constantly work on improving our pipeline.

Some of the scaling problems we had to face on training individual models were (in no particular order):

  • The training set did not fit in main memory, so we had to stream from disk
  • H5 does not deal well with shuffling instances, so we had to find a workaround. Not shuffling our dataset was not an option, we tried it and the models stopped learning almost completely. We settled initially on pre-shuffling into shards and then shuffling the shards themselves, which is similar to how shards work in TFRecords.
  • Making a change to the dataset took over 2 days and was very expensive.
  • Optimising and training a model took another 3 days of GPU time, which meant that experiments had to be carefully designed.

We did fortunately manage to overcome these obstacles, and create a model that predicts the outcomes of construction activities very well. Sometimes I am still surprised by how well our latest model performs, and I can only thank our amazing engineering team for this.

There are some other scale-related issues that we had to resolve later on, which we consciously decided to file under “acceptable tech debt for now”:

  • Switching to data streaming to avoid using hundreds of GB of memory
  • Using serverless training (we eventually did this on GCP AI Platform)
  • Automating the calculation of metrics over different slices of the dataset (e.g. getting per-client performance)
  • Automatically detecting data leakage between the training and test set — we had a special case where it was possible to get repeated instances in both.

Building a product

What follows naturally is that we should build something that is useful to a project director or planner or portfolio manager. When asking our prospective clients what information they would be interested in seeing and then figuring out how to generate that information, the answer we heard the most was around risk. More specifically, they wanted to know what was the risk of the entire project overrunning and how did every activity impact that risk, or how their project portfolio might become disproportionately affected by one of the many projects happening concurrently.

Anonymised typical output from our project overview

To produce accurate risk measures, we had to build an entire simulation engine based on the inputs of the ML model. This was not an easy task. If one thinks of a schedule as a Directed Acyclical Graph, one can see how simulation can be done with Monte Carlo — style simulation. This is also, quite conveniently, what the industry already does — but using subjective inputs and some incorrect assumptions about tail risk. I wrote more about that here.

We built a Monte Carlo engine that worked in a similar way, and we added in our own risk measures in the meanwhile. The one that we like the most we call “Risk Share”. It’s the independent potential that each individual activity has to change the end date of the entire project, expressed as a percentage of all the total possible deviations. So, for example, an activity that will consistently move the end date by a small amount, will have a lower Risk Share than one that will consistently move it by a lot, and will also have a similar Risk Share to one that infrequently moves the end date, but when it does it moves it by a lot.

Anonymised typical output from our milestone predictions

Our clients use this information to find the most relevant mitigation strategies or plan changes that reduce the risk of delaying the project. This means that less risk margin is required, and it is less likely that the project will be over budget because of any potential delays.

We now run regular workshops and learning sessions to better understand what our clients need at every step of using the product. It has been incredible to see how, by engaging the community in the right way, we have been able to focus on building a valuable product from the very beginning.

Conclusion

If I have inspired you to work on something difficult, start your own company, and rise up to whatever your challenges may be, I have done more than I expected to with this post.

Get in touch

You can also get in touch at twitter.com/nplanHQ and twitter.com/nitbix

Co-founder & CTO @ nPlan