In the previous episode, I told the story of how we collected and created a giant dataset to prove that our problem was solvable, and that construction schedules were a learnable input. In this episode, I’ll tell you more about how we built our first models that led us to show that it works and raise capital to build the company.
Can we do some ML now please?
We gave ourselves the mammoth task of creating a pipeline we could use to run experiments, and to create a finalised model that would go into the product, whilst facing the same scale problems that affected all our other endeavours. We were in fact not building one single model, but a structure made up of multiple modules, such as word and paragraph embeddings, and multiple classifiers on various targets.
We had to train one component at a time, and find an objective for each one that would correlate to the predictive ability of its successors in the pipeline. We struggled for a while on what would be the best architectures and the best strategy, and finally converged on a technique that I call “proxy evaluation”. For every step that is not being currently trained, we optimise hyper-parameters by attaching the rest of the trained architecture and seeing if that improves the results of the final output. It’s indirect, it’s inefficient, and it’s just plain wrong if you are looking for a truly optimal solution. However, at this particular point in our history we were looking for something that would just do the job, and this did the job just fine. We’ve since found ways of improving this simple greedy strategy, and we continue to constantly work on improving our pipeline.
Some of the scaling problems we had to face on training individual models were (in no particular order):
- The training set did not fit in main memory, so we had to stream from disk
- H5 does not deal well with shuffling instances, so we had to find a workaround. Not shuffling our dataset was not an option, we tried it and the models stopped learning almost completely. We settled initially on pre-shuffling into shards and then shuffling the shards themselves, which is similar to how shards work in TFRecords.
- Making a change to the dataset took over 2 days and was very expensive.
- Optimising and training a model took another 3 days of GPU time, which meant that experiments had to be carefully designed.
We did fortunately manage to overcome these obstacles, and create a model that predicts the outcomes of construction activities very well. Sometimes I am still surprised by how well our latest model performs, and I can only thank our amazing engineering team for this.
There are some other scale-related issues that we had to resolve later on, which we consciously decided to file under “acceptable tech debt for now”:
- Switching to data streaming to avoid using hundreds of GB of memory
- Using serverless training (we eventually did this on GCP AI Platform)
- Automating the calculation of metrics over different slices of the dataset (e.g. getting per-client performance)
- Automatically detecting data leakage between the training and test set — we had a special case where it was possible to get repeated instances in both.
Building a product
Having a model that can predict the outcomes of construction activities is, by itself, particularly useless. No matter how well it predicts those outcomes, there are plenty of reasons why nobody will be interested in that output alone. Top of this list would be the fact that construction companies, and their clients, are not interested in buying a trained model saved as an HDF5 file.
What follows naturally is that we should build something that is useful to a project director or planner or portfolio manager. When asking our prospective clients what information they would be interested in seeing and then figuring out how to generate that information, the answer we heard the most was around risk. More specifically, they wanted to know what was the risk of the entire project overrunning and how did every activity impact that risk, or how their project portfolio might become disproportionately affected by one of the many projects happening concurrently.
To produce accurate risk measures, we had to build an entire simulation engine based on the inputs of the ML model. This was not an easy task. If one thinks of a schedule as a Directed Acyclical Graph, one can see how simulation can be done with Monte Carlo — style simulation. This is also, quite conveniently, what the industry already does — but using subjective inputs and some incorrect assumptions about tail risk. I wrote more about that here.
We built a Monte Carlo engine that worked in a similar way, and we added in our own risk measures in the meanwhile. The one that we like the most we call “Risk Share”. It’s the independent potential that each individual activity has to change the end date of the entire project, expressed as a percentage of all the total possible deviations. So, for example, an activity that will consistently move the end date by a small amount, will have a lower Risk Share than one that will consistently move it by a lot, and will also have a similar Risk Share to one that infrequently moves the end date, but when it does it moves it by a lot.
Our clients use this information to find the most relevant mitigation strategies or plan changes that reduce the risk of delaying the project. This means that less risk margin is required, and it is less likely that the project will be over budget because of any potential delays.
We now run regular workshops and learning sessions to better understand what our clients need at every step of using the product. It has been incredible to see how, by engaging the community in the right way, we have been able to focus on building a valuable product from the very beginning.
When I set out to write this series of blog entries, I didn’t really have a clear objective. I wanted to tell a story about how many difficult problems we have had to overcome, both as engineers and as a company. There are many more obstacles, and many more that will come in the future, and I fully trust our team to take these head on and always do our best to solve them as best we can.
If I have inspired you to work on something difficult, start your own company, and rise up to whatever your challenges may be, I have done more than I expected to with this post.
Get in touch
If you think that doing machine learning at scale is exciting, we are hiring! Look at nplan.io/careers for current opportunities.