Monte Carlo or Bust?

UK-based readers of a certain age may remember a weather forecast that gained notoriety for the reporter, who reassured viewers that rumours of an impending hurricane were unfounded.

The report went out on the evening of October 15th, 1987 and the next day a storm hit the South East of England resulting in the loss of 18 lives and £2 billion worth of damage. (video clip can be viewed here). In 2012, researchers took the exact same data that was used to make the prediction for that evening’s news in 1987, and used it to generate a prediction with today’s technology; the resulting forecast showed a 40% chance of a storm, a likelihood that was completely missed by the 1987 forecast.

Why was there such a difference between the forecast of 1987 and 2012, given the exact same data? First of all, consider how weather forecasting is done; basically, create and apply a model that simulates (or predicts) what the weather is going to be tomorrow. This involves gathering input data, applying it to the model, and getting an output which is your answer, or forecast. In the case of weather forecasting, observed data such as wind speed and direction, pressure, temperature, humidity etc is the input to the model, which then uses that data to simulate what the weather is going to be like. The fundamentals of weather forecasting have not changed since 1987; in both cases modelling is used to determine the forecast. The key difference is that with today’s technology, the forecast model actually creates many simulations (or possible tomorrows) and uses these to estimate the probability of events such as storms, hurricanes, rain, snow, etc. The re-forecast in 2012 ran 100 different simulations of the weather conditions in south-east England for October 16th 1987, and showed a storm in about 40 of them. Back in 1987, the forecasting technology would have been based on one or two simulations, but today modelling techniques like creating multiple simulations can be used to determine the likelihood (or probability) of key events and warn interested parties.

This technique of running many simulations is known as the Monte Carlo method, a category of modelling techniques that has and is gaining currency in many different areas, particularly with increases in computational capability. Today’s technology landscape is significantly advanced from that of 1987, and the consequence (and benefit) is illustrated with the weather forecasting anecdote from previous paragraphs. Monte Carlo methods have been found to be particularly useful for modelling phenomena with a high degree of uncertainty, and weather forecasting is notoriously prone to variability or uncertainty in it’s output, as we are probably all too familiar with. But it is an example of an activity that has advanced significantly in quality.

Another area where Monte Carlo methods have been applied to is the calculation of risk, and at Creme Global we have been applying Monte Carlo to determine risk in different scenarios such as consumer’s exposure to pesticides in foods, or chemicals in cosmetics. It has been particularly useful in “what if” scenarios; what if we increase the amount of this particular preservative in our food product, or what if we don’t know a key statistic about our target population and must use an estimate; with traditional deterministic modelling, each uncertain variable is assigned a “best guess”, often to reflect a worst case scenario. In contrast, Monte Carlo simulations apply a probability distribution to the unknown variables to generate hundreds or thousands of possible outcomes. The results are aggregated to show the probabilities of different outcomes.

It’s also worth contrasting Monte Carlo with traditional data sampling; it’s been well known for many years that taking a sample of about a thousand from a population of any size can be representational of the overall population to within a 3% error margin. However, the sample must truly be random for the results to be reliable, and this has proved very challenging in practice (the exit polls from the 2008 US presidential elections being a case in point). With Monte Carlo, the algorithm samples from a distribution in a way that is statistically consistent with the distribution, i.e. there will be proportionally fewer samples in the very low probability regions. Repeating this many times (i.e. running many simulations) gives us an orders-of magnitude increase in the quantity of data to be analyzed, ultimately enabling the kind of picture to emerge that we saw from the 2012 weather re-forecast.

Applying a methodology like Monte Carlo is computationally intensive; it is an example of using data to generate more data, and extracting knowledge from that data — boiling it down to the key information that enables the user to make an informed decision.

To cope with the computational demands, Creme Global runs in the cloud — in fact it has been running in the cloud for many years, initially in the form of a private cloud, whereby the physical servers that ran the computations were hosted and managed by Creme Global . In 2011, Creme Global made the decision to switch to using Amazon’s cloud, and this has opened up a new dimension in scaling the service that it provides; Creme Global runs in an elastic computing environment, which means the environment automatically responds to increases in demand by provisioning more computational resources, without the need for manual intervention or service downtime. At any given time, the computational resources running the Creme Global service are directly proportional to the usage of the service at that time, and are adjusted automatically as the usage (or workload) varies over the day.

In conclusion, technology like Monte Carlo is just one of the factors that underpins Creme Global’s core value proposition, to generate the best possible information from the available data, and ultimately to enable better decisions to be made. By running our service on Amazon’s cloud platform, we are delivering this value in a dynamic and responsive fashion, and ensuring our customers are focused on making the best possible decisions.

_______

The Predict Conference is an annual data science conference held in Dublin Ireland. Visit us at http://www.predictconference.com