Machine Learning for Surf Forecasting

Ben Freeston
Surfline Labs
Published in
5 min readDec 6, 2018

Surf forecasting is really a pattern recognition game, as a surfer you watch the charts and the data, you go surf and you piece together the correlations and causations. Refining this process can take years, even if you dig deep into the ocean physics and walk around the traps and pitfalls. The data itself is complex and multilayered and confirmation bias means we tend to hang onto the experiences that confirm our theories and are somewhat blind to those that don’t. Our world is rife with the ‘bro science’ that results.

Decades of careful observation are the key to both a great surf forecaster, and a great machine learning surf forecast system.

At Surfline we’ve long solved this problem with a team of professional forecasters with collective experience of decades of careful observation and prediction. This team already relies on extensive computer modelling to make that call, but now we’ve also started putting our computer systems to work on the pattern recognition problem. This is what machine learning is, the process of allowing computers to teach themselves to find the nuanced relationships in huge, complex data sets. The beauty of this approach is we don’t need to understand every aspect of the physics, the subtleties of the bathymetry or the local ocean currents, we simply allow the system to find the threads that connect what we know to what we see happening in the surf.

Significant improvements from the ML system with reductions in error of up to almost 70% for some locations.

Building a surf forecasting AI

A single long range forecast for one location from our ocean models covers almost a million data points. We can accurately transform that offshore data into uniformly shallow water near the beach using our proprietary nearshore modelling system (which is primarily interested in how waves refract as they encounter uneven bathymetry near the shore) and compress that data into a smaller set of summary information that describes the waves just before they start to shoal and break. It’s here we couple our model data with our human observations, decades of careful reports for surf spots around the globe. The AI system, a fairly simple neural net in this first implementation, is tasked with learning how to most accurately predict the human surf observations from the computer model data. It’s doing exactly what an experienced forecaster would do, but instead of having to build that knowledge slowly over the years it’s able to absorb the hard work of every surf reporter who has ever worked for Surfline in a few hours. It’s also doing this in a mathematically most optimal way; there’s no room for bias or prejudice, it works with nothing in mind except minimising its own predictive error. It has no one to impress by under-calling and no chest to thump by exaggerating the best days. In fact we have full, explicit control over how we expect the system to measure it’s success. For some locations we might be more tolerant of errors on large, onshore days and prioritise accuracy for the sort of days surfers actually care about. While this may be implicit in the way surf forecasters think with this system we can, where it makes sense, make it explicit.

Winning the data game

Sean Collins doing the hard work of gathering surf forecast data in the 1970s

The winning ticket here isn’t the system we’ve built. It’s the data we’ve used. Surfline, uniquely, has a decades long data set of surf observations for hundreds of locations around the globe going back to Sean Collins meticulous personal journals. An AI is as good as the knowledge it absorbs and we’re confident that our over one million human observations means we are on the road to building the best surf prediction system possible. To put that into context, if you were to surf every morning and every afternoon and keep a detailed record of what you saw it’d take 1,300 years to gain as much experience as we’ve built into this AI!

That said human observations aren’t an absolute ground truth and it’s often the case that surfers disagree on exact wave heights. We work around that in two ways; Firstly we recognise that we’re building a system that emulates the best a human observer can do, this is reasonable for our use case because our product is designed to tell surfers what to expect. But we can also dive into the data by comparing multiple observations for the same time and place and using the differences we see to adjust how the network chooses to punish itself for mistakes. If we know that once conditions hit 10ft at a given beach it’s really hard to accurately judge wave heights we can either remove this data, or simply ask the network to care a little less about how it performs in these circumstances.

The latest machine learning forecast for Pipeline, Hawaii

This system is already on average 30% more accurate that our original LOLA surf height forecasts. We can test this by training the system with the bulk of our archives, but setting some aside, unseen, to test how it’d have performed in the real world. Because we sort these training and test data sets carefully we can be sure the that test conditions mirror a full range of likely ocean conditions.

Improvements cover all spots we have reporter data for and even those we don’t. UK improvements are typically in the 15–25% range.

Just the first step

One of the challenges for any machine learning system is making sure it will work with conditions it’s not seen. It’s easy to ‘predict’ information you’ve already been given but this ‘overfitting’ is a scourge of poorly designed machine learning systems. Particularly acute for forecasting big waves where the ’50 year storm’ might be something we need to predict for, despite never seeing before. To avoid this the first generation of our system is generic across all locations. That’s to say the Pipeline forecast has learnt from HB Pier, and visa versa. This ensures that the system is generalised and should have seen a little bit of everything, although equally it requires that all local differences are well captured by our nearshore modelling. This means we can make useful predictions for locations for which there is no record of observation, really important in this first iteration given we cover thousands of locations worldwide. We’ve really replaced just the last step of the physics, as waves start to break, because that’s the hardest part for our existing models. But it isn’t the only place we can see benefits.

A future version, already in progress, gently introduces location specific forecasting and uses models that understand how conditions evolve over time. There are many more ways we can build on this system, but this is the start of a journey that finally combines the full potential of our wave modelling technology with our huge archive of observational records.

Machine learning surf forecasts will be live on surfline.com soon.

Any questions about our forecast technology please fire away in the comments below, or find me on twitter.

--

--

Ben Freeston
Surfline Labs

VP of data science at Surfline + Magicseaweed. Checking charts and chasing waves.