Surf Forecast Accuracy

Ben Freeston
Surfline Labs
Published in
4 min readFeb 26, 2019

Here’s your two step guide to setting up a new surf forecast website. First go grab some generic free ocean model data and build a website, secondly tell everyone you now have the ‘most accurate’ surf forecast around.

It’s a pattern we’ve seen repeated over and over again, but what we’ve not seen is anyone explain how they’re actually evaluating that accuracy. What does it mean?

We carefully test every time we make a change, but we also use real-time dashboards to keep an eye on our operational systems. Here you see the reports for HB pier compared to our LOLA and expert forecast team predictions 48hrs in advance. If your favourite surf forecast company isn’t doing this they’re not doing their job right!

Before we get into the nuance here, let’s start with the real basics. To judge a forecast accurately you need two things; a forecast and something to measure it against. The first red flag we see when we hear these accuracy claims is there doesn’t appear to be a measuring stick. Accuracy isn’t a ‘feeling’ — it’s the result of measuring the gap between what you thought would happen and what did. It can be relatively easy to make that prediction and sometimes actually harder to check if it’s right.

Surfline has solved this by employing trained surfer observers to take detailed daily reports from a large number of beaches. It’s not as perfect a ground truth as you could want, there’s always some subjectivity and bias in observed reports, but you should appreciate that when we use the word accuracy it means the measurable gap between what we said and what we saw. After all, if you can’t measure it you can’t improve it, all you can do is use it as a marketing sound bite.

Getting technical

So how do we do? This is where the nuance comes in and there’s no one answer. A typical measurement of forecast accuracy is RMSE or Root Mean Squared Error. Here, we penalise infrequent larger errors more than frequent smaller ones. It feels sensible at a glance — we should care most about being most wrong — but it has some issues for our specific data. We know that it’s harder for a human observer to judge the size of large surf, so we know that there’s likely to be a gap between our forecast and our observation that isn’t all forecast error, some will be observation error when we get to bigger waves.

Our starting point is often to use MAE or Mean Absolute Error for understanding surf height forecast error. Using this, a 1ft error on five reports is the same as a 5ft error on one report. It’s not perfect, if those errors were all on 2ft days, we’ll probably call the one big error more of a problem, but it avoids some of the pitfalls of RMSE.

Mean average error for HB Pier. While our forecasts get less accurate over time we’re still averaging less than 1ft of error out at six days.

Our next problem is benchmarking in a changeable wave climate. Imagine an El Nino winter with many very large swells. However you measure error, you could be doing as good a job as usual and still see a larger miss overall, simply because the waves were bigger. For this reason we baseline our accuracy metrics by comparing them to a naive forecast (In our case a simple shoaling algorithm that bases surf heights on swell height and period rather than the one-step forecast often used). Because the naive forecast is proportional to the overall wave height it removes some of the noise in the seasonal climate — it also helps us understand if we’re doing better than this simpler kind of forecast, which is the first target we need to hit. It makes our MAE relative to the overall wave heights without completely squashing small, repetitive errors that can be problematic in some places and we’d want to fix them. This MASE or Mean Absolute Scaled Error is a great way for us to evaluate our performance improvements over time, regardless of the oceans’ mood.

Hopefully you’re starting to understand that we care a lot about being right, measuring how well we’re doing and then doing it better. If we say a new model or forecast is more accurate than an old one we’ve actually measured that to be the case and measured it in a way that we believe makes it meaningful for surfers. That’s not to say our forecasts are perfect, just that they’re getting better all the time and we know by exactly how much!

Comparing LOLA forecasts (pink) to our human teams adjusted forecasts (blue) gives us confidence that our team adds significant value to our forecasts. Our forecast team are more than halving the error of our sophisticated nearshore modelling approach.

Finally if you do set up that new website, or devised a new surf forecasting methodology, and you want to actually benchmark it properly — we’re happy to help with ground truth data that allows for some proper analysis.

Check this paper for more information on MASE.

Any questions about how we’re measuring our surf forecast accuracy please fire away in the comments below, or find me on twitter.

--

--

Ben Freeston
Surfline Labs

VP of data science at Surfline + Magicseaweed. Checking charts and chasing waves.