Explorer Data Science Stream Review

Jess Robertson
Unearthed Community
8 min readMay 3, 2019

Part of the best part of crowdsourcing is uncovering completely new approaches to problems. But to do this you need a diversity of views in your community — if all you have are geologists, all you’ll get are geology solutions!

For the main explorer challenge OZ Minerals released over 4 Tb of private exploration data, but understanding this avalanche of information is a challenge without prior experience. When we reached out to our community of thousands of innovators globally we realised very quickly that we needed to provide an easier way for non-geoscience innovators to get on board.

To help teams get started with exploration data, we created a ‘data science stream’ in parallel with the main Explorer challenge, and asked innovators to use publicly available exploration data to predict mine locations from across Australia. For more details on the challenge, scoring and dataset, see the full challenge description.

As a base dataset we provided data from 25 x 25 km ‘stamp’ areas selected from across the continent. Each stamp dataset contained known deposit locations and aligned geophysical and remote sensing data from publicly available geophysical exploration data across the Australian continent (over 47,000 raster and vector layers total). These were taken from national datasets managed by Geoscience Australia and the national research data collection hosted at NCI.

We also provided a publicly available Python package for teams to generate their own coverages, and support for innovators through our community platforms, where non-geoscientists could get access to geoscience or data science expertise to help them compete.

Left — the locations of all the 25 x 25 km stamps across Australia. Stamps were randomly rotated. Right — the locations of all the deposits used across Australia, sourced from Geoscience Australia (http://www.geoscience.gov.au)
An example of some of the datasets available for each stamp: left — VRTP Magnetic anomaly, centre — radiometric map from potassium sources, right, simplified geology showing basement lithology (purple, pink and green), regolith (in yellow), and fault lines. Geospatial data was regridded onto a local oblique Mercator projection, so all datasets have the same grid resolution and are geospatially referenced to the same coordinate reference system (essentially metres from the centre of the stamp).

Although we provided a benchmarking/leaderboard facility for teams to test their predictions against other teams’ approaches, our aim wasn’t to create a pure machine learning competition, but rather to provide feedback on the teams total data science approach. We judged submissions not just on technical accuracy, but also on how teams broke down the problem, how they handled data quality and uncertainty and how well they communicated outcomes to non-data science audiences. All of these challenges are familiar to anyone who has had to convince an exploration manager to drill a hole into a given target.

We don’t want to give away too much information about the technical details that teams used since that might give away their advantage in the main Explorer Challenge. But we thought it worthwhile to consider the approaches in aggregate to see if there were any interesting commonalities in approach or direction.

This wasn’t a straightforward challenge to complete. There were a few known problems that we were interested in seeing teams tackle. For example, there was a massive imbalance in numbers of each deposit class. This was due to the different ways that deposits get recorded by Geoscience Australia (our source for deposit locations). For example, many gold lodes are recorded as separate deposits, even when these can be closely related. In contrast larger deposits such as Broken Hill or Prominent Hill are recorded as a single deposit. Teams needed to handle this issue both in defining a good metric for their models, and also when stratifying samples for cross-validation. Most teams went with the resource groupings provided by Unearthed but there were some interesting unsupervised approaches to label grouping as well.

Unsurprisingly, teams that did well on the benchmark also produced submissions which showed they had spent a lot of time time exploring the datasets to deal with the garbage-in, garbage-out problem. Although we provided a relatively ‘clean’ set of data teams still had to content with missing values and skewed distributions, not to mention the problem of extracting useful features.

There were a range of approaches taken to data/feature selection, from relatively manual criteria based on geological expertise, through to fully automated ML feature selection. In particular, there were a number of interesting ways to encode the vector data before training a model, including some interesting dimensionality reduction techniques. We didn’t see as much feature engineering on the raster layers themselves, possibly because a lot of gradient features fall naturally out of the neural net approach, but also possibly because of time limitations (some teams reported using hundreds of hours of CPU time just to preprocess the data).

For actual ML models, teams generally took one of two approaches — either deep nets using the stamp layers as channels, or a pixel- or superpixel-based feature extraction/boosted tree approach (CatBoost was particularly prominent — thanks Kagglers). The best accuracies in this competition came from the neural net approach although the judging panel thought that being able to mix-and-match some of the more interesting feature engineering approaches used by the point-based models would improve this further.

It’s interesting to look at the scoring of submissions to the benchmark in total and see what this says. We provided a metric that had two components: (a) a commodity label metric which measures how well a model predicts the likely deposit type(s) in a given stamp, and (b) a location metric, which measured how close the predicted deposit locations were to the actual deposit locations. Both components and the combined metric had a range of [0, 1] with higher values being better. These metrics were designed to penalize both false positives (saying there is a deposit when there isn’t one) and false negatives (saying there’s no deposit when there is one), with false negatives being penalized much more heavily.

We expected that part (a) would be relatively easy to achieve since there are lots of geological signals (in lithology etc) that tell you whether a particular region is prospective for a particular commodity. Part (b) was much harder since the location information is very sparse (same number of deposits but now spread over 250000 pixels per stamp). Roughly these parts correspond to the way we approach exploration in industry — it’s pretty easy to get yourself into a prospective area, but really challenging to decide exactly where to put the drill rig.

For the combined benchmark score we weighted 75% on commodity label and 25% on location, however in the plots below we show each of these metrics on separate axes. We also provided three ‘benchmark’ situations to see how far random guessing gets you:

  1. Optimist: The optimist says there is a single gold deposit at the centre of every stamp in the dataset. The overall score for the optimist was 0.34, with a distance score of 0.56 but a commodity label score of 0.27. A lot of the submissions were worse than random guessing based on these metrics, which gives you a clue as to the difficulty of the task.
  2. Pessimist: The pessimist says there are no deposits in any of the stamps. All scores are 0, since it fails to predict any deposits at all and hence gets penalised on all stamps which actually do have deposits.
  3. Super-optimist: The super-optimist places four deposits (Au, Pb-Zn, PGE and Fe) about the centre of the stamps. You can see that the super optimist does worse than the optimist on both labelling and distance metrics, since it is being penalized for more false positives than the optimist in the commodity label, but also because the distances to actual deposits are likely to be higher given there are four deposits rather than one.
Overall submission scores for the challenge: left — benchmarks (for random guesses) shown with red boxes, right — all submissions colored by submission time from early submissions (black) to late submissions (yellow). Best scores are up and to the right.

The best scoring teams’ submissions fall in the top right-hand corner of this plot. The best overall marks were just over 0.5 but a lot of submissions fell in the 0.4 to 0.5 range. From the overall scatter you can see that while most teams were able to effectively learn a commodity classification, a lot of teams struggled to get the location information much above random guesses (the optimist), and a few submissions noted that their models basically learnt to place deposits at the centre of the stamp. The teams that did do better on location used some interesting priors to force their models away from the centre of the stamp, and it will be interesting to see whether this improves in the main competition.

So if you’re a data scientist looking at this and thinking about the approach for the main Explorer Challenge, what should be your takeaway?

  • Sampling bias and covariate shift are not your friends. There is a strong bias towards areas with low cover (cover == regolith and other stuff that hides deposits) in Australia — simply because it’s easier for a geologist to fall over something if it’s sticking out at the surface. In contrast, the area under Mt Woods has thicker cover sequences, requiring you to use geophysical features rather than surface information like Aster. Most teams found the geophysics more useful than the remote sensing data at a national scale anyway but this is something to keep in mind.
  • Higher resolution data will make a big difference. The datasets used in this competition cover the Mt Woods tenement but you’re likely to get better results using the detailed regional geophysics (requires competition signup) in the data release for the main challenge. As most of the tenement is under cover, the geophysical potential field datasets (magnetics, gravity, IP) are likely to be more use than some of the remote sensing datasets used in the data science stream.
  • Find some geological expertise. Teams which scored well in this competition had found a geologist to bounce ideas off. If you’re looking for some extra help, come and ask questions on our community slack — we’ll try and point you in the right direction.

--

--