Tin-Tungsten Prospecting with Machine Learning in Northeast Tasmania, Australia

10 min readJul 27, 2020

This article presents the results of a machine learning driven tin-tungsten (Sn-W) prospectivity analysis in a part of the world that is close to my heart, the Northeast of Tasmania. The project started off as Covid-19 lockdown procrastination exercise inspired by a preprint paper, now published in Yeomans et. al (2020), describing a machine learning based approach to tungsten exploration in Southwest England. After releasing an initial Google Colab notebook I received some generous feedback from the author and decided to refine the workflow by including new data layers and increasing the spatial resolution of the input data. All results from this updated workflow presented here are taken directly from the new notebook, which I’ve also made available in a similar fashion here.

Geological Setting

The oldest basement rocks in Northeastern Tasmania belong to a thick sequence of Ordovician-Devonian quartz-rich turbidites, the Mathinna Supergroup. Mathinna sediments were deformed during the Tabberabberan Orogeny, a major Devonian tectonic event that affected much of Southeastern Australia and coincided with voluminous granitic magmatism. Devonian aged egranites cover ~6% of the Tasmanian landmass and are an important source for much of Tasmania’s mineral wealth, which in northeastern Tasmania includes a series of stockwork and greisen Sn-W deposits. Figure 1 below presents a map of the study area for context.

Figure 1. Context map taken from QGIS using Google’s satellite imagery. The red box in the right hand plot highlights the study area discussed in the article, while the red dots represent locations with known Sn-W mineralisation.

Some historical Sn and Sn-W mines in the area include the Anchor mine in the Blue Tier district, the Briseis mine in Derby (now a mountain biker’s paradise), as well as the Storeys Creek and Aberfoyle mines around Rossarden. For a comprehensive summary of Tasmania’s economic geology the reader is referred to Seymour et al, (2006).

Mineral System Model

Greisen and stockwork type Sn-W deposits are typically associated with upper parts of evolved granitoid plutons where mineralising fluids exsolved from cooling magmas have been focussed and/or ponded (Fig. 2). Aspects of the mineral system exploitable by geophysics include:

Granite domes/cupolas are easily targetted as pronounced gravity lows
Sn-W granites are typically depleted in compatible elements which can give rise to low Fe-Ti oxide mineral content, making for magnetically ‘quiet’ granite targets
Enrichment in incompatible elements can result in strong radiometric U and K responses when exposed at surface
Contact aureoles around granites may have different weathering properties giving rise to distinct morphological characteristics visible in topography data

For more on granite related Sn-W mineral systems in the context of mineral exploration, the reader is referred to Blevin & Downes (2017) and Blevin (1998).

Figure 2. Model of Sn-W stockwork and greisen deposits. Scale is `‘`very approximate’. Image from Blevin (1998).

Conceptual Modelling Approach

The prospectivity modelling exercise presented herein relies on two types of data; point data representing the spatial location of known Sn-W mineralisation occurences, and gridded raster data sets containing geological, geophysical and remote sensing information surrounding the known Sn-W occurences. The goal of modelling is to train an ensemble decision tree model on a two-class problem in which classes contain raster data from pixels that are either proximal or distal to known occurences. The hope is that these models are sensitive to the multivariate signature of mineralisation in the raster data, and thus can be applied to all parts of the data sets in order to generate a prospectivity data layer of comparable resolution.

Relevant literature on the application of machine learning to similar mineral exploration problems can be found in Rodriguez-Galiano et. al (2015), Hariharan et. al (2017), Sun et. al (2019) and Roshanravan et. al (2020).

Data Sets

A total of 222 Sn-W occurences were extracted from the mineral occurences data set compiled by Mineral Resources Tasmania. These were filtered to ensure only in situ occurences were included as transported alluvial deposits would likely introduce spurious signals into the modelling procedure.

Raster data sets were compiled into a multiband virtual raster with a spatial resolution of 30m using the gdalbuildvrt module contained within the GDAL library of geospatial data manipulation programs. Data sets included geological, geophysical and remote sensing layers sourced from either Mineral Resources Tasmania or Geoscience Australia (Tab 1).

Table 1. Information regarding the evidence layers used in Sn-W prospectivity analyses.

Figure 3 below presents each data layer from the virtual raster with colours linearly stretched across 2.5% to 97.5% of the data values in each case. Red stars represent Sn-W occurences in the study area.

Figure 3. Data layers with Sn-W occurences plotted as red stars. Colour stretches have been clipped to 95% of the data range for each layer.

Principal Component Analysis

Figure 4 below presents principal component eigenvectors for each data band. It is clear that all radiometric and Landsat data bands are highly positively correlated with themselves given the similar magnitude and direction of their eigenvectors. The inclusion of highly correlated data layers such as these into ensemble decision tree modelling workflows can contribute a degree of redundant information that may adversely effect the performance of the models. It is for this reason that a feature extraction procedure was applied to the raster data prior to modelling.

Figure 4. Principal component eigenvectors for each data layer. Note that radiometric and Landsat layers have similar variance and are positively correlated.

Feature Extraction

Feature extraction involved a linear dimensionality reduction procedure in which three principal components were derived from the six Landsat layers, and two principal components from the four radiometric layers. Figure 5 below presents each of the three Landsat principal components as an RGB image. Despite using a Landsat product compiled over 30 years so as to represent the ‘barest earth’ Landsat signal, principal components tend to be sensitive to vegetation signals in NE Tasmania.

Figure 5. Landsat principal components plotted as RGB colour bands with Sn-W occurences overlain as white stars. Red colours represent cultivated land while blue colours represent dense wet eucalypt forest (dark blues). Low density sclerophyll forest appears as light to dark green colours while coastal sand dunes are given by white colours. Yellow colours in the south of the data set probably represent bare earth in the drier parts of the Fingal valley near Avoca.

Of the two radiometric principal components presented in Figure 6 below, the first (left plot) appears to highlight granite-related signatures the best as there appear to be strong postive responses above known granite outcrops. The second principal component appears to be partly sensitive to cultivated land when comparing it to the red colours in the earlier Landsat principal component image.

Figure 6. Radiometric principal components.

Dimensionality reduction decreased the total number of data layers to be used in modelling from 21 to 16. The final layers used in modelling are presented in Figure 7 below.

Figure 7. Data layers after dimensionality reduction ready for prospectivity modelling.

Propsectivity Modelling with CatBoost

Prospectivity modelling relied on the python implementation of the CatBoostClassifier algorithm, part of the CatBoost family of gradient boosted decision tree modelling algorithms. I use CatBoost because I am familiar with it, but there are alternative algorithms similarly suited to this application including Random Forests, XGBoost and LightGBM.

The relevant modelling code can be found in the notebook, but a simple summation of the procedure is as follows:

scale the input rasters to their respective unit variance
loop through Sn-W occurence locations, hold out the current occurence as well as all other occurences within 2km
extract evidence layer data from pixels within some box surrounding occurences not behing held out, this is the proximal data class
extract at random an equivalent number of pixels outside the boxes surrounding occurences, the distal data class
shuffle the classes and train a CatBoostClassifier model on 70% of the data, evaluate the model against 30% of the data and shrink to the iteration with the best evaluation metric (accuracy in this case)
apply the model to every pixel in the evidence layer data set to get a raster with pixel values representing the model’s confidence that the given pixel is in the proximal class
average all prediction rasters generated for each holdout iteration into a single output raster

The modelling procedure trains a unique model on the multivariate signature of all occurences with the exception of those within 2km of an excluded ‘holdout’ occurence. The logic behind this approach relates to the need to investigate the sensitivity of the models to each occurence without the nefarious influence of spatial autocorrelation. Since the goal here is to find previously unkown prospective areas, it is fitting that we test wether or not models are capable of detecting known mineralisation when they are trained on data surrounding occurences remote to it.

Holdout Results

Figure 8 below presents Sn-W occurences plotted on a digital elevation model and coloured by the averaged model predictions for surrounding pixels in the case where the occurence was held out. Here, 99 from 222 occurences are correctly classified as being proximal to an occurence, a false negative rate of 55%. The models are largely insensitive to Sn-W occurences outside of the three main mining districts in which there is a high spatial density of occurences; Rossarden-Storeys Creek in the south, Blue Tier in the north and Great Pyramid in the east.

Figure 8. Sn-W occurences overlain onto DEM. Occurences are coloured by the mean probabilities describing the holdout model’s confidence that the surrounding pixels are proximal to mineralisation.

There are a number of potential reasons for the poor model performance on these isolated occurences. One possibility may relate to the potential for the data on hand being insensitive to subtle signatures of some of the Sn-W occurences. Isolated, narrow vein-type deposits distal to granites might be an example of a deposit subtype for which data is insensitive. In addition, a number of Sn-W occurences, especially the older ones, may not have accurate coordinates and/or geologic descriptions, making some error checking procedure here worthwhile. Finally, it is likely that models are biased toward the signature of the main mining districts as these areas contain the majority of occurences.

Feature Importance Results

Feature importance values describe the degree to which changes in the feature values effect the prediction values. The larger the feature importance value, the greater the effect variation in this feature has on the model’s prediction outputs. All of the 222 holdout models saved feature importance data which are summarised in the boxplot shown in Figure 9 below.

Figure 9. Feature importance box plots showing feature importance of each data layer across all 222 models.

The isostatic residual Bouguer anomaly gravity layer was the most important data layer to the modelling. This relates to the sensitivity of residual gravity data to low density Devonian granite bodies in the shallow surface that either host or are directly beneath Sn-W mineralisation.

The importance ascribed to elevation data in the DEM probably relates to the tendency for in situ Sn-W occurences to be found in higher altitude areas where bedrock is exposed by erosional processes. Magnetic field data are also important which is likely due to the magnetically ‘quiet’ signature of Fe-Ti oxide depleted granites that host mineralisation being radically different to the high amplitude responses from Jurassic dolerites and remanently magnetised Cenozoic basalts in the study area.

High importance ascribed to the distance to granite outcrop and the 1st principal component of the radiometric data layers are expected given these layers, like gravity, are sensitive to the location of granites.

The barest earth Landsat data bands are largely unimportant to the models. This is likely due to there being little to no bedrock geology information in these data, which in this high rainfall part of the world is dominated by vegetation-related signals. Repeating the workflow in other Sn-W prospective parts of Australia with barer bare earth data, such in the New England Orogen in New South Wales, might change this result.

Averaged Prediction Results

A final prospectivity map was generated by averaging all 222 model prediction rasters into a single raster layer with pixel values representing the average probability that pixels are priximal to Sn-W mineralisation. Figure 10 presents this layer with occurences overlain as cyan coloured stars. If you would like to view this raster layer yourself, you can download a GDA94 UTM zone 55 projected version here and a WGS84 mercator projected version here.

Figure 10. Averaged probabilities that pixels are proximal to a Sn-W occurence from 222 holdout models.

A Cheeky Squizz At Current Exploration Leases

The plot below (Fig. 11) presents the final averaged probability output as a semitransparent overlay on Google satellite imagery. Pink to yellow colours represent pixels classified as being proximal to mineralisation. Current metals exploration leases taken from Mineral Resources Tasmania are overlain as green polygons and Sn-W occurences used in model training are overlain as red stars.

Figure 11. Final averaged probability raster overlain onto Google satellite imagery in QGIS. Red stars are Sn-W occurences while green polygons are current metals exploration tenements. Yellow to pink colours in the prospectivity raster represent areas of high prospectivity.

Summary & Conclusion

Machine learning is a powerful tool in the kit of the mineral explorationist. However, it is not magic and does not give you the equivalent of X-ray vision for mineral systems. In this case, holdout models predicted known mineralisation 45% of the time and struggled to identify occurences outside of the major mining camps, so there are probably a number of prospective areas not identified by this prospectivity exercise. This could potentially be improved by incoporating geochemical information in the form of stream sediment geochemical analyses into the modelling procedure, something that may be released in future iterations of the project.

Future Work

Incorporate geochemical information from stream sediment analyses in some sophisticated way
Follow up some of the prospective areas with a database search to see whether they have been tested by historic drilling
Repeat the entire workflow on the same raster data for structurally hosted gold deposits