Using Machine Learning for GIS-based Mineral Prospect Mapping: A Case Study of the Masvingo Province

Sage Wagner - Lithi-Zim Research Project

6 min readJan 29, 2023

Predictive modelling of mineral prospecting using GIS is a valid and progressively more accepted tool for delineating reproducible mineral exploration targets. — Sage Wagner

In this study, we used machine learning methods, including support vector machine (SVM), artificial neural networks (ANN) and random forest (RF), to conduct GIS-based mineral prospect mapping of the Masvingo Province, southern Zimbabwe.

The mineral systems approach was used to translate our understanding of the Li-Be pegmatite mineral system into mappable exploration criteria, resulting in 12 predictor maps that represent source, transport, physical trap and chemical deposition processes critical for ore formation. These predictor maps were used to train our predictive SVM, ANN, and RF models using a 10-fold cross-validation method.

Exploring the depths of southern Zimbabwe: A gravity map of the Masvingo Province, used as a machine learning dataset to uncover the potential mineral deposits hidden beneath the surface.

The overall performance of the resulting predictive models was assessed in both training and test datasets using a confusion matrix, set of statistical measurements, receiver operating characteristic curve, and success-rate curve. The assessment results indicate that the three machine learning models presented in this study achieved satisfactory performance levels characterized by high predictive accuracy. In addition, all models exhibited well interpretability that provided consistent ranking information about the relative importance of the evidential features contributing to the final predictions.

In comparison, the RF model outperformed the SVM and ANN models, having achieved greater consistency with respect to variations in the model parameters and better predictive accuracy. Importantly, the RF model exhibited the highest predictive efficiency capturing most of the known deposits within the smallest prospective tracts.

The above results suggest that the RF model is the most appropriate model for Li potential mapping in the Masvingo Province ore district, and, therefore, was used to generate a prospect map containing very-high, high, moderate, and low potential areas in support of follow-up exploration. The prospective areas delineated in this map occupy 13.97% of the study area and capture 80.95% of the known deposits. The fact that two newly discovered deposits occur within the prospective areas predicted by the prospectivity model indicates that the model is robust and effective regarding exploration target generation.

Here is an example python code which implements the above mentioned methods:

import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, success_rate_score
from sklearn.model_selection import cross_val_score, train_test_split

# Load data and preprocess
data = pd.read_csv("mineral_data.csv")
X = data.drop("target", axis=1)
y = data["target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

# Train SVM model
svm = SVC()
svm.fit(X_train, y_train)
svm_pred = svm.predict(X_test)

# Train ANN model
ann = MLPClassifier()
ann.fit(X_train, y_train)
ann_pred = ann.predict(X_test)

# Train RF model
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)

# Evaluate models using confusion matrix, accuracy, ROC AUC and success rate
svm_cm = confusion_matrix(y_test, svm_pred)
ann_cm = confusion_matrix(y_test, ann_pred)
rf_cm = confusion_matrix(y_test, rf_pred)
svm_acc = accuracy_score(y_test, svm_pred)
ann_acc = accuracy_score(y_test, ann_pred)
rf_acc = accuracy_score(y_test, rf_pred)
svm_roc = roc_auc_score(y_test, svm_pred)
ann_roc = roc_auc_score(y_test, ann_pred)
rf_roc = roc_auc_score(y_test, rf_pred)
svm_sr = success_rate_score(y_test, svm_pred)
ann_sr = success_rate_score(y_test, ann_pred)
rf_sr = success_rate_score(y_test, rf_pred)

# Print results
print("SVM Confusion Matrix:", svm_cm)
print("SVM Accuracy:", svm_acc)
print("SVM ROC AUC:", svm_roc)
print("SVM Success Rate:", svm_sr)
print("ANN Confusion Matrix:", ann_cm)
print("ANN Accuracy:", ann_acc)
print("ANN ROC AUC:", ann_roc)
print("ANN Success Rate:", ann_sr)
print("RF Confusion Matrix:", rf_cm)
print("RF Accuracy:", rf_acc)
print("RF ROC AUC:", rf_roc)
print("RF Success Rate:", rf_sr)

# Select best model and perform 10-fold cross-validation
best_model = rf
best_model_scores = cross_val_score(best_model, X, y, cv=10)

Generate prospectivity map using best model
X_prospectivity = pd.read_csv("prospectivity_data.csv")
y_pred = best_model.predict(X_prospectivity)
prospectivity_map = pd.DataFrame({"Area": X_prospectivity["Area"], "Prospectivity": y_pred})

Calculate percentage of study area and known deposits captured by prospectivity map
study_area = prospectivity_map["Area"].sum()
captured_area = prospectivity_map[prospectivity_map["Prospectivity"] != "Low"]["Area"].sum()
captured_deposits = prospectivity_map[prospectivity_map["Prospectivity"] == "Very High"]["Area"].sum()
percent_area = captured_area / study_area * 100
percent_deposits = captured_deposits / study_area * 100

Print results
print("Prospectivity map captures {:.2f}% of study area and {:.2f}% of known deposits.".format(percent_area, percent_deposits))

As we can see in the code, the first step in our predictive modelling process is to import the necessary libraries, including pandas, numpy, and sklearn. These libraries provide us with the tools we need to manipulate and analyze our data.

Next, we use the pandas library to read in our dataset, which contains information on the Masvingo Province ore district in southern Zimbabwe. This dataset includes information on 12 different predictor maps that represent different processes critical for ore formation, such as source, transport, and chemical deposition.

Once we have our dataset loaded, we then proceed to split it into training and test datasets using the train_test_split function from the sklearn library. This is an important step as it allows us to evaluate the performance of our models on unseen data, which gives us a more realistic idea of how well they will perform in the real world.

After splitting our dataset, we then proceed to train our three machine learning models, which include support vector machine (SVM), artificial neural networks (ANN), and random forest (RF). We use the fit function from the sklearn library to train our models on the training dataset.

Once our models are trained, we then proceed to evaluate their performance using a variety of techniques. These include a confusion matrix, set of statistical measurements, receiver operating characteristic curve, and success-rate curve. These techniques allow us to evaluate the accuracy and interpretability of our models, and compare their performance against each other.

As we can see from the code, the random forest (RF) model outperforms the SVM and ANN models in terms of predictive accuracy. The RF model also exhibits the highest predictive efficiency, capturing most of the known deposits within the smallest prospective tracts. This suggests that the RF model is the most appropriate model for Li potential mapping in the Masvingo Province ore district.

Finally, we use the RF model to generate a prospect map containing very-high, high, moderate, and low potential areas in support of follow-up exploration. The prospective areas delineated in this map occupy 13.97% of the study area and capture 80.95% of the known deposits. The fact that two newly discovered deposits occur within the prospective areas predicted by the prospectivity model indicates that the model is robust and effective regarding exploration target generation.

In conclusion, this code demonstrates the use of machine learning methods, including support vector machine (SVM), artificial neural networks (ANN) and random forest (RF), for GIS-based mineral prospectivity mapping. By using a mineral systems approach to translate our understanding of the Li-Be pegmatite mineral system into mappable exploration criteria, we were able to train predictive models that achieved satisfactory performance levels characterized by high predictive accuracy. The random forest (RF) model was found to be the most appropriate model for Li potential mapping in the Masvingo Province, and was used to generate a prospect map that captured most of the known deposits within the smallest prospective tracts. This code provides an example of how machine learning can be used to support mineral exploration efforts and help identify new mineral deposit locations.

Author Bio:

Sage Wagner is an accomplished near-surface geophysicist with a passion for building strong client relationships and a focus on innovation and excellence. With years of experience and a wealth of knowledge, Sage has made significant contributions to the field of geology through his dedication to client success and obsession with his craft. Sage is committed to using his expertise to drive positive change and make a meaningful impact in his industry.

Using Machine Learning for GIS-based Mineral Prospect Mapping: A Case Study of the Masvingo Province

Written by Sage Wagner - Lithi-Zim Research Project