21st Century Paleontology with Machine Learning
Unlock Your Inner Jurassic Park Fan and Learn to Hunt Dinosaurs with Computer Vision
Let’s build an AI fossil-hunting tool based on PyTorch and the Intel AI Analytics Toolkit. This tutorial will help you understand how to decompose an image classification problem, like dinosaur fossil hunting, into a few key components: building context from data, proper data representation to our model, model definition/training, and producing actionable insights from model predictions. Please visit the Jurassic repository to run the Jupyter notebooks discussed in this article.
Building Context from Data
Our model will be trained from aerial photos rather than satellite images, which are typically flown at ~120,000 feet and do not have sufficient resolution. Resolution must be high enough to find specific dinosaur bone fragments. The model we train will focus on the landscape’s colors, textures, and shapes (Figure 1).
As the images are passed through our convolutional neural network (CNN), higher and higher level features are extracted by each layer until we have models equipped to recognize the critical features (Figure 2). Our model will learn how to differentiate the depositional environment’s colors, textures, and shapes in Utah’s Dinosaur National Monument sites.
Proper Representation of Data to Our Model
As previously mentioned, we will work with aerial imagery to build our training and testing data. We will use the Google Earth Engine SDK to extract images at specific locations and altitudes. The Jupyter notebook can be found here. After retrieving the aerial imagery, we need to process the images into a format that our model expects (Figure 3).
The images are then manually labeled based on the following criteria and classes (Figure 4):
- Class 0 — Non-bone locations within a few miles of bone sites (no bones possible)
- Class 1 — Any region with similar depositional environments for which bones have not been positively identified (bones are possible, but have not been found)
- Class 2 — Bone locations have been identified via GPS locations and mapped to image coordinates (verified fossil sites)
Upon completing the manual labeling process, the labeled data is separated into training and validation folders for the next step in our tutorial.
Let’s look at our data distributions. These histograms show the total number of samples per label in our training (Figure 5a) and validation (Figure 5b) datasets.
Now, let’s spot-check some of the images (Figures 6a and 6b). The augmented samples in our training dataset seem to have negative space along the edges. This is fine for our model, but in the future, we could try pixel rotations rather than rotating the entire image to avoid this negative space. From these sample images, ignoring the vegetation, we can start to see patterns of textures and colors emerge. In Figure 6a, we see that labels 1 and 2 have distinct light-colored streaks mixed in with darker-shades, while the samples in Figure 6b of label 0 have more homogeneous coloring.
The patterns present in labels 1 and 2 indicate the Brushy Basin Member (Figure 1), the target bench of the Morrison Formation, where we expect to find fossil-bearing rocks. These distinctions between images are detectable after visual inspection, but our goal is to build a model that can perform this analysis for thousands of images in seconds.
Model Definition and Training
Using our labeled dinosaur hunting dataset, we will transfer learn a pretrained ResNet model through a process called “domain adaption.” Transfer learning allows us to leverage the pretrained weights in an existing model for new tasks (Figure 6).
We could train our model from scratch, yielding a dedicated fit-for-purpose model, but this would require significantly more data. Let’s unpack this methodology:
- Learning from scratch is your traditional deep learning scheme where you initialize the weights of your network as zeros, randomly, or with some predefined value. The model then uses backpropagation to update the weights of your model based on some objective function and optimizer. This type of training is generally more computationally expensive, requiring accelerators like GPUs and HPUs, and can require hours, days, or weeks to complete.
- Domain Adaptive Transfer Learning is when we start with a model pre-trained on an original dataset and introduce a completely different dataset and use it to retrain the model. This adapts our model to a new problem and transfers the learnings from a previous dataset. In this tutorial, we start with a pre-trained ResNet model and domain adapt it for a fossil likelihood classification task.
- Fine-tuning is another sub-category of transfer learning that doesn’t switch domains but updates an existing model with new data. For example, we could acquire new data from a fossil location to update our dinosaur hunting model after the initial training. For this to make sense, our original dataset would have to be significantly larger than the new data so that we can justify preserving some of the weights through transfer learning techniques like layer freezing or learning rate drops. If the datasets are similar, it might be worth training a new model from scratch.
Bypassing “learning from scratch” gives us more flexibility when considering the type of hardware we need. In this case, transfer learning will enable us to perform our deep learning and inference directly on the CPU, thus eliminating the need for additional accelerated hardware.
We will use the Intel Extension for PyTorch (IPEX) to accelerate training and inference. IPEX allows us to apply channels last, graph optimization, operator optimization, and auto-mixed precision, all via an easy-to-use Python API. If you’d like to learn more about implementing IPEX, please visit the IPEX GitHub or IPEX PyTorch Documentation.
By adding the two lines of code below, we can immediately unlock the latest Intel hardware optimizations for PyTorch:
model = self.model.to(memory_format=torch.channels_last)
model, self.optimizer = ipex.optimize(
self.model, optimizer=self.optimizer,dtype=torch.float32
After training our model enough epochs to stabilize and achieve satisfactory accuracy without overfitting (Figure 8), we can begin testing our model on unseen data.
Producing Actionable Insights from Model Predictions
Now that we have successfully transfer-learned our model, we can predict the labels of unseen data (Figure 9a and 9b) and stitch together a dinosaur fossil probability map.
Let’s combine our images and apply a gradient color overlay representative of our model’s predicted labels (Figure 10). The green gradients indicate a higher probability of finding fossils in that location. Now we can put on our boots and begin our dinosaur hunting adventure, starting with the areas of higher fossil likelihood predicted by our model.
Concluding Remarks
We explored a fascinating application of computer vision to the discipline of paleontology. One important thing to highlight is that the workflow described in this article and the underlying Jupyter notebooks can be applied to any scenario where aerial photographs are used to delineate the properties of a region. Some future examples we could explore are forest fire likelihood, coral reef bleaching detection, and agriculture crop performance. If you are interested in evaluating the extended work performed with the Intel Distribution of OpenVINO Toolkit, check out the notebooks on CPU and iGPU inference.