Solving automated wildlife taxonomy with AI

Amir Akta
FruitPunchAI
Published in
12 min readNov 15, 2023

https://www.fruitpunch.ai/

In the wild landscapes of Europe, camera traps have become essential tools for ecologists and wildlife researchers, sharing glimpses into the lives of the continent’s diverse and fascinating animals. These devices have revolutionized our understanding of European wildlife, but their use comes with a challenge — the sheer volume of data they generate. To tackle this issue, a team of dedicated FruitPunchers recently developed an AI solution for the European Wildlife Challenge.

With help from Rewilding Europe, we collaborated with researchers from the Swedish University of Agricultural Sciences (SLU). The goal of this challenge was to create precise computer vision models that could help researchers identify European wildlife in their camera trap images. By doing so, population monitoring would become significantly faster and cheaper. To accomplish this, we developed a new workflow. While many researchers currently rely on the MegaDetector, a Microsoft-developed animal detector, our new detector addresses its limitations in terms of accuracy and the number of supported classes. Additionally, our pipeline allows researchers to train and re-train their models using various location-specific datasets, resulting in an increased number of classes and improved accuracy.

Fig 1. Example of a camera trap image of 2 moose

Who would benefit from this tool and why is it important?

The tool we designed assists scientists in their wildlife research activities by automating the process of classifying various animal species in their camera trap images. Camera traps can provide a wealth of information about the European ecosystem when strategically placed. However, sifting through thousands of images can be tedious without the right tools. The proposed machine learning product would greatly improve the accuracy and speed of species identification, allowing conservationists to focus on extracting insights from the data. What can we learn about the environment that supports these creatures? Are any of these species at risk of extinction? Is there a threat from invasive species? While MegaDetector is currently favored by many researchers as a more efficient option than manual identification, it still has significant shortcomings.

The AI for European Wildlife Challenge

A group of engineers with different backgrounds collaborated on this AI for Good project. We participated in two masterclasses that provided insight into ecological issues and how computer vision can support conservation efforts. To tackle the challenge, we divided into two teams: one team to explore various AI models, and another to construct the machine learning pipeline.

The Models Exploration Team

We started off by researching the existing solutions. MegaDetector employs the You Only Look Once model (YOLOv5) for detection, but its classification abilities are limited. In this Challenge the Model Exploration Team is looking for a state-of-the-art Computer Vision model to expand MegaDetector’s capabilities. We want to classify a greater number of species, improve object detection metrics, and generalize the results under diverse environmental conditions.

Criteria used to select a suitable model

  1. Open source with code available online.
  2. High ratio: optimal results/time efficiency.
  3. Easy replicability with our dataset.

Our search led us to focus on 3 promising model architectures:

YOLOv8 — developed by Ultralytics, this model is the highest-performing YOLO model at the moment. This model was proposed by the team due to its possibility to outperform the YOLOv5 architecture on which MegaDetector is built.

YOLO-NAS — developed by Deci to deal with problems found in the YOLO family: inadequate quantization support and insufficient accuracy-latency tradeoffs by implementing the Neural Architecture Search technology developed by this company. It has been shown that they outperform the YOLO family of models in terms of latency.

Fig 2. Comparisons of performances of YOLO family models with YOLO-NAS using mAP metric

MViTv2 — Two-staged detector based on the SOTA architecture of transformers. The architecture of these Improved Multiscale Vision Transformers allows us to perform accurate classification and object detection based on attention.

After experimenting, we found Yolo models to be more promising than MViTv2. In the last stages of the Challenge, we didn’t pursue more hyperparameter tuning for the MViTv2. Therefore, MViTv2 results are not present in this article.

The Generalizable Pipeline Team

This team aimed to create a pipeline that seamlessly integrates with the Trapper API in various ways. One of its main features is allowing users to upload their camera trap images to Trapper and retrieve the resulting data. On the other hand, users can also access the location-specific datasets already available in Trapper. The pipeline is foreseen with models that can be trained using the location-specific data, and predictions can be made from them. Additionally, users have the freedom to upload their custom models into the pipeline to train them.

Developing the best model

Selecting a classification and detection model for wild animals in their natural habitat is a complex task due to various challenges. These challenges include unbalanced datasets, detecting small animals, and changing environmental factors such as lighting and background. Additionally, the model needs to be able to generalize, and fine-tuning is necessary to prevent any biases during the training phase. Our research has shown that YOLOv8 performs well on the Rewilding Europe and Trapper images. This model has been identified as a state-of-the-art model and is available on GitHub with excellent development and maintenance. It is simple to implement for training, testing, and inference.

Fig 3. Evaluation metrics

Fig 4. Parallel coordinate chart for hyper-parameter tuning

Building a sophisticated pipeline

The team had the grand idea of delivering the pipeline as a model-serving web service. But this approach was infeasible given the technical complexity involved and the time we had at our disposal. So what alternately followed is the architecture shown here.

Fig 5. Current Generalisable Pipeline System Architecture

To begin their project, the team integrated with Trapper and decided to run their pipeline in a Docker container instance. They also set up a local Trapper instance during development and testing to ensure a reliable connection and fast response times for optimal performance. In order to upload data to Trapper, the procedure was:

Create a research projectAdd location information (optional)Set up the Trapper ClientAdd deployments to TrapperAdd the resources (images & others) to Trapper

The team utilized the ReWilding Dataset which contains over 378,000 camera trap images for training their models and the Lila dataset for validation. The data was pre-processed twice, before uploading to Trapper and again before training the models in the pipeline.

Examining the ReWilding dataset revealed a lot of false positives in the images. Biologists provided the species class, but the images lacked bounding boxes. Thus, the first stage of data pre-processing would consist of providing annotations for the ReWilding Dataset, done in seven steps:

  1. We downloaded the ReWilding Dataset locally
  2. We set up a local instance of the MegaDetector v5 and annotated the images with bounding boxes
  3. We merged the annotated images with the expert species-level annotations
  4. We filtered bounding boxes with high confidence using a threshold of 0.8 (a heuristic from MegaDetector+DeepLabCut)
  5. We converted the bounding box annotations into COCO, YOLO, and TF formats
  6. We split the dataset into train, validation, and test splits (80%: 10%: 10%)
  7. We created a label map for the species class

Fig 6. Dataset size

It should be noted that in most images with multiple animals spotted, these animals were of the same species.

The first stage of data pre-processing was to load the data from Trapper. Before we can begin training the YOLO models in the pipeline we had to:

  • Download the image files from Trapper, through HTTP requests
  • Convert the bounding boxes to YOLO format

After pre-processing the data, which included images and YOLO-formatted bounding boxes, we began training our model. We optimized its parameters to ensure that it could accurately detect and classify objects in the images. During training, the model generated several weights files at different stages or iterations. These files encapsulate the learned parameters and features necessary for object detection. To maximize flexibility, we saved these weights files so users could select the most appropriate one during inference.

To address the limited occurrence of certain rare classes in the training data, we employed data augmentation. We explored several techniques and libraries, including animal pose, mix-up, simulating animals in empty images, and re-sampling rare classes. Ultimately, we chose Albumentations due to its extensive set of augmentations. We developed a wrapper program around Albumentations that loads a .yaml configuration file, which specifies the augmentations to use and their parameters. This program also allows users to control actions like applying augmentations separately or altogether with a certain probability. By augmenting the data, we created variations of the original camera trap images, which improved our model’s performance.

As seen in Fig 5. the team developed two pipelines: a (re)training pipeline and an inference pipeline. The team also maintained these models in their model zoo:

  • YOLOv5
  • YOLOv8
  • YOLONAS
  • Vision Transformer

Other models that they researched, but didn’t fully implement include:

  • SAHI
  • MViTv2

With some post-processing using WeightsandBiases and MLFlow, the evaluation metrics of the models were logged. The inference results are published to Trapper within the Docker container using a POST REST API call. All HTTP calls in both pipelines were executed through scripts running on the terminal within the container.

Challenges and takeaways

Working on the AI for European Wildlife Challenge gave us deep insights into population monitoring and imparted some valuable lessons at the end of the day.

At the start of our project, we were all working in different notebooks. However, we soon realized that we needed to work together more closely to ensure that our goals aligned and that we were not spending time on unnecessary tasks.

We also had to decide whether to prioritize the quality of our data or to move forward with the datasets we had prepared and begin training our models. We discussed the concern that focusing too much on incorporating datasets into the pipeline could take away from training and testing our models. We recognized that Megadetector was performing well for our current needs, so we decided to pursue both approaches simultaneously.

Our plan was to refine the data while also training our models. As we continued to refine the data, our models would be retrained on the updated dataset. This way, we could ensure the quality of our data while also making progress in training our models. Overall, we agreed that this approach would help us move forward efficiently while also ensuring that everyone is happy with the project’s progress.

Apart from these, the various aspects of development came with their own technical challenges and lessons.

We found that implementing certain augmentations on some of the images posed challenges, particularly when segmentation and pose detection were required. We also encountered some difficulty integrating MLFlow into the model training files directly since the source code was wrapped quite tightly. We circumvented this problem by specifically only logging the parameters passed into and leaving the training function. The main focus of logging though was on the validation process — using MLFlow we logged the dataset, model, limits on training size, and others. MLFlow ran in its own dedicated Docker container (Fig 5). Working with Trapper came with its unique problems too: images had misplaced bounding boxes when many (>250) were downloaded at once, possibly from a bug in the codebase; network errors when using a local Trapper instance to download from their server.

Where can you find our work?

More details on the pipeline — training, inference, other technical processes, the results, and future work — are provided in this report. The model exploration is covered in this report.

All the code and example notebooks for the development aspect of this project and its use are hosted on these Gitlab projects.

The Weights & Biases project for the pipeline contains things like the model comparison report.

Finally, the presentation for viewing.

Personal experiences

Adrian

Participating in the AI for European Wildlife Challenge hosted by FruitPunch AIi was an eye-opening experience that reinforced my belief in the importance of wildlife research. As ecosystems face increasing threats from human activities, I was excited to contribute to this cause by using AI to detect wildlife on camera trap images. The challenge introduced me to the wildlife research community, and I was thrilled to collaborate with like-minded individuals who share the same passion for protecting our planet’s biodiversity.

Throughout the challenge, I was impressed by the capabilities of Trapper, a tool widely used in the wildlife research community. Trapper provided an efficient platform to store, organize, and share wildlife datasets. I found it challenging but rewarding to work on enriching Trapper with an ML pipeline, enabling us to leverage AI algorithms to help streamline the research process. It was a fulfilling journey, and I’m grateful for the opportunity to contribute to the advancement of wildlife research and conservation efforts.

I am genuinely grateful to FruitPunch for organizing this impactful challenge, which not only broadened my horizons but also re-ignited a deeper commitment to preserving the natural world.

Icxa

The Challenge enabled me to get out of my comfort zone and try out different things for building ML Pipeline. It was a good learning experience and a pleasure to work with other AI for Good Engineers.

Arnold

I am proud to have been in Team Pipeline of the European Wildlife Challenge. From the very first meeting, I could already tell things were going to go well. It was a warm crowd of people that surrounded me, with diverse backgrounds ranging from researchers to engineers in different disciplines. Everyone was quite sociable, and tried sharing what qualities and skills they felt would help our initiative. I found it very different from the FruipPunch Bootcamp I’d engaged in some months prior. There were so many times I felt out of my depth as I tried to learn from those around me; in the busy weeks, it felt like I could barely keep up with the rest haha… And yet still it was a rewarding endeavor.

From the highs of the enriching masterclasses to the many frustrations derived from the project calling for ad-hoc sub-group Zoom sessions to the sudden need for a presenter during our Tuesday weekly meetings, it was such a worthwhile experience!

‍Agnethe

The Challenge was such an exciting endeavor. I found myself working with people across multiple disciplines from all over the world. Even though it was sometimes difficult, everyone worked really hard to get across the finish line and it made for a great experience.

Aakash

I was co-lead of the Model Exploration and an active member of the Data Pipeline team. It was interesting to work with participants from different educational and cultural backgrounds. Presentations from Dan of MegaDetector fame and Magali Frauendorf from ReWilding Europe provided an exciting perspective on the challenges faced by biologists in the field. I also want to commend Pilar and Agnethe on their work during the challenge.

One of my key contributions was to create a data pipeline and train different models on the ReWilding dataset. The dataset was the largest in the challenge and also tricky to work with since it didn’t have any data annotations with bounding boxes. I created a pipeline for downloading and automated annotation of the images before the images were piped into a model training pipeline.

I certainly look forward to more collaborations with the FruitPunch Team!

In Closing…

Visit out website to apply AI for Good — Join our Community!

We extend our thanks to the stakeholders who aided us during the challenge: Magali Frauendorf and Tim Hofmeester from SLU.

We would here like to gratefully acknowledge the following participants who made efforts at all hours of the day to realize the goal of this challenge: Aditya Rachman Putra, Agnethe Olsen, Alba Gómez, Julia Krzemień, Mahmoud Kobbi, Miguel Maldonado, Pilar Navarro Ramírez, Ricardo Martínez Prentice, Aakash Gupta, Adrian Azoitei, Arnold Kinabo, Christian Winkelmann, Icxa Khandelwal, Merve Günak, Patryk Neubauer

--

--

Amir Akta
FruitPunchAI

CGO @ Fruitpunch AI, love Art & Travel, will mostly write about AI, Growth and Fundraising.