Microsoft AI for Earth

Accelerating biodiversity surveys with Azure Machine Learning

Bringing computer vision to millions of camera trap photos

Siyu Yang

Published in

Microsoft Azure

13 min readDec 18, 2019

This post is written by Dan Morris and Siyu Yang at Microsoft AI for Earth.

Breaking the annotation logjam in wildlife surveys

Biodiversity is declining across the globe at a catastrophic rate. Conservation biologists are faced with the daunting — but urgent — task of surveying wildlife populations and making policy recommendations. What species need legal protection? Where will it be most effective to build underpasses as wildlife migration corridors? Where should we deploy anti-poaching resources?

Informed policy decisions on issues like these start with data, which in this case is wildlife population estimates. As digital cameras became affordable, populations are increasingly surveyed with imaging tools.

But currently, collecting data on how many animals live in an area involves massive amounts of manual annotation to count and classify the wildlife present in each image. It often takes years for an NGO or a government agency to annotate the millions of images collected from a single study. This bottleneck often delays critical conservation insights so long that by the time they’re available, they’re no longer relevant.

Recent advances in computer vision promise to break this annotation logjam and accelerate conservation decision-making. We have seen a number of success cases applying machine learning to specific wildlife datasets, but the models developed usually do not generalize well to data from other ecosystems. Consequently, the promise of efficiency gain from machine learning remains unrealized for the majority of conservation organizations.

This blog post outlines our approach to ML-assisted camera trap image annotation and the collaboration with our first partner, the Idaho Department of Fish and Game (IDFG), with whom we developed an end-to-end, open-source solution that now serves a dozen wildlife conservation organizations worldwide.

We will then go into depth on how we process images in batch using our animal detection model on a cluster of GPU-enabled virtual machines through Azure Machine Learning (AML). We have also exposed this functionality via an API, allowing our model to be integrated into existing annotation and crowdsourcing tools. To date, our batch processing API for animal detection has processed more than 25 million camera trap images from 13 organizations, while we continue to add to our training data repository.

All of the code and models presented here are available on GitHub. Furthermore, in partnership with Zooniverse, the University of Minnesota, and the University of Wyoming, we have brought together camera trap datasets from many organizations and hosted them publicly on Azure, including more than 200,000 bounding box annotations; those datasets are available at lila.science. A set of them from the Serengeti are used in the DrivenData machine learning competition “Hakuna Ma-data: Identify Wildlife on the Serengeti.”

microsoft/CameraTraps

This repo contains the tools for training, running, and evaluating detectors and classifiers for images collected from…

github.com

LILA BC (Labeled Image Library of Alexandria: Biology and Conservation)

LILA BC is a repository for archival data sets related to biology and conservation. Our intention is to create a…

lila.science

Project and partnerships

For the past two years, the AI for Earth program has been working with a number of organizations to curate labeled data and develop computer-vision-based pipelines for processing wildlife survey images from motion-triggered, handheld, and aerial cameras. You can follow all our projects in this area here.

The focus of our efforts has been on motion-triggered cameras (camera traps), as they are the most common tools for continuous monitoring of wildlife habitats. Conservation biologists around the world have developed workflows for deploying camera traps and analysis approaches for estimating population characteristics from annotated images.

Camera traps are motion- and heat-triggered cameras installed in the wild to survey animal activity. Image courtesy of US Fish & Wildlife Service.

By the summer of 2018, we had collected a substantial number of camera trap images from different locations to form a diverse training dataset. We also labeled more than 100,000 images with bounding boxes around the animals to allow us to train object detection models. We then experimented with different convolutional neural network architectures and trained our first “MegaDetector” — a model that identifies animals and humans in camera trap images — which worked well on images in a variety of habitats.

The Idaho Department of Fish and Game (IDFG) reached out after hearing about our projects: IDFG was putting out 800 camera traps and expressed concern that they would not have the staff-hours to annotate the millions of photos they will get back after snowmelt. Since some organizations, including IDFG, take time-triggered images as opposed to motion-triggered to obtain more accurate estimates of animal population, empty images can make up as many as 98% of all images. The MegaDetector’s detection results give a natural way to separate out empty from non-empty images, which could lead to a massive efficiency gain for these studies.

After seeing very encouraging results from the MegaDetector on a small batch of IDFG images, we moved to further assess the detector’s performance quantitively. For this, we received about three million images from IDFG that were labeled manually in previous years. Since the MegaDetector is a large network taking about 0.8 seconds to score one image on the latest GPU, scoring three million images on a single node would take 28 days. This motivated us to operationalize our model using Azure Machine Learning and distribute image processing to a scalable cluster.

However, machine learning results alone are not useful to ecologists; it’s critical to incorporate this model into a practical workflow. To this end, we collaborated with the developer of Timelapse, a popular open-source camera trap image labeling tool that IDFG and others already use for manual annotation, to surface our detection results. This way, annotators can visually review our model’s output in a familiar environment, and — without leaving their workflow — set a confidence threshold below which images can be safely considered empty. Other tools can do the same with our results, which are provided in a JSON where each entry corresponds to an image, specified here.

Machine learning approach: a two-stage processing pipeline with a generic detector and project-specific classifier

The computer vision community has been making strides towards the automated, species-level labeling for camera trap deployments anywhere in the world. However, a one-model-fits-all approach has been ineffective thus far: accuracy falls off in new regions and for previously unseen species. The same analysis shows, however, that object detectors used for localizing animals in a picture generalize better to new environments than do species classifiers.

Therefore, to accelerate the work of wildlife biologists around the world without having to train individual models for each region of deployment, we have taken the approach of training a two-class detector for a broad range of species and ecosystems (the “MegaDetector”, where each detection is either “animal” or “human”), followed by creating project-specific species classifiers for individual ecosystems. Running our detector prior to species classification means that classifiers can be trained on cropped-out areas containing animals, focusing the attention of the classifier on the animal, rather than the background.

Video showing our MegaDetector (v2) model running in a variety of ecosystems, on locations unseen during training. Images credit: eMammal. Video made by Sara Beery.

Since falsely triggered empty images make up more than 75% of all collected data in motion-triggered deployments (this can be as high as 98% for time-triggered cameras), we can already drastically reduce the number of images requiring manual review even if only the detector is applied to separate empty images from non-empty ones.

The following section describes how we can apply the MegaDetector to millions of images at a time by creating a batch scoring API powered by Azure Machine Learning.

Detection batch processing API: model operationalization

In choosing our detector model architecture, we favored accuracy over speed, since missing animals would render the automation exercise useless. As such, inference requires about 0.8 seconds per image on an NVIDIA V100 GPU. Consequently, we operationalize the MegaDetector so that large batches of images can be scored quickly on a scalable cluster of GPUs, and so that multiple batches can be queued. We also deploy this functionality as an API so that we and other users can call it from scripts and applications.

We found a good fit for our scaling needs in Azure Machine Learning (AML) Compute, a managed compute infrastructure that can be set as a compute target inside AML workspaces. It allows us to easily create a cluster, which can automatically scale up from 0 nodes to some maximum as jobs are submitted. The job queue is managed also, and when it becomes empty, the cluster scales down to 0 nodes after a user-specified delay. Its Python SDK also facilitates connecting data stores such as Blob Storage containers to the compute target, downloading model files previously uploaded to the AML workspace, and specifying the execution environment (Python packages to install or customized Docker containers).

Setting up AML workspace and compute target

Provisioning the AML workspace and compute target is a 10-minute process. The steps are demonstrated in this Jupyter Notebook:

After installing the AML Python SDK, authenticating, and setting the desired subscription to use via the Azure CLI, we create an AML workspace with the Workspace.create() method (documentation; you can also do this on the Azure Portal).
The storage account specified or created in this step needs to be in the same region as the workspace, as it is used to store AML artifacts such as uploaded models, Docker images, and run snapshots. Note that the input and output data can be in different storage accounts and regions.
Once created, the AML workspace will be listed under “Machine Learning” on the Azure Portal . You can also manage compute targets, models, and experiments (which we use for processing batches of images) here.
We then create an AML Compute cluster using the AmlCompute.create() method.
A point of confusion is that the quota in your subscription for each VM SKU for AML Compute is separate from the quota for ordinary VMs. You can check the quota for this purpose and request an increase on the “Usage + quotas” section of the workspace page.
We also need to register the models that our API makes available using the Model.register() method.
While we can always download the model from remote storage inside the scoring script, registering the model with the AML workspace makes them available to the scoring script automatically.
Each time a model of the same name is uploaded, AML bumps its version number, and we could select which version to use in the scoring script (we use the default/latest version).
We created the workspace above by authenticating to our subscription on the CLI. When our API needs to access the AML instance, it has to authenticate as an application (a service principle). We now need to create an application representing our API and give it access to this AML workspace. You can do this in the Azure Portal with these instructions. The password of the application needs to be set as an environment variable (AZUREML_PASSWORD) in the Dockerfile for our containerized API, and the “tenant-id” and “application-id” are stored in the API config file.

Using AML Pipelines to score images

We are using the ML Pipelines feature of AML service to carry out batch scoring (for training models, see the Estimator feature). Currently, there is only one step in our Pipeline (applying the MegaDetector), but it allows us to include classifier scoring and other post-processing steps in the future and execute each on the same or different compute targets.

You can see how we’ve set up our scoring step as a PythonScriptStep here (also see this new batch inference capability using a ParallelStep). To utilize the GPUs on the VMs, set the following options in the AML run configuration passed to PythonScriptStep’s constructor:

amlcompute_run_config.environment.docker.enabled = Trueamlcompute_run_config.environment.docker.gpu_support = Trueamlcompute_run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE

The source_directory parameter should point to a directory containing your scoring script(s). If your scoring code is in multiple files as ours is, make sure to set hash_paths to [.] so that the entire source_directory is hashed to determine changes before starting a new run.

We divide each job into batches of 2000 images. The detection results of each batch of images processed on a single node are saved in a JSON and uploaded to a storage container. Both (1) dividing the images into batches for distribution across the cluster and (2) aggregating the outputs need to be handled by a coordination script, usually as a Python Notebook, and in our case, in the API component.

Exposing the pipeline via an API

Having the batch scoring system set up on AML, we now want to expose this functionality as an API. There are two benefits to wrapping the processing capability in an API (as opposed to calling it from a Jupyter Notebook or script):

Each time an external organization uploads data, we can process it with minimal manual operations by making a call to the API
Exposing an API allows integration with other applications and use by other organizations to whom we can’t give access to the underlying AML resources

To support such scenarios where our collaborators or ourselves would like to expose a useful ML model via an API, AI for Earth has created an API Framework and an API Platform. The Framework consists of a layer above a Flask app that provides automatic telemetry and a series of customizable Docker images for containerization. The Platform allows APIs created via the Framework to be hosted in a scalable cluster on Azure, leveraging multiple Azure services including Cache for Redis, Application Insights, and Kubernetes Service (AKS).

There are two important files in the API creation process:

Dockerfile. Starting with a Python base container from our API Framework, this sets up a simple environment for running the API. The keys and passwords to the Azure Storage Account where we store model outputs and the AML workspace are set as environment variables here
runserver.py. This is where our endpoints are defined, incorporating tools from the API Framework.

Our camera trap image batch processing API is asynchronous since it may take hours or days to finish scoring a large drop of images. The API Framework accommodates this by implementing a polling-based asynchronous calling pattern, where a “request ID” is returned to the user after the endpoint is called. The user is then responsible for checking the status of the request via a /task endpoint using that request ID.

Batch processing is coordinated by the API, which submits jobs to the queue in the AML workspace after dividing up the images into smaller batches. After the jobs are submitted, another thread starts, periodically querying the AML workspace to see if all batches have finished processing. When all images have been scored, the monitoring thread aggregates individual output files in the storage container shared by all jobs, uploads the concatenated output, and updates the API’s task status with SAS keyed URL to the result file for the user to download.

Making use of detector results

In the wildlife conservation community, ecologists use a variety of tools to review camera trap images, and we would like to integrate the results of the MegaDetector into these workflows wherever possible, rather than requiring ecologists to learn new tools. We have defined our output format in JSON so that labeling tools can render bounding boxes on high-confidence animal detections, and separate empty images. We have been working with the developer of Timelapse to build out this integration and test it at IDFG.

Screenshot of the Timelapse software loaded with camera trap images and bounding boxes produced by the detection API. Image courtesy of RSPB.

Next steps

The value of the MegaDetector and the batch processing API is reflected in the number of uninteresting images that are not looked at by our collaborators. The next step is to accelerate species-level labeling — i.e., the processing of the interesting images — by training ecosystem-specific classifiers. For this, we use our library of labeled images and the output of the MegaDetector to train a classifier on cropped-out detection areas on each image. To learn more, head over to our classification directory.

We are also setting up a database using Azure Cosmos DB to organize our image meta-data so that we can quickly train classifiers for new ecosystems.

Active learning is another area we’re investing in to speed up species label generation, building on tools created in partnership with Conservation Metrics.

The sequence information in bursts of images and the background scene at each camera location are other information we hope to incorporate into our model to make even more accurate predictions.

Conclusion

When we first started the project, our goal was to build a tool that is used by at least one biologist to do conservation work more efficiently. With the development of the detection batch processing API, we have now scored more than 25 million camera trap images from 13 different organizations, and we hope that it will only scale up from here, enabling conservation organizations all over the world to accelerate biodiversity surveys with AI.

Acknowledgment

This project resulted from the contribution of many people in and outside of the AI for Earth team: Sara Beery, Marcel Simon, Saul Greenberg, Annie Enchakattu, Patrick Flickinger, and Neel Joshi.

We would like to thank the following organizations who have worked with us to adopt the ML-assisted workflow and the MegaDetector, and in some cases provided images for model re-training (list updated in January 2021):

Idaho Department of Fish and Game
San Diego Zoo Global
University of Washington Quantitative Ecology Lab
University of Idaho
Borderlands Research Institute at Sul Ross State University
Borneo Nature Foundation
Parks Canada
Australian Wildlife Conservancy
Lab of Dr. Bilal Habib at the Wildlife Institute of India
Royal Society for the Protection of Birds (RSPB)
Wildlife Protection Solutions
Island Conservation
Synthetaic
School of Natural Sciences, University of Tasmania
Arizona Department of Environmental Quality
Wildlife Research, Oregon Department of Fish and Wildlife
National Wildlife Refuge System, Southwest Region, US Fish and Wildlife
Mammal Spatial Ecology and Conservation Lab at Washington State University
Point No Point Treaty Council
SPEA (Portuguese Society for the Study of Birds)
Ghost Cat Analytics
EcoLogic Consultants Ltd.
Smithsonian Northern Great Plains Program
Federal University of Amapá, Ecology and Conservation of Amazonian Vertebrates Research Group
Hamaarag, The Steinhardt Museum of Natural History, Tel Aviv University
Czech University of Life Sciences Prague
Ramat Hanadiv Nature Park, Israel
TU Berlin, Department of Ecology
DC Cat Count, led by the Humane Rescue Alliance
Center for Biodiversity and Conservation at the American Museum of Natural History
Camelot
Graeme Shannon’s Research Group at Bangor University
Snapshot USA