AI for AG: Production machine learning for agriculture

Published in
11 min readAug 6, 2020


Author: Chris Padwick, Director of Computer Vision and Machine Learning at Blue River Technology

How did farming affect your day today? If you live in a city, you might feel disconnected from the farms and fields that produce your food. Agriculture is a core piece of our lives, but we often take it for granted.

A 2017 prototype of See & Spray, Blue River Technology’s precision weed control machine

Farmers today face a huge challenge — feeding a growing global population with less available land. The world’s population is expected to grow to nearly 10 billion by 2050, increasing the global food demand by 50%. As this demand for food grows, land, water, and other resources will come under even more pressure. The variability inherent in farming, like changing weather conditions, and threats like weeds and pests also have consequential effects on a farmer’s ability to produce food. The only way to produce more food while using less resources is through smart machines that can help farmers with difficult jobs, offering more consistency, precision, and efficiency.

Alex Marsh, one of our Field Operations Specialists, pictured with a Self Propelled Sprayer. We work with big machines at Blue River — Alex is 6’4” tall and is about level with the top of the tire.

Agricultural robotics

At Blue River Technology, we are building the next generation of smart machines. Farmers use our tools to control weeds and reduce costs in a way that promotes agricultural sustainability. Our weeding robot integrates cameras, computer vision, machine learning and robotics to make an intelligent sprayer that drives through fields (using AutoTrac to minimize the load on the driver) and quickly targets and sprays weeds, leaving the crops intact.

The machine needs to make real-time decisions on what is a crop and what is a weed. As the machine drives through the field, high resolution cameras collect imagery at a high frame rate. We developed a convolutional neural network (CNN) using PyTorch to analyze each frame and produce a pixel-accurate map of where the crops and weeds are. Once the plants are all identified, each weed and crop is mapped to field locations, and the robot sprays only the weeds. This entire process happens in milliseconds, allowing the farmer to cover as much ground as possible since efficiency matters. Here is a great See & Spray Video that explains the process in more detail.

To support the Machine Learning (ML) and robotics stack we built an impressive compute unit, based on the NVIDIA Jetson AGX Xavier Edge AI platform. Since all our inference happens in real time, uploading to the cloud would take too long, so we bring the server farms to the field. The total compute power on board the robot just dedicated to visual inference and spray robotics is on par with IBM’s super computer, Blue Gene (2007). This makes this a machine with some of the highest compute capacity of any moving machine machinery in the world!

Building weed detection models

My team of researchers and engineers is responsible for training the neural network model that identifies crops and weeds. This is a challenging problem because many weeds look just like crops. Professional agronomists and weed scientists train our labeling workforce to label the images correctly — can you spot the weeds below?

You are looking at cotton plants and some weeds. Can you tell the difference?

In the image below, the cotton plants are in green and the weeds are in red.

The cotton plants are in green and the weeds are in red.

Machine learning stack

On the machine learning front, we have a sophisticated stack. We use PyTorch for training all our models. We have built a set of internal libraries on top of PyTorch which allow us to perform repeatable machine learning experiments. The responsibilities of my team fall into three categories:

  • Build production models to deploy onto the robots
  • Perform machine learning experiments and research with the goal of continually improving model performance
  • Data analysis / data science related to machine learning, A/B testing, process improvement, software engineering

We chose PyTorch because it’s very flexible and easy to debug. New team members can quickly get up to speed, and the documentation is thorough. Before working with PyTorch, our team used Caffe and Tensorflow extensively. In 2019, we made a decision to switch to PyTorch and the transition was seamless. The framework gives us the ability to support production model workflows and research workflows simultaneously. For example we use the torchvision library for image transforms and tensor transformations. It contains some basic functionality and it also integrates really nicely with sophisticated augmentation packages like imgaug. The transforms object in torchvision is a piece of cake to integrate with imgaug. Below is a code example using the Fashion MNIST dataset. A class called CustomAugmentor initializes the iaa.Sequential object in the constructor, then calls augment_image() in the __call__ method. CustomAugmentor() is then added to the call to transforms.Compose(), prior to ToTensor(). Now the train and val data loaders will apply the augmentations defined in CustomAugmentor() when the batches are loaded for training and validation.

Additionally, PyTorch has emerged as a favorite tool in the computer vision ecosystem (looking at Papers With Code, PyTorch is a common submission). This makes it easy for us to try out new techniques like Debiased Contrastive Learning for semi-supervised training.

On the model training front, we have two normal workflows: production and research. For research applications, our team runs PyTorch on an internal, on-prem compute cluster. Jobs being executed on the on-premise cluster are managed by Slurm, which is an HPC batch job based scheduler. It is free, easy to set up and maintain, and provides all the functionality our group needs for running thousands of machine learning jobs. For our production based workflows we utilize an Argo workflow on top of a Kubernetes (K8s) cluster hosted in AWS. Our PyTorch training code is deployed to the cloud using Docker.

Deploying models on field robots

For production deployment, one of our top priorities is high-speed inference on the edge computing device. If the robot needs to drive more slowly to wait for inferences, it can’t be as efficient in the fields. To this end, we use TensorRT to convert the network to an NVIDIA Jetson AGX Xavier optimized model. TensorRT doesn’t accept JIT models as input so we use ONNX to convert from JIT to ONNX format, and from there we use TensorRT to convert to a TensorRT engine file that we deploy directly to the device. As the toolstack evolves, we expect this process to improve as well. Our models are deployed to Artifactory using a Jenkins build process and they are deployed to remote machines in the field by pulling from Artifactory.

To monitor and evaluate our machine learning runs, we have found the Weights & Biases platform to be the best solution. Their API makes it fast to integrate W&B logging into an existing codebase. We use W&B to monitor training runs in progress, including live curves of the training and validation loss.

SGD vs Adam Project

As an example of using PyTorch and W&B, I will run an experiment and compare the results of using different solvers in PyTorch. There are a number of different solvers in PyTorch — the obvious question is which one should you pick? A popular choice of solver is Adam. It often gives good results without needing to set any parameters and is our usual choice for our models. In PyTorch, this solver is available under torch.optim.adam. Another popular choice of solver for machine learning researchers is Stochastic Gradient Descent (SGD). This solver is available in PyTorch as torch.optim.SGD. If you’re not sure of the differences between the two, or if you need a refresher, I suggest reviewing this write up. Momentum is an important concept in machine learning, as it can help the solver to find better solutions by avoiding getting stuck in local minima in the optimization space. Using SGD and momentum the question is this: Can I find a momentum setting for SGD that beats Adam?

The experimental setup is as follows. I use the same training data for each run, and evaluate the results on the same test set. I’m going to compare the F1 score for plants between different runs. I set up a number of runs with SGD as the solver and sweeping through momentum values from 0–0.99 (when using momentum, anything greater than 1.0 causes the solver to diverge). I set up 10 runs with momentum values from 0 to 0.9 in increments of 0.1. Following that I performed another set of 10 runs, this time with momentum values between 0.90 and 0.99, with increments of 0.01. After looking at these results, I also ran a set of experiments at momentum values of 0.999 and 0.9999. Each run was done with a different random seed, and was given a tag of “SGD Sweep” in W&B. The results are shown in Figure 1.

Figure 1: On the left hand side the f1 score for crops is shown on the x-axis, and the run name is shown on the y axis. On the right hand side the f1 score for plants as a function of momentum value is shown.

It is very clear from Figure 1 that larger values of momentum are increasing the f1 score. The best value of 0.9447 occurs at momentum value of 0.999, and drops off to a value of 0.9394 at a momentum value of 0.9999. The values are shown in the table below.

Table 1: Each run is shown as a row in the table above. The last column is the momentum setting for the run. The F1 score, precision, and recall for class 2 (crops) is shown.

How do these results compare to Adam? To test this I ran 10 identical runs using torch.optim.Adam with just the default parameters. I used the tag “Adam runs” in W&B to identify these runs. I also tagged each set of SGD runs for comparison. Since a different random seed is used for each run, the solver will initialize differently each time and will end up with different weights at the last epoch. This gives slightly different results on the test set for each run. To compare them I will need to measure the spread of values for the Adam and SGD runs. This is easy to do with a box plot grouped by tag in W&B.

Figure 2: The spread of values for Adam and SGD. The Adam runs are shown in the left of the graph in green. The SGD runs are shown as brown (0.999), teal (0–0.99), blue (0.9999) and yellow (0.95).

The results are shown in graph form in Figure 2, and in tabular form in Table 2. The full report is available online too. You can see that I haven’t been able to beat the results for Adam by just adjusting momentum values with SGD. The momentum setting of 0.999 gives very comparable results, but the variance on the Adam runs is tighter and the average value is higher as well. So Adam appears to be a good choice of solver for our plant segmentation problem!

Table 2: Run table showing f1 score, optimizer and momentum value for each run.

PyTorch Visualizations

With the PyTorch integration, W&B picks up the gradients at each layer, letting us inspect the network during training.

W&B experiment tracking also makes it easy to visualize PyTorch models during training, so you can see the loss curves in real time in a central dashboard. We use these visualizations in our team meetings to discuss the latest results and share updates.

As the images pass through our PyTorch model, we seamlessly log predictions to Weights & Biases to visualize the results of model training. Here we can see the predictions, ground truth, and labels. This makes it easy to identify scenarios where model performance isn’t meeting our expectations.

The ground truth, predictions and the difference between the two. Crops are shown in green, while weeds are shown in red.

Here we can quickly browse the ground truth, predictions and the difference between the two. We’ve labeled the crops in green and the weeds in red. As you can see, the model is doing a pretty reasonable job of identifying the crops and the weeds in the image.

Here is a short code example of how to work with data frames in W&B:

Reproducible models

Reproducibility and traceability are key features of any ML system, and it’s hard to get right. When comparing different network architectures and hyperparameters, the input data needs to be the same to make runs comparable. Often individual practitioners on ML teams save YAML or JSON config files — it’s excruciating to find a team member’s run and wade through their config file to find out what training set and hyperparameters were used. We’ve all done it, and we all hate it.

A new feature that W&B just released solves this problem. Artifacts allow us to track the inputs and outputs of our training and evaluation runs. This helps us a lot with reproducibility and traceability. By inspecting the Artifacts section of a run in W&B I can tell what datasets were used to train the model, what models were produced (from multiple runs), and the results of the model evaluation.

A typical use case is the following. A data staging process downloads the latest and greatest data and stages it to disk for training and test (separate data sets for each). These datasets are specified as artifacts. A training run takes the training set artifact as input and outputs a trained model as an output artifact. The evaluation process takes the test set artifact as input, along with the trained model artifact, and outputs an evaluation that might include a set of metrics or images. A directed acyclic graph (DAG) is formed and visualized within W&B. This is helpful since it is very important to track the artifacts that are involved with releasing a machine learning model into production. A DAG like this can be formed easily:

One of the big advantages of the Artifacts feature is that you can choose to upload all the artifacts (datasets, models, evaluations) or you can choose to upload only references to the artifacts. This is a nice feature because moving lots of data around is time consuming and slow. With the dataset artifacts, we simply store a reference to those artifacts in W&B. That allows us to maintain control of our data (and avoid long transfer times) and still get traceability and reproducibility in machine learning.

Leading ML teams

Looking back on the years I’ve spent leading teams of machine learning engineers, I’ve seen some common challenges:

  • Efficiency: As we develop new models, we need to experiment quickly and share results. PyTorch makes it easy for us to add new features fast, and Weights & Biases gives us the visibility we need to debug and improve our models.
  • Flexibility: Working with our customers in the fields, every day can bring a new challenge. Our team needs tools that can keep up with our constantly evolving needs, which is why we chose PyTorch for its thriving ecosystem and W&B for the lightweight, modular integrations.
  • Performance: At the end of the day, we need to build the most accurate and fastest models for our field machines. PyTorch enables us to iterate quickly, then productionize our models and deploy them in the field. We have full visibility and transparency in the development process with W&B, making it easy to identify the most performant models.

I hope you have enjoyed this short tour of how my team uses PyTorch and Weights and Biases to enable the next generation of intelligent agricultural machines!

About the author

I am the Director of Computer Vision and Machine Learning at Blue River Technology. We build robots that distinguish crop from weed in an agricultural field and then only spray the weeds. I’ve worked at Blue River for 4 and a half years. My background is in Physics and Astronomy and in grad school I helped build a telescope to measure the Cosmic Background Radiation. Check out our careers page, we are hiring!




PyTorch is an open source machine learning platform that provides a seamless path from research prototyping to production deployment.