Red pepper chef— from new training data to deployed system in a few lines of code

The samples for this code are available on github.

Deep learning is continuing to grow in importance.

One of the toughest parts of deep learning is integration. Making it actually work. Here we walk through how we deployed a modern object detection system to distinguish between parts of a red pepper we wish to discard, and parts we wish to keep.

Hypothetically this could be used as a sub system for a robotic chef. The method we used is generic, so we hope this acts as some inspiration for others looking to get started.

Visual of output, distinguish between parts of a red pepper we wish to discard, and parts we wish to keep

In this article:

  • How to create new training data for object detection
  • Train a custom object detection model (in one line of code)
  • Deploy and get predictions (also in one line)

Creating the training data


Take photos or video. I took 15 photos with my phone while I was cutting up the red pepper.

Example photos


I created 2 labels in Diffgram discard and keep.

Since this is an object detection problem the system automatically handles the background label (the not discard and not keep so to speak).

I labelled the data by drawing boxes around the areas of interest.

Labelled data

Mark as completed

After all the visible keep and discard parts of the image where labelled, I marked the image as complete.

This matters because training on an image with incorrect or incomplete annotations would be confusing to the network. It would be like grading a test with an answer key that’s wrong.

By default Diffgram only takes complete images for training.

This means I have the flexibility to import all images, and only the completed ones will be used for training. And I can arbitrarily test how data effects results by marking groups of images complete / not complete.

Making it practical with Diffgram

Diffgram makes it more practical to train models by acting as a middleware between deep learning implementations, hardware, and data. Diffgram makes it more accessible to normal software engineers by providing useful opinionated defaults.

The integrated automatic object detection training similar in spirit to Google’s AutoML. This means the entire training process only requires a single command.

High level overview of training middleware

This is new. People may be used to the training process being a lot more complicated, and even using existing implementations can be difficult to get the formatting and configurations right.

A goal is to abstract the training process to the point that it becomes as standard as normal code compilation. We see applications using many brains, and needing a strong way to handle the interaction between those brains.

Accessibility with strong defaults

Diffgram suggests defaults.

Personally I think you should not have to read 30 papers and take a bunch of courses just to get started with deep learning. Differences that people working in the field take for granted can be daunting to someone trying this for the first time.

A brief technical overview (Experts, see italics):

  • Diffgram trains best in class open source models. The default model provides a good blend of performance and speed for an average application. At time of writing the default model is SSD MobileNet V2.
  • It fine tunes based on transfer learning. Default is pre-trained on ImageNet, then on MSCOCO.
  • Default model settings, for example, the length of time the model runs. We set hyperparameters, ie global steps as a function of examples.

In the future we plan to expand this offering with more high level choices for a normal user, and more access to low level choices for experts.

Two options for training

  • User interface
  • SDK

Training through the user interface

The user interface visually displays the result of the pre-checks such as number of images, instances, (ie boxes), and labels. The default requirements are designed to help guide new users to get reasonable initial results.

Training through the SDK

Github sample

Install the library: pip install diffgram

Example code:

from diffgram import Diffgram

project = Diffgram(
project_string_id = "replace_with_project_string",
client_id = "replace_with_client_id",
client_secret = "replace_with_client_secret")

# Construct new training with all defaults
brain = project.train.start(



First we define credentials and create a class Project object.

Then we call train.start() which starts training and returns a class Brain object.

We check the status by calling brain.check_status().

The default training process can be expected to take an hour.

Get status

We can call brain.get_model() to get the brain outside of an initial training run. By default get_model() will return the latest model in the project, optionally you can provide a named model.

brain = project.get_model(name = "my_model") 


Diffgram has integrated automatic deployment. This means the deploy process requires a single command. This is also new. Usually deploys require a lot more effort.

More strong defaults

Diffgram follows best practices for optimization.

For example, at prediction time networks don’t need quite as much precision, so instead of using say length 32 storage we use length 8. This usually results in a surprisingly negligible performance decrease at a 3–4x speedup.

Experts, this is train time quantization of the network, defaults to starting once 80% of global steps are complete.

Red pepper results

I ran the brain on some training and validation data. It worked fairly well!

Left train, right validation

Exception handling

And had some misses!

On the validation image below and to the left, it didn’t do as well. While there were no false predictions (ie calling a discard a keep), it missed all of the discards. I can now correct this example, and (optionally) use it as a training example going forward.

Left validation, right human corrected

I can do the same with this other example:

Left, validation inference, Right, corrected inference

As you can see the work that’s being done on the system is with improving the training data.

To oversimplify, imagine instead of cutting a red pepper every day, you only correct the robot when it makes a mistake. To start off with it may be a little messy, but over time it will get better and better to the point that the mistakes may be acceptable and require no further correction.

keep and discard was just a first blush at how to do this. Maybe it would be better to also include a stem label, etc. This is generic approach so we can define relatively arbitrary classes here.

UI deploy

Diffgram enables non technical people to experiment and work on the data related to these networks. There is a fully integrated deploy option through the UI. Select the files you want to run and click “inference”.

SDK Deploy

Github link

The goal here is to run the model and get our inference results back.

First we call get_model(), assuming we are using this on an already trained model.

brain = project.get_model(name = "my model")

Then we call one of the predict options, ie

inference = brain.predict_from_url(url)

The default deploy option takes a minute to warm up then a second per image. Calling any predict method will start the warm up.

3 different ways to do predictions from the SDK

  • From a local file
  • From a remote file (by url)
  • From a diffgram file id
# Local
path = ""
inference = brain.predict_from_local(path)
# URL (Recommended over from local, works for cloud providers)
url = ""
inference = brain.predict_from_url(url)
# Diffgram file
inference = brain.predict_from_file(file_id = 111546)

All 3 predict operations will return a class Inference object.

By default only high probability predictions will be returned. Experts, IoU > .5


We can also view the predictions on Diffgram, by default high probability predictions get saved as draft Instances. Sort by “FAN only” in the media controls.

Inference and Instances

Inference objects have class Instance objects that have key information on the prediction.

inference = brain.predict_from_url(url)
instance = inference.instance_list[0]

For object detection the information in the instance is:

  • instance.location, length 4 array normalized to the image, ie [0.0, 0.3841, 0.5845, 0.6375]
  • instance.score, float, ie 0.99145
  • instance.label, integer, ie 2

Updates coming to improve ease of working with instances.

You can then use the inference information in your application!

Naturally this was meant as a demo, I hope whoever builds a robotic pepper | carrot | vegetable chef will use a little more training data! :)

Try this yourself — on your own data

Thanks for reading!