Don’t Miss Your Target: Object Detection with TensorFlow and Watson

Men In Black, Columbia Pictures (1997)

I’ve been using Watson Visual Recognition for a while now. It gets the job done and is super easy to set up. Out of the box, I can give it a photo and it will give me a good idea of what the image depicts. I can even go further and easily build my own custom classifiers. However, one thing that it can’t do, is object detection. Sometimes there are multiple objects in an image and we need to know where they are and what they are.

TensorFlow just recently released an object detection API so I decided to try and leverage it to help out with my Watson classifications. It worked out pretty well so I thought I’d share with you what I have.

Getting Started

To make it easier to follow along, you can start by cloning this GitHub repo. It has all the code to run out of box, except for the visual recognition API key.

Note: This project is using python so make sure you have python and pip installed on your machine.

First, pip install all the requirements:

pip3 install -r requirements.txt

In the first part of the project, after all the import statements, you’ll find this line of code:

visual_recognition = VisualRecognitionV3('2016-05-20', api_key='API_KEY')

This is where we initialize the Watson Visual Recognition service. You’ll need replace API_KEY with your actual API Key. Go ahead and do that now if you already have it. If not I will go over it later in the Bluemix section.

The next section are variables you can adjust to fit your needs:


MAX_NUMBER_OF_BOXES is the maximum amount of objects to locate, I just set it to 10 because it can get pretty messy if there are a lot of them.

MINIMUM_CONFIDENCE is the minimum confidence score that a box can have. If this value is to low you may end up with boxes around nothing.

The middle sections of the code just downloads the model and then loads it into memory. Downloading the model can take a really long time, so be patient, but don’t worry, it only has to download on the first run.

Detecting Objects

The next part of the code is running the image through TensorFlows object detection. It will give us the coordinates of the box as a numpy array of the edge positions [top, left, bottom, right]. We will then crop and save the images based on the boxes.

In order for us to crop the correct area we need to transform the coordinates from percentages to pixels by multiplying the values by the width and height:

box_x = box[1] * width    # Left edge
box_y = box[0] * height # top edge
box_x2 = box[3] * width   # right edge
box_y2 = box[2] * height # bottom edge

After we have the saved image portions, we can pass each of them to Watson to get classifications:

with open(full_path, 'rb') as images_file:
results = visual_recognition.classify(
print(json.dumps(results, indent=2))

Note: This is using the default classifier, but you can also use food, or create your own. Read more here.


In order to use Watson, you need to create the service and generate credentials on IBM Bluemix. Bluemix is IBM’s PaaS offering that lets you deploy and manage your cloud applications.

Once you sign-up you should see the Visual Recognition service, if not, go to the Catalog and you should find the Visual Recognition service under Watson.

After creating the service, you should be able to find your API key by clicking View Credentials in the Service Credentials sections.

Back at the top of our project, try to find this line of code:

visual_recognition = VisualRecognitionV3('2016-05-20', api_key='API_KEY')

We need to replace API_KEY with your actual API Key we just got from Bluemix.


For this to work we need to supply an image to detect objects in. I’ve included a picture of 4 dogs in the repo that you can use, or replace it with your own.

Note: The project expects the file to be located at “test_image/image1.jpg”.

We can run the project with this command:


Just remember that it might take a really long time for the first run. But, if all goes well, you should see something like this:

Final Thoughts

I haven’t used TensorFlow very much, but it seems great! I look forward to learning more about it and diving a little deeper. I hear you can do top level retraining for custom object detection models. That sounds like a good idea for another tutorial. Have fun, be creative and keep on hacking!

Thanks for reading! If you have any questions, feel free to reach out at, connect with me on LinkedIn, or follow me here on Medium.

If you found this article helpful, it would mean a lot if you click the 💚 and share with friends.