Don’t know how to run Tensorflow Object Detection? In this tutorial, I will show you 10 simple steps to run it on your own machine! We will use Tensorflow version 1.8. Are you ready to start detecting objects?

This guide will help you install Tensorflow on GPU enabled host. You will need Nvidia GPU with Compute Capability equal to 3.0 or greater. You can check CC here. If you don’t have such GPU you can omit CUDA and CUDNN installation and just install tensorflow without GPU support.

You can try to use these steps to install Tensorflow on Windows, but…

Instance segmentation, object detection, drivable areas and lane markings — all you can find in Berkley DeepDrive 100K Dataset. It consists of more than 100 000 HD videos recorded at various times, seasons and weather. The dataset includes localization, timestamp and IMU data.

Data were collected in 4 locations which 3 are close to each other (SF, Berkeley and Bay Area), and the last one is New York.

30th April 2018 new version of Open Images Dataset V4 is released. There is also announced a challenge for best object detection results using this dataset.

Here you can see data examples: Open Images Dataset V4

ECCV 2018 Open Images Challenge

During ECCV 2018 conference there will be a workshop dedicated Open Images Challenge (presented by Vittorio Ferrari, Alina Kuznetsova, Jasper Uijlings, Rodrigo Benenson, Victor Gomes, Matteo Malloci). They will announce challenge results.

The Challenge has two tracks:

  1. Object Class Detection: predicting a tight bounding box around all instances of the 500 classes.
  2. Visual Relationship Detection: detecting pairs of objects in particular relations, e.g. …

4K dashcam videos versus State of The Art object detection deep nets such as YOLO, SSD or Mask RCNN.

Object detection with YOLO and 4K dataset —
Mask RCNN result for video #2

I want to share my datasets I use for testing deep neural networks. I have already tested on 4k videos:

  • Darknet YOLO
  • Tensorflow Object Detection API: SSD, Faster RCNN
  • Segmentation
  • Mask RCNN


I am using original implementation (Darknet by Joseph Redmon) with 4 different trained weights files. These weights are:

Weights are downloaded from:
Source code:



Google offers machine learning REST API for image content understanding.

Recognize place, faces, category, web entities, text and more [source image:]

In this post I would like to show how to easily run image recognition in the cloud with a little help of powerful deep learning models. Several models are accessible using one REST API interface. You can upload your image and get specified insights. You can choose from following:

  • Face detection [33 landmarks, emotional state, wearing headwear, Facial Recognition is NOT supported]
  • Landmark detection [detect popular structures, predicts location]
  • logo detection [detect popular product logos]
  • label detection [image category]
  • OCR [Optical Character Recognition, detect text, automatic language identification]
  • document text

Distributed file system demo using docker containers.


What is MooseFS?

MooseFS is a distributed file system. It spreads data over several physical commodity servers, which are visible to the user as one resource. For standard file operations MooseFS acts like any other Unix-like file system:

  • A hierarchical structure (directory tree)
  • Stores POSIX file attributes (permissions, last access and modification times)
  • Supports special files (block and character devices, pipes and sockets)
  • Symbolic links (file names pointing to target files, not necessarily on MooseFS) and hard links (different names of files which refer to the same data on MooseFS)

Distinctive features of MooseFS

  • High availability (i.e. redundant metadata servers)

IMM’s Husky platform with Mandala 3D Mapping Unit

Team IMM approach to European Robotics League Emergency 2017

European Robotics League is funded by the European Union’s Horizon 2020 Program. It is continuation of three earlier projects:

  • RoCKIn@Home (now: ERL Service Robots) tournament focuses on the domain of service robotics for home application.
  • RoCKIn@Work (now: ERL Industrial Robots) tournament focuses on the domain of industrial robotics in the Factory of the Future and also deals with modern automation issues.
  • euRathlon (now: ERL Emergency Robots) is a civilian, outdoor robotics competition, with a focus on realistic, multi-domain emergency response scenarios.

ERL Emergency 2017 is a continuation of Eurathlon 2015 robotic…

Semantic segmentation is one of projects in 3rd term of Udacity’s Self-Driving Car Nanodegree program. The goal is to train deep neural network to identify road pixels using part of the KITTI dataset.

This solution uses VGG16 with 3 skip layers. The size of the input image is 576 x 160. Results for 4k video are generated by resizing prediction to 4k. The results are not perfect because of two factors:

  • really small training dataset <400 images
  • training shorter than <120k samples (with 16 samples per batch)

Data augmentation

For this result I was using only these 3 technics:

  • flipping every image

Input video | Detected Face | Generated Pix2Pix output

Inspired by this work Dat Tran, I prepared my own dataset and trained improved Pix2Pix net to generate Polish youtuber Krzysztof Gonciarz creating show “Zapytaj Beczkę”.

But first let’s give it a try:

In my first approach I just trained the original net from face2face-demo. It worked! But it was only 256 x 256 resolution which was not enough for me, so I decided to increase resolution to 1024 x 1024.

High resolution mod

To increase resolution one needs to add layers to encoder and decoder, there is no simpler way to do it. …

Karol Majek

Self-Driving Car Engineer. Mentor at Udacity Self-Driving Car Nanodegree. Mobile Robotics Engineer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store