Instance Segmentation with Mask R-CNN and TensorFlow on Onepanel

Published in

Onepanel

7 min readSep 5, 2019

This project can be easily forked from Onepanel .

What is Instance Segmentation?

Instance segmentation is the task of identifying object outlines at the pixel level. Compared to similar computer vision tasks, it’s one of the hardest possible vision tasks. Consider the following asks:

Classification: There is a balloon in this image.
Semantic Segmentation:These are all the balloon pixels.
Object Detection: There are 7 balloons in this image at these locations. We’re starting to account for objects that overlap.
Instance Segmentation: There are 7 balloons at these locations, and these are the pixels that belong to each one.

Let’s Build a Balloon Filter

Training Dataset

We’ve created this python script

import csv
from datetime import datetime
import glob
import json
import os
import re
import sys
import time
import urllibfrom bs4 import BeautifulSoup
import requests
from requests.exceptions import ConnectionError, ReadTimeout
from typing import Listdef queries_from_other_sources(func):
    def wrapper(*args, **kwargs):
        print(len(args[0]))
        if len(args[0]) != 3:
            print('Invalid argment\n> [target name] [number of images to be downloaded] [save dir]')
            return None
        if os.path.isfile(args[0][0]):
            with open(args[0], 'r') as f:
                queries = [q[:-1] for q in f.readlines()] # remove '\n' at the end of each string
            dirnames = [q.replace(' ', '_') for q in queries]
            args[0].append('')
            for query, dirname in zip(queries, dirnames):
                args[0] = query
                args[2] = dirname
                func(args[0], **kwargs)
        elif os.path.isdir(os.path.split(args[0][1])[0]):
            dirnames = [os.path.split(p)[1] for p in glob.glob(args[0][1])]
            queries = [re.sub(r'^n\d{8}-', '', s.replace('_', ' ')) for s in dirnames]
            args[0].append('')
            for query, dirname in zip(queries, dirnames):
                args[0] = query
                args[2] = dirname
                func(args[0], **kwargs)
        else:
            func(args[0], **kwargs)
        return None
    return wrapperclass Google(object):
    def __init__(self):
        self.GOOGLE_SEARCH_URL = 'https://www.google.co.in/search'
        self.session = requests.session()
        self.session.headers.update({'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0'})    def search(self, keyword, maximum):
        print('Begining searching', keyword)
        query = self.query_gen(keyword)
        return self.image_search(query, maximum)    def query_gen(self, keyword):
        # Search query generator
        page = 0
        while True:
            params = urllib.parse.urlencode({
                'q': keyword,
                'tbm': 'isch',
                'ijn': str(page)})
        yield self.GOOGLE_SEARCH_URL + '?' + params
            page += 1    def image_search(self, query_gen, maximum):
        # Search image
        result = []
        total = 0
        while True:
            # Search
            query = next(query_gen)
            try:
                html = self.session.get(query, timeout=20).text
            except (ConnectionError, ReadTimeout) as e:
                print(e)
                print('retry in 10 sec...')
                time.sleep(10)
                try:
                    html = self.session.get(query, timeout=20).text
                except Exception as e:
                    print(e)
                    continue
            soup = BeautifulSoup(html, 'lxml')
            elements = soup.select('.rg_meta.notranslate')
            jsons = [json.loads(e.get_text()) for e in elements]
            imageURLs = [js['ou'] for js in jsons]            if not len(imageURLs):
                break
            elif len(imageURLs) > maximum - total:
                result += imageURLs[:maximum - total]
                break
            else:
                result += imageURLs
                total += len(imageURLs)        print('-> Found', str(len(result)), 'images')
        return resultdef main(args: List):
    google = Google()
    req_headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
    if len(args) < 3:
        print('Invalid argment')
        print(' [target name] [download number] [save dir]')
        sys.exit()
    else:
        # Save location
        name = args[0]
        data_dir = args[2]
        if len(args) > 3:
            dest_dir_path = os.path.join(data_dir,args[1])
            urls_dest_dir_path = os.path.join(data_dir, 'urls')
            urls_file = os.path.join(urls_dest_dir_path, args[2] + '.csv')
        else:
            dest_dir_path = os.path.join(data_dir,name.replace(' ', '_'))
            urls_dest_dir_path = os.path.join(data_dir, 'urls')
            urls_file = os.path.join(urls_dest_dir_path, name.replace(' ', '_') + '.csv')
        print(dest_dir_path)
        os.makedirs(dest_dir_path, exist_ok=True)
        #os.makedirs(urls_dest_dir_path, exist_ok=True)        result = google.search(
            name, maximum=int(args[1]))
        result_logs = []        download_error = []
        for i in range(len(result)):
            try:
                request = urllib.request.Request(url=result[i], headers=req_headers)
                data = urllib.request.urlopen(request, timeout=15).read()
                with open(os.path.join(dest_dir_path, str(i + 1).zfill(4) + '.jpg'), "wb") as f:
                    f.write(data)
                downloaded = 1
            except requests.exceptions.ConnectionError as e:
                print(e)
                download_error.append(i + 1)
                downloaded = 0
                time.sleep(10) # may need some time to escape connection refusion next time
            except Exception as e:
                print(e)
                download_error.append(i + 1)
                downloaded = 0result_logs.append((i+1, result[i], downloaded))print('Complete download')
        print('├─ Download', len(result) - len(download_error), 'images')
        print('└─ Could not download', len(
            download_error), 'images', download_error)if __name__ == '__main__':
    main(['partybaloons','100','/onepanel/output/baloons'])

which is then connected to a NodeRED flow, which scrapes images from google image search and puts them to the desired location.

I picked a total of 100 images and divided them into a training set and a validation set. Finding images is easy. Annotating them is the hard part.

Wait! Don’t we need, like, a million images to train a deep learning model? Sometimes you do, but often you don’t. I’m relying on two main points to reduce my training requirements significantly:

First, transfer learning. Which simply means that, instead of training a model from scratch, I start with a weights file that’s been trained on the COCO dataset (we provide that in the github repo). Although the COCO dataset does not contain a balloon class, it contains a lot of other images (~120K), so the trained weights have already learned a lot of the features common in natural images, which really helps. And, second, given the simple use case here, I’m not demanding high accuracy from this model, so the tiny dataset should suffice.

There are a lot of tools to annotate images. Onepanel has CVAT inbuilt which is a simple interactive video and image annotation tool for computer vision. Annotating the first few images can be very slow, but once you get familiar used to the user interface, you can annotate at around an object a minute.

Loading the Dataset

There isn’t a universally accepted format to store segmentation masks. Some datasets save them as PNG images, others store them as polygon points, and so on. To handle all these cases, our implementation provides a Dataset class that you inherit from and then override a few functions to read your data in whichever format it happens to be.

Code Tip:
An easy way to write code for a new dataset is to copy coco.py and modify it to your needs. Which is what I did. I saved the new file as balloons.py

BalloonDataset class looks like this:

class BalloonDataset(utils.Dataset):    
      def load_balloons(self, dataset_dir, subset):
                     ...    
      def load_mask(self, image_id):
                     ...    
      def image_reference(self, image_id):
                     ...

load_balloons reads the JSON file, extracts the annotations, and iteratively calls the internal add_class and add_image functions to build the dataset. load_mask generates bitmap masks for every object in the image by drawing the polygons.

image_reference simply returns a string that identifies the image for debugging purposes. Here it simply returns the path of the image file.

You might have noticed that my class doesn’t contain functions to load images or return bounding boxes. The default load_image function in the base Dataset class handles loading images. And, bounding boxes are generated dynamically from the masks.

Verify the Dataset

To verify that code is implemented correctly use this Jupyter notebook. It loads the dataset, visualizes masks and bounding boxes, and visualizes the anchors to verify that my anchor sizes are a good fit for my object sizes.

Here is an example of what you should expect to see:

Configurations

The configurations for this project are similar to the base configuration used to train the COCO dataset, so I just needed to override 3 values. As I did with the Dataset class, I inherit from the base Config class and add my overrides:

class BalloonConfig(Config):    # Give the configuration a recognizable name
    NAME = "balloons"    # Number of classes (including background)
    NUM_CLASSES = 1 + 1  # Background + balloon    # Number of training steps per epoch
    STEPS_PER_EPOCH = 100

The base configuration uses input images of size 1024x1024 px for best accuracy. I kept it that way. My images are a bit smaller, but the model resizes them automatically.

Training

Mask R-CNN is a fairly large model. Especially that our implementation uses ResNet101 and FPN. So you need a modern GPU with 12GB of memory. It might work on less, but I haven’t tried. I used Onepanel’s K80 instances to train this model, and given the small dataset, training takes less than an hour.

Start the training with this command, running from the balloon directory. Here, we’re specifying that training should start from the pre-trained COCO weights. The code will download the weights from our repository automatically:

python3 balloon.py train --dataset=/path/to/dataset --model=coco

And to resume training if it stopped:

python3 balloon.py train --dataset=/path/to/dataset --model=last

Using Onepanel Jobs for parallel training,

Use this script to spin up multiple jobs using various set of hyperparameters to train your models parallel and choose the best one.

from onepanel.models import DatasetMount
from onepanel.models import ProjectRepository
from onepanel.models import Job
from onepanel.sdk import Clientclient = Client()
job = Job()
job.project.uid = 'maskrcnn'
job.command = 'cd /onepanel/code/samples/balloon/ && ls && python balloon.py train --dataset=/onepanel/input/datasets/ \
                onepanel-demo/baloon-mrcnn/1/ --weights=/onepanel/code/samples/balloon/mask_rcnn_balloon.h5 \
                --logs=/onepanel/output/ --gpu_count=1 --images_per_gpu=2 --learning_rate=0.002 --valid_steps=100 --train_steps=2000'
job.machine_type.uid = 'gpu-8-52-1k80'
job.environment.uid = 'jupyter-py3-tensorflow1.13.1'
job.volume_type.uid = 'ssd-100gb'
job.dataset_mount_claims = [
    DatasetMount(account_uid='onepanel-demo', dataset_uid='baloon-mrcnn', version=1, destination='/onepanel/input/')
]
job.code_repository = ProjectRepository(url=' https://git.onepanel.io/onepanel-demo/maskrcnn.git ', branch='master')
job_uid = client.jobs.create(job)

Read more about the Onepanel SDK/CLI here .

References:

https://github.com/matterport/Mask_RCNN