🔥Supervisely: end-to-end web-platform for Deep Learning and Computer Vision

Supervise.ly
Supervisely
Published in
14 min readMay 21, 2019

Introduction

Building an AI powered product in computer vision is a long, complex and expensive process. The source of complexity is a very large number of tasks to perform during the development process. Data collection, annotation, thousand of experiments with Deep Learning models, continuous model improvement, sharing and collaboration. From high level perspective it’s just a huge pile of tasks to be solved by people with various expertise by means of dozens of software packages.

Figure 1. Pile of tasks to solve during AI product development in computer vision

A general solution is needed to put all people, data, algorithms inside a single ecosystem and to provide tools for efficient interaction and development.

Enterprise World understands the problem and potential market opportunities, so Amazon, Google, Nvidia, Microsoft offer their cloud-based AI platforms. However, tech giants have the intention to conquer the entire AI market — computer vision, NLP, speech recognition and so on. On the contrary, Supervisely is completely focused on computer vision tasks that allows us to provide end-to-end solution from image annotation to deployment of neural network models.

More concretely, due to our complete focus on computer vision direction our users get the following unique advantages (that no other available solution can currently offer):

  • Get from idea to a toy prototype in several minutes . It will take you 5 minutes to manually label 10 images, run data preparation script, train and apply model.
  • Leverage the largest Deep Learning models collection available. You can use Deep Learning models in unified, framework-independent way. So the experiments are fast and cheap, it’s easy to compare the performance of different models on your task.
  • Fast iterations. Active learning to improve your models continuously is a huge benefit of our platform.
  • Get ready-to-use ecosystems. Organizing workflow of data annotators, reviewers, data scientists and domain experts in a way that results are sharable and available with the emphasis on fast iterations usually implies creating complex front-end / back-end infrastructure that we provide out of the box.

Above we have outlined several advantages of the platform. But the purpose of the post is to provide an overview on how to move from raw data to production application in a systematic way. To accomplish that, we introduce Major components of Supervisely (figure 2) one by one to address a particular set of tasks. As we keep stacking the components, you will see that more and more tasks are covered and, hopefully, at the end, we manage to prove that, with Supervisely, the path from raw data to production application is shorter than you might think before.

Figure 2. Major components of Supervisely

Let’s start.

1. Labeling interface

Development of Deep learning application starts with data. Images / videos / dicoms should be labeled in a precise and efficient way. In Supervisely you can annotate either with vector graphics or on pixel level.

  • Vector graphics tools. Polygon, rectangle, polyline, point
  • Pixel level tools. Brush, eraser, Smart Tool

Once the objects are labeled you can attach more information to them with descriptions, key-value tags, custom data (figure 3).

Figure 3. Basic annotation tools overview

To make labeling more efficient for individual annotators, hotkeys might be defined for the majority of user actions (figure 4. left). Another way to simplify annotation process is through customization of visualization features. For instance, users have control over brightness / contrast of an image, opacity of annotated figures, sizes of points and more (figure 4. right).

Figure 4. Left. Hotkey to speedup labeling process. Right. Custom visualisations

Efficient ways to navigate through labeled data and perform visual inspection is a good way to identify annotation mistakes early. With filtering features (figure 5) you can ask to show, for example:

  • Images with the number of cars more than 4
  • Unlabeled images
  • Images that are associated with a given tag
Figure 5. Navigation through labeled images based on custom criteria

Lastly, Labeling Interface is where the work begins by creating the training data and also it’s a place to visualize models predictions and gain an intuition and insights of what are the next steps to take to make the model even better.

Additional materials:

✏️ Advanced annotation tools in Deep Learning: training data for computer vision with Supervisely

What’s next

Now that we can label images, we need a structured way to keep and enhance the training data.

2. Data organization

The next step is to keep images along with corresponding annotations in a structured, well-organised way. To accomplish it, we introduce the notion of workspace, project and dataset (figure 6).

Figure 6 — Workspaces, Projects and Datasets

A Workspace includes data related to a particular research or experiment. You can create several Workspaces and quickly switch between them. Uploaded and generated Datasets are grouped in Projects and share the same definitions.

In the example above (figure 6) WorkSpace 2 contains data related to road signs. Inside the Workspace there are two project: San Jose Project and London Project. These projects contain corresponding images and labelings. Meanwhile, road signs in San-Jose slightly differ from the ones in London, so putting the data into separate projects allows to handle differences in object classes.

Once the data is imported, you always know how many images in a given project, how many of that images are labeled and with what classes (figure 7. left). As you work with Supervisely more extensively, you can control Disk space usage and remove unnecessary objects (figure 7. right).

Figure 7. Left — basic project statistics; Right — Usage of disk space

Additional materials:

  1. Here is more information on how the data is organized in Supervisely

What’s next

Now it’s time to scale up the labeling process by allowing many workers to be involved in annotation process.

3. Team management

Once we have introduced labeling interface and data organization, we can scale up the labeling process itself by allowing many workers to be involved in annotation process.

Users are organized into Teams and within a team given user has specific role: viewer, labeler, reviewer, developer or admin. This approach allows to manage access rights (figure 8).

Figure 8. Team management and user roles diagram

Once Teams, Users and their Roles are defined, annotation at scale is straight forward to setup.

Labeling jobs is a way to assign an annotation task to a user with strict constraints on what images to label with what object classes and with what instruments (figure 9. left).

After labeling jobs are created and people start annotation, team manager can monitor activity of workers and always see the “big picture” (figure 9. right).

Figure 9. Left — Labeling jobs to create a task for a given user. Right. Activity monitoring interface

Additional materials:

  1. Teams & Labeling jobs — an example of creating a team and distributing annotation tasks between several team members via Labeling Jobs.

What’s next

Previous components provide necessary functionality to label at scale. In the next section Python SDK is introduced — a great way to manipulate training data and customize the platform for various use cases.

4. SDK & Python notebooks

Python SDK is the key to automation, data manipulation and assembling of custom data processing pipelines.

Jupyter Notebooks for repetitive data processing tasks and data exploration

We have make Jupyter Notebooks a part of the Platform to allow interactive data manipulation. For a lot of repetitive data processing tasks, pre-defined templates are available so there is no need to write a code at all (figure 10).

Figure 10. Example of python template that is applied to lemons_annotated project and outputs training_data project with data augmentation and training-validation tagging performed

In the example above, a template for training data preparation (data augmentation and training / validation split) is executed to output a new project that can be used to train segmentation model.

Training data querying

Below are examples of data querying that you might do from time to time:

  • Sampling. Annotation quality estimation, building models from small portion of data
  • Filtering. Keeping only the class of interest for the current experiment
  • Merging / splitting projects. Merging publicly available datasets with in-house labeled data
  • De-noising. Dropping small irrelevant objects
  • Tagging. Automatically tag images with train / val tags

Custom statistics and visualization

Statistics of interest might be extremely task specific. For example, you might want to know how many objects are labeled on different images to identify potential outliers in the training data (figure 11).

Figure 11. Custom statistics example

With Python SDK and Jupyter notebooks integrated any statistics and visualizations are straight forward to implement.

Consensus labeling scenario

It’s quite common when several persons get a task to label the same set of images. Then, images for which manual annotations are similar considered to be successfully labeled and the images that annotated differently are subject for manual revision. The way to measure annotation differences are task specific: for classification — tag matching, for object detection and semantic segmentation — Intersection over Union. Python SDK is used to build a script that compares annotations and associate a specific tag (“pending review”) with images for which annotations disagree. This could be done via intergrated Jupyter Notebook in a couple of clicks using prepared cookbooks.

Figure 12. Example of Python code to perform automatic comparison of annotations

Reviewer goes through images automatically marked with “pending review” tag and manually resolves conflicts.

Figure. 13. In this example “Alex” made a correct annotation, while “Dima”’s labeling lacks precision

Additional materials:

  1. Collection of Jupyter Nootbooks — to automate common data processing workflows.
  2. Supervisely SDK overview — an outline of available modules and areas they cover.
  3. Working with annotation primitives with SDK — the basics of working with annotated images in Supervisely format using Python SDK.

What’s next

In the next section we further extend a range of tasks covered with the platform by introducing neural networks. As you will see, active learning, AI-assisted labeling, multi-stage processing pipelines are natural extension of the platform functionality previously described.

5. Neural networks

Unique feature of Supervisely is the largest collection of Deep Learning models for computer vision available online that can be used for training and inference tasks in unified way without any coding (figure 14).

Below is a brief summary of models available:

Figure 14. Overview of Deep Learning models available

The underlying reason that allows us to provide so many production level and state of the art models is that Supervisely is a framework agnostic.

Figure 15. Docker and Supervisely SDK allow to support Deep Learning models from variety of frameworks

We rely on Docker technology and Supervisely SDK to integrate neural networks from TensorFlow, Pytorch, Keras, Darknet and other frameworks that provide python interface (figure 15).

We also understand that machine learning experts have already implemented tons of task specifics neural network architectures so here are the guideline on integrating your model into the platform.

Once the model is chosen (either the one from our library or you have integrated your model), it’s time to leverage more training data to improve the model performance.

Active learning

Once the model is chosen, it’s time to leverage the data to improve recognition quality.

Here is where Supervisely shines: since it incorporates labeling interface, data manipulation, team management and neural networks into single environment, implementing Active Learning approach becomes straightforward.

Each time the model improvement is needed, the following steps should be taken.

  1. Training step. Use existing Training Set to train a model.
  2. Evaluation step. Run the model to process labeled images from the Test Set and calculate performance metrics. Run the model to process unlabeled images and save predictions of the model.
  3. Labeling step. Identify mislabeled images, send them for manual labeling and, then, add newly labeled images to the Training Set. Go to step 1.
Figure 16. Continuous model improvement via Active Learning

Each time model improvement is needed, steps 1–3 are taken (figure 16). If dozens of tools are involved in the process the time spent are high and the mistakes due to data conversions / transformations might arise. These kinds of mistakes are extremely expensive and hard to identify.

Training data verification

Applying a model to annotated images and tagging the images for which prediction errors are high is an easy way to identify annotation mistakes in the training data. If these mistakes are systematic, identifying and correcting them early on, may save months of data scientists’ time.

Multi-stage data processing pipelines

A common scenario for production settings is when several models are applied sequentially. The next model relies on the processing results of the previous model. Figure 17 illustrates road quality inspection pipeline where semantic segmentation and detection models are interchangeably applied to identify road defects.

Figure 17. Roads quality inspection pipeline

Similar pipeline but for nutrition facts recognition task is shown on figure 18. In this case, semantic segmentation model is applied to find Nutrition facts label on the box. Then a rectangle is put around each line of text. Finally, CNN-LSTM OCR model is used to process images within the rectangles to recognize the text.

Figure 18. Nutrition facts recognition pipeline

The pipelines above are the production systems that solve the task in end-to-end manner and can be continuously improved inside the platform as more labeled data is available.

AI-assisted labeling

AI-assisted labeling is the key to get more high quality training data in a shorter period of time. There are several ways to leverage deep learning models to label data more efficiently.

1. Leverage available models for classification, object / landmarks detection, semantic segmentation.

A model is applied to unlabeled images and automatically generated predictions are used as a starting point in annotation process (figure 19).

Figure 19. Using pre-trained detection model to speedup labeling process

2. Leverage Deep Learning models that were design to interact with user and to minimize the number of clicks required to label an object.

Great example of this approach is our Smart tool — basically, it is a neural network trained in a class agnostic way to obtain a segmentation mask for dominant object inside specified rectangle (figures 20,21).

Figure 20. Smart tool for pixel-wise annotation of car
Figure 21. Smart tool for pixel-wise food annotation

The interactive approach above is applicable not only for semantic segmentation task but for landmarks detection as well. This feature is currently under research stage and it will released in future.

Smart tool is yet another neural network and can be trained the same way other Deep Learning models are trained inside the platform. So, usually, it’s a matter of several mouse clicks to run data preparation scripts and initiate training procedure to obtain a version of the Smart tool optimized for your specific objects / domain.

Additional materials:

What’s next

Let’s add compute power on top of the previous components. As you will see, adding more computational resources is a matter of running a single command.

6. Computational cluster

Figure 22 illustrates the process of adding one more machine with GPUs to the computational cluster. Essentially, it’s one step procedure — just execute autogenerated command on your Linux machine with Docker and Nvidia Docker installed. After running the command a Supervisely Agent (open sourced tiny Python program) is installed on the machine and it can be used to run training or inference tasks

Figure 22. Attaching GPU machine to the platform

Here is an important note:

It does not really matter whether you use AWS machines, your home or office computers or any combination of the resources. As soon as Supervisely agent is installed, your computational cluster is 24/7 available.

Additional materials:

What’s next

The last section is devoted to potential Enterprise customers with developed internal infrastructure, for whom custom integration is required

7. Custom integration options

For our corporate customers we offer Supervisely Enterprise Edition. Supervisely EE was designed to run inside private network without any access to the internet. Thus, you can be sure that no services or data would be exposed to potential threats from the web.

You can also use any S3-compaible storage if you want to have your data in local cloud storage or attach your custom data storage backend. Your existing user management system can be intergrated via OpenID or LDAP.

But even if the existing integration mechanisms won’t meet your needs, you can always extend Supervisely via low-level Restful API, attach custom plugin as a Docker image or use intergrated Python Notebooks.

Figure 23. Supervisely infrastructure overview

Additional materials:

Next we will outline future direction of our work.

Future work

There are a number of directions to follow that will make the platform even stronger

1. More Deep Learning models available

Even today Supervisely provides the largest collection of models for semantic segmentation, object detection and classification tasks. Ideally, every State of The Art neural network related to computer vision is available in our platform. We will keep monitoring the latest research to maintain neural network library up-to-date as well as support more computer vision tasks and corresponding Deep Learning models.

2. AI assisted labeling

There are categories of tasks for which traditional ways of labeling do not work at all. Take portrait segmentation task as an example. It’s impossible to manually create high quality segmentation masks for objects like hairs — probably, the only way to address it is by using Photoshop. But Photoshop is not designed to work with training data and we see AI powered photo-editing tools as part of the platform in the future.

3. Additional annotation primitives

In the next release we are going to include a tool that allows to annotate images with custom graphs — ideal tool for landmarks annotation. As Deep learning methods are used more extensively in AI products, more and more complex structures should be supported.

4. Further workflow simplification

Time-to-market is critical, so we will keep further simplify common workflows to further shorten the path from raw data to production models.

Conclusions

Supervisely platform is a good starting point for AI powered production level applications in Computer Vision. The platform is designed to address a wide range of tasks from data annotation to building and deployment of latest Deep Learning models.

Due to our entire focus on Computer Vision, we were able to design a platform to allow annotators, reviewers, machine learning and domain experts to work within single web-based platform in collaborative manner, use and customize data processing pipelines and move fast from raw images to production application.

We have introduced seven major components of the platform that address the tasks on the way to the product in a systematic manner. We highly encourage you to try Community Edition of Supervisely for free or speak with us about an Enterprise solution for your business.

Also, most components of Supervisely are open sourced, so everyone is very welcome to contribute our GitHub repo.

If you found this post interesting, then let’s help others too. More people will see it if you give it some 👏.

--

--