Midas : A Machine Learning Model for Depth Estimation

David Cochard
Published in
3 min readMay 28, 2021

This is an introduction to「Midas」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.


Midas is a machine learning model that estimates depth from an arbitrary input image.

Source: https://arxiv.org/pdf/1907.01341v3.pdf


Various datasets containing depth information are not compatible in terms of scale and bias. This is due to the diversity of measuring tools, including stereo cameras, laser scanners, and light sensors. Midas introduces a new loss function that absorbs these diversities, thereby eliminating compatibility issues and allowing multiple data sets to be used for training simultaneously.

Midas uses multiple datasets for training, as shown in the table below. Therefore, it can estimate the depth of images in various conditions and environments.

Source: https://arxiv.org/pdf/1907.01341v3.pdf

In addition, 3D movies were also used for training to complement the existing data set.

Source: https://arxiv.org/pdf/1907.01341v3.pdf

Below is the loss function introduced by Midas.

Source: https://arxiv.org/pdf/1907.01341v3.pdf

The architecture of the network is based on ResNet.

Source: https://arxiv.org/pdf/1907.01341v3.pdf


You can use the following command to run Midas on the webcam video stream in ailia SDK.

$ python3 midas.py -v 0

You can also choose the higher precision v2.1 or the faster v2.1 small model, which runs five times faster than the regular model and enables real-time processing.

$ python3 midas.py -v 0 -v21
$ python3 midas.py -v 0 -v21 -t small

Here are some results.

Related topic

ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.



David Cochard

Engineer with 10+ years in game engines & multiplayer backend development. Now focused on machine learning, computer vision, graphics and AR