Unleashing Depth Anything v2: SOTA Monocular Depth Estimation on Intel CPU with OpenVINO and NNCF
In this article, we’ll dive into the latest advancements in monocular depth estimation, focusing on the state-of-the-art Depth Anything V2 model. We’ll walk through how to convert this model to OpenVINO, leverage the benefits of OpenVINO 2024.1 for inference, and further optimize it by quantizing to INT8 using Neural Network Compression Framework (NNCF ). By the end of this guide, you’ll be equipped to use the converted models for the most accurate and efficient monocular depth estimator.
Let’s get started! 🚀
You can find the complete code and all utility functions used here on my Github.
Depth Anything V2
Depth Anything V2 is a cutting-edge model for monocular depth estimation. Developed by Yang et al. (2024), it significantly advances the accuracy and efficiency of depth estimation from a single image.
About OpenVINO
OpenVINO™ (Open Visual Inference and Neural Network Optimization) is an open-source toolkit that optimizes and accelerates AI inference. The latest version, OpenVINO 2024.1, offers enhanced support for various deep learning models, making it a perfect choice for deploying depth estimation models in real-world applications.
Python Environment
For this project, I’m using a python virtual environment with python 3.11.
the requirements.txt files can be found on the Github repo.
Download Pre-trained Weights and Inference with PyTorch
First, let’s download the pre-trained weights for Depth Anything V2 and run inference using the original PyTorch implementation.
Official weights provided by the authors
Right below I’m going to show how to build the pytorch model, load the pre-trained weights and how to run inference.
import torch
from depth_anything_v2.dpt import DepthAnythingV2
import utils
class DepthAnythingV2Pytorch:
def __init__(self, model_type="vits", device="cpu"):
self.model_configs = {
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
}
self.device = device
self.weights_path = f"weights/depth_anything_v2_{model_type}.pth"
self.model = DepthAnythingV2(**self.model_configs[model_type]).eval()
self.model.load_state_dict(torch.load(self.weights_path, map_location=device))
def predict(self, image):
"""depth estimation prediction method from a RGB Image.
Args:
image (numpy): RGB Image of shape (height, width, 3)
"""
input_tensor, image_size = utils.image_preprocess(image)
out = self.model(torch.from_numpy(input_tensor))
depth = utils.postprocess(out.cpu().detach().numpy(), image_size)
return depth
# example of how to use it
if __name__ == "__main__":
# load model with pretrained weights - choosing small version and using cuda
model = DepthAnythingV2Pytorch(model_type="vits", device="cuda")
# download image from a url and convert to numpy (RGB Image on PIL to numpy)
image_url = "https://images.pexels.com/photos/5740792/pexels-photo-5740792.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"
image = np.array(utils.download_image(image_url))
# prediction
depth = model.predict(image)
# colorfull depth map with values, check utils module no Github to see the full code
The authors have published three model versions, the small, base and large. With this code we can choose which one to use, but to keep it as fast as possible, I’m going to use the small version for all analysis here.
Prediction using original Pytorch model running on my RTX3060
Converting Pytorch Model to OpenVINO
##########################
# Load Pre-trained model #
##########################
model_select = "vits"
model_configs = {
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
}
weights_path = f"weights/depth_anything_v2_{model_select}.pth"
model = DepthAnythingV2(**model_configs[model_select]).eval()
model.load_state_dict(torch.load(weights_path, map_location='cpu'))
########################
# Get Sample RGB Image #
########################
image_url = "https://images.pexels.com/photos/5740792/pexels-photo-5740792.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"
image = np.array(utils.download_image(image_url))
########################
# Preprocess RGB Image #
########################
input_tensor, image_size = utils.image_preprocess(image)
#######################
# Convert to OpenVINO #
#######################
ov_model_path = Path("models_ov") / Path(Path(weights_path).name.replace(".pth", ".xml"))
if not ov_model_path.exists():
ov_model = ov.convert_model(model, example_input=input_tensor, input=[1, 3, 518, 518])
ov.save_model(ov_model, ov_model_path)
Using the latest OpenVINO is simple to convert from a pytorch model, so it basically loads the pre-trained Pytorch model and provides an input example.
Note: By default the OpenVINO Intermediate representation (IR) saved using this approach will quantize the model to be float16.
Using OpenVINO model for inference
Here’s how we can load the converted openvino IR model and run inference. As before, I’m creating a class to help use the code later.
Note that the input and output of the object will remain the same as the Pytorch version.
import openvino as ov
import utils
class DepthAnythingV2OpenVINO:
def __init__(self, ov_model_path="depth_anything_v2_vit.xml", device="AUTO"):
self.ov_model_path = ov_model_path
self.core = ov.Core()
self.compiled_model = self.core.compile_model(self.ov_model_path, device)
def predict(self, image):
"""depth estimation prediction method from a RGB Image.
Args:
image (numpy): RGB Image of shape (height, width, 3)
"""
input_tensor, image_size = utils.image_preprocess(image)
out = self.compiled_model(input_tensor)[0]
depth = utils.postprocess(out, image_size)
return depth
# example of how to use it
if __name__ == "__main__":
model = DepthAnythingV2OpenVINO()
# download image from a url and convert to numpy (RGB Image on PIL to numpy)
image_url = "https://images.pexels.com/photos/5740792/pexels-photo-5740792.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"
image = np.array(utils.download_image(image_url))
# prediction
depth = model.predict(image)
# colorfull depth map with values, check utils module no Github to see the full code
Quantization to INT8 using NNCF
To run this step you’ll need to install two more packages
pip install nncf
pip install datasets
NNCF — Neural Network Compression Framework for enhanced OpenVINO™ inference.
And datasets to download our calibration dataset used by the NNCF to quantize our openvino model (at the moment with fp16 precision) model.
First, nncf needs a few images to be able to quantize the model reducing as much as possible the quality of the output, a calibration dataset. We are going to use the dataset created on the Depth Anything v2 work as well, which is also published in their hugging face collections with the name “depth-anything/DA-2K”.
import datasets # to get the images for calibration
from tqdm import tqdm
import utils # to use image_processing function
#################################
# Creating the calibration data #
#################################
calibration_data = []
dataset = datasets.load_dataset("depth-anything/DA-2K",
split="train",
streaming=True)
# let's shuffle and take just a small portion of it
dataset = dataset.shuffle(seed=2024).take(300)
for batch in tqdm(dataset):
image = np.array(batch["image"])[...,:3]
input_tensor, _ = utils.image_preprocess(image)
calibration_data.append(input_tensor)
##############################
# Load the openvino ir model #
##############################
ov_model_path = "models_ov/depth_anything_v2_vits.xml"
# output path
ov_model_int8_path = "models_ov/depth_anything_v2_vits_INT8.xml"
print("[INFO] Reading input ov model ...")
core = ov.Core()
model = core.read_model(ov_model_path)
and finally, let’s run the quantization process itself
print("[INFO] Running quantization process ...")
subset_size = 300
quantized_model = nncf.quantize(
model=model,
subset_size=subset_size,
model_type=nncf.ModelType.TRANSFORMER,
calibration_dataset=nncf.Dataset(calibration_data),
)
print("[INFO] Saving quantized model at {} ...".format(ov_model_int8_path))
ov.save_model(quantized_model, ov_model_int8_path)
print("[INFO] Done!")
Note: This process will take time and need computational resources.
Prediction using converted model to OpenVINO IR + Int8 Quantization
We can use the same code we already have to run inference from the OpenVINO model, just need to prove the path for the desired INT8 model.
Below we can check visually a result using that model running on my Intel Core i7–12700H.
Findings
Converting the Depth Anything v2 model to OpenVINO + using NNCF to quantize it to INT8, we could speed up the inference by almost 3x running only using CPU, without much degradation on the output quality(just qualitative analysis here of course).
You can find the full code and the converted models on my Github
Acknowledgments
This work is heavily based on the Depth Anything notebook by the OpenVINO Toolkit team. Special thanks to Yang et al. (2024) for their groundbreaking research.
Licenses:
- Depth-Anything-V2-Small: Apache 2.0
- Other variants: CC-BY-NC-4.0
References
- https://github.com/DepthAnything/Depth-Anything-V2
- https://huggingface.co/spaces/depth-anything/Depth-Anything-V2/tree/main
- https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/depth-anything/depth-anything.ipynb
- https://github.com/heyoeyo/muggled_dpt/blob/main/.readme_assets/results_explainer.md
- https://docs.openvino.ai/2024/home.html
- https://github.com/kijai/ComfyUI-DepthAnythingV2
- https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/convert-to-openvino/convert-to-openvino.ipynb
Thanks for reading! Happy coding! 💻✨
#IntelSoftwareInnovator #openvino
Do you identify as Latinx and are working in artificial intelligence or know someone who is Latinx and is working in artificial intelligence?
- Get listed on our directory and become a member of our member’s forum: https://forum.latinxinai.org/
- Become a writer for the LatinX in AI Publication by emailing us at publication@latinxinai.org
- Learn more on our website: http://www.latinxinai.org/
Don’t forget to hit the 👏 below to help support our community — it means a lot!