Using YOLO in C++

6 min readSep 14, 2023

Introduction

I recently came across the new YOLO model, and played around with it trying to use it in the C++ programming language. YOLO (you only look once) is a really great model for real-time object detection. It has been constantly developing in recent years. The original paper of YOLO was released to Arxiv in 2015, the second version or YOLO9000 in 2016, the third version in 2018, the forth version in 2020, apparently there is no paper for the fifth version, the sixth version in 2022, and now the seventh version in 2022. There have been other variants of this model, however, we will be focusing on YOLOv7. You can see its comparison to other models in the image below taken from its GitHub repository.

Performance comparison of YOLOv7 vs other object detection models, taken from YOLOv7 GitHub repository

The official GitHub repository contains Python scripts that you can use to detect objects inside images or even video streams, train from scratch or fine-tune on other datasets, or export the model to other inference platforms like ONNX and TensorRT. In this article, we are going to talk about the usage of YOLOv7 in C++. C++ is a very powerful and fast language, and there are a lot of reasons why you would want to use YOLO in C++. Some of these reasons include faster and real-time object detection, integrating object detection into your C++ application, and running it on embedded systems.

Using YOLO in C++

We are going to use ONNX to bring the model into our C++ code. ONNX is an ecosystem designed to run neural network models developed with deep learning libraries like PyTorch and TensorFlow. We will be using ONNX Runtime which has interface for several programming languages like Python, C++, C#, Java, and JavaScript. I’m using (Arch) Linux to write these instructions. The steps for Windows will be a little different but not hard to figure out.

Step 1: Exporting

We will use the official repository to export the model to ONNX. The result of this step will be a model file with .onnx suffix. First, we are going to clone the repository of YOLOv7:

git clone https://github.com/WongKinYiu/yolov7.git
cd yolov7

Afterward, we need to install the Python dependencies using the requirements.txt file provided in the repo. The dependencies for exporting in the requirements file are commented out, so we will have to fix that first. Please change the export section of the file to this:

# Export --------------------------------------
coremltools>=4.1  # CoreML export
onnx>=1.9.0  # ONNX export
onnx-simplifier>=0.3.6  # ONNX simplifier
# scikit-learn==0.19.2  # CoreML quantization
# tensorflow>=2.4.1  # TFLite export
# tensorflowjs>=3.9.0  # TF.js export
# openvino-dev  # OpenVINO export

It is recommended to create a new virtual environment to install these dependencies:

python -m venv yolo_env
source yolo_env/bin/activate
python -m pip install -r requirements.txt

Now that we have our dependencies installed, we can export the model to ONNX:

python export.py --weights yolov7-tiny.pt --grid --end2end --simplify --include-nms \
        --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640

You can replaceyolo7-tiny.pt with other models to increase accuracy (at the cost of slower runtime): yolov7.pt, yolov7x.pt, yolov7-d6.pt, yolov7-e6.pt, yolov7-e6e.pt, yolov7-w6.pt. The tiny model and the first two use 640 image size and the rest use 1280 image size (replace 640 in the export command with 1280). Now we have our model with .onnx extension in the current directory.

Step 2: C++ Programming

Now that we have our model file, we can proceed to write our C++ program to use this model. Our C++ code will use ONNX Runtime and OpenCV. If you want to run the model on GPU, then we will need CUDA libraries too. We will download ONNX Runtime pre-built libraries and put them in /opt, but you can also put them in a directory of your liking:

wget https://github.com/microsoft/onnxruntime/releases/download/v1.15.1/onnxruntime-linux-x64-gpu-1.15.1.tgz
tar -xf onnxruntime-linux-x64-gpu-1.15.1.tgz
sudo mv onnxruntime-linux-x64-gpu-1.15.1 /opt/onnxruntime

You can install the OpenCV library using your package manager (Arch Linux: sudo pacman -S opencv, on Ubuntu: sudo apt install libopencv-dev). Lets create the C++ project files:

mkdir yolo-cxx
cd yolo-cxx
touch main.cpp
touch CMakeLists.txt

We will use the CMake build system to manage our C++ project. Let’s edit CMakeLists.txt file:

cmake_minimum_required(VERSION 3.5)
project(YOLO_CXX LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

find_package(OpenCV REQUIRED)
set(ONNX_RUNTIME_PATH "/opt/onnxruntime")

add_executable(YOLO_CXX main.cpp)
target_include_directories(YOLO_CXX PUBLIC
    "${OpenCV_INCLUDE_DIRS}"
    "${ONNX_RUNTIME_PATH}/include")
target_link_directories(YOLO_CXX PUBLIC
    "${ONNX_RUNTIME_PATH}/lib")
target_link_libraries(YOLO_CXX PUBLIC
    ${OpenCV_LIBS}
    "onnxruntime")

Now let’s edit main.cpp and write some actual C++ code. First, we have a few typedefs and parameter definitions to make things easier:

using Array = std::vector<float>;
using Shape = std::vector<long>;

bool use_cuda = false;
int image_size = 640;
std::string model_path = "<path/to/model.onnx>";
std::string image_path = "<path/to/image.jpg>";

We will also need names of the classes:

const char *class_names[] = {
    "person",         "bicycle",    "car",           "motorcycle",    "airplane",     "bus",           "train",
    "truck",          "boat",       "traffic light", "fire hydrant",  "stop sign",    "parking meter", "bench",
    "bird",           "cat",        "dog",           "horse",         "sheep",        "cow",           "elephant",
    "bear",           "zebra",      "giraffe",       "backpack",      "umbrella",     "handbag",       "tie",
    "suitcase",       "frisbee",    "skis",          "snowboard",     "sports ball",  "kite",          "baseball bat",
    "baseball glove", "skateboard", "surfboard",     "tennis racket", "bottle",       "wine glass",    "cup",
    "fork",           "knife",      "spoon",         "bowl",          "banana",       "apple",         "sandwich",
    "orange",         "broccoli",   "carrot",        "hot dog",       "pizza",        "donut",         "cake",
    "chair",          "couch",      "potted plant",  "bed",           "dining table", "toilet",        "tv",
    "laptop",         "mouse",      "remote",        "keyboard",      "cell phone",   "microwave",     "oven",
    "toaster",        "sink",       "refrigerator",  "book",          "clock",        "vase",          "scissors",
    "teddy bear",     "hair drier", "toothbrush"};

Then we read the input image:

std::tuple<Array, Shape, cv::Mat> read_image(const std::string &path, int size)
{
    auto image = cv::imread(path);
    assert(!image.empty() && image.channels() == 3);
    cv::resize(image, image, {size, size});
    Shape shape = {1, image.channels(), image.rows, image.cols};
    cv::Mat nchw = cv::dnn::blobFromImage(image, 1.0, {}, {}, true) / 255.f;
    Array array(nchw.ptr<float>(), nchw.ptr<float>() + nchw.total());
    return {array, shape, image};
}

Then we create an ONNX Runtime session:

Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "YOLOv7");
Ort::SessionOptions options;
if (use_cuda) Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_CUDA(options, 0));
Ort::Session session(env, model_path.c_str(), options);

We then pass the image we read to the model:

std::pair<Array, Shape> process_image(Ort::Session &session, Array &array, Shape shape)
{
    auto memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
    auto input = Ort::Value::CreateTensor<float>(
        memory_info, (float *)array.data(), array.size(), shape.data(), shape.size());

    const char *input_names[] = {"images"};
    const char *output_names[] = {"output"};
    auto output = session.Run({}, input_names, &input, 1, output_names, 1);
    shape = output[0].GetTensorTypeAndShapeInfo().GetShape();
    auto ptr = output[0].GetTensorData<float>();
    return {Array(ptr, ptr + shape[0] * shape[1]), shape};
}

Finally, we can display the image with the output bounding boxes:

void display_image(cv::Mat image, const Array &output, const Shape &shape)
{
    for (size_t i = 0; i < shape[0]; ++i)
    {
        auto ptr = output.data() + i * shape[1];
        int x = ptr[1], y = ptr[2], w = ptr[3] - x, h = ptr[4] - y, c = ptr[5];
        auto color = CV_RGB(255, 255, 255);
        auto name = std::string(class_names[c]) + ":" + std::to_string(int(ptr[6] * 100)) + "%";
        cv::rectangle(image, {x, y, w, h}, color);
        cv::putText(image, name, {x, y}, cv::FONT_HERSHEY_DUPLEX, 1, color);
    }

    cv::imshow("YOLOv7 Output", image);
    cv::waitKey(0);
}

We can see the output of the code in the image below:

Annotated output image of the C++ code with yolov7-tiny model, original image taken from YOLOv7 GitHub repository

The Library

I have also created a repository that you can use as a library in your projects. It also contains a demo program that you can run to detect objects in an image or a video stream. You can clone and build the code with:

git clone git@github.com:ShahriarRezghi/yolov7-cxx.git
cd yolov7-cxx
mkdir build
cd build
cmake \
    -DONNX_RUNTIME_PATH=<path/to/onnx/runtime/root> \
    -DCUDA_LIBRARIES_PATH=<path/to/cuda/toolkit/root> \
    ..
cmake --build .

This will give you an executable file named yolov7_demo that you can run to perform object detection:

# For image detection:
./yolov7_demo -m <path/to/model.onnx> -i <path/to/image.jpg>
# Or for video stream detection:
./yolov7_demo -m <path/to/model.onnx> -v </path/to/video/device> --cuda 0

You must pass a video device like /dev/video0 when trying to detect from video stream. If you want to use the project as a library, you can add it as a CMake subdirectory like this:

add_subdirectory(yolov7_cxx)
set(ONNX_RUNTIME_PATH "<path/to/onnx/runtime/root>")
set(CUDA_LIBRARIES_PATH "<path/to/cuda/toolkit/root>")
add_executable(MyTarget main.cpp)
target_link_libraries(MyTarget PUBLIC yolov7_cxx)

Please take a look at this minimal example to learn how to use the library.

Conclusion

YOLOv7 (you only look once) is a powerful object detection model. This article acts as a tutorial as to how one can use this model in C++. This allows faster detection times and the ability to integrate with other applications or programming languages more easily. The code of this article is also available as a library that can be used easily to perform object detection tasks in C++.

Using YOLO in C++

Introduction

Using YOLO in C++

Step 1: Exporting

Step 2: C++ Programming

The Library

Conclusion

Written by Shahriar Rezghi