How to use the YOLOV8 with CSI camera in Jetson xavier

5 min readSep 4, 2023

It’s not a good beginning. And I just want to use the yolov8 in my newbuy jetson xavier to use the strength of the strong GPU. And It’s a hard work to boot the os, and install the necessary softwares. Luckily, I spent about two days to finish the installation and make the jetson well-worked.

It’s a better choice to install the Jtop and the versions of the installed softwares, especially the CUDA, is described in the screen-shot:

And the L4T works well, we notice that from the profile. And use the command shown bellow to check the parameters of the CSI camera.

$ v4l2-ctl --list-devices
# VIDIA Tegra Video Input Device (platform:tegra-camrtc-ca):
#        /dev/media0

# vi-output, imx219 10-0010 (platform:tegra-capture-vi:2):
#        /dev/video0
$ v4l2-ctl --device=/dev/video0 --list-formats-ext

The result of the command showed the supported resolution and fps of the camera. We can pick one of them to develop the code. At this part, we chose the 1920*1080 and 30 fps to set the configuration.

With the GOOGLE help, I can easily get some documents to deploy my CSI camera and make it work well with the following code:

import cv2


# set the config of gstreamer-pipeline
def gstreamer_pipeline(
        capture_width=1280, # camera captured width
        capture_height=720, # height
        display_width=1280, # displayed window witdh
        display_height=720, # height
        framerate=60,       # captured fps
        flip_method=0,      # whether rotate image
    ):
    return (
        "nvarguscamerasrc ! "
        "video/x-raw(memory:NVMM), "
        "width=(int)%d, height=(int)%d, "
        "format=(string)NV12, framerate=(fraction)%d/1 ! "
        "nvvidconv flip-method=%d ! "
        "video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
        "videoconvert ! "
        "video/x-raw, format=(string)BGR ! appsink"
        % (
            capture_width,
            capture_height,
            framerate,
            flip_method,
            display_width,
            display_height,
        )
    )

if __name__ == "__main__":
    capture_width = 1920
    capture_height = 1080

    display_width = 640
    display_height = 640

    framerate = 10      # fps
    flip_method = 0     # dirction
    # create the pipeline
    gp = gstreamer_pipeline(capture_width,capture_height,display_width,display_height,framerate,flip_method=0)
    # bind the video stream and pipeline
    cap = cv2.VideoCapture(gp, cv2.CAP_GSTREAMER)

    if cap.isOpened():
        window_handle = cv2.namedWindow("CSI Camera", cv2.WINDOW_AUTOSIZE)

        while cv2.getWindowProperty("CSI Camera", 0) >= 0:
            ret_val, img = cap.read()
            cv2.imshow("CSI Camera", img)

            keyCode = cv2.waitKey(30) & 0xFF
            if keyCode == 27:# ESC to quit
                break

        cap.release()
        cv2.destroyAllWindows()
    else:
        print("Failed to open the camera")

Before the code works, you should verify the installation of the gstreamer with the code:

$ gst-inspect-1.0 --version

If you got some error, or you didn’t install the software, the gstreamer document may help you. After the installation, you should better verify it, and make sure you have installed it.

At the same time, you should be careful that the package of torch should support the GPU, and you should install the whl package developed by nvidia, then install the corresponding version of torchvision. You should check it carefully enough.

The OpenCV initially installed is not support the CUDA and if you want to make it work with the CUDA . It’s another complex work to rebuild the package OpenCV from source code to make it support CUDA. One day, I will publish another blog to describe the process.

Run the code above, and the window to display the video shown in front of the window. It worked well. At least, it’s a good message!

Everything seems ok, then I added the Yolov8 mode and codes to provide the ability to do the detection. The code is described as following:

from ultralytics import YOLO
import cv2
import math
import torch

print(torch.cuda.is_available())

# model
model = YOLO("deploy.pt")

# object classes
classNames = [
    "person",
    "bicycle",
    "car",
    "motorbike",
    "aeroplane",
    "bus",
    "train",
    "truck",
    "boat",
    "traffic light",
    "fire hydrant",
    "stop sign",
    "parking meter",
    "bench",
    "bird",
    "cat",
    "dog",
    "horse",
    "sheep",
    "cow",
    "elephant",
    "bear",
    "zebra",
    "giraffe",
    "backpack",
    "umbrella",
    "handbag",
    "tie",
    "suitcase",
    "frisbee",
    "skis",
    "snowboard",
    "sports ball",
    "kite",
    "baseball bat",
    "baseball glove",
    "skateboard",
    "surfboard",
    "tennis racket",
    "bottle",
    "wine glass",
    "cup",
    "fork",
    "knife",
    "spoon",
    "bowl",
    "banana",
    "apple",
    "sandwich",
    "orange",
    "broccoli",
    "carrot",
    "hot dog",
    "pizza",
    "donut",
    "cake",
    "chair",
    "sofa",
    "pottedplant",
    "bed",
    "diningtable",
    "toilet",
    "tvmonitor",
    "laptop",
    "mouse",
    "remote",
    "keyboard",
    "cell phone",
    "microwave",
    "oven",
    "toaster",
    "sink",
    "refrigerator",
    "book",
    "clock",
    "vase",
    "scissors",
    "teddy bear",
    "hair drier",
    "toothbrush",
]


#
def gstreamer_pipeline(
    capture_width=1280,  # captured width
    capture_height=720,  # captured height
    display_width=1280,  # display width
    display_height=720,  # display height
    framerate=10,  # captured fps
    flip_method=0,  # whether rotate
):
    return (
        "nvarguscamerasrc ! "
        "video/x-raw(memory:NVMM), "
        "width=(int)%d, height=(int)%d, "
        "format=(string)NV12, framerate=(fraction)%d/1 ! "
        "nvvidconv flip-method=%d ! "
        "video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
        "videoconvert ! "
        "video/x-raw, format=(string)BGR ! appsink"
        % (
            capture_width,
            capture_height,
            framerate,
            flip_method,
            display_width,
            display_height,
        )
    )


if __name__ == "__main__":
    capture_width = 1920
    capture_height = 1080

    display_width = 640
    display_height = 640

    framerate = 5  # fps
    flip_method = 0  # direction

    # pipeline
    gp = gstreamer_pipeline(
        capture_width,
        capture_height,
        display_width,
        display_height,
        framerate,
        flip_method,
    )

    # bind pipeline and stream
    cap = cv2.VideoCapture(gp, cv2.CAP_GSTREAMER)

    if cap.isOpened():
        window_handle = cv2.namedWindow("CSI Camera", cv2.WINDOW_AUTOSIZE)

        while cv2.getWindowProperty("CSI Camera", 0) >= 0:
            ret_val, img = cap.read()
            results = model(img, stream=True)
            for r in results:
                boxes = r.boxes

                for box in boxes:
                    # bounding box
                    x1, y1, x2, y2 = box.xyxy[0]
                    x1, y1, x2, y2 = (
                        int(x1),
                        int(y1),
                        int(x2),
                        int(y2),
                    )  # convert to int values

                    # put box in cam
                    cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
                    # class name
                    cls = int(box.cls[0])
                    print("Class name -->", classNames[cls])

                    # object details
                    org = [x1, y1]
                    font = cv2.FONT_HERSHEY_SIMPLEX
                    fontScale = 1
                    color = (255, 0, 0)
                    thickness = 2

                    cv2.putText(
                        img, classNames[cls], org, font, fontScale, color, thickness
                    )

            cv2.imshow("CSI Camera", img)
            keyCode = cv2.waitKey(30) & 0xFF
            if keyCode == 27:  # ESC to quit
                break

        cap.release()
        cv2.destroyAllWindows()
    else:
        print("Failed to open camera")

Oops, some error happened!!!

Seems that the GStreamer can not find the plugin and load it. After a sequence of searching, I found the similar solution in the forum of the nvidia, and just set the environment variable.

$ export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvvidconv.so

If your path is different, maybe you should change the path. Then run the code, and get the result, shown as the following picture:

At last, it worked well, Thank God!

How to use the YOLOV8 with CSI camera in Jetson xavier

Written by Dean.Du