Turning a laptop into an “IP camera”

Artem Tarasov
Jul 3, 2019 · 4 min read

In Xi IoT, a common source of video input is an IP camera, which streams data over network, often using RTSP as a protocol. But do you need a real one when you’re just starting to set up a bare metal edge? Not at all, as long as your laptop is connected to the same local network!

Although RTSP is an open standard, finding a reliable implementation might be somewhat tricky and frustrating. After trying out a bunch of open-source tools, I discovered a blog post documenting the journey of the RTSPATT author, whose had a similar experience.

RTSPAllTheThings is a tiny wrapper (less than 1 kLOC, 10% of which is a lovely ASCII logo you see in the terminal) around gst-rtsp-server, which is built on top of GStreamer. This wrapper attempts to build the GStreamer pipeline for you, but you really, really want to learn how to use gst-launch (the official tutorials will take you quite far) to gain access to the full power —but sometimes simply to access your laptop webcam.

You can get rtspatt up and running using docker:

docker run --rm --device=/dev/video0 -p8554:8554 -e INPUT="/dev/video0" ullaakut/rtspatt

If this worked for you just like that, congratulations, you are lucky! Chances are something went wrong for one reason or another, and all you can see are obscure 5xx HTTP errors when you try to play the stream. What you can do, though, is supply the GST_PIPELINE env variable to the container — but you’d need to build that pipeline first.

1. Find Your Camera’s Output Formats

First, you need to understand what format your camera outputs. On Ubuntu, the way to go is to install the v4l-utils package and query the device:

$ sudo apt install v4l-utils
$ v4l2-ctl -d /dev/video0 --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
Index : 0
Type : Video Capture
Pixel Format: 'MJPG' (compressed)
Name : Motion-JPEG
Size: Discrete 1280x720
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 320x180
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 320x240
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 352x288
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 424x240
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 640x360
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 640x480
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 848x480
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 960x540
Interval: Discrete 0.033s (30.000 fps)
Index : 1
Type : Video Capture
Pixel Format: 'YUYV'
Name : YUYV 4:2:2
Size: Discrete 1280x720
Interval: Discrete 0.100s (10.000 fps)
Size: Discrete 320x180
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 320x240
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 352x288
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 424x240
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 640x360
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 640x480
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 848x480
Interval: Discrete 0.050s (20.000 fps)
Size: Discrete 960x540
Interval: Discrete 0.067s (15.000 fps)

One interesting fact here is that you can get 30fps for 1280x720 only using MJPEG — presumably due to bandwidth limitations? So we’d prefer that format over raw pixel data (YUYV).

2. Build the Video Streaming Pipeline

Let’s start building the GStreamer pipeline. The two basic ingredients are v4l2src for capturing and autovideosink for showing output in a window.

$ gst-launch-1.0 v4l2src ! autovideosink
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
ERROR: from element /GstPipeline:pipeline0/GstV4l2Src:v4l2src0: Internal data stream error.
Additional debug info:
gstbasesrc.c(3055): gst_base_src_loop (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src0:
streaming stopped, reason not-negotiated (-4)
Execution ended after 0:00:00.000188964
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

That didn’t quite work because raw video formats are not quite the same for the source and sink (“not-negotiated”), so what you need is videoconvert between them:

$ gst-launch-1.0 --gst-debug-level=4 v4l2src ! videoconvert ! autovideosink

This works, and I added a flag to output some debugging information — which shows that GStreamer selected raw pixel data instead of MJPEG and the corresponding 10fps for the highest available resolution:

0:00:00.171038081 18722 0x55f9739aa0f0 INFO                    v4l2 gstv4l2object.c:3614:gst_v4l2_object_set_format_full:<v4l2src0:src> Set capture framerate to 10/1
0:00:00.171095066 18722 0x55f9739aa0f0 INFO v4l2 gstv4l2object.c:2931:gst_v4l2_object_setup_pool:<v4l2src0:src> accessing buffers via mode 4
0:00:00.171450933 18722 0x55f9739aa0f0 INFO v4l2bufferpool gstv4l2bufferpool.c:557:gst_v4l2_buffer_pool_set_config:<v4l2src0:pool:src> increasing minimum buffers to 2
0:00:00.171491240 18722 0x55f9739aa0f0 INFO v4l2bufferpool gstv4l2bufferpool.c:570:gst_v4l2_buffer_pool_set_config:<v4l2src0:pool:src> reducing maximum buffers to 32
0:00:00.171577833 18722 0x55f9739aa0f0 INFO GST_EVENT gstevent.c:814:gst_event_new_caps: creating caps event video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)10/1, format=(string)YUY2, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:7:1, interlace-mode=(string)progressive
0:00:00.172427918 18722 0x55f9739aa0f0 INFO GST_EVENT gstevent.c:814:gst_event_new_caps: creating caps event video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)10/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)YV12

Not exactly what we wanted. To specify our preferences, we need a caps filter followed by JPEG decoder to turn those jpegs into a raw video format:

gst-launch-1.0 v4l2src ! image/jpeg ! jpegdec ! autovideosink

You can specify the format more precisely by passing parameters like this (it should be compatible with v4l2-ctl output above):

image/jpeg,width=640,height=360,framerate=30/1

Now we have a decent-resolution raw 30fps video stream which we can encode to H264 and wrap into RTP packets just like RTSPATT code does in pipeline.cpp. This won’t work with gst-launch anymore but will as GST_PIPELINE argument:

docker run --rm --device=/dev/video0 -p8554:8554 -e INPUT="/dev/video0" -e GST_PIPELINE="v4l2src ! image/jpeg ! jpegdec ! x264enc tune=zerolatency ! rtph264pay name=pay0 pt=96" ullaakut/rtspatt

You should be able to see the output using ffplay:

ffplay -rtsp_transport tcp rtsp://localhost:8554/live.sdp

If you want to stream to a network, you should add RTSP_USERNAME/RTSP_PASSWORD and use host networking when launching the container, or just use the executable without docker.

Enjoy your home-brewed “IP camera”!

nutanix-iot

Build IoT & AI Apps and Operate Edge-Cloud Converged Infrastructure

Artem Tarasov

Written by

nutanix-iot

Build IoT & AI Apps and Operate Edge-Cloud Converged Infrastructure

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade