The Guide on Making a Suitable Video Footage for Video-analytic AI
Automated traffic data collection from camera footage is definitely a game-changer, cutting costs and providing fuel for majority of processes in traffic management space. At GoodVision we use to say: “just upload the video and you’re good to go”. But how to make sure that your videos have sufficient quality to be processed by video analytic software? Is there a way to maximize the quality of your results? This guideline aims to help you once and for all.
There are following four main quality aspects of camera footage which should be considered before providing it to a system like GoodVision Video Insights in order to achieve the highest possible accuracy. Following guidelines apply to outdoor camera footage.
- Camera view — position, zoom and tilt
- Lens quality — and light conditions
- Video resolution
- Video framerate
There is one general rule of thumb with video-analytics AI: “If your eyes don’t see it, don’t expect any computer system to see it”.
1. Camera view
It is always good if you can adjust the position, tilt or zoom of your camera in case the original view is not good. Below are four crucial aspects that affect the overall readability of the scene:
A) Distance from monitored objects
To achieve the maximum guaranteed accuracy, make sure the objects to be monitored are dimensionally at least 5% of the scene size. That meaning vehicle length should be around 60 pixels on 1280px x 720px scene. Smaller objects might be harder to detect in some cases (this is affected also by other factors — lens, blur, etc.). Also make sure the objects are not covering substantial portion (around 33%) of the scene or it will be considered as false. How to solve this? Zoom or move.
Below are the GOOD and the BAD examples with short explanation:
B) Camera height
Let’s assume your camea is in a perfect distance now. What about the height of camera installation? Honestly, camera height itself affects mainly your visual experience, not the system’s detection ability. Problem is when the camera is positioned too low causing objects in the front covering the objects in the back — which are not detected, not tracked → not collected.
It is really simple to fix this issue and achieve great results. If your camera has standard 2.8 lens equipped, place it between 5 and 25 meters above the ground (depending on how broad space you want to monitor). Below are the GOOD and the BAD examples with short explanation:
C) Camera tilt
Long story short — if previous two conditions are fulfilled, camera tilt doesn’t affect the detection ability of system like GoodVision’s as it is trained and versatile for various conditions. You can tilt the camera from horizontal up to a straight-down birdview if needed.
Obstacles are tricky. Some of them seriously affect the detection ability (trees, other cars, buildings, bridges, big traffic signs, ..) while some of them might not (thin poles, standard traffic signs, wires, ..). System looses the object while it is covered by the obstacle and after it appears again, it is often considered as a new object — causing the trajectory of the object is split. Below are GOOD (not ideal but suitable) and the BAD examples with short explanation:
2. Lens quality and light conditions
Following lens defects rapidly reduce the workability of the scene:
- Poor lens, dirty, with scratches or smudges — causes blurry image, removes object contours or deforms it
- Raindrops on the lens — distorts the image or causes light reflections, acts like the physical obstacle
- Barrel distortion — deforms the objects, bends them, causing them look nothing like the system is trained for
- Frontal light causing flare or reflection — covers the image, decreases the object clarity
Scene lighting plays the important role in video analytics as well, however modern systems are trained to recognize even objects in the dark. The only condition is that the objects must be at least a little bit illuminated to be visible in the image with the naked eye.
3. Video resolution
The equation here is simple: “SHIT IN — > SHIT OUT” (sorry for the language). The more image data (pixels) you provide to the system, the better it recognizes the objects in it. GoodVision Video Insights is trained to deliver best performace with 1280px x 720px and 1920px x 1080px (FULL HD) resolutions. These are considered the ideal resolutions.
Generally GoodVision can handle lower resolutions all way down to VGA. However, lower resolutions go hand in hand with low-quality optics and low bitrate, causing the object contours are not crisp (are blurry) or do not resemble the object from the real world. Set the resolution which displays object’s contours clearly. These are the BAD examples of unsuitable low resolutions:
4. Video framerate (FPS) and shutter speed
Framerate of the video defines the fluency of the object’s motion in it and greatly affects the tracking ability of the video analytic system. Tracking means preserving the identity of the object between frames on which it was detected. Tracking has the crucial impact on having the nice solid object trajectories i.e. for the origin-destination counting of traffic. Moreover, the bigger the speed of objects in the video, the worse impact has low FPS on their tracking.
Ideal FPS for video-analytics which works well with most of the scenarios is between 10 and 30 frames per second. The bigger the better, however FPS bigger than 30 per second does not have any visible impact to tracking quality. Lower FPS than 10 frames per second cause tracking problems, especially in crowded scenes and with fast moving objects, which are literally “jumping” from place to place over the scene.
Camera shutter speed affects the clarity of the moving object’s contours, especially in the low-light conditions and close to the camera. Some cameras switch to longer shutter speeds in order to keeep the same overall brightness of the scene during night. Try to avoid this and rather preserve the clarity of the objects. Modern video analytic systems are trained to recognize objects in the dark, but if the objects are smudgy and lack the contours, it is super-hard to detect them.
And if everything goes well …
As you can see, it’s actually easy to prepare a suitable video footage for modern video analytics. All of the described conditions are reasonable. To summarize it all I would say: “What is not visible, cannot be seen”. In example, if the conditions above are fulfilled you can expect close to 100% traffic data collection accuracy from GoodVision Video Insights. So it is also in your hands — modern technology is here for everyone, accessible as never before, use it to the last drop.
And if everything goes well, your system will reward you with amazing performance like this:
Daniel Stofan is a co-founder and CEO of GoodVision — a company devoted to innovate the ways of traffic data collection. Our product GoodVision Video Insights is the autonomous traffic data collection cloud service providing highly reliable traffic data from standard surveillance cameras via Artificial Intelligence. It provides advanced data analytics, data visualization and managerial reporting — all in a single platform covering the whole process of operation from data collection to decision making, with 1-hour delivery of results.