Vision AI for Intelligent Transportation (ITS): What’s your Edge-Cloud (SaaS) Strategy

--

I would like to start this interesting discussion with a short story. A few weeks ago, I wrote on one of my slides, RTFM. And one of my team members asked me what does it mean? So I said, you know, RTFM?… a few other team members heard us, and they were also clueless to the meaning of the acronym. This made me wonder. I mean, years ago it was one of the most common acronyms in the tech world. Every time you bought a new DVD player and tried to install it without reading the manual first (which obviously didn’t work), people would shout at you: “hey RTFM”. i.e. Read the fuxxing manual, or … read the factory manual, read the field manual… Some people believe that this acronym date back to WW2

So how come nobody uses this acronym anymore…?

Well, perhaps people are just a lot nicer (It’s quite an aggressive acronym isn’t it?), or more likely, it shows a significant improvement in the quality of products in the market.

So, who needs a manual??
(say it in your head like Steve Jobs said: “Who needs a stylus…”).

You have to download manuals (from where??) or print them, find the right version for the right model for the right topic 😰 read them, and follow a long list of steps…

Yuck. Nobody wants a manual. Users want and expect great UX, one that doesn’t require a manual.

How is this related to Vision AI for Transport?

Lately, I’ve had the opportunity to attend numerous global conferences focused on intelligent transportation systems (ITS). It’s truly captivating to witness the abundance of vendors offering cutting-edge intelligent cameras that are capable of running AI algorithms directly on the device. The demonstrations are impressive, showcasing vibrant bounding boxes and dynamic trajectories that seem to materialize around vehicles and pedestrians, tracking their movements with an almost ethereal quality.

Object detection on the edge

However, when you finally decide to give them a try, you encounter a different reality. Countless manuals and lengthy PDF documents filled with outdated screenshots resembling the user interfaces of Windows from the 90s. It becomes a tedious process of setting up, installing various components, navigating through multiple menus, configuring settings, and following a seemingly endless series of steps. And what do you ultimately find? The performance is not great, and you end up using just the basic functionality of the camera, not leveraging any of those shiny and advanced features.

Video analytics manual (120 pages), out of 100 PDFs to do with installation, setting, calibrating...

What does this tell us…

  1. Video analytics is hard and complex. Even though the technology exists, it’s not usable out of the box and still requires manual calibrations, tuning, and masking. Using it effectively is almost impossible without professional expertise.
  2. Working with our road operators around the world, we quickly realized that the majority of them lack the time and resources to navigate through extensive manuals and acquire the necessary skills, expertise, and patience to fully harness the capabilities of these advanced cameras.
  3. Let’s face it, CCTV cameras' core business is security. When you read the manuals, you soon realize that transport is a side business, the tools are there, but the whole solution and functionality is focused on security, law enforcement, and surveillance.

Vision AI for Transport: Introduction

Computer vision has revolutionized the field of intelligent transport systems (ITS) by providing valuable data insights for various use cases. Let’s explore some key applications of vision AI in ITS:

  1. Event Detection: Real-time detection of hazards, stopped vehicles, wrong-way driving, and pedestrians/animals on the road. Detection of these events fast enables quick response and appropriate measures to be taken, minimizing potential accidents or disruptions.
  2. Risk Prediction: Identifying near misses, such as harsh braking and anti-social driving behavior can be used as early warning indicators
  3. Weather Data: Monitoring weather conditions like fog, rain, wind, and ice/snow for better road management.
  4. Queue Length and Junction Optimization: Estimating queue lengths and optimizing traffic signal timings for improved traffic flow.
  5. Protecting Work Zones: Monitoring work zones and detecting safety violations or unauthorized intrusion.
  6. License Plate Recognition: Reading and interpreting license plate information for toll collection, parking management, and law enforcement.
  7. Parking Lot Occupancy Detection: Real-time detection of available or occupied parking spaces.
  8. Traffic Flow Analysis: Analyzing vehicle movement, speed, and classification for traffic planning and congestion reduction.
  9. Security, Law Enforcement, and Vandalism: Enhancing security, law enforcement, and tackling vandalism through automated detection and alert systems.
  10. GIS (Geographic Information System) Integration: Locating and assessing assets such as signs, road markings, potholes, and crash fences for efficient maintenance and safer roads.

Types of CCTV cameras

  • PTZ (Pan, Tilt, Zoom) cameras: While they are typically not used for AI applications, they are the most commonly used CCTV cameras in transport systems
  • Static (fixed) Day/Night camera: Mostly used for AI as they are easier to tune, calibrate and apply masks
  • Thermal cameras: Particularly useful in complete darkness and specialized use cases where it’s important to measure the body temperature.
  • Hybrid Thermal + Standard: Adds body temperature to objects
  • Hybrid Radar + Standard: Adds accurate speed measurements to objects
  • ANPR camera: Fast shutter speed + IR illumination, can read the license plate of a fast-moving car day/night. Commonly used in applications such as toll collection, parking management, and law enforcement
  • 360 hemispheric camera: Provides a wide field of view

Vision AI for Transport: Edge-Cloud (SaaS) Strategy

Technology strategy is hard, confusing, and paralyzing. There are many trends, sometimes conflicting, many vendors that knock on your door and try to convince you with high-quality videos and slides that they know where the industry is going, and then you have your own team that have their own personal agenda. But Strategy discussion is important and now is the time to stop and look at the trends, landscape, and alternatives, otherwise, your existing inertia might bring you to the edge of a cliff.

So how can you decide where to invest in? Edge or Cloud?

Transport Operators Needs and Expectations

What do transport operators need and expect to get from AI?

  • Full coverage with no blind spots
  • Affordable TCO: Cameras + AI + Service + Support over the years
  • Accuracy: minimal miss detections, false alarms, and low error rates
  • Automatic: Accuracy should be good enough to initiate an automatic response plan
  • Locations: Road operators are not interested in the location of events in the image space, they want to know where they are on the map
  • Low-touch installation and usage: No manuals, great UX
  • Innovation: They don’t want to stay behind
  • Low latency: most use cases can tolerate 30 seconds, but some do need a latency of 1–2 seconds
  • Standards and compliance
  • Privacy and security

Transport Operator's Vision AI Reality

What can transport operators get today from both edge and cloud?

  • Limited coverage, as AI cameras only work on static cameras.
    Cloud AI solutions like Valerann support PTZ cameras which increases the coverage of AI analytics.
  • Expensive TCO: Both edge and cloud are expensive with no big difference over a lifetime of 5 years
  • Accuracy: The accuracy of computer vision may be OK in good conditions, but deteriorate significantly in bad weather or nighttime, which are exactly the conditions when you need them the most. Cloud Vision AI would be generally more accurate than Edge as it’s easier to innovate, customize and allocate more resources.
  • Automatic: As accuracy is limited and noisy, you can’t initiate an automatic response based on Vision AI alone
  • Low-touch installation and usage: Configuring AI on the Edge is really hard, SaaS solutions actually make integrations and usage much easier
  • Standards and compliance: Unlike video compression (H264, MPEG) and IP cameras access (ONVIF), there is still no standard for Vision AI
  • Privacy, security, and latency: Both solutions meet the needs with some advantages to edge devices. There are a few solutions to resolve privacy in the cloud: blurring out privacy-related data (registration numbers, people's faces), reducing the resolution, or stopping the feed for high zoom levels (PTZ scenario only).

Vision AI Related trends

What are the related trends (Technology, Social, Economic)?

  • More technology is shifted to the cloud, and there are many good reasons for this, but generally, cloud technology and SaaS solutions are just a lot better from almost any perspective. This is a trend that cuts all industries with no exception. And definitely for Vision AI the benefits of SaaS and Cloud are significant over Edge devices.
  • Road operators and departments of transportation (DOTs) would face challenges in recruiting and retaining internal technical expertise. Certainly in areas such as AI.
  • Other sources of visual data are around the corner: Connected vehicles, drones, satellites
  • The balance between which data sources are under your control and on-premise (1st party) to data sources that are generated by other companies and coming from the cloud (3rd party) is shifting towards the cloud.
  • Other sensor types like Lidar, enter the market, that can work in the nighttime and bad weather conditions, and also accurately measure the speed.
Lidar image by Innoviz to prevent bridge collision
  • Network bandwidth will continue to increase using technologies such as 5G, and just better redundant broadband connectivity.
  • You can’t rely on one data source. That’s the core idea of digital twins and complex systems. Each data source has its benefits but also challenges. A good and robust system must fuse intelligently many types of data sources together: Vision, CAV, Sensors, Social media
  • There is a growing concern about the negative consequences of AI and surveillance, related to privacy, but also to controlling field devices like traffic lights drones, etc.
Vision AI in transport: Porter’s five forces analysis

Vision AI Possible Scenarios

It’s hard to predict the future of Vision AI in the transport industry. On one hand, existing CCTV vendors give a fight and try to convince you to spend more on AI on the edge, on the other hand, other sensors like Lidar enter the market and provide more robustness and accuracy compared to standard cameras. Both solutions are expensive and don’t give you the additional coverage that you need. Additionally, this data needs to go somewhere as vision AI is not the only source of data out there, so you need to think of a strategy that brings all the data together into what some companies call digital twin or ATMS. (Advanced traffic management system). If anyway, you are going to run your digital twin in the cloud, so why not run Vision AI there as well?

With all these forces coming together, I believe a possible scenario would be a hybrid one, like the one below.

Hybrid Edge-Cloud Vision AI with Digital Twin in the cloud

One trend that is just starting, and has similar concepts to the IoT world, is CCTV cameras as a service. i.e. CCTV cameras that include a mobile phone (4G/5G) owned by the CCTV vendor, transmitting the video to their cloud, and then providing the video live as a service. I am sure that this is a trend that eventually will become more and more common.

Comparing Edge and Cloud: Technical Details

Pricing

A camera that supports AI is more expensive than a standard CCTV camera. If you just buy one camera, it’s insignificant, but normally you buy hundreds so the numbers add up. Let’s imagine that an AI camera cost 4000$ (including all license fees, etc..) while a standard CCTV camera cost 500$ also, let’s say that it cost 100$ per month per camera to run AI processing in the cloud (**not real numbers, don’t use this as a reference for any decision). Ignoring any other costs, or optimizations, after 3 years cloud becomes more expensive than the edge.

If for instance, you don’t need to run any AI vision at night time (or daytime), you can scale down the services in the cloud, and reduce the cost to 50$ / month, which then changes the break-even point to 6 years. This simple example shows one of the key benefits of the cloud: elasticity + usage-based pricing. But these are not the only benefits that cloud computing provides.

In terms of price drop for compute and AI in the next 5–10 years, I believe that this will happen, but will affect both pricing: edge and cloud.

Flexibility

Another key strength of cloud vision AI is that you don’t need to know in advance which AI capabilities you need for each camera. You can just buy standard cameras from the same model and brand, install them, and decide later or even dynamically which one needs which capability. This way you can also double and triple the amount of cameras that you install, increasing your coverage and field of view.

Cloud Vision AI provides dynamic allocation of AI resources to more cameras

Innovation

The diagram below shows the innovation over the last 8 years regarding object detection and classification within the Yolo family. The performance improvements are staggering.

Yolo Object Detection and Classification Timeline

Upgrading a cloud-based solution to use a new model takes a few days, as adding more CPU and memory is instantaneous, also it’s easier to run A/B testing in the cloud, i.e. run both models together and compare the results. However upgrading edge devices can sometimes take years, or maybe even impossible to achieve. Once a specific camera model is installed the CPU, hardware, and memory are fixed, so the solution will always be limited by these constraints. Also, testing new firmware to make sure that it doesn’t impact other functionality, and comparing/benchmarking performance side by side is a long process. To summarise this paragraph: edge devices will always lag 3–4 years behind the cloud regarding innovation.

There are a few brands that opened their cameras for third-party apps (app store) to encourage innovation, but most of them realized the complexity of doing so, and they shut them down.

Budget

Public road operators face a challenge related to government funding, which often comes with a specific timeframe for utilization. For instance, if they have $500,000 remaining in their budget before the end of the financial year, it becomes easier to purchase hardware as they can place an order and have the goods delivered within weeks. However, adopting SaaS solutions with a pay-as-you-go approach presents a different scenario. The same budget would need to be spent over a period of 5 years, which may not align with government policies.

While some cloud providers offer the option to pay upfront for a 3-year period with a significant discount of over 60%, creating a win-win situation, SaaS providers would need to introduce a similar pricing model to leverage this approach effectively. This would enable them to cater to the requirements of public road operators and ensure a mutually beneficial arrangement.

Performances

Edge devices normally use dedicated hardware to run AI algorithms (i.e. GPUs). These devices have unique hardware capabilities to perform AI-style computations, however, they are still limited and constrained by cost, power consumption (heating), and size. One way that chip manufacturers use to fit AI algorithms to processing chips with these constrained is to reduce the algorithms from floating-point to fixed-point (integers), but this significantly reduces the accuracy as in deep learning models, if you round 12.05 to 12 on the first layer, it’s like a snowball that propagates and doubles up on every neural network layer, until it reaches the final layer with a massive error.

Standardization

Standards are important in the CCTV world. Most of our clients don’t have just one brand of camera, but a collection of many types and models that they install over the years from a few brands. There are many reasons for this:

  • Most public organizations have to go through a bid process when purchasing new cameras. Each bid can result in a different brand.
  • Normal transition: Sometimes road operators just can’t replace all the cameras at once, and it takes years until they get the resources and time to finish the migration.

ONVIF is a global CCTV camera industry body that aims to standardize access to IP-based products. This means that every camera that follows ONVIF has the same interface to connect to it, get the video feed, set up parameters, etc. (There are still restrictions on Chinese brands like HikVision, and Dahua)

ONVIF is still working on Profile M to support analytics capabilities, but it’s still early days, and the number of AI use cases and options is growing exponentially for them to catch up. In the meantime, every camera provider has a proprietary solution to publish analytics. Some use MQTT, some use WebSocket, and each is using a different data schema.

If you decide to use an edge device, you would need to gain skills in multiple devices, while AI SaaS providers would connect to all the cameras to get the video in a standard way, and then apply the same AI on all of them together.

Network Bandwidth

Although this becomes less of an issue, some operators still don’t have enough bandwidth to send all their videos live to the cloud from their data centers (Assuming wired CCTV and not 5G ones). A low network bandwidth for an operator is 100Mbps. An acceptable bandwidth for a video that is good enough for AI is 200–500Kbps. This means that operators can only send around 200–300 live video feeds to the cloud before they need to top their internet connectivity.

What about On-Premise?

as you can see, I deliberately ignored any comparison to data center deployments, as I think that this approach is the worse of both worlds. You don’t have elasticity or agility, you spend double upfront, both for cameras and servers. You now have two sets of manuals, one for the cameras and the other for the AI product, and need double expertise.

Summary

There is growing momentum for moving more and more technology to the cloud and to SaaS models. A trend that I call shift right (Wardley map). The main reason for this is not pricing, but skills, expertise, and focus!

For all road operators and DOTs out there, you have a limited budget and long roads with many blind spots. Instead of spending a lot of money on sophisticated CCTV cameras, use your budget to install more low-cost cameras, and use a cloud-based AI only when and where you need it! You can even experiment and find the right balance between performance, coverage, and pricing by dynamically scaling the solution based on your needs. This is exactly what Cloud means: freedom and agility.

The final stage of the shift-right trend is to move the cameras themselves to the cloud. This may sound like a joke, but in 10–20 years we will have connected vehicles that can provide us with live video streams on demand and satellites that can broadcast HD feed from anywhere on Earth. When this happens, we won’t need physical CCTV anymore. Instead, we will rely on a SaaS service that gives us live video feed from any road / at any time.

And I’ll finish with a Jewish joke:

A rabbi was called upon to settle a dispute between two of his followers. The first man poured out his complaints to the rabbi, and when he finished, the rabbi said, “You’re right.” Then it was the second one’s turn. When he finished, the rabbi said, “You’re also right.” The rabbi’s wife, who had been listening to the conversation, said incredulously to her husband, “What do you mean, ‘You’re also right’? They can’t both be right!” The rabbi thought for a few moments, and then replied, “You know, my dear, you’re also right.”

--

--

Ran Katzir (Valerann CTO: Digital Twins for ITS)

Experienced CTO with extensive experience in building digital and physical products. Likes to write about tech leadership and to provide clarity about it.