System Design for Autonomous Vehicle Systems — Part 2
In the second installment of this series, we delve into the communication dynamics between the components outlined in Part 1.
How the components will communicate with each other
TCP will serve as the communication protocol between the Vehicles and the Telemetry Gateway. This choice is guided by the common practice of using TCP for IoT devices operating remotely with internet connectivity to a central server. HTTP protocol is not suitable in this scenario as it conveys more details about the content of the transmitted data — which is not what we want, unlike TCP which concentrates on the delivery process. Given the constraints of limited bandwidth and high data volume, TCP is more effective due to its capabilities in managing congestion and controlling data flow.
For data transfer between the Operational and Archive databases, as well as from the Operational database to the Data Warehouse, we will employ the ETL (Extract, Transform, and Load) process. This involves extracting data from one database, converting it into a different format, and then loading it into another database. This method aligns perfectly with our requirements for data movement between these databases.
The communication protocol on the other components will be determined by their technology stack.
Now let’s design the system architecture for each of the other components starting with the Telemetry gateway.
Telemetry Gateway
Application type
For this component, the application type will be either a console application or a service. Due to its reliance on TCP communication, it’s not feasible for it to be a web app, web API, mobile app, or desktop app. Additionally, it doesn’t operate on a request-response model. Any of the two options can work, but we will opt for a service application type.
Technology stack
When determining the technology stack for this component, we need to consider various factors: the data load this component will handle, performance requirements for managing this load, the team’s existing knowledge and skills, and the operating system environment. It’s crucial that the chosen technology is compatible with SWV’s current environment. Therefore, we’ll consult with SWV to understand their existing technology landscape and team expertise.
According to SWV, their developers are proficient in Python and highly skilled in JavaScript, and they exclusively use Linux servers. Given this context, Python is not a viable option for the Telemetry Gateway service due to its relatively lower performance. We need a high-performance language that is compatible with Linux and aligns with the team’s skills. While the team is familiar with Python, it’s not suitable for this purpose. JavaScript, however, is a strong candidate. With the NodeJS library, which with it a service that can be written using JavaScript, we gain the required performance.
NodeJS operates smoothly on Linux, leverages the team’s JavaScript expertise, and is fast. Therefore, JavaScript, via NodeJS, Linux for the server, emerges as the optimal choice for developing the technology stack for this component.
Architecture
Our architecture will adopt a Service Interface and Pipeline model, as this gateway component does not involve a user interface, business logic, or data access requirements.
The Service Interface efficiently captures the data and promptly channels it into the pipeline. This architecture is what we need for this service, being straightforward and minimalistic, is ideally suited for managing applications experiencing high data load.
Redundancy
To handle the heavy load on this service and in case there is a crash we simply put a Load Balancer in front of the Telemetry Gateway service.
Notice there are 3 instances of the Telemetry Gateway, this is because we want to be able to handle the load by distributing equally among all instances. Given the substantial load, starting with 3 instances is a practical approach, this will scale up in the future with more instances.
Telemetry Pipeline
The Telemetry pipeline acts as a queue system, so we need some kind of queue mechanism. The first thing to do is to talk and find out from our client SWV if there is an existing queue system that we can use.
According to SWV, there is no queue mechanism at the moment. This means two things, either we develop our queue system or we use a third-party system. Developing our own is a bad idea, so we would use a third-party queuing system instead. The good thing, there are a lot of queuing mechanisms out there that we can use.
What queue system are we going to use? As a Solution Architect, we need to find the best tool that works for this system.
We are going to use Apache Kafka.
Several reasons why Apache Kafka was picked, two main reasons are it can handle massive amounts of data and is highly available.
Telemetry Processor
Application type
The application type for this component will be a console or a service. This is because the component needs to maintain continuous communication with Apache Kafka. It requires a persistently operating service that can regularly query Kafka to retrieve the most recent data. This setup does not involve a request-response model or the use of an HTTP protocol. So between the two options — Console or Service, we would go for Service.
Technology stack
For the processor and the development stack for the processor, we will utilize NodeJS. The reason is, that NodeJS is highly effective for processing tasks, it’s already being used in the system, so there’s no compelling reason to switch to a different technology, and it has great Kafka support since it has good libraries just for this.
Datastore
Regarding the Operational Database, MongoDB will be our choice. Remember that in Part 1 our telemetry data lacks a fixed structure, we need a data store that supports schema-less and rapid data retrieval. And we won’t need complex queries, all we would do is query the database to get the latest information, no joins or complex filtering. MongoDB fulfills these requirements effectively.
As for the Archive database, which needs to accommodate a large volume of data and doesn’t require frequent access, our priority is not fast retrieval but cost-effectiveness due to the substantial data storage. To meet these needs, we will opt for cloud storage, which provides the necessary scalability, cost efficiency, and supports huge data.
Specifically, we will use Microsoft Azure cloud storage, as it aligns well with all our requirements for the Archive database.
Architecture
The Service Interface is responsible for getting the data from Apache Kafka. The Business Logic is responsible for the validation and processing of the data. The Data Access will store the processed data in the MongoDB Data Store.
Redundancy
To handle the heavy load on this service and in case there is a crash we simply put a Load Balancer in front of the Telemetry Processor service.
Just like the Telemetry Gateway, we have 3 instances of Telemetry Processor because we want to be able to handle the load by distributing it equally among all instances. So if one instance goes down, the other can keep functioning.
In the next final part, Part 3, we will discuss the Telemetry Viewer component, which presents real-time data to the user’s browser. We will delve into the application and technology stack: How should we manage the architecture and ensure redundancy? Additionally, we will explore the selection of a Business Intelligence tool for analyzing Telemetry data, and for generating trends and reports. All these topics will be covered in Part 3.