Link aggregation via mobile carriers

Original article available at https://habrahabr.ru/company/mailru/blog/332950/

Introduction

It’s always nice to have a fast and reliable Internet connection while on the road, especially on long-haul train trips! If your route lies through a densely populated area, any modern smartphone with 4G support can easily pick up a signal, so you just keep on surfing the Internet as usual. Of course, it’s a completely different story once you leave those areas behind. There are only two ways to provide Internet connectivity for transportation:

  1. Aerial:
    Satellites, stratostats and other technologies that enable over-the-air data transfer.
  2. Ground-based facilities:
    Any means of passing a radio signal between base stations. It might be Wi-Fi, good old Radio Ethernet, some equipment used by mobile carriers or anything along those lines.

In our case, nobody in their right mind would allocate any funds for creating a custom data transfer infrastructure from scratch, so we had to make do with a satellite channel and a mobile carrier infrastructure.

Our options were further narrowed down when it became clear that our customer’s financial model doesn’t allow for satellite usage. Therefore, this article will detail how to use mobile carriers to provide the transportation industry with the most reliable channel possible.

Link aggregation is all about accumulating the capacity of various physical carriers. Suppose we have four channels with the capacity of 1 Mbit/s each — in theory, after aggregating them, we should have a single 4 Mbit/s channel. In our case, there are four mobile carriers with the maximum capacity of 70 Mbit/s each. Under auspicious circumstances, the aggregation could produce a single 280 Mbit/s channel.

You might argue that 280 Mbit/s isn’t enough for all the train passengers (around 700 people) and such capacity can’t be achieved outside of residential areas. Moreover, in places with no coverage whatsoever, no means of transportation will miraculously have Internet connectivity. And you’ll be absolutely right. That’s why we focus not on providing a comfortably fast Internet connection for everyone, but on creating a channel with at least some capacity in places where regular smartphones can’t pick up a signal.

This article is going to be about how we had to reinvent the wheel to bring the Internet to vehicles and railroad transport managed by an Indian railroad operator. I’ll also tell you how we tracked transport movements and monitored the quality of the data transfer channel at each checkpoint along the route, as well as how we stored the gathered data in a Tarantool cluster.

Link aggregation

The main goal of link aggregation is to ensure fault tolerance and/or to accumulate the capacity of several data transfer channels. Based on the aggregation type, network topology and the equipment used, concrete implementations may vary a lot.

Each aggregation type is worth a separate article, but we have a specific task at hand: to provide certain means of transportation with the most reliable and “broad” data transfer channel possible.

Mobile carrier infrastructure lays the foundation for the problem statement:

  1. Convergent environment. Data is transferred across the open Internet by various carriers. It means that EtherChannel and other hardware-supported protocols won’t work.
  2. High entropy of data transfer channels during transport movement. For each channel, capacity and delay are liable to fast and unpredictable changes as these parameters depend on the distance to the carrier’s base station, its workload, interference and so on.

From a technical standpoint, link aggregation is a very simple and well-documented procedure, so you can easily find information with any level of detail. In a nutshell, it looks like this:

On the client side:

  1. Network configuration. Independent data transfer channels are used to create L3 tunnels (one tunnel per channel) leading to a single aggregation point (any external server with a configured NAT behavior). Also set up is an interface that serves as a default gateway for the whole network.
  2. Specialized software that monitors key quality metrics of channels and tunnels and then, based on these metrics, distributes the NAT traffic across these tunnels.

It’s necessary to analyze channels of various mobile carriers and to monitor their signal strength, communication type, base station workload and any errors in a carrier’s data transfer network (not to be confused with an L3 tunnel). Based on the gathered metrics, data flow is distributed accordingly. That said, we decided we had to come up with a custom solution.

To be fair, there are certain solutions, more or less acceptable, that offer a working implementation of aggregation; for example, a standard interface bonding in Linux. We could use any available tool, be it a VPN or an SSH tunnel, to create an L3 tunnel, manually configure routing and then add virtual tunnel interfaces to the bonding. And everything’s going to work as long as the capacity of each tunnel remains the same at any given point in time. The thing is, given this network topology, the only working aggregation mode is balance-rr, whereby each tunnel gets an equal number of bytes, one after another. Suppose we have three channels with the capacities of 100 Mbit/s, 100 Mbit/s and 1 Mbit/s. It means that the resulting capacity will only be 3 Mbit/s, that is the minimum channel capacity gets multiplied by the number of channels. If each of the three channels has the capacity of 100 Mbit/s, the resulting capacity is 300 Mbit/s.

Another solution is a great open-source project called vtrunkd, which, after quite some time, was dragged out of oblivion in 2016. It has almost everything we need, and we sent the developers an honest email saying that we’re ready to pay for the project to be enhanced with the ability to monitor quality metrics of communication services provided by mobile carriers and to distribute the traffic based on these metrics. Since we didn’t hear back from the developers, we decided to create our own solution from scratch.

Qedr Train

We began with the monitoring of quality metrics of mobile carriers’ services (signal strength, network type, network errors and so on). When choosing modems, the main criterion was how easy it is to gather those metrics from them. Eventually we chose SIM7100 produced by Simcom. All the necessary metrics can easily be collected via its serial port. This modem also provides quite accurate GPS/GLONASS coordinates. Apart from the quality metrics above, we needed to monitor computer metrics (CPU and SSD temperature, available RAM and disk space, S.M.A.R.T. parameters) and, separately, network interface statistics (send/receive errors, send queue length, bytes transferred). As the modem throughput is severely limited and data packages being transferred need to be as small as possible, to simplify the monitoring of these metrics on Linux via /proc/sys, we developed the whole monitoring module from scratch as well.

With the monitoring subsystem out of the way, we started working on link aggregation proper. Unfortunately, the detailed algorithm is a trade secret, so I can’t make it publicly available. One thing I can do, though, is give you a high-level overview of what’s going on in the aggregation module installed in the means of transportation that we’ve talked about:

  1. At startup, it reads a JSON configuration file that contains settings for virtual interfaces. Addresses of aggregation servers are dynamically obtained from the central system. This ensures server-side load balancing and a relatively seamless handover in case any aggregation server is down.
  2. Based on the configuration file it read, the module creates L3 tunnels leading to aggregation servers and configures routing. Optionally, tunnels may have data compression and encryption.
  3. Based on the data received from the monitoring module, it assigns each tunnel a “weight”. The greater it is, the more traffic will pass through this tunnel. All the weights are updated once per second.
  4. Device statistics, its geoposition, and business metrics are accumulated for 10 minutes and then packaged into a transaction. All transactions are stored in a local Tarantool database and sent to the central database via Tarantool’s built-in replication mechanism. The developers and active community of this DBMS deserve a separate kudos.

Server-side-wise, link aggregation looks much more straightforward. At startup, the aggregation module queries the configuration server, receives a JSON configuration file and, based on it, creates L3 interfaces. Simple as that.

A system for metric collection and visualization deserves a separate mention. It consists of two large parts: one monitors systems that maintain client-side and server-side hardware, and the other monitors business metrics of the project.

Our technology stack is pretty standard: Grafana and OpenStreetMap for visualization, and Go and Tarantool for a client-side and server-side application server.

Tarantool

For our projects, we’ve tried a number of DBMSs over the years. It all started in 2009 with PostgreSQL that we used to store geospatial data generated by on-board devices installed in specialized vehicles. The PostGIS module was doing a good job. As time went on, we needed increasingly higher performance when processing schemaless data. We switched to MongoDB and stuck with it for a while, from version 2.4 to version 3.2. There were a couple of times when we were unable to recover the data after a power outage (we had insufficient resources to fully duplicate all the data). Then we turned to ArangoDB. Since our backend was written in JavaScript at the time, the technology stack was pretty sweet. In two years, however, this database became a thing of the past as well, since we couldn’t control RAM consumption when dealing with massive amounts of data. In this project, our attention was caught by Tarantool. A few deciding factors that cemented our choice:

  1. Built-in transaction mechanism.
  2. Ability to store non-relational data.
  3. Both in-memory and on-disk (Vinyl) storage engines.
  4. Master-slave replication
  5. High performance both on powerful datacenter hardware and on small devices installed in vehicles.

At first glance, Tarantool is all around perfect, except that it works only on a single CPU core. To make sure it’s not a showstopper for our project, we conducted a series of tests on target hardware and found out that Tarantool was doing just wonderful.

We have three major data profiles: financial operations, time series (system logs) and geospatial data.

Financial operations are money flow data per device. Each device has at least three SIM cards from different mobile carriers, so it’s necessary to keep track of the current account balance for each carrier and, in order to avoid service suspension, to know in advance when exactly to refill a specific account associated with a specific device.

Time series are just monitoring logs for all the subsystems, including an aggregate bandwidth log for each device installed in each vehicle. They allow us to see, for each mobile carrier, what channel was used at each point along the route. This data comes in handy when analyzing network coverage, which, in its turn, is used for handover and distribution among each mobile carrier’s channels. If we know beforehand that at a given point a certain carrier has the worst service, we simply omit this channel during the aggregation.

Geospatial data is generated as a result of tracking a means of transportation along its path. A GPS sensor built into the modem is queried each second and gives us the coordinates and altitude. Every ten minutes the collected data is packaged and sent to the datacenter. As per customer’s request, this data must be stored for an unlimited amount of time, so, given the ever-expanding fleet, it’s essential to carefully plan the whole infrastructure well in advance. Luckily, Tarantool’s implementation of sharding is pretty straightforward, so we didn’t rack our brains about scaling to accommodate even the fastest data growth rates.

We used the on-disk storage engine called Vinyl for all the three data profiles. Financial operations are not that numerous , so it doesn’t make much sense to keep them in RAM. Logs, of course, don’t need to be stored in memory, and neither does geospatial data until there arises a need to analyze it. When it’s time to analyze the gathered geospatial data, depending on the speed requirements, it might make sense to store the pre-aggregated data in RAM and analyze this preprocessed data instead of the raw one. But the customer has yet to finalize their requirements.

It’s worth mentioning that Tarantool’s managed to efficiently solve our problem. It demonstrated equally high performance running both on small on-board devices with limited resources and in the datacenter with ten shards.

Hardware used in devices installed in vehicles:

CPU: ARMv8, 4Core, 1.1Ghz
RAM: 2Gb
Storage: 32GB SSD

Hardware used in datacenter servers:

СPU x64 Intel Core i7, 8Core, 3.2Ghz
RAM: 32 Gb
Storage: 2x512Gb (Soft Raid 0)

As I mentioned, we used Tarantool’s built-in mechanism to replicate all the data from vehicles to the datacenter. Data sharding across multiple instances is also something available out of the box. Details can be found on the official site, so I guess I don’t need to give any additional information here.

Currently, our system serves 866 vehicles, and the customer’s planning to bring this number up to 8,000.

You might wonder why I haven’t mentioned the site of the company that I work for. Well, the reason is banal: we’re in the process of moving to a new hosting provider and therefore trying to prevent any unexpected spikes in the number of visitors for now.

Thanks for taking the time to read this! If you have any questions, drop me a line at a.rodin@r-t-s.ru and I’ll try to get back to you as soon as I can. My name is Alexander Rodin. I’m CIO and the principal developer (it doesn’t happen in startups only, and I just love programming!) of the Qedr Train project at Regional Telematic Systems.