Atlas Device Sync: Defining and measuring performance

Tyler Kaye
Realm Blog
Published in
9 min readMay 18, 2023

Atlas Device Sync (ADS) is a platform for online-always data synchronization from mobile/edge devices and MongoDB. It is built from the ground up to provide:

  1. Real-time synchronization from device to cloud and back
  2. Seamless handling of network connectivity issues
  3. Built-in conflict resolution
  4. An intuitive developer experience across all of the Realm SDKs
ADS Overview

Read the following article for more information about ADS and how it works.

Overview:

Testing the performance of Device Sync has typically been a daunting task. On top of the fact that it is difficult to define what performance means exactly, there are a lot of factors that affect that performance.

The aim of this article is to provide a:

  1. Common vocabulary for defining the performance of ADS
  2. Summary of the factors that affect performance
  3. Repository with a CLI experience for testing various workloads
  4. Results of some initial tests

Defining the Performance of ADS

Latency vs. Throughput

When speaking about performance, the first thing to highlight is the difference between throughput and latency.

  • Latency: The time it takes for changes to propagate through the system
  • Throughput: The amount of data that can propagate through the system

ADS strives to be real-time; however, there are many design decisions made that prioritize throughput over latency. This is in line with how most of our customers want to use sync and ensure it can keep up with a high write load.

Types of Performance

Performance for ADS is a hard thing to define, let alone measure.

The first thing to think about when you are looking to test the performance of ADS is to hone in on what exactly it is that you are concerned about. This can include:

  1. Realm To Realm: Change propagating from one device to another
  2. Realm To Mongo: Change from a device propagating to MongoDB
  3. Mongo To Realm: MongoDB write propagating to device
  4. Bootstrap: Subscribing to a query and receiving the initial state
  5. Initial Sync: Turning on sync for the first time and populating the necessary metadata
  6. Load: How many clients are able to connect to ADS at the same time and (a) bootstrap, (b) upload writes, and (c) download changes

Each of these is a different workload with different patterns and performance characteristics. The last section will go into a brief description of each of them and how it will be tested.

Factors that affect performance

  1. Cluster Tier: The size of your MongoDB Atlas cluster will very much affect the performance of ADS. Please use at least an M10 (ideally an M30) to run performance tests on ADS. This has to do both with the performance of the cluster as well as various thresholds ADS uses based on cluster tier.
  2. Cluster Region vs. App Services Deployment Region: ADS performance is greatly increased by colocating the Atlas Cluster with the App Services Region (Local App). This is much more important than putting the App Services region close to the end-user because the number of roundtrips made to the database far outweighs that of the data transfer to the device.
  3. Size and Shape of Data: The size of the average document, the number of documents, and the schemas of the documents (lists, links, embedded objects) all play a part in the performance of ADS. Be sure to test with a similar data model as you will use in your application.
  4. Subscriptions and Permissions: ADS is like any database, it is as good as the inputs allow it to be. Defining subscriptions and permissions that use indexes efficiently (one on each of the queryable fields) will greatly increase the performance of some operations.
  5. Conflict Resolution: ADS is designed to handle conflicts but it is optimized to not have to. Avoiding document-level and field-level conflicts will lead to hitting many of the fast paths in the code that bypass the complexities of dealing with conflicts.

Measuring the Performance

Repository: https://github.com/tkaye407/DeviceSyncPerfTesting

This repository contains a CLI-like interface that lets you run performance tests on an ADS application. Follow the README for steps on how to set things up properly. Once you have an Atlas cluster and App Service application with sync enabled, you can interact with the CLI:

▶ npm start

Ensure that Flexible Sync is enabled and developer mode is turned on.
Enter your App Services App ID: *************
Enter developer mode database name: PerfDB
Initializing connection and syncing up schemas and queryable fields. Might take a few seconds

*************************
Select a number:
(1) Realm to Realm
(2) Mongo to Realm
(3) Realm to Mongo
(4) Realm Bootstrap
(5) Exit
*************************

Select number for test you want to run: 1
Number of tests to run: 5
Number objects to insert per test: 100
Number objects to insert per write: 20
Iteration 0 completed in 1300 ms
Iteration 1 completed in 1421 ms
Iteration 2 completed in 1511 ms
Iteration 3 completed in 1245 ms
Iteration 4 completed in 1198 ms
Results:
{ test: realm-to-realm, min: 1197, max: 1511, avg: 1335 }

This repository is intended to serve as a jumping off point to begin testing the performance of Device Sync. You can run it without any changes, or you can update the code to be more in line with what you want to test. This includes things like:

  1. Update the schema being used to more closely resemble your data models. This can be found in src/schemas.tsx
  2. Update the subscription being used (see src/schemas.tsx)
  3. Update any of the test workloads to more closely resemble your workload (types of updates, etc)

Individual Workloads and Results

Each section below defines what each workload is, why you should care about it, and what the results of the base set of tests are. These tests were run against an M30 cluster in us-east-1 and each test was run 5 times and averaged.

Realm To Realm

Definition: How long does a change made by one Realm client take to make it to another Realm client.

Why Care: This is important for collaborative apps in which you want to measure how quickly changes are sent between devices

How To Test It:

  1. Open 2 Realms: a reader and a writer
  2. Initialize a progress listener on the writer realm for the signal document (the last one to be inserted)
  3. Insert N objects into the writer realm such that the last document is the signal document
  4. When the progress listener resolves it means the signal document has made its way from the writer realm to the reader realm

Results:

The chart above shows how effective batching is when writing to Realm. There is a point at which batching has little to no effect, but there is a large gain when inserting several hundred objects at a time compared to one at a time. Additionally, a thing to call out is that generally increasing the number of documents by a factor of N leads to an execution time increase by a factor much less than N.

Mongo To Realm

Definition: How long it takes for a change made in Atlas to be sent to Realm client devices.

Why Care: If you have a lot of administrative changes or an ETL job pushing data into MongoDB, this defines how long it takes for that data to be propagated to the devices

How To Test It:

  1. Open a Realm and create a listener that will resolve when it receives a signal document
  2. Insert N documents into MongoDB Atlas directly using a driver (in this case the RemoteMongoClient built into the SDK). The last document inserted will be the signal document.
  3. When the listener resolves we know that the signal document has reached the Realm

Results:

The biggest thing to point out here is how much of a difference batching in the MongoDB driver makes!

Realm to Mongo

Definition: How long it takes for a change made by one Realm client to make it to MongoDB Atlas.

Why Care: If you have other consumers of the application using web clients, or perhaps a trigger executing custom function logic, this is how quickly this write will be visible in Atlas

How To Test It:

  1. Open a MongoDB Change Stream on the collection (using the watch API) to resolve when the signal document is inserted
  2. Insert N documents to the synced Realm such that the last document is the signal document
  3. When the change stream resolves that means all of the documents have made it to Atlas

Results:

An important caveat here is that there is an inherent issue with the test in that it is not just testing how long it takes for a change to make it to MongoDB but also how long it takes for the change stream to observe that and send it back to us. There is an additional latency in that but we expect that to be a constant across all workloads.

Realm Bootstrap

Definition: How long it takes for a subscription to be fully loaded.

Why Care: This might represent how long it takes when the application is first opened to get the relevant data from ADS

How To Test It:

  1. Add some initial data to seed the cluster
  2. Remove the subscription from the client and wait for the subscription change to complete
  3. Time how long it takes to re-add a subscription and download all relevant data from the server

Results:

The bootstrap is very much a function of how much data is in the cluster, how much is trying to be synced down, and what the subscription looks like.

Initial Sync:

Definition: How long does it take once you enable sync to read all documents in Atlas and send them to ADS

Why Care: In some disaster recovery scenarios is it necessary to terminate and re-enable sync to rebuild the metadata. How long this takes can be important for customers looking to avoid downtime.

How To Test It:

There is no script for this in the repository provided in this article, but an easy way to test this yourself is to:

  1. Seed your Atlas cluster with your data
  2. Define schemas for the collections and enable sync
  3. Time how long it takes for the initial sync to complete. This will be in a banner at the top of the page

Results:

This is very much dependent on the number of documents, number of collections, and average size of each document. Some samples of results we have seen on larger customers are:

  • 214,804,945 smaller documents in 12 minutes and 52 seconds
  • 1,741,170 medium-sized documents in 10 minutes and 8 seconds
  • 321,890,108 larger documents in 1 hour and 21 minutes

Load Testing

Definition: How many clients can connect to ADS at the same time and issue uploads.

Why Care: This test is intended to be used to load test the server with many clients. See the README for more details.

How To Test It:

This test is in some ways the simplest but also the most difficult to run. Running more than a few instances of it on a single machine will simply lead to the conclusion that the bottleneck is on the device and not the server. In production, we have applications with hundreds of thousands of daily active users connecting to ADS and we will scale our backend appropriately to handle the load.

If you do want to run a test like this, it will be important to distribute the load onto many machines. In this case, it is best to look into external tools such as JMeter and K8 to properly test this.

The repository comes with a bash script that will run the test automatically without any CLI interface

sh load.sh  client-app-id  5

Conclusion

ADS is a data synchronization platform. Like any data service, there are performance tradeoffs you can make in designing your application and workload. Due to it being more of a black box than traditional systems, this can make it more difficult to evaluate its performance of it. Hopefully, this article peels back the curtain a bit and empowers you to run your own tests.

--

--

Tyler Kaye
Realm Blog

Lead Engineer @ MongoDB working on Atlas Device Sync with Realm